03 Oct 2025 7 min read AI & Humanity

Google Makes $460 From Your Data. You Get $2.

Technology platforms extract enormous value from user data while compensating the sources minimally or not at all.

From December 2024 through February 2025, Anker's eufy security camera division offered customers $2 per video to submit footage of package thefts and car break-ins. The company needed training data for its AI, and eufy explicitly encouraged customers to stage the crimes if real ones weren't available. The eufy program attracted over 120 participants who submitted hundreds of thousands of videos—many filming dozens of staged theft scenarios to maximize their $200 payouts.

Meanwhile, Google generates approximately $460 annually from each US user's data. Meta earns $217 per US user. Your email address alone is worth $89 to brands over time. These aren't hypothetical valuations—they're actual advertising dollars paid by companies who want to reach you based on what these platforms know about your behavior.

So why is eufy paying $2 while Google captures $460? And why isn't anyone else paying you at all?

The Value Gap

The asymmetry is stark and quantifiable. Technology platforms extract enormous value from user data while compensating the sources minimally or not at all. On the consumer side, receipt-scanning apps like Fetch Rewards and Ibotta offer $3-10 monthly to active users—$36-120 annually. These apps frame the exchange as "cash back on purchases," never as "selling your data for AI training."

The math reveals the extraction rate: companies capture 90-95% of the value generated by consumer data. The people creating that data receive 5-10% at most, and the majority receive nothing.

This isn't a small market inefficiency. The AI training dataset market is projected to grow from $2.6 billion in 2024 to $9.6 billion by 2030. Yet direct consumer compensation remains extraordinarily rare.

Why Companies Don't Pay You Directly

The eufy program worked for the company's specific circumstances: focused scope, rare event data, and self-imposed constraints from marketing cameras as "local storage only." But the model's extreme rarity reveals why it doesn't scale.

Transaction costs exceed value for individual data points. Processing a $2 payment requires verification, payment infrastructure, fraud prevention, and customer service. For consumer-scale operations—millions or billions of users—these costs quickly exceed per-unit value. Professional data labeling services charge companies $6 per annotated image specifically because they handle all the operational complexity—verification, quality control, payments to annotators—that makes direct consumer payment uneconomical at scale.
Legal compliance creates prohibitive friction. GDPR and CCPA mandate specific consent for each use of personal data. The FTC has explicitly warned that "quietly changing terms of service could be unfair or deceptive." Companies that collected data under one privacy regime cannot unilaterally decide to use it for AI training without fresh consent—triggering the compensation question.
Better alternatives exist at every price point. Synthetic data generation costs approximately $0.06 per image compared to $6 for professional human annotation—a 99% cost reduction that makes real consumer data economically unnecessary for many use cases. Gartner projects that 60% of AI training data is synthetic in 2024, up from just 1% in 2021, with complete dominance expected by 2030. Why negotiate with millions of consumers when you can generate or purchase data more efficiently?

The Real Business Model

Walmart exemplifies how companies actually monetize data: collect through normal operations, sell insights business-to-business, compensate consumers indirectly if at all.

Walmart Data Ventures has shown significant growth in supplier customers for data services, selling aggregated insights to Consumer Packaged Goods suppliers through platforms like Scintilla. The value proposition to suppliers: understand what customers actually buy, how products perform relative to competitors, which promotions drive sales, and what demographic segments respond to which marketing. Consumers contributed all underlying data through their purchases but received zero dollars from these B2B transactions.

Health tech reveals an even more audacious model: consumers actually pay companies for the privilege of generating valuable training data. Apple Watch costs $299-400+, Fitbit Premium requires $10 monthly, Whoop mandates $20-30 monthly. Users pay these sums, then generate continuous health data that companies use to train AI features.

The hospital-to-broker-to-developer pipeline operates with even less consumer visibility. Truveta has accumulated 120+ million patient records—approximately one-third of the U.S. population—making it one of the largest health data aggregators operating outside traditional HIPAA oversight. Truveta pays hospitals directly; patients receive absolutely nothing. Once properly de-identified per HIPAA standards, patient data can be sold commercially without patient notification or consent.

The LLM Double Extraction

The dynamics get even more striking when we look at the AI tools many of us use daily.

OpenAI and other LLM providers have perfected an elegant extraction model: charge users for access while using their interactions to improve models that generate enterprise revenue. ChatGPT Plus costs $20 monthly. Claude Pro costs $20 monthly. Users pay for access, then every conversation, every correction, every refinement potentially teaches the model.

The policies vary by provider—OpenAI's free tier uses conversations for training by default, while paid users have more control—but the fundamental dynamic remains consistent: consumer interactions improve models that generate far more revenue from enterprise customers than from individual subscriptions.

Microsoft's Copilot integration, Google's Gemini deployment, and Anthropic's Claude for Work all follow this pattern: consumer tier for training data collection and brand building, enterprise tier for revenue generation. The individual users teaching the model through conversations are compensated with better AI responses. The enterprises deploying those improved models pay six-figure annual contracts.

The synthetic data trajectory matters even more here. As LLMs increasingly train on AI-generated content, researchers warn of "model collapse"—deteriorating performance when AI trains exclusively on AI output. This makes human interaction data more valuable precisely as it becomes scarcer. Yet compensation mechanisms remain nonexistent.

The question for consumer LLM users isn't whether to use these tools—they're genuinely useful. It's whether we understand the exchange: we're not just customers; we're unpaid workers in a training operation generating billions in enterprise value.

The Coordination Problem No Individual Can Solve

Why don't market forces correct this asymmetry? The answer lies in coordination problems and power imbalances that individual action cannot solve.

Individual data is worth very little; aggregated data is worth everything. Your personal search history might be worth $8-10 monthly. But your search history combined with billions of other users becomes the foundation for the world's most valuable advertising platform. The value isn't linear—it's exponential through network effects.

Information asymmetry operates at maximum intensity. Google knows precisely what your data is worth because it auctions ad inventory in real-time markets with transparent pricing. You have no idea because that information is proprietary and deliberately obscured. You can't negotiate effectively when you don't know the value of what you're trading.

Network effects create lock-in that prevents competition. Social networks are valuable because your friends use them. Switching to a hypothetical "paid for data" competitor means losing your social graph, email history, photos, and personalized results. The costs of switching exceed any compensation a competitor could offer.

This explains why Datacoup, launched in 2014 as "the world's first personal data marketplace" promising $8 monthly, quietly shut down after acquisition in 2021-2022, having failed to achieve meaningful scale despite years of operation.

The Collective Alternative

If individual action fails and corporate self-regulation has demonstrably failed, data cooperatives represent the most promising alternative: collective bargaining power addressing the asymmetry between platforms and individuals.

The model draws directly from labor union history. Individual workers have minimal negotiating power; collective bargaining creates countervailing power. Data cooperatives apply this logic to the digital economy.

Real-world implementations are emerging. Driver's Seat Cooperative pools gig worker data. DataUnion Foundation has facilitated over 600 contributors using token-based rewards. CitizenMe has completed millions of data exchanges where consumers are paid directly.

The cooperative structure solves problems individual payment schemes cannot. Collective bargaining power enables negotiating terms individuals cannot achieve. Governance rights give data subjects meaningful control over usage terms. Revenue sharing distributes the 90-95% currently captured by platforms more equitably. Transaction cost reduction through aggregation makes compensation economically viable.

Policy support is materializing. The EU Data Governance Act explicitly recognizes data cooperatives. California's proposed "data dividend" reflects growing political interest in mechanisms ensuring value flows to contributors.

Challenges remain formidable: achieving sufficient scale, standardizing crowdsourced data, navigating regulatory complexity across jurisdictions. But the cooperative model's historical success in agriculture, finance, and worker ownership suggests digital viability.

Why This Matters Now

Three forces are converging to make the current asymmetry unsustainable.

Consumer trust is eroding measurably. Consumer confidence in the digital economy has declined significantly, with recent surveys showing less than half of consumers now believe online benefits outweigh privacy concerns. Among 25-34 year-olds, 49% have already switched companies over data policies, compared to just 16% of those 65+. The generational divide suggests trust problems will intensify as digital natives gain purchasing power.

Regulatory enforcement has escalated from warnings to existential threats. Total GDPR fines reached €1.2 billion in 2024, with cumulative fines since 2018 exceeding €5.88 billion. AI-related enforcement now dominates. OpenAI received €15 million for ChatGPT data practices. Google France paid €250 million—the first AI company fined specifically for training data issues.

Critically, the FTC can require deletion of models trained on unlawfully obtained data, creating existential risk beyond fines. This threat matters far more than financial penalties for trillion-dollar companies.

Technological alternatives reduce extraction necessity. Synthetic data, federated learning, and differential privacy enable AI development without invasive data collection. As these mature, the justification that extraction is necessary for functionality weakens.

Where Will We Go From Here

The eufy case matters not because it represents the future but because its exceptionalism proves the rule. Direct consumer payment can work technically—eufy demonstrated that. The reason companies don't compensate isn't technical impossibility; it's economic calculation.

The value gap—$460 to Google, $2 to you—persists because companies don't have to close it. Market forces don't correct power imbalances when network effects and switching costs prevent competition on data terms. Individual consumer action cannot solve collective problems.

But the current equilibrium isn't permanent. Consumer trust is eroding. Regulation is intensifying with real consequences. Cooperatives are organizing with policy support. Technology is evolving to reduce extraction dependence.

The question isn't whether your data is valuable—Google's $460 per user settles that. The question is who captures that value, and whether collective mechanisms and regulatory intervention can force redistribution from the current 90-95% platform capture toward something approaching fair compensation.

The next three years will determine whether data cooperatives achieve scale to negotiate effectively, whether regulatory frameworks enable collective bargaining, and whether enough consumers organize to create market pressure that individual action cannot.

The asymmetry will persist until external forces make it untenable. Those forces are building. Whether they build fast enough to shift outcomes before dominant models lock in permanently will determine whether we look back on this decade as when digital extraction peaked—or when it became permanent.

The next time you chat with an AI, scan a receipt, or search for something online, remember: you're not just a customer. You're generating value someone else captures. The question is whether we'll organize collectively to demand our share, or accept that digital extraction is simply how the internet works.