The development of humanoid robots is increasingly dependent not just on hardware breakthroughs or AI models, but on a growing global workforce capturing the physical world on camera. Across more than 50 countries, gig workers are now filming themselves performing everyday household tasks to generate training data for robots that are still years away from widespread deployment.
The model, led by startups such as Micro1, reflects a broader shift in how physical AI systems are built. Just as large language models relied on vast corpora of text scraped and labeled at scale, humanoid robots require detailed recordings of human interaction with objects in real-world environments. The difference is that this data must be created, not collected – and it is being produced inside people’s homes.
Building the Data Layer for Physical AI
Humanoid robots face a fundamentally different challenge from software-based AI systems: they must operate in unstructured, unpredictable environments. Tasks such as folding laundry, loading dishwashers, or organizing shelves involve subtle variations that are difficult to simulate or script.
To address this, companies are assembling large datasets of human activity, capturing how people manipulate objects in real settings. Workers are paid to record themselves performing routine tasks, often wearing cameras that track hand movements, object interactions, and spatial context.
The resulting footage forms the foundation for training robot perception and control systems. Companies such as Scale AI have already accumulated tens of thousands of hours of such material, while platforms like DoorDash have begun experimenting with allowing gig workers to contribute training data alongside their primary work.
This emerging pipeline suggests that physical AI will depend on a new category of data infrastructure – one that extends beyond digital content into the physical behaviors of human workers.
A Familiar Economic Structure in a New Domain
The economics of this system closely resemble earlier phases of the AI industry. Workers contributing data are typically paid hourly rates that are competitive within their local economies but represent a small fraction of the value generated downstream.
Participants receive no ownership over the data they produce and no share in the long-term value of the models trained on it. As humanoid robotics companies attract billions in investment, the gap between capital allocation and labor compensation is becoming more pronounced.
This structure mirrors the development of computer vision and natural language processing systems, where data labeling and annotation were outsourced globally. The key difference is that physical AI requires more invasive forms of data collection, capturing not just digital inputs but lived environments.
The result is a new layer of the gig economy, one that sits beneath the visible robotics industry and provides the raw material for its progress.
Privacy Risks Move Into the Home
Unlike earlier data pipelines, which largely relied on public or platform-generated content, the data used to train humanoid robots is often recorded in private spaces. Videos include kitchen layouts, household items, and other details that collectively form a detailed map of domestic life.
This raises questions about data ownership, consent, and long-term storage. Workers may have limited visibility into how their recordings are used, whether they are anonymized, or how long they are retained. The implications extend beyond individual privacy to broader concerns about the creation of large-scale visual datasets of private environments.
Researchers in human-centered computing have emphasized the need for clearer disclosure and safeguards, but industry practices remain inconsistent. As the volume of collected data grows, so too does the potential risk associated with breaches, misuse, or secondary applications.
The reliance on gig workers to generate training data underscores a central reality of humanoid robotics: progress depends not only on engineering advances, but on access to large-scale, real-world human behavior.
This data-centric approach may accelerate development, but it also introduces new questions about labor, ownership, and privacy. As physical AI moves closer to commercial deployment, the systems being built will increasingly reflect not just technological innovation, but the global infrastructure of work that supports them.