source: techcrunch ai: collecting robot training data is dirty, unglamorous work. some ai labs are already paying xdof to do it.

level: business

openai is restarting its robotics program, and other frontier ai labs are racing to teach machines to operate in the physical world. but robots need training data that captures physical interaction, and that data barely exists. unlike language models trained on public text, robot data requires teleoperation, sensors, and real-world feedback loops. xdof, a startup emerging from stealth, aims to fill this gap by building data pipelines, collection tools, and annotation systems for robotics companies.

xdof was founded by philipp wu, fred shentu, and nemo jin after wu's phd work at uc berkeley revealed a chicken-and-egg problem: no large-scale data existed to train foundation models for robotics. their earlier project, gello, created a low-cost teleoperation system for generating training data. now xdof plans to work across three data tiers: teleoperation on target robots, general teleoperated data, and egocentric data from humans wearing sensors. the company will hire and train operators worldwide to produce this data.

the startup has raised $70 million from thrive capital, spark capital, a16z, lux, and wndrco. it already works with 20 customers, including unnamed frontier ai labs. xdof is also releasing abc, a large dataset with 130,000 robot manipulation trajectories, 300 hours of simulation, and 100 hours of evaluations, in partnership with uc berkeley. the company argues that most ai labs would rather outsource the capital-intensive work of maintaining robot warehouses and training operators, creating a market for dedicated data infrastructure.

why it matters: high-quality robot training data is scarce and hard to produce, so dedicated data providers could accelerate physical ai development by letting labs focus on models instead of logistics.


source: techcrunch ai: collecting robot training data is dirty, unglamorous work. some ai labs are already paying xdof to do it.