Every robotics and world model company is hitting the same wall: not enough real-world data. AURI turns billions of earbuds users into the world's largest egocentric data collection network — while giving consumers cinematic video they actually want. Current collection costs $75+/hr. Ours approaches zero.
Foundation models for robotics, world simulation, and embodied intelligence require massive amounts of real-world egocentric data. Current collection methods can't scale.
Research labs hire people to perform tasks while wearing cameras. Expensive, limited diversity, artificial behavior. It doesn't scale beyond a few thousand hours.
Teleoperated data only covers lab environments — kitchens, tabletops, warehouses. It misses the vast diversity of real human life: commuting, cooking, exercising, socializing.
AURI X1 is a pair of camera earbuds — fisheye lenses mounted at the ear, the most biomechanically stable point on the human body. Users wear them to exercise, commute, and live their lives while listening to music. The cameras capture continuous wide-angle egocentric video.
Our AI pipeline transforms this raw footage into structured embodied data: skeleton trajectories, action semantics, object interactions, 3D scene reconstructions. Data that robotics companies can directly use for training.
Users get something they want — AI-generated cinematic third-person video of their activities. We get something the industry needs — diverse, real-world egocentric data at near-zero marginal cost.
Ear-mounted fisheye cameras capture 170° ego-centric video. Built-in IMU tracks head motion. High-quality earbuds make it worth wearing all day.
Our pipeline extracts skeleton pose, hand-object interactions, action segmentation, camera ego-motion, and scene geometry from raw fisheye video.
Structured data is packaged per customer spec and delivered via API. Users receive AI-rendered cinematic videos. Data buyers get training-ready datasets.
Wear AURI while running, cycling, cooking, or traveling. AI transforms ego-centric footage into cinematic third-person video — follow-cam, orbit shots, hero moments.
Every minute of user activity generates training data for robotics, world models, and embodied AI. Our distillation pipeline outputs structured data mapped to customer schemas.
The ear is the body's most stable mounting point during locomotion. Unlike the forehead (glasses shake), chest (breathing), or wrist (arm swing), the ear sits at the natural pivot of head movement — the body's biological gimbal.
A single fisheye lens at the ear captures forward environment, peripheral context, AND the wearer's own shoulders, arms, and legs — simultaneously. This dual visibility is what makes ego-to-exo reconstruction possible.
500M+ people already wear earbuds daily. No new behavior required. No social stigma of camera glasses. Invisible, natural, all-day wearable. The best data collection device is one people already want to use.
Skeleton extraction from partial body visibility + IMU fusion. Fisheye distortion-aware models trained on our proprietary ego-view data.
Volumetric 3D scene via Gaussian Splatting. Camera ego-motion estimation. Object detection and tracking across frames.
VLM-powered action segmentation, hand-object interaction graphs, task phase annotation. Structured output mapped to customer schema.
Ego-to-exo view synthesis. Autonomous cinematography: follow-cam, orbit, hero shot. Neural rendering with style transfer for consumer output.
85 patent claims filed under PPA 63/999,137 covering 16 subsystems including: ego-to-exo synthesis, autonomous cinematography, hardware-rooted reality proof, predictive safety, multi-person collaborative sensing, and embodied AI data engine.
Tesla's insight wasn't the car — it was turning millions of drivers into free data labelers. Every mile driven improves FSD for everyone. AURI applies the same logic: every minute a user wears our earbuds generates egocentric data that trains world models and robot AI.
"20,854 hours of egocentric video → 54% improvement in robot dexterity.
Log-linear scaling with R²=0.9983. No saturation observed."
More users → more diverse data → better AI output → better consumer experience → more users. The flywheel that makes egocentric data abundant and cheap while competitors pay $75+/hr.
Raw ego video is commodity. Our proprietary pipeline that extracts structured embodied data — skeleton trajectories, action semantics, interaction graphs — from fisheye ego-view is the real barrier. Optimized for our hardware's specific optical characteristics.
Hardware is the entry point. Subscription is retention. Data licensing is where the real value compounds.
Kickstarter at $299, retail $399. BOM target under $80. The device consumers want to wear. The data collection infrastructure enterprises need deployed.
$9.99/mo for ego-to-exo cinematic rendering, cloud processing, AI-triggered highlights, and 3D reconstruction features.
Anonymized, consented egocentric data sold to robotics and world model companies. Per-hour pricing. Structured data at premium. Raw video at standard. Custom schemas for enterprise.
Forbes 30 Under 30 (Consumer Technology). Youngest-ever Red Dot Design Award: Best of the Best recipient (age 19). 33 granted patents. Tsinghua University. Founded Nums — smart trackpad with global distribution, Unbox Therapy feature (1.7M+ views). Deep Shenzhen hardware supply chain expertise.
Stanford CS. Co-created Oasis — first real-time playable world model. Former researcher at World Labs (Fei-Fei Li). ICLR 2026 first author (Percy Liang). Chose to join AURI over MIT, Berkeley, and Stanford PhD offers.
Ex-Google 7 years (PM, Mountain View). UCLA Anderson MBA. Kickstarter launch experience. Deep Shenzhen supply chain network. Manages hardware production and vendor relationships.
Avin Wang (Meta Superintelligence Labs) · Yilin Zhu (Apple ML, Stanford CS PhD) · Growing network across Stanford, Tsinghua, and Silicon Valley.
The world's robots will learn from the world's people.
We're building the infrastructure to make that happen.
Shawn Gong — CEO / Founder