4.
Choosing storage formats for robot learning data
Robot learning datasets stress storage systems differently from analytics workloads, forcing tradeoffs among Parquet, MCAP, Lance, NCore, and purpose-built formats
3 appearances on the backlist front page in the last 30 days.
Robot learning datasets stress storage systems differently from analytics workloads, forcing tradeoffs among Parquet, MCAP, Lance, NCore, and purpose-built formats
Robot-learning datasets are multi-rate and multimodal by default, and .rrd is designed around that reality rather than treating logs as ordinary videos or tables
LLMs are trained on web data. Physical AI is trained on physical data. Physical data is different from web data. It is multi-rate: cameras might run at 10-30Hz, joint angles at 100-200Hz, and IMUs at 1kHz. It is also multimodal: one stre