In-Home Daily-Life Captioning Using Radio Signals

Lijie Fan*      Tianhong Li*      Yuan Yuan      Dina Katabi
Massachusetts Institute of Technology
(* indicates equal contribution;)


This paper aims to caption daily life --i.e., to create a textual description of people's activities and interactions with objects in their homes. Addressing this problem requires novel methods beyond traditional video captioning, as most people would have privacy concerns about deploying cameras throughout their homes. We introduce \name, a new model for captioning daily life by analyzing the privacy-preserving radio signal in the home with the home's floormap. \name\ can further observe and caption people's life through walls and occlusions and in dark settings. In designing \name, we exploit the ability of radio signals to capture people's 3D dynamics, and use the floormap to help the model learn people's interactions with objects. We also use a multi-modal feature alignment training scheme that leverages existing video-based captioning datasets to improve the performance of our radio-based captioning model. Extensive experimental results demonstrate that \name\ generates accurate captions under visible conditions. It also sustains its good performance in dark or occluded settings, where video-based captioning approaches fail to generate meaningful captions.


Demo Video:

Talk Video:


Coming soon.

Also check out:

Learning Longterm Representations for Person Re-Identification Using Radio Signals
L. Fan*, T. Li*, R. Fang*, R. Hristov, Y. Yuan, and D. Katabi
Computer Vision and Pattern Recognition (CVPR), 2020

Through-Wall Human Pose Estimation using Radio Signals
M. Zhao, T. Li, M. Alsheikh, Y. Tian, H. Zhao, A. Torralba and D. Katabi
Computer Vision and Pattern Recognition (CVPR), 2018