This repository contains the official codebase for Structure from Silence: Learning Scene Structure from Ambient Sound. [Project Page]
To setup the environment, please simply run
conda env create -f environment.yml
conda activate SFS
We collected Static recording
data in 46 classrooms from 12 buildings on The University of Michigan’s campus, amounting to approximately 200 minutes audio. Inside each classroom, we selected 16 − 30 positions and recorded 10 secs. of audio at each one. The camera and microphone are pointed toward the nearest wall when recording so that the distance is well-defined. Our data can be downloaded from Here. You can simply download this processed dataset by running
cd Dataset/SFS-Static
sh ./download_static.sh
We collected Motion recording
data with approximate 90 minutes of videos
in motion (222 videos total). During recording, the microphone and RGB-D camera move toward or away from a wall. Our data can be downloaded from Here. You can simply download this processed dataset by running
cd Dataset/SFS-Motion
sh ./download_motion.sh
We release several models trained with our proposed methods and our collected dataset. We hope it could benefit our research communities.
Model | Dataset | url |
---|---|---|
static-obstacle-detection | Static Recording | url |
static-relative-depth-order | Motion Recording | url |
motion-obstacle-detection | Motion Recording | url |
motion-relative-depth-order | Motion Recording | url |
AV-order-pretext | Motion Recording | url |
To download all the checkpoints, simply run
sh ./scripts/download_models.sh
We provide training and evaluation scripts under scripts
, please check each bash file and run chmod +x xx.sh
before running it.
-
Training obstacle detection model on Static recordings, run:
./scripts/training/train-static-obstacle.sh
-
Training relative depth order model on Static recording, run:
./scripts/training/train-static-relative-depth.sh
-
Training obstacle detection model on Motion recordings, run:
./scripts/training/train-motion-obstacle.sh
-
Training relative depth order model on Motion recording, run:
./scripts/training/train-motion-relative-depth.sh
-
Training AV-Order model on Motion recording, run:
./scripts/training/train-motion-avorder.sh
-
To perform linear probing experiment on downstream tasks, here is an example for relative depth order task:
CUDA_VISIBLE_DEVICES=0 python main_lincls.py --exp='Audio-LinCLS-relative-depth' \ --epochs=40 --setting='av_lincls_RD' --input='audio' \ --batch_size=256 --num_workers=8 --save_step=1 --valid_step=1 --lr=0.01 \ --optim='SGD' --repeat=50 --schedule='cos' --aug_wave --freeze \ --resume='pretrained-models/motion-avorder.pth.tar'
- To evaluate our
static-obstacle-detection
model, simply run:./scripts/test/test-static-obstacle.sh
under parent path. You can change the checkpoint in the bash file. - To evaluate our
static-relative-depth-order
model, simply run:./scripts/test/test-static-relative-depth.sh
under parent path. You can change the checkpoint in the bash file. - To evaluate our
motion-obstacle-detection
model, simply run:./scripts/test/test-motion-obstacle.sh
under parent path. You can change the checkpoint in the bash file. - To evaluate our
motion-relative-depth-order
model, simply run:./scripts/test/test-motion-relative-depth.sh
under parent path. You can change the checkpoint in the bash file.
If you find our project useful, please consider citing:
@inproceedings{
chen2021structure,
title={Structure from Silence: Learning Scene Structure from Ambient Sound},
author={Ziyang Chen and Xixi Hu and Andrew Owens},
booktitle={5th Annual Conference on Robot Learning },
year={2021},
url={https://openreview.net/forum?id=ht3aHpc1hUt}
}
This work was funded in part by DARPA Semafor and Cisco Systems. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.