Linok Sergey
·
Tatiana Zemskova
·
Svetlana Ladanova
·
Roman Titkov
·
Dmitry Yudin
Maxim Monastyrny
·
Aleksei Valenkov
10GB+ vRAM to run mapping and 16GB+ vRAM to run local LLM and vLLM.
Download the Replica RGB-D scan dataset using the downloading script in Nice-SLAM. It contains rendered trajectories using the mesh models provided by the original Replica datasets.
For ScanNet, please follow the instructions in ScanNet.
Build docker image and create container:
./docker/build.sh
./docker/start.sh <path_to_data_folder>
./docker/into.sh
Install bbq library, call this once for container:
pip install -e .
First, build 3D scene representation. Check config before run. Inside container call script:
python3 main.py --config_path=examples/configs/replica/room0.yaml #Replica
python3 main.py --config_path=examples/configs/scannet/scene0011_00.yaml #ScanNet
To visualize construction process:
python3 main.py --config_path=examples/configs/replica/room0.yaml --save_path=output
python3 visualize/show_construction.py --animation_folder=output
Setup Meta-Llama-3-8B-Instruct according to the docs.
# Llama3-8B
python3 query.py --scene_file=examples/scenes/replica/room0.json --model_path=<your_path>/Meta-Llama-3-8B-Instruct #Replica
python3 query.py --scene_file=examples/scenes/scannet/scene0011_00.json --model_path=<your_path>/Meta-Llama-3-8B-Instruct #ScanNet
We base our work on the following paper codebase: ConceptGraphs.
If you find this work helpful, please consider citing our work as:
@misc{linok2024barequeriesopenvocabularyobject,
title={Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph},
author={Sergey Linok and Tatiana Zemskova and Svetlana Ladanova and Roman Titkov and Dmitry Yudin and Maxim Monastyrny and Aleksei Valenkov},
year={2024},
eprint={2406.07113},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.07113},
}
Please create an issue on this repository for questions, comments and reporting bugs. Send an email to Linok Sergey for other inquiries.