Everyday Objects Rearrangement in a Human-Like Manner via Robotic Imagination and Learning from Demonstration
The rearrangement of objects is an essential task in daily human life. Subconsciously, humans break down such tasks into three components: perception, reasoning, and execution, which are automatically resolved. This process represents a significant challenge for robots, as it must apply complex logic to treat all the information and successfully execute the task. In this research, we propose a solution to perform this task in a human-like manner. For that purpose, we developed a modular algorithm that provides the capability to observe and understand the scene, imaginate the best solution and execute it, following human-like reasoning. This is done combining a zero-shot deep learning model for perception, a zero-shot large diffusion model to provide a human-like final scene and a Learning from Demonstration algorithm for execution. To test the performance, we have made several experiments to check the correct resolution of the rearrangement task. To this end, we have checked the efficiency of the final scene generated, the correct performance of the path using human demonstrations and finally experiments with two different robots in a simulated and real environment. The results obtained prove the adaptability of our algorithm to different environments, objects and robots.
The model presented is divided in 3 diferent modules:
- Observation Module : detects the elements of the envionment using Yolov8.
- Imagination Module : generate a final state image taking into account the elements detected using Stable Diffusion.
- Execution Module : applies a propietary Learning from Demostration algorithm that only takes into account relevant frames based on TPGMM and Kullback-Leiberg divergence.
To be used on your device, follow the installation steps below.
Requierements:
- Python 3.10.0 or higher
It is highly recommended to install all the dependencies on a new virtual environment. For more information check the conda documentation for installation and environment management. For creating the environment use the following commands on the terminal.
conda create -n relocateEnv python=3.10.0
conda activate relocateEnv
Clone the repository in your system.
git clone https://github.com/AdrianPrados/klTPGMM.git
For each of the modules it is necessary to install different requierements:
Follow the instructions in Yolov8.
Note: That modules requires the use of a Realsense D435 (or any other Realsense model)
Follow the instructions in StableDiffusion
For that module it is neccesary to install the executionReq.txt
cd Task_Parameterized_Gaussian_Mixture_Model
pip install -r executionReq.txt
This process starts with the acquisition of an top-view image
The information extracted from the image is received by the imagination module. This information is passed through the
To perform a correct rearrangement process, the objects in the $\mathcal{S}{o}$ state must end up as the objects in the $\mathcal{S}{f}$ state. The transition between the two states is not direct, as the rearrangement tasks are set up in a series of sub-tasks that allow the whole scene to be shaped correctly. To make this process in the most humane way, a module has been created that, by means of Learning from Demonstration, is able to perform each of the established sub-tasks. For this purpose, a proprietary algorithm based on TP-GMM has been created, to which a relevant frame selector has been added to optimise and improve the result for each of the actions and tasks to be performed. Prior to this process it is important to establish the optimal order of resolution, which is considered to be the order in which the objects between state transitions do not collide. For this purpose, a sorting and manipulation order selection logic has been implemented. Once this Execution process is finished, the task is considered as successfully completed by obtaining an organisation in the real environment equal to that obtained by the Imagination module, starting from the state obtained in the Observation module.
To launch the algorithm, the Realsense camera must be connected. First, the element detection part in the environment will be launched, which will allow obtaining the objects that will generate the prompts. These prompts will trigger the generation of the Imaginative part. This section can be run directly or use a previously generated model with a large number of generated images. After waiting for enough time, the algorithm will choose one of the correct options, check for collisions, and provide the solution path results through the Learning from Demonstrations algorithm. By default, all results are plotted to allow the user to visualize the corresponding results of each section. This can be removed by hiding the plt.show()
from the section that is not desired to be displayed.
To execute all the code run:
python Controlller.py
If you use this code, please quote our works 😊
@article{mendez2024everyday,
title={Everyday Objects Rearrangement in a Human-Like Manner via Robotic Imagination and Learning from Demonstration.},
author={Mendez, Alberto and Prados, Adrian and Menendez, Elisabeth and Barber, Ramon},
journal={IEEE Access},
year={2024},
publisher={IEEE}
}
Other interesting works that use the same idea 🤖
@inproceedings{mendez2024user,
title={User-guided framework for scene generation using diffusion models},
author={Mendez, Alberto and Prados, Adrian and Fernandez, Noelia and Espinoza, Gonzalo and Barber, Ramon},
booktitle={2024 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC)},
pages={22--27},
year={2024},
organization={IEEE}
}
@inproceedings{prados2024f,
title={f-Divergence Optimization for Task-Parameterized Learning from Demonstrations Algorithm},
author={Prados, Adrian and Mendez, Alberto and Espinoza, Gonzalo and Fernandez, Noelia and Barber, Ramon},
booktitle={2024 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC)},
pages={9--14},
year={2024},
organization={IEEE}
}