Skip to content

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

License

Notifications You must be signed in to change notification settings

sovit-123/SAM_Molmo_Whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAM_Molmo_Whisper

Note: The project is in very initial stages and will change drastically in the near future. Things may break.

Go to Setup

A simple integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Capabilities:

  • Segment objects with SAM2.1 using point prompts.
  • Points can be obtained by prompting Molmo with natural language. Molmo can take inputs by the text box (typing) or Whisper via microphone (speech to text).

Run the Gradio demo using:

python app.py
sam2_molmo_whisper-2024-10-11_07.09.47.mp4

What's New

October 30, 2024

  • Added tabbed interface for video segmentation. Process remains the same. Either prompt via text or voice, upload a video and get the segmentation maps of the objects.

Setup

Clone Repo

git clone https://github.com/sovit-123/SAM_Molmo_Whisper.git
cd SAM_Molmo_Whisper

Installing Requirements

Install Pytorch, Hugging Face Transformers, and the rest of the base requirements.

pip install -r requirements.txt

Install SAM2

It is highly recommended to clone SAM2 to a separate directory other than this project directory and run the installation commands.

git clone https://github.com/facebookresearch/sam2.git && cd sam2

pip install -e .

To Use CLIP Auto Labelling

After installing the requirements install SpaCy's en_core_web_sm model.

spacy download en_core_web_sm

Run the App

python app.py

About

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published