SAM_Molmo_Whisper

Note: The project is in very initial stages and will change drastically in the near future. Things may break.

A simple integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Capabilities:

Segment objects with SAM2.1 using point prompts.
Points can be obtained by prompting Molmo with natural language. Molmo can take inputs by the text box (typing) or Whisper via microphone (speech to text).

Run the Gradio demo using:

python app.py

sam2_molmo_whisper-2024-10-11_07.09.47.mp4

What's New

October 30, 2024

Added tabbed interface for video segmentation. Process remains the same. Either prompt via text or voice, upload a video and get the segmentation maps of the objects.

Setup

Clone Repo

git clone https://github.com/sovit-123/SAM_Molmo_Whisper.git

cd SAM_Molmo_Whisper

Installing Requirements

Install Pytorch, Hugging Face Transformers, and the rest of the base requirements.

pip install -r requirements.txt

Install SAM2

It is highly recommended to clone SAM2 to a separate directory other than this project directory and run the installation commands.

git clone https://github.com/facebookresearch/sam2.git && cd sam2

pip install -e .

To Use CLIP Auto Labelling

After installing the requirements install SpaCy's en_core_web_sm model.

spacy download en_core_web_sm

Run the App

python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
demo_data		demo_data
docs		docs
experiments		experiments
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAM_Molmo_Whisper

What's New

October 30, 2024

Setup

Clone Repo

Installing Requirements

Install SAM2

To Use CLIP Auto Labelling

Run the App

About

Releases

Packages

Languages

License

sovit-123/SAM_Molmo_Whisper

Folders and files

Latest commit

History

Repository files navigation

SAM_Molmo_Whisper

What's New

October 30, 2024

Setup

Clone Repo

Installing Requirements

Install SAM2

To Use CLIP Auto Labelling

Run the App

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages