Seamless AI

Seamless AI is a cutting-edge AI suite that combines state-of-the-art natural language processing, computer vision, and multimodal capabilities to provide a unified and intuitive experience for users across various input modes and languages. This revolutionary project supports 11 widely spoken Indian languages, including low-resource languages, making it accessible to a diverse Indian audience.

Image 2 Speech Pipeline

Speech 2 Speech Pipeline

Text 2 Speech Pipeline

Repository Structure

The repository is organized into the following directories:

LID: Contains the implementation of Language Identification (LID) for both text and speech inputs, leveraging the SpeechBrain toolkit.
Models Used: Provides information about the pre-trained models utilized in the project.
Notebook-Recipes: Includes Jupyter Notebooks demonstrating the usage of Bhashini pipelines and Seamless AI pipelines.
SeamlessGlasses: This directory houses the core implementation of our product offering Seamless Glasses, Concept AR Smart Glasses with Seamless AI's pipelines.
Stella: Contains the source code for the demo chatbot "Stella," a multilingual chatbot showcasing Seamless AI's capabilities.

Technologies Used

Seamless AI is built upon a robust ensemble of state-of-the-art open-source models, carefully curated and integrated to deliver exceptional performance:

Natural Language Processing (NLP) Models:
- Mixtral 8x7B: For text comprehension and generation.
- Bhashini Suite: Provides automatic speech recognition (ASR), text-to-speech (TTS), and machine translation (NMT) capabilities for Indian languages.
- Whisper Large V2 : Backup ASR
Multimodal AI Models:
- FireLLaVA 13B: Combines language understanding and computer vision for multimodal processing.
- CogVLM: Enables image analysis and textual description generation.
Image Generation:
- Stable Diffusion XL: Generates high-fidelity images based on textual prompts.

Key Features

Multimodal Interactions: Support for text, speech, image, and video inputs, enabling natural and intuitive user interactions.
Multilingual Support: Supports 11 Indian languages, including low-resource ones, breaking down language barriers.
Fast Processing Speeds: Optimized pipelines for real-time, responsive interactions.
Cloud-Based Integration: Cloud-based architecture allows low-resource devices to leverage Seamless AI's advanced capabilities.

Getting Started

To set up the project locally, follow these steps:

Clone the repository:

git clone https://github.com/your-username/seamless-ai.git

Follow the instructions (readme.md) in the respective subdirectories (LID, Notebook-Recipes, SeamlessGlasses, Stella) to run the desired components or demos.

Contributing

Contributions to Seamless AI are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Acknowledgments

Seamless AI was developed by Team UNDERGOD, during the SAMSUNG PRISM GEN AI HACKATHON. We would like to express our gratitude to Samsung for organizing this event and providing us with the opportunity to showcase our skills and innovation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seamless AI

Image 2 Speech Pipeline

Speech 2 Speech Pipeline

Text 2 Speech Pipeline

Repository Structure

Technologies Used

Key Features

Getting Started

Contributing

License

Acknowledgments

Certificate

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LID		LID
Models Used		Models Used
Notebook-Recipes		Notebook-Recipes
SeamlessGlasses		SeamlessGlasses
Stella		Stella
GIF.gif		GIF.gif
README.md		README.md

shresthasingh1501/Seamless-AI

Folders and files

Latest commit

History

Repository files navigation

Seamless AI

Image 2 Speech Pipeline

Speech 2 Speech Pipeline

Text 2 Speech Pipeline

Repository Structure

Technologies Used

Key Features

Getting Started

Contributing

License

Acknowledgments

Certificate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages