Seamless AI is a cutting-edge AI suite that combines state-of-the-art natural language processing, computer vision, and multimodal capabilities to provide a unified and intuitive experience for users across various input modes and languages. This revolutionary project supports 11 widely spoken Indian languages, including low-resource languages, making it accessible to a diverse Indian audience.
The repository is organized into the following directories:
- LID: Contains the implementation of Language Identification (LID) for both text and speech inputs, leveraging the SpeechBrain toolkit.
- Models Used: Provides information about the pre-trained models utilized in the project.
- Notebook-Recipes: Includes Jupyter Notebooks demonstrating the usage of Bhashini pipelines and Seamless AI pipelines.
- SeamlessGlasses: This directory houses the core implementation of our product offering Seamless Glasses, Concept AR Smart Glasses with Seamless AI's pipelines.
- Stella: Contains the source code for the demo chatbot "Stella," a multilingual chatbot showcasing Seamless AI's capabilities.
Seamless AI is built upon a robust ensemble of state-of-the-art open-source models, carefully curated and integrated to deliver exceptional performance:
-
Natural Language Processing (NLP) Models:
- Mixtral 8x7B: For text comprehension and generation.
- Bhashini Suite: Provides automatic speech recognition (ASR), text-to-speech (TTS), and machine translation (NMT) capabilities for Indian languages.
- Whisper Large V2 : Backup ASR
-
Multimodal AI Models:
- FireLLaVA 13B: Combines language understanding and computer vision for multimodal processing.
- CogVLM: Enables image analysis and textual description generation.
-
Image Generation:
- Stable Diffusion XL: Generates high-fidelity images based on textual prompts.
- Multimodal Interactions: Support for text, speech, image, and video inputs, enabling natural and intuitive user interactions.
- Multilingual Support: Supports 11 Indian languages, including low-resource ones, breaking down language barriers.
- Fast Processing Speeds: Optimized pipelines for real-time, responsive interactions.
- Cloud-Based Integration: Cloud-based architecture allows low-resource devices to leverage Seamless AI's advanced capabilities.
To set up the project locally, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/seamless-ai.git
- Follow the instructions (readme.md) in the respective subdirectories (
LID
,Notebook-Recipes
,SeamlessGlasses
,Stella
) to run the desired components or demos.
Contributions to Seamless AI are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.
Seamless AI was developed by Team UNDERGOD, during the SAMSUNG PRISM GEN AI HACKATHON. We would like to express our gratitude to Samsung for organizing this event and providing us with the opportunity to showcase our skills and innovation.