Skip to content

shresthasingh1501/Seamless-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seamless AI

Seamless AI is a cutting-edge AI suite that combines state-of-the-art natural language processing, computer vision, and multimodal capabilities to provide a unified and intuitive experience for users across various input modes and languages. This revolutionary project supports 11 widely spoken Indian languages, including low-resource languages, making it accessible to a diverse Indian audience.

Image 2 Speech Pipeline

1719138560906

Speech 2 Speech Pipeline

1719138560857

Text 2 Speech Pipeline

1719138560886

Repository Structure

The repository is organized into the following directories:

  • LID: Contains the implementation of Language Identification (LID) for both text and speech inputs, leveraging the SpeechBrain toolkit.
  • Models Used: Provides information about the pre-trained models utilized in the project.
  • Notebook-Recipes: Includes Jupyter Notebooks demonstrating the usage of Bhashini pipelines and Seamless AI pipelines.
  • SeamlessGlasses: This directory houses the core implementation of our product offering Seamless Glasses, Concept AR Smart Glasses with Seamless AI's pipelines.
  • Stella: Contains the source code for the demo chatbot "Stella," a multilingual chatbot showcasing Seamless AI's capabilities.

Technologies Used

Seamless AI is built upon a robust ensemble of state-of-the-art open-source models, carefully curated and integrated to deliver exceptional performance:

  • Natural Language Processing (NLP) Models:

    • Mixtral 8x7B: For text comprehension and generation.
    • Bhashini Suite: Provides automatic speech recognition (ASR), text-to-speech (TTS), and machine translation (NMT) capabilities for Indian languages.
    • Whisper Large V2 : Backup ASR
  • Multimodal AI Models:

    • FireLLaVA 13B: Combines language understanding and computer vision for multimodal processing.
    • CogVLM: Enables image analysis and textual description generation.
  • Image Generation:

    • Stable Diffusion XL: Generates high-fidelity images based on textual prompts.

Key Features

  • Multimodal Interactions: Support for text, speech, image, and video inputs, enabling natural and intuitive user interactions.
  • Multilingual Support: Supports 11 Indian languages, including low-resource ones, breaking down language barriers.
  • Fast Processing Speeds: Optimized pipelines for real-time, responsive interactions.
  • Cloud-Based Integration: Cloud-based architecture allows low-resource devices to leverage Seamless AI's advanced capabilities.

Getting Started

To set up the project locally, follow these steps:

  1. Clone the repository:
git clone https://github.com/your-username/seamless-ai.git
  1. Follow the instructions (readme.md) in the respective subdirectories (LID, Notebook-Recipes, SeamlessGlasses, Stella) to run the desired components or demos.

Contributing

Contributions to Seamless AI are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Acknowledgments

Seamless AI was developed by Team UNDERGOD, during the SAMSUNG PRISM GEN AI HACKATHON. We would like to express our gratitude to Samsung for organizing this event and providing us with the opportunity to showcase our skills and innovation.

Certificate

1719138561284

About

Winning Project - Samsung Prism Gen AI Hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published