#

multimodal-deep-learning

Here are 393 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Oct 11, 2024
Jupyter Notebook

Awesome-Text-to-Image

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

survey generative-adversarial-network image-manipulation image-generation text-to-image image-synthesis multimodal multimodal-deep-learning awseome-list text-to-face

Updated Nov 7, 2024

kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

machine-learning deep-neural-networks artificial-intelligence deeplearning multimodal multimodal-deep-learning gpt4

Updated Nov 11, 2024
Python

AI4Finance-Foundation / FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs 🚀 🚀 🚀

finance multimodal-deep-learning robo-advisor large-language-models prompt-engineering chatgpt fingpt aiagent

Updated Nov 4, 2024
Jupyter Notebook

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Sep 30, 2024
C++

KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

machine-learning deep-learning time-series language-model time-series-analysis time-series-forecast time-series-forecasting multimodal-deep-learning cross-modality multimodal-time-series cross-modal-learning prompt-tuning large-language-models

Updated Nov 3, 2024
Python

pytorch-widedeep

jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

python deep-learning text images tabular-data pytorch pytorch-cv multimodal-deep-learning pytorch-nlp pytorch-transformers model-hub pytorch-tabular-data

Updated Nov 6, 2024
Python

DWCTOD / CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

computer-vision segmentation object-detection cvpr multimodal-deep-learning cvpr2021 cvpr2022 llm cvpr2023 segment-anything cvpr2024

Updated Apr 25, 2024

yuewang-cuhk / awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

bert vision-and-language multimodal-deep-learning pretraining vl-ptms

Updated Aug 19, 2022

TheShadow29 / awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

natural-language-processing computer-vision paper awesome-list arxiv papers video-understanding captioning-images captioning-videos phrase-grounding language-grounding multimodal-deep-learning grounding visual-grounding embodied-agent video-grounding image-grounding paper-roadmap

Updated Apr 9, 2023

declare-lab / multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

multimodal-interactions multimodal-learning multimodal-sentiment-analysis multimodal-deep-learning

Updated Mar 15, 2023
OpenEdge ABL

blended-latent-diffusion

omriav / blended-latent-diffusion

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

computer-vision deep-learning pytorch generative-model image-generation text-to-image diffusion multimodal multimodal-deep-learning text-to-image-synthesis diffusion-models text-guided-manipulation text-driven-editing

Updated Jun 4, 2024
Jupyter Notebook

richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated Nov 11, 2024

jianghaojun / Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

machine-learning computer-vision deep-learning transfer-learning multimodal-deep-learning parameter-efficient-learning parameter-efficient-tuning

Updated Sep 26, 2024

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated Nov 10, 2024
Python

theislab / scarches

Reference mapping for single-cell genomics

deep-learning scrna-seq data-integration single-cell rna-seq-analysis single-cell-genomics batch-correction multimodal-deep-learning multiomics human-cell-atlas

Updated Oct 27, 2024
Jupyter Notebook

kyegomez / Med-PaLM

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

Yutong-Zhou-cv / Awesome-Multimodality

A Survey on multimodal learning research.

awesome-list multimodality multimodal-deep-learning

Updated Aug 22, 2023

MUStARD

soujanyaporia / MUStARD

Multimodal Sarcasm Detection Dataset

sarcasm multimodal-interactions sarcasm-detection multimodal-deep-learning

Updated Aug 22, 2024
OpenEdge ABL

fcakyon / content-moderation-deep-learning

Deep learning based content moderation from text, audio, video & image input modalities.

profanity-detection nudity-detection genre-classification violence-detection multimodal-deep-learning movie-trailer nsfw-recognition content-moderation content-ratings movie-content-filter

Updated May 22, 2024

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."