Multimodal AI Project

This repository contains reference implementations developed by the Vector AI Engineering team, focused on advancing multimodal learning across structured data, image, audio, and text.

About the Project

This project explores cutting-edge techniques in multimodal AI through the following key areas:

Multimodal Representation Learning: Learning representations from multiple modalities for improved understanding and downstream tasks.
Table Question Answering: Extending Retrieval-Augmented Generation (RAG) to structured data for intelligent question answering and table summarization.
Vision-Language Models (VLMs): Enhancing document understanding by integrating visual layouts with textual representations.
Audio-Language Models (ALMs): Fusing audio and text inputs to improve speech and language understanding tasks.

Repository Structure

implementations/: Implementations are organized by topics. Each topic has its own directory containing notebooks, and a README for guidance.

Getting Started

To begin working with this repository:

Clone this repository to your local environment.
Explore each topic in the implementations/ directory, guided by their respective README files.
Follow the instructions in the README file of each topic to setup the environment.
Run the notebooks in the topic directory.

Contact Information

For more information or help with navigating this repository, please contact members of Vector AI Engineering Team:

Vahid Reza Khazaie — vahidreza.khazaie@vectorinstitute.ai
Mahshid Alinoori — mahshid.alinoori@vectorinstitute.ai
Aravind Narayanan — aravind.narayanan@vectorinstitute.ai

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
docs		docs
implementations		implementations
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal AI Project

About the Project

Repository Structure

Getting Started

Contact Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal AI Project

About the Project

Repository Structure

Getting Started

Contact Information

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages