MCP Directory
Back

multimodal-agents-course

by the-ai-merge · Python · ★ 556

An MCP Multimodal AI Agent with eyes and ears!

#agent#embeddings#groq#mcp#mcp-client#mcp-server#multimodal#openai#opik#pixeltable

Install

pip install git+https://github.com/the-ai-merge/multimodal-agents-course.git

Claude Desktop config

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "multimodal-agents-course": {
      "command": "uvx",
      "args": [
        "git+https://github.com/the-ai-merge/multimodal-agents-course.git"
      ]
    }
  }
}

From the README

Kubrick Course Hi Dave... Learn to build AI Agents that can understand images, text, audio and videos. A free, Open-source course by The Neural Maze and Neural Bits in collaboration with Pixeltable and Opik Completing this course, you'll learn how to design and enable Agents to understand multimodal data, across images, video, audio, and text inputs, all within a single system. Specifically, you'll get to: - Build a complex Multimodal Processing Pipeline - Build a Video Search Engine and expose its functionality to an Agent via MCP (Model Context Protocol) - Build a prod…
Read full README on GitHub →

💡 Need a managed MCP host?

Try Claude Pro for the smoothest MCP experience, or browse our cloud-hosted servers.

Related developer tools servers