vllm-mlx

by waybarrios · Python · ★ 1,053

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

#anthropic#apple-silicon#audio-processing#claude-code#computer-vision#image-understanding#inference#llm#machine-learning#macos#mllm#mlx#multimodal-ai#speech-to-text#stt#text-to-speech#tts#video-understanding#vision-language-model#vllm

Install

pip install git+https://github.com/waybarrios/vllm-mlx.git

Claude Desktop config

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "vllm-mlx": {
      "command": "uvx",
      "args": [
        "git+https://github.com/waybarrios/vllm-mlx.git"
      ]
    }
  }
}

From the README

**Read this in other languages:** [English](README.md) · [Español](README.es.md) · [Français](README.fr.md) · [中文](README.zh.md) **Continuous batching + OpenAI + Anthropic APIs in one server. Native Apple Silicon inference.** [](https://pypi.org/project/vllm-mlx/) [](https://pypi.org/project/vllm-mlx/) [](https://www.python.org/downloads/) [](LICENSE) [](https://support.apple.com/en-us/HT211814) [](https://github.com/waybarrios/vllm-mlx) ----|------:|-------:| | Qwen3-0.6B-8bit | 417.9 | 0.7 GB | | Llama-3.2-3B-Instruct-4bit | 205.6 | 1.8 GB | | Qwen3-30B-A3B-4bit | 127.7 | ~18 GB | **Audi…

Read full README on GitHub →

💡 Need a managed MCP host?

Try Claude Pro for the smoothest MCP experience, or browse our cloud-hosted servers.

Related developer tools servers

everything-claude-code

affaan-m

★ 170,676

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

Developer Tools · JavaScript