infrastructure

Ollama

by Ollama

Ollama is an open-source tool that lets developers and users run large language models locally on their own hardware with a single command. It provides a simple CLI, a REST API compatible with the OpenAI API format, and a curated model library covering hundreds of models including Llama, Mistral, Gemma, Qwen, and more. Ollama supports GPU acceleration via NVIDIA CUDA, AMD ROCm, and Apple Metal, making local inference fast and accessible across macOS, Linux, and Windows.

9 Overall Score

Scores

Capability
8
Ease of Use
9
Documentation
8
Reliability
8
Value
10
Momentum
9

Details

Status
active
Pricing
open-source
Launch Date
2023-07
Last Updated
2026-03-15

Key Features

  • One-command local LLM serving (ollama run llama3)
  • OpenAI-compatible REST API for easy integration
  • Large model library with 200+ models (Llama, Mistral, Gemma, Qwen, etc.)
  • GPU acceleration across NVIDIA, AMD, and Apple Silicon
  • Modelfile system for customizing and creating derivative models
  • Cross-platform support: macOS, Linux, and Windows

Tech Stack

GoGGUFNVIDIA CUDAAMD ROCmApple Metal (MLX)REST APIDocker