Running AI models on your own computer sounds complicated — but with Ollama, it's genuinely one of the easiest things you can set up in an afternoon. No subscriptions. No API bills. No data sent to a third-party server. Everything runs right on your machine.
This guide walks you through installing and configuring Ollama on both Windows and macOS, from checking whether your hardware is up to the task all the way to pulling your first model and chatting with it locally.
What Is Ollama, and Why Should You Care?
Ollama is an open-source tool that lets you download and run large language models (LLMs) directly on your computer. Think of it as a package manager for AI models — you type a single command, and it fetches the model, sets everything up, and gives you a local API to talk to it.
The appeal is straightforward: your conversations never leave your machine, you're not paying per token, and once a model is downloaded, it works completely offline. For developers, researchers, or anyone who handles sensitive information, that's a pretty compelling setup.
It wraps a popular inference engine called llama.cpp behind a clean command-line interface and REST API, handling the messy details of model quantization and GPU memory allocation so you don't have to.
Before You Install: Does Your Hardware Cut It?
This is the part most guides skip, and then people wonder why their machine grinds to a halt. Ollama itself is lightweight — the bottleneck is the model you're trying to run.
The core rule: the model has to fit in memory. Ollama can use your GPU's video memory (VRAM), your system RAM, or a combination of both. GPU memory is dramatically faster — the difference between a smooth conversation and waiting two minutes for a single response.
Memory Requirements by Model Size
| Model Size | Minimum RAM (CPU only) | Recommended VRAM (GPU) |
|---|---|---|
| 3B–4B | 8 GB RAM | 4 GB VRAM |
| 7B–8B | 16 GB RAM | 6–8 GB VRAM |
| 13B–14B | 32 GB RAM | 10–12 GB VRAM |
| 32B | 64 GB RAM | 24 GB VRAM |
| 70B+ | 128 GB RAM | 48+ GB VRAM |
A rough planning formula: budget about 0.6 GB per billion parameters at standard quantization (Q4_K_M), then add a bit of extra headroom for context.
GPU Support
- NVIDIA GPUs (compute capability 5.0+): Ollama uses CUDA for acceleration. You don't need to install the CUDA toolkit separately — Ollama bundles what it needs.
- AMD GPUs: Supported via ROCm, though Windows support is still somewhat experimental as of 2026. Linux is more reliable for AMD.
- Apple Silicon (M1–M4): Works beautifully out of the box using Metal. Apple's unified memory architecture means your GPU and CPU share the same RAM pool, so a Mac with 32 GB can comfortably load models that would need a dedicated GPU card on other platforms.
- Intel Macs: Functional but noticeably slower. Expect around 4–6 tokens per second on a 7B model.
- CPU-only (no GPU): Ollama still runs, just slowly. On a solid modern CPU, expect 7–12 tokens per second on smaller models. Usable for experimentation, but not comfortable for daily use.
Practical minimum for a decent experience: 16 GB RAM, a modern CPU (Intel 11th gen+ or AMD Zen 4+), and an SSD. Skip the HDD — loading a 5 GB model file from a spinning disk takes minutes.
Installing Ollama on Windows
Windows installation is straightforward. You don't need WSL2 or any workarounds anymore — Ollama has had native Windows support since version 0.3.
Method 1: Download the Installer (Easiest)
- Open your browser and go to https://ollama.com/download
- Click the Windows download button. You'll get an
.exeinstaller file. - Run the installer. It installs like any other Windows application — just click through the prompts.
- Once installation finishes, Ollama starts automatically and adds itself to your system tray.
That's the whole process. Open a new PowerShell or Command Prompt window and verify it worked:
ollama --version
If you see a version number, you're set. If you get command not found, restart your machine — the PATH update sometimes needs a reboot to take effect.
Method 2: Install via winget (For the Command-Line Inclined)
If you prefer working from the terminal, Windows Package Manager handles it in one line:
winget install Ollama.Ollama
Installing to a Custom Directory
If you want Ollama installed somewhere other than the default location (useful if your C: drive is running low), you can specify a path when running the installer from the command line:
OllamaSetup.exe /DIR="D:\Tools\Ollama"
Configuring Environment Variables on Windows
Ollama's behavior is controlled through environment variables. On Windows, these are set through the standard System Properties panel.
- Search for "Environment Variables" in the Start menu and open it.
- Under User variables, click New to add a variable.
- After adding your variables, restart Ollama from the Start menu for the changes to take effect.
The most useful variables to know:
| Variable | What It Does | Example Value |
|---|---|---|
OLLAMA_MODELS |
Where models are stored | D:\OllamaModels |
OLLAMA_HOST |
Which address/port Ollama listens on | 0.0.0.0:11434 |
OLLAMA_ORIGINS |
Allows API access from other origins | * |
OLLAMA_NUM_THREADS |
Number of CPU threads for inference | 12 |
Changing the model storage location is worth doing early if you have a small system drive. Models are large — a 7B model is around 4–5 GB, and you'll quickly accumulate several of them. Point OLLAMA_MODELS to a drive with more space before downloading anything.
Installing Ollama on macOS
Mac installation is equally painless, with two solid options depending on how you like to work.
Method 1: Download the App (Easiest)
- Go to https://ollama.com/download and click Download for macOS.
- Open the downloaded
.zipfile — it contains the Ollama app. - Drag Ollama into your Applications folder.
- Open Ollama from Applications. On first launch, macOS may ask for security confirmation (right-click → Open if needed).
Once running, you'll see the Ollama icon appear in your menu bar. The application starts the local server automatically in the background.
Method 2: Install via Homebrew (Developer-Friendly)
brew install ollama
Note that the Homebrew install gives you the CLI tool only — no menu bar app. You'll need to start the server manually each time:
ollama serve
Or set it to start automatically:
brew services start ollama
Configuring Environment Variables on macOS
Because Ollama runs as a GUI app on macOS, you can't just set environment variables in your shell profile and call it a day. You need to use launchctl:
launchctl setenv OLLAMA_MODELS "/Volumes/ExternalDrive/OllamaModels"
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_ORIGINS "*"
After setting these, quit and restart Ollama from the menu bar icon.
If you installed via Homebrew and run Ollama as a service, you can also add variables directly to the service configuration:
sudo nano /usr/local/opt/ollama/homebrew.mxcl.ollama.plist
Apple Silicon Mac users get Metal GPU acceleration automatically — no drivers to install, no flags to set. It just works.
Pulling and Running Your First Model
With Ollama installed, the next step is actually downloading a model. The ollama pull command handles this:
ollama pull llama3.2:3b
This fetches a 3-billion parameter version of Meta's Llama 3.2 — a good starting point that runs on modest hardware. Once the download finishes, start an interactive chat session:
ollama run llama3.2:3b
You'll get a >>> prompt. Type your message, hit Enter, and the model responds. Type /bye to exit.
Choosing the Right Model for Your Hardware
Not sure which model to start with? Here's a practical shortlist:
- 8 GB RAM or less / 4 GB VRAM or less: Start with
gemma3norllama3.2:3b. Small but genuinely capable. - 16 GB RAM / 6–8 GB VRAM:
llama3.1:8borqwen2.5-coder:7brun well here. This is the sweet spot for most people. - 32 GB RAM / 12 GB VRAM:
gemma3:12bordeepseek-r1:7bare solid choices. - 24 GB VRAM:
qwen2.5-coder:32bopens up, which is a genuinely impressive coding assistant.
To browse the full library of available models, visit https://ollama.com/library.
Useful Commands to Know
# List all downloaded models
ollama list
# See what's currently running
ollama ps
# Remove a model you no longer need
ollama rm llama3.2:3b
# Check your Ollama version
ollama --version
Using the API
One of Ollama's most useful features is its built-in REST API, which runs on port 11434 by default. This means you can talk to your local models from scripts, applications, or any tool that can send an HTTP request.
Test it quickly with curl:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "What is the capital of France?",
"stream": false
}'
The API is designed to be compatible with OpenAI's client libraries, so many existing integrations work with minimal changes — just point them at http://localhost:11434 instead of the OpenAI endpoint.
Adding a Chat Interface
The command line works, but if you want something that feels more like a proper chat app, Open WebUI is the most popular option. It gives you a browser-based interface similar to ChatGPT, running entirely on your local machine and connecting to Ollama in the background.
If you have Docker installed, getting Open WebUI running takes one command:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000 in your browser.
Troubleshooting Common Issues
"command not found" after installation on Windows Restart your computer. The PATH update doesn't always kick in immediately.
"connection refused" errors Ollama's server might not be running. On Windows, check the system tray — the icon should be visible. On Mac, check the menu bar. If neither is there, restart the application or run ollama serve in your terminal.
Model runs but responses are extremely slow You're likely running on CPU only, or the model is too large for your available VRAM and spilling over into system RAM. Try a smaller model, or check your GPU is actually being used: OLLAMA_DEBUG=1 ollama run <model> will show you how many layers were loaded onto the GPU vs. kept in RAM.
AMD GPU not being used on Windows AMD ROCm support on Windows is still maturing. If your GPU isn't being picked up, try running in CPU mode with a smaller model, or check the Ollama GitHub issues page for your specific GPU model.
A Few Things Worth Knowing Before You Dive In
Ollama stores downloaded models in your home directory by default (~/.ollama/models on Mac, C:\Users\<username>\.ollama\models on Windows). If that drive has limited space, set OLLAMA_MODELS to point elsewhere before you start downloading.
Models load from disk each time you start a conversation, so an NVMe SSD makes a noticeable difference. On a decent SSD, a 7B model loads in a few seconds. On a mechanical hard drive, that same load can take a couple of minutes.
Finally: start small. Pull a 3B or 7B model first, get everything working end to end, then decide if you need something larger. It's easy to upgrade — one ollama pull command away. It's harder to discover you've filled your drive with 40 GB models that don't run fast enough to be useful.
Bonus: Chat With Your Ollama Models From Your Android Phone
Once Ollama is running on your PC or Mac, you're not limited to using it at your desk. With an Android app called LMSA, you can connect your phone to your local Ollama server over Wi-Fi and have a full chat interface in your pocket — no cloud involved, no data leaving your home network.
LMSA connects to local AI servers like Ollama over your local network, so conversations stay entirely on your device and never pass through third-party servers. It's a genuinely clean app — no remote database logging, no telemetry, and all API keys stored securely on your device.
The core app is free, with a one-time purchase of $14.99 to unlock premium features like templates, biometric lock, and text-to-speech. No subscriptions.
Step 1: Enable Network Access in Ollama's Settings
Ollama includes a built-in toggle for this, so you don't need to manually set environment variables.
On Windows, look for the Ollama icon in the system tray at the bottom-right corner of your taskbar. Click it, then select Settings. You'll see an option labeled "Expose Ollama to the network" — flip that on.
On macOS, click the Ollama icon in the menu bar at the top-right of your screen, then select Settings and enable the same "Expose Ollama to the network" toggle.
Restart Ollama after flipping the switch, and it will start listening for connections from other devices on your network instead of just your own machine.
Note: If you're on an older version of Ollama and don't see this option, update the app first — the toggle was added in v0.94. Alternatively, you can still manually set OLLAMA_HOST=0.0.0.0:11434 as an environment variable if needed.Step 2: Find Your Computer's Local IP Address
Your phone needs to know where on the network to find Ollama.
On Windows, open PowerShell and run:
powershell
ipconfigLook for the IPv4 Address under your active Wi-Fi adapter — it'll be something like 192.168.1.45.
On macOS, open Terminal and run:
bash
ipconfig getifaddr en0Write that IP down — you'll need it in a moment.
Step 3: Install LMSA on Your Android Phone
Search for LMSA on the Google Play Store, or go directly to the listing for "LMSA for LM Studio & Ollama." The app requires Android 6.0 or higher.
Step 4: Connect LMSA to Your Ollama Server
- Open LMSA and go to Settings.
- Select Ollama as your connection type.
- Enter your server address in the format:
http://192.168.1.45:11434(replace with your actual IP from Step 2). - Tap Save and return to the main screen.
LMSA will automatically detect which models you have downloaded in Ollama and show them in a dropdown. Select a model and start chatting.
One Requirement Worth Noting
Both your phone and your computer need to be on the same Wi-Fi network for this to work. The local connection is unencrypted, so stick to your home or a trusted private network — don't try this on public Wi-Fi.
What You Can Do From the App
Beyond basic chat, LMSA lets you attach files for the model to analyze — TXT, PDF, JSON, CSV, Markdown, and code files are all supported. You can also adjust temperature, top-p, and other inference parameters per conversation, and set custom system prompts without leaving the app.
It's a surprisingly complete setup. Your model runs on your computer, your phone is just the interface — and the whole thing works even if your internet goes out, as long as your home network is up.
Wrapping Up
Ollama manages to make something that used to require a lot of technical patience — running local AI models — feel genuinely approachable. The installation takes about five minutes on either platform, and most of the configuration is optional until you have a specific reason to change something.
The value proposition is real: private inference, no ongoing costs, and offline capability once the model is downloaded. Whether you're a developer building something that needs AI without cloud dependencies, or just someone who wants to experiment without sending your data anywhere, Ollama is a solid place to start.
Get the installer at https://ollama.com/download, pick a model that fits your hardware, and go from there.