Running a capable AI model on your own computer — no subscription, no internet connection, no data leaving your machine — sounds like it should be complicated. It really isn't. LM Studio has quietly become the go-to app for anyone who wants to experiment with open-source language models locally, and for good reason: the setup is straightforward, the interface feels familiar, and you don't need to touch a command line to get started.
This guide walks you through everything — system requirements, installation on both Mac and Windows, downloading your first model, and dialing in the settings that actually matter.
What Is LM Studio, and Why Does It Matter?
LM Studio is a free desktop application that lets you download and run large language models (LLMs) directly on your own hardware. Think of it as a ChatGPT-style chat interface, except the model runs entirely on your machine. Everything you type stays on your computer — no cloud servers, no API bills, no usage limits.
It supports models in the GGUF format (a compressed format optimized for local hardware), and it pulls from Hugging Face's massive library of open-source models — Llama, Mistral, Qwen, Gemma, DeepSeek, and many others. There's also a built-in local API server that lets developers swap it in as a drop-in replacement for the OpenAI API, which is genuinely useful.
For Mac users with Apple Silicon, there's an added bonus: LM Studio uses Apple's MLX framework under the hood, which takes advantage of unified memory so the GPU can access all of your RAM directly. That means better performance without any extra configuration on your part.
System Requirements Before You Download Anything
Getting the requirements right upfront will save you a lot of frustration. Here's what you actually need.
Mac Requirements
- Chip: Apple Silicon — M1, M2, M3, or M4. Intel-based Macs are not currently supported by LM Studio.
- RAM: 8GB is the floor, but you'll be limited to smaller models. 16GB is the practical sweet spot for most users. 24GB or more opens up larger, higher-quality models.
- Storage: Plan for at least 10–20GB of free space. Model files range from around 1GB for tiny models to 20GB+ for larger ones.
- macOS: Version 12.0 (Monterey) or later recommended.
Windows PC Requirements
- CPU: Must support AVX2 instructions. Most Intel and AMD chips from the last several years do, but it's worth verifying. You can check via the System Information tool — search for it in the Start menu and look under the processor details.
- RAM: 16GB is the recommended minimum. 8GB can work for basic testing with small models, but you'll notice the limitations quickly.
- GPU (optional but helpful): If you have a dedicated GPU with at least 4GB of VRAM, LM Studio can offload parts of the model to it for faster responses. This works with both NVIDIA and AMD cards.
- Windows: Windows 10 or Windows 11 (both x64 and ARM/Snapdragon X Elite are supported).
Installing LM Studio on Mac
Step 1: Download the Installer
Head to lmstudio.ai and download the macOS version. You'll get a .dmg file. Make sure you grab the Apple Silicon build — there's no Intel version.
Step 2: Install the App
Open the .dmg file from your Downloads folder. A window will appear showing the LM Studio icon next to your Applications folder. Drag the icon into Applications, then eject the .dmg when it's done.
Step 3: Launch and Get Past the Welcome Screen
Open LM Studio from your Applications folder. The first time you launch it, you may see a macOS security prompt — if so, go to System Settings → Privacy & Security and click "Open Anyway." This is standard for apps downloaded outside the App Store.
Once it opens, you'll land on a welcome screen. You can skip the tutorial if you'd like and jump straight in.
Installing LM Studio on Windows
Step 1: Download the Installer
Go to lmstudio.ai and download the Windows version. You'll get an .exe installer file.
Step 2: Run the Installer
Double-click the .exe file and follow the on-screen prompts. The installer handles everything — you don't need to install any additional dependencies manually. When it finishes, you can choose to create a desktop shortcut.
Step 3: First Launch
Open LM Studio. Windows Defender or your antivirus may flag it the first time — this is common with newer apps and not a concern. Allow it through, and you'll land on the same welcome screen as the Mac version.
Downloading Your First Model
Once LM Studio is open, the next step is getting a model to actually run. Here's how to do it.
Finding Models
Click the Discover tab in the left sidebar (it looks like a magnifying glass or compass icon). This opens a search interface connected to Hugging Face. You can search by model name or browse featured options.
If you're not sure where to start, here are some solid choices depending on your RAM:
- 8GB RAM: Qwen2.5 3B, Phi-3 Mini, or TinyLlama — these are small but surprisingly capable for everyday tasks.
- 16GB RAM: Llama 3.1 8B, Mistral 7B Instruct, or Qwen2.5 7B — this is where things get genuinely useful.
- 24GB+ RAM: Llama 3.1 70B (quantized), Qwen2.5 14B, or Gemma 2 27B — larger models with noticeably better reasoning.
Understanding Quantization
When you search for a model, you'll usually see multiple versions with names like Q4_K_M, Q5_K_S, or Q8_0. These are quantized variants — compressed versions of the same model that trade a small amount of accuracy for much lower memory usage.
As a general rule: Q4_K_M is the sweet spot for most people. It's roughly half the size of the full model with minimal quality loss. Q8_0 is closer to the original but needs more RAM. If you're tight on resources, go lower; if you have RAM to spare, go higher.
Downloading a Model
Select the model version you want and click Download. You can monitor progress from the Downloads panel. A 5–8GB model typically takes a few minutes on a decent connection.
Starting a Chat
Once a model finishes downloading, click the Chat tab in the sidebar. Select your model from the dropdown at the top of the screen and wait a moment for it to load into memory — you'll see a loading indicator while this happens.
Then just start typing. The chat interface works exactly like you'd expect.
A few things worth knowing:
- Responses from larger models on slower hardware can take 10–30 seconds for the first token. This is normal.
- You can stop a response mid-generation by clicking the stop button.
- Starting a new chat clears the conversation history — useful if the model starts going off track.
Configuring LM Studio: Settings That Actually Make a Difference
The default settings get you running, but spending a few minutes on configuration makes a meaningful difference in both performance and response quality.
System Prompt
The system prompt is one of the most useful tools in LM Studio. It's a set of instructions that runs silently at the start of every conversation, shaping how the model responds before you type a single word.
You'll find it in the Chat panel, usually accessible via a small icon or dropdown. A few examples of what you can put there:
- "You are a concise assistant. Always give direct answers without unnecessary preamble."
- "You are an expert in Python and software architecture. Assume the user has a technical background."
- "Always respond in plain language. Avoid jargon."
Take time to craft a system prompt that fits your use case — it dramatically improves consistency.
Context Length
This setting controls how much conversation history the model can "remember" during a session, measured in tokens. The default is often 4,096 tokens, which is enough for casual use.
If you're doing longer research sessions, document review, or complex back-and-forth coding help, consider bumping this to 8,192 or even 16,384 — provided your hardware can handle it. More context uses more RAM, so watch your memory usage.
Temperature
Temperature controls how creative or unpredictable the model's responses are:
- 0.0: Very focused and deterministic — good for factual questions and code.
- 0.7: Balanced — works well for most tasks.
- 1.0+: More creative and varied — useful for brainstorming or writing, though it can introduce inaccuracies.
Start at 0.7 and adjust from there based on what you're doing.
GPU Offloading (Windows / Nvidia or AMD)
If you're on a Windows PC with a dedicated GPU, LM Studio can split the model between your GPU and RAM for faster inference. In the model loading dialog, look for a GPU Offload slider. Moving it to the right offloads more layers to the GPU — experiment until you find the point where it runs fast without running out of VRAM.
Using RAG: Chat with Your Own Documents
One of the more practical features in LM Studio is the ability to attach documents to your conversation — a capability sometimes called RAG (Retrieval-Augmented Generation). Instead of relying purely on what the model was trained on, it can reference your actual files.
You can attach PDFs, Word documents, or plain text files to a chat message using the attachment icon in the chat input. LM Studio supports up to 5 files at a time with a combined maximum of 30MB. Once attached, ask specific questions about the content — it works well for reviewing contracts, summarizing reports, or pulling details from technical docs.
Developer Mode: Running a Local API Server
If you're a developer, this is where things get interesting. LM Studio can run as a local API server that's fully compatible with the OpenAI API format — meaning any app or script built for the OpenAI SDK can point to your local machine instead.
To enable it:
- Click on the Developer tab in the left sidebar.
- Toggle the Status switch from Stopped to Running.
- The server starts at
http://localhost:1234by default.
From there, you can test it immediately. Open a browser and go to http://localhost:1234/v1/models — if you see a JSON response listing your installed models, it's working.
You can also connect tools like LangChain, continue.dev, or your own Python scripts to this endpoint. Just set the base URL to http://localhost:1234/v1 and use any placeholder as the API key — LM Studio doesn't require authentication for local connections.
Quick Troubleshooting
The app won't open on Mac: Go to System Settings → Privacy & Security and look for a message about LM Studio being blocked. Click "Open Anyway."
The model loads but responses are very slow: Try a smaller or more heavily quantized model, or reduce the context length. On Windows, make sure GPU offloading is enabled if you have a dedicated card.
Out of memory errors: You've loaded a model that's too large for your available RAM. Switch to a smaller model or a lower quantization level (e.g., drop from Q8 to Q4).
Can't connect to the local API server: Make sure the server is running (green status indicator in the Developer tab). Also check that nothing else is using port 1234 — you can change the port in the server settings if needed.
Using LM Studio on Android With LMSA
Want to chat with your local model from your phone or tablet? There's an app for that. LMSA (LM Studio Assistant) is an open-source Android client that connects to your LM Studio server, letting you use your local model from anywhere on your home network.
The key difference from the localhost setup is that instead of restricting the server to your computer only, you'll configure LM Studio to serve on your local network, making it accessible to other devices on Wi-Fi.
Set Up LM Studio for Network Access
Before you touch your phone, you need to tell LM Studio to listen on your network instead of just localhost.
- Open LM Studio and go to the Developer tab.
- Click the Settings icon or gear button (usually in the Developer panel).
- Look for Serve on Local Network and toggle it on.
- In the same settings, enable CORS (Cross-Origin Resource Sharing) — this allows your Android device to communicate with the server.
That's it on the desktop side. LM Studio will now show your local network IP address (something like 192.168.1.123) instead of 127.0.0.1.
Important: Note this IP address and the port number (default is 1234). You'll need both for your phone.
Install LMSA on Android
- Open the Google Play Store on your Android phone or tablet (Android 7.0 or higher).
- Search for LMSA (or the full name: "LM Studio Assistant").
- Install the app from IslandApps.
It's a small install and free to download. There's a one-time optional purchase to remove ads, but the basic functionality is completely free.
Connect Your Phone to the Server
When you open LMSA for the first time, it will ask for your server details:
- Enter your LM Studio server IP address (e.g.,
192.168.1.123). - Enter the port number (default
1234). - Tap Connect.
The app will verify the connection. If it succeeds, you'll see your loaded model listed in the app.
Start Chatting
Once connected, LMSA works exactly like the desktop version — you get a familiar chat interface where you can:
- Type messages and get responses from your local model.
- Adjust the temperature and system prompt from the settings menu.
- Upload documents and attachments for analysis.
- Switch between models if you have multiple downloaded.
Everything stays on your network — nothing leaves your home.
A Security Note
LMSA communicates with LM Studio over standard HTTP by default, which means the connection is unencrypted. This is fine if you're on a trusted home network, but keep it in mind:
- Only use LMSA on private Wi-Fi networks you control.
- Don't use it on public Wi-Fi or shared networks.
- If network security is a concern, consider using a VPN or waiting for LMSA to support HTTPS in the future.
For most home users, this setup is perfectly safe and gives you a genuinely useful way to access your AI setup on the go.
Final Thoughts
LM Studio has done a lot of heavy lifting to make running local AI models accessible without requiring any real technical background. The installation takes minutes, the model library is enormous, and the experience of having a capable assistant running entirely offline — with no usage limits and no data going anywhere — is genuinely satisfying once you get there.
Start with a mid-sized quantized model, spend a few minutes setting up a system prompt that fits your workflow, and don't overthink the advanced settings until you have a reason to. Most people find that the defaults get them 80% of the way there, and small adjustments handle the rest.
The local AI space is moving fast, and LM Studio updates regularly — so it's worth checking back at lmstudio.ai/docs occasionally to see what's new.