February 23, 2026
Running AI locally on your phone or tablet is now practical for many workflows.
With ai.local, you can run models on-device and expose local endpoints compatible with both OpenAI-style (/v1) and Ollama-style (/api) clients.
This guide shows the fastest setup.

For a good experience, use:
The server is exposed on your local network with:
11434http://<iphone-ip-address>:114340.0.0.0You can use either the device IP address or Bonjour host shown by the app.

Popular model families to look for in ai.local:
On iPhone/iPad, smaller quantized models usually feel best for latency and memory.
If you are unsure what is installed, query the model list first:
curl http://<iphone-ip-address>:11434/v1/models
curl http://<iphone-ip-address>:11434/api/tags
From another device on the same network:
curl -I http://<iphone-ip-address>:11434/
curl http://<iphone-ip-address>:11434/status
Expected status response:
{
"status": "Running",
"message": "Server is currently running."
}
/v1)ai.local supports OpenAI-compatible routes like:
GET /v1/modelsPOST /v1/chat/completionsPOST /v1/completionsPOST /v1/audio/transcriptionsPOST /v1/audio/speechExample chat request:
curl http://<iphone-ip-address>:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer local-ai" \
-d '{
"model": "Llama-3.2-1B-Instruct-4bit",
"messages": [{"role":"user","content":"Give me 3 tips to write cleaner Swift."}]
}'
Notes:
/v1 model names, do not include the mlx-community/ prefix./api)ai.local also supports Ollama-compatible routes like:
GET /api/tagsPOST /api/chatPOST /api/generatePOST /api/showExample Ollama-style chat request:
curl http://<iphone-ip-address>:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.2-1B-Instruct-4bit",
"messages": [{"role":"user","content":"Write a short release note."}],
"stream": false
}'