@brunowernimont
Let's connect together

Run a private AI server directly on your iPhone or iPad

February 23, 2026

iOS iPadOS AI Ollama OpenAI API Privacy

Running AI locally on your phone or tablet is now practical for many workflows.
With ai.local, you can run models on-device and expose local endpoints compatible with both OpenAI-style (/v1) and Ollama-style (/api) clients.

This guide shows the fastest setup.

ai.local screenshot 1

1. Check device requirements

For a good experience, use:

  • iOS 18.2+ or iPadOS 18.2+
  • Newer high-performance devices (for example iPhone 15 Pro, iPhone 16 series, iPad Pro M1/M2)

2. Install and prepare ai.local

  1. Install ai.local on the App Store.
  2. Open the app and choose a model that matches your device performance target.
  3. Start the server in the app.

The server is exposed on your local network with:

  • Default port: 11434
  • Base URL format: http://<iphone-ip-address>:11434
  • LAN binding: 0.0.0.0

You can use either the device IP address or Bonjour host shown by the app.

3. Pick from famous LLM families

ai.local screenshot 2

Popular model families to look for in ai.local:

  • Llama 3.2 (for example 1B/3B Instruct)
  • Qwen 2.5 (for example 1.5B/3B Instruct)
  • Mistral 7B Instruct
  • Gemma 2 (2B/9B)
  • Phi-3.5 Mini
  • DeepSeek distilled models (small variants)

On iPhone/iPad, smaller quantized models usually feel best for latency and memory.
If you are unsure what is installed, query the model list first:

curl http://<iphone-ip-address>:11434/v1/models
curl http://<iphone-ip-address>:11434/api/tags

4. Verify the server is up

From another device on the same network:

curl -I http://<iphone-ip-address>:11434/
curl http://<iphone-ip-address>:11434/status

Expected status response:

{
  "status": "Running",
  "message": "Server is currently running."
}

5. Use OpenAI-style endpoints (/v1)

ai.local supports OpenAI-compatible routes like:

  • GET /v1/models
  • POST /v1/chat/completions
  • POST /v1/completions
  • POST /v1/audio/transcriptions
  • POST /v1/audio/speech

Example chat request:

curl http://<iphone-ip-address>:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer local-ai" \
  -d '{
    "model": "Llama-3.2-1B-Instruct-4bit",
    "messages": [{"role":"user","content":"Give me 3 tips to write cleaner Swift."}]
  }'

Notes:

  • Authentication is not enforced by the server.
  • If your client requires an API key, any non-empty value works.
  • For /v1 model names, do not include the mlx-community/ prefix.

6. Use Ollama-style endpoints (/api)

ai.local also supports Ollama-compatible routes like:

  • GET /api/tags
  • POST /api/chat
  • POST /api/generate
  • POST /api/show

Example Ollama-style chat request:

curl http://<iphone-ip-address>:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.2-1B-Instruct-4bit",
    "messages": [{"role":"user","content":"Write a short release note."}],
    "stream": false
  }'

Resources