SwiftAI

Unified Swift SDK for LLM inference across local and cloud providers

SwiftAI provides a clean, idiomatic Swift interface for LLM inference. Choose your provider explicitly—local inference with MLX on Apple Silicon, cloud inference via HuggingFace, or system-integrated AI with Apple Foundation Models on iOS 26+.

Features

Capability	MLX	HuggingFace	Anthropic	Foundation Models
Text Generation	✓	✓	✓	✓
Streaming	✓	✓	✓	✓
Structured Output	✓	✓	✓	✓
Tool Calling	—	—	✓	—
Vision	—	—	✓	—
Extended Thinking	—	—	✓	—
Embeddings	—	✓	—	—
Transcription	—	✓	—	—
Image Generation	—	✓	—	—
Token Counting	✓	—	—	—
Offline	✓	—	—	✓
Privacy	✓	—	—	✓

Installation

Add SwiftAI to your Package.swift:

dependencies: [
    .package(url: "https://github.com/christopherkarani/SwiftAI", from: "0.1.0")
]

Then add "SwiftAI" to your target's dependencies.

Quick Start

Local Generation (MLX)

import SwiftAI

let provider = MLXProvider()
let response = try await provider.generate(
    "Explain quantum computing in simple terms",
    model: .llama3_2_1B,
    config: .default
)
print(response)

Cloud Generation (HuggingFace)

let provider = HuggingFaceProvider() // Uses HF_TOKEN environment variable
let response = try await provider.generate(
    "Write a haiku about Swift",
    model: .huggingFace("meta-llama/Llama-3.1-8B-Instruct"),
    config: .creative
)
print(response)

Streaming

let provider = MLXProvider()
let stream = provider.stream(
    "Tell me a story about a robot",
    model: .llama3_2_3B,
    config: .default
)

for try await chunk in stream {
    print(chunk.text, terminator: "")
}

Providers

MLXProvider

Local inference on Apple Silicon. Zero network traffic, complete privacy.

Best for: Privacy-sensitive apps, offline functionality, consistent latency

// Default configuration
let provider = MLXProvider()

// Optimized for M1 devices
let provider = MLXProvider(configuration: .m1Optimized)

// Full control
let config = MLXConfiguration.default
    .memoryLimit(.gigabytes(8))
    .withQuantizedKVCache(bits: 4)
let provider = MLXProvider(configuration: config)

Configuration Presets:

Preset	Memory	Use Case
`.default`	Auto	Balanced performance
`.m1Optimized`	6 GB	M1 MacBooks, base iPads
`.mProOptimized`	12 GB	M1/M2 Pro, Max chips
`.memoryEfficient`	4 GB	Constrained devices
`.highPerformance`	16+ GB	M2/M3 Max, Ultra

Warmup for Fast First Response:

let provider = MLXProvider()

// Warm up Metal shaders before first generation
try await provider.warmUp(model: .llama3_2_1B, maxTokens: 5)

// Now first response is fast
let response = try await provider.generate("Hello", model: .llama3_2_1B)

HuggingFaceProvider

Cloud inference via HuggingFace Inference API. Access hundreds of models.

Best for: Large models, embeddings, transcription, image generation, model variety

Setup:

export HF_TOKEN=hf_your_token_here

// Auto-detects HF_TOKEN from environment
let provider = HuggingFaceProvider()

// Or provide token explicitly
let provider = HuggingFaceProvider(token: "hf_...")

// Custom configuration
let config = HFConfiguration.default.timeout(120)
let provider = HuggingFaceProvider(configuration: config)

Embeddings:

let provider = HuggingFaceProvider()

let embedding = try await provider.embed(
    "SwiftAI makes LLM inference easy",
    model: .huggingFace("sentence-transformers/all-MiniLM-L6-v2")
)

print("Dimensions: \(embedding.dimensions)")
print("Vector: \(embedding.vector)")

// Similarity comparison
let other = try await provider.embed("AI frameworks for Swift", model: /* ... */)
let similarity = embedding.cosineSimilarity(with: other)

Transcription:

let provider = HuggingFaceProvider()

let result = try await provider.transcribe(
    audioURL: audioFileURL,
    model: .huggingFace("openai/whisper-large-v3"),
    config: .detailed
)

print(result.text)
for segment in result.segments {
    print("\(segment.startTime)s: \(segment.text)")
}

Image Generation:

let provider = HuggingFaceProvider()

// Simple text-to-image with defaults
let result = try await provider.textToImage(
    "A cat wearing a top hat, digital art",
    model: .huggingFace("stabilityai/stable-diffusion-3")
)

// Use directly in SwiftUI
result.image  // SwiftUI Image (cross-platform)

// With configuration presets
let result = try await provider.textToImage(
    "Mountain landscape at sunset, photorealistic",
    model: .huggingFace("stabilityai/stable-diffusion-xl-base-1.0"),
    config: .highQuality.width(1024).height(768)
)

// Available presets: .default, .highQuality, .fast, .square512, .square1024, .landscape, .portrait

// Save to file
try result.save(to: URL.documentsDirectory.appending(path: "landscape.png"))

// Save to Photos library (iOS only, requires NSPhotoLibraryAddUsageDescription)
try await result.saveToPhotos()

// Access raw data if needed
let data = result.data

Foundation Models (iOS 26+)

System-integrated on-device AI. Zero setup, managed by the OS.

Best for: iOS 26+ apps, system integration, no model downloads

if #available(iOS 26.0, *) {
    let provider = FoundationModelsProvider()
    let response = try await provider.generate(
        "What can you help me with?",
        model: .foundationModels,
        config: .default
    )
    print(response)
}

Anthropic Claude

SwiftAI includes first-class support for Anthropic's Claude models via the Anthropic API.

Best for: Advanced reasoning, vision tasks, extended thinking, production applications

Setup:

export ANTHROPIC_API_KEY=sk-ant-api-03-...

import SwiftAI

// Simple generation
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let response = try await provider.generate(
    "Explain quantum computing",
    model: .claudeSonnet45,
    config: .default.maxTokens(500)
)

// Streaming
for try await chunk in provider.stream(
    "Write a poem about Swift",
    model: .claude3Haiku,
    config: .default
) {
    print(chunk, terminator: "")
}

Available Models:

Model	ID	Best For
Claude Opus 4.5	`.claudeOpus45`	Most capable, complex reasoning
Claude Sonnet 4.5	`.claudeSonnet45`	Balanced performance and speed
Claude 3.5 Sonnet	`.claude35Sonnet`	Fast, high-quality responses
Claude 3 Haiku	`.claude3Haiku`	Fastest, most cost-effective

Features:

Text generation (streaming and non-streaming)
Multi-turn conversations with context
Vision support (multimodal image+text)
Extended thinking mode for complex reasoning
Comprehensive error handling
Environment variable support (ANTHROPIC_API_KEY)

Vision Example:

let messages = Messages {
    Message.user([
        .text("What's in this image?"),
        .image(base64Data: imageData, mimeType: "image/jpeg")
    ])
}

let result = try await provider.generate(
    messages: messages,
    model: .claudeSonnet45,
    config: .default
)

Extended Thinking:

var config = AnthropicConfiguration.standard(apiKey: "sk-ant-...")
config.thinkingConfig = .standard

let provider = AnthropicProvider(configuration: config)
let result = try await provider.generate(
    "Solve this complex problem...",
    model: .claudeOpus45,
    config: .default
)

Get your API key at: https://console.anthropic.com/

Core Concepts

Model Identifiers

SwiftAI requires explicit model selection—no magic auto-detection:

// MLX models (local)
.mlx("mlx-community/Llama-3.2-1B-Instruct-4bit")
.llama3_2_1B  // Convenience alias
.phi4
.qwen2_5_3B

// HuggingFace models (cloud)
.huggingFace("meta-llama/Llama-3.1-70B-Instruct")
.huggingFace("sentence-transformers/all-MiniLM-L6-v2")

// Anthropic models (cloud)
.claudeOpus45
.claudeSonnet45
.claude35Sonnet
.claude3Haiku

// Foundation Models (iOS 26+)
.foundationModels

Generation Config

Control generation behavior with presets or custom settings:

// Presets
.default      // temperature: 0.7, topP: 0.9
.creative     // temperature: 1.0, topP: 0.95
.precise      // temperature: 0.3, topP: 0.8
.code         // temperature: 0.2, topP: 0.9

// Custom
let config = GenerateConfig(
    temperature: 0.8,
    maxTokens: 500,
    topP: 0.9,
    stopSequences: ["END"]
)

// Fluent API
let config = GenerateConfig.default
    .temperature(0.8)
    .maxTokens(500)

Messages

Build conversations with the Messages result builder:

let messages = Messages {
    Message.system("You are a Swift expert.")
    Message.user("What are actors?")
}

let result = try await provider.generate(
    messages: messages,
    model: .llama3_2_1B,
    config: .default
)

print(result.text)
print("Tokens: \(result.usage.totalTokens)")

Streaming

Real-time token streaming with AsyncSequence:

// Simple text streaming
for try await text in provider.stream("Tell me a joke", model: .llama3_2_1B) {
    print(text, terminator: "")
}

// With metadata
let stream = provider.streamWithMetadata(
    messages: messages,
    model: .llama3_2_1B,
    config: .default
)

for try await chunk in stream {
    print(chunk.text, terminator: "")

    if let tokensPerSecond = chunk.tokensPerSecond {
        // Track performance
    }

    if let reason = chunk.finishReason {
        print("\nFinished: \(reason)")
    }
}

// Collect all chunks into final result
let result = try await stream.collectWithMetadata()
print("Total tokens: \(result.tokenCount)")

Structured Output

Generate type-safe structured responses using the @Generable macro, mirroring Apple's FoundationModels API from iOS 26.

Defining Generable Types

import SwiftAI

@Generable
struct Recipe {
    @Guide("The recipe title")
    let title: String

    @Guide("Cooking time in minutes", .range(1...180))
    let cookingTime: Int

    @Guide("Difficulty level", .anyOf(["easy", "medium", "hard"]))
    let difficulty: String

    @Guide("List of ingredients")
    let ingredients: [String]
}

Generating Structured Responses

let provider = AnthropicProvider(apiKey: "sk-ant-...")

// Generate typed response
let recipe = try await provider.generate(
    "Create a recipe for chocolate chip cookies",
    returning: Recipe.self,
    model: .claudeSonnet45
)

print(recipe.title)           // "Classic Chocolate Chip Cookies"
print(recipe.cookingTime)     // 25
print(recipe.difficulty)      // "easy"
print(recipe.ingredients)     // ["flour", "butter", "chocolate chips", ...]

Streaming Structured Output

Get progressive updates as the response is generated:

let stream = provider.stream(
    "Generate a detailed recipe",
    returning: Recipe.self,
    model: .claudeSonnet45
)

for try await partial in stream {
    // Update UI progressively
    if let title = partial.title {
        titleLabel.text = title
    }
    if let ingredients = partial.ingredients {
        updateIngredientsList(ingredients)
    }
}

// Get final complete result
let recipe = try await stream.collect()

Available Constraints

Type	Constraints
String	`.pattern(_:)`, `.anyOf(_:)`, `.minLength(_:)`, `.maxLength(_:)`
Int/Double	`.range(_:)`, `.minimum(_:)`, `.maximum(_:)`
Array	`.count(_:)`, `.minimumCount(_:)`, `.maximumCount(_:)`

Tool Calling

Define and execute tools that LLMs can invoke during generation.

Defining Tools

struct WeatherTool: AITool {
    @Generable
    struct Arguments {
        @Guide("City name to get weather for")
        let city: String

        @Guide("Temperature unit", .anyOf(["celsius", "fahrenheit"]))
        let unit: String?
    }

    var description: String { "Get current weather for a city" }

    func call(arguments: Arguments) async throws -> String {
        // Implement weather lookup
        return "Weather in \(arguments.city): 22C, Sunny"
    }
}

Executing Tools

// Create executor and register tools
let executor = AIToolExecutor()
await executor.register(WeatherTool())
await executor.register(SearchTool())

// Configure provider with tools
let config = GenerateConfig.default
    .tools([WeatherTool(), SearchTool()])
    .toolChoice(.auto)

// Generate with tool access
let response = try await provider.generate(
    messages: messages,
    model: .claudeSonnet45,
    config: config
)

// Handle tool calls if present
if let toolCalls = response.toolCalls {
    let results = try await executor.execute(toolCalls: toolCalls)
    // Continue conversation with results...
}

Tool Choice Options

.toolChoice(.auto)              // Model decides whether to use tools
.toolChoice(.required)          // Model must use a tool
.toolChoice(.none)              // Model cannot use tools
.toolChoice(.tool(name: "weather"))  // Model must use specific tool

ChatSession

Stateful conversation management with automatic history:

let session = try await ChatSession(
    provider: MLXProvider(),
    model: .llama3_2_1B,
    systemPrompt: "You are a helpful coding assistant.",
    warmup: .eager  // Fast first response
)

// Send messages—history is managed automatically
let response1 = try await session.send("What is a protocol in Swift?")
let response2 = try await session.send("Can you give me an example?")

// Stream responses
for try await text in session.streamResponse("Explain associated types") {
    print(text, terminator: "")
}

// Access conversation history
let history = await session.messages

// Clear and start fresh
await session.clearHistory()

Warmup Options:

Option	First Message Latency	Use Case
`nil`	2-4 seconds	Infrequent use
`.default`	1-2 seconds	Balanced
`.eager`	100-300ms	Chat interfaces

Model Management

Download, cache, and manage models:

let manager = ModelManager.shared

// Check if model is cached
if await manager.isCached(.llama3_2_1B) {
    print("Model ready")
}

// Download with progress
let url = try await manager.download(.llama3_2_1B) { progress in
    print("Downloading: \(Int(progress.percentComplete))%")
}

// Cache management
let size = await manager.cacheSize()
print("Cache size: \(size.formatted())")

// Evict to fit storage limit
try await manager.evictToFit(maxSize: .gigabytes(30))

// Remove specific model
try await manager.remove(.llama3_2_1B)

Storage Location: ~/Library/Caches/SwiftAI/Models/

Token Counting

Manage context windows with precise token counts:

let provider = MLXProvider()

// Count tokens in text
let count = try await provider.countTokens(
    in: "Hello, world!",
    for: .llama3_2_1B
)
print("Tokens: \(count.count)")

// Count tokens in conversation (includes chat template overhead)
let messageCount = try await provider.countTokens(
    in: messages,
    for: .llama3_2_1B
)

// Check if content fits in context window
if messageCount.fitsInContext(size: 4096) {
    // Safe to generate
}

// Encode/decode
let tokens = try await provider.encode("Hello", for: .llama3_2_1B)
let decoded = try await provider.decode(tokens, for: .llama3_2_1B)

Error Handling

SwiftAI provides detailed, actionable errors:

do {
    let response = try await provider.generate(prompt, model: model)
} catch AIError.modelNotCached(let model) {
    // Download the model first
    try await ModelManager.shared.download(model)
} catch AIError.providerUnavailable(let reason) {
    // Check availability requirements
    print("Provider unavailable: \(reason)")
} catch AIError.tokenLimitExceeded(let count, let limit) {
    // Truncate input
    print("Input has \(count) tokens, limit is \(limit)")
} catch AIError.networkError(let message) {
    // Handle connectivity issues
    print("Network error: \(message)")
}

Requirements

Platform	Minimum Version
iOS	17.0
macOS	14.0
visionOS	1.0
Swift	6.2

MLX Provider: Requires Apple Silicon (arm64)

Foundation Models: Requires iOS 26.0+

Design Principles

Explicit Model Selection — You choose your provider. No magic auto-detection.
Swift 6.2 Concurrency — Actors, Sendable types, and AsyncSequence throughout.
Progressive Disclosure — Simple one-liners for beginners, full control for experts.
Protocol-Oriented — Extensible via protocols with associated types.
Type-Safe Structured Output — @Generable macro mirrors Apple's FoundationModels API.
Tool Integration — First-class support for LLM tool/function calling.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
Documentation		Documentation
Sources		Sources
Tests/SwiftAITests		Tests/SwiftAITests
.gitignore		.gitignore
.mcp.json		.mcp.json
.swiftlint.yml		.swiftlint.yml
ANTHROPIC_PROGRESS.md		ANTHROPIC_PROGRESS.md
CONTEXT_RECOVERY_ANTHROPIC.md		CONTEXT_RECOVERY_ANTHROPIC.md
IMPLEMENTATION_PLAN_ANTHROPIC.md		IMPLEMENTATION_PLAN_ANTHROPIC.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

christopherkarani/SwiftAI

Folders and files

Latest commit

History

Repository files navigation

SwiftAI

Features

Installation

Quick Start

Local Generation (MLX)

Cloud Generation (HuggingFace)

Streaming

Providers

MLXProvider

HuggingFaceProvider

Foundation Models (iOS 26+)

Anthropic Claude

Core Concepts

Model Identifiers

Generation Config

Messages

Streaming

Structured Output

Defining Generable Types

Generating Structured Responses

Streaming Structured Output

Available Constraints

Tool Calling

Defining Tools

Executing Tools

Tool Choice Options

ChatSession

Model Management

Token Counting

Error Handling

Requirements

Design Principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

Packages