Unified Swift SDK for LLM inference across local and cloud providers
SwiftAI provides a clean, idiomatic Swift interface for LLM inference. Choose your provider explicitly—local inference with MLX on Apple Silicon, cloud inference via HuggingFace, or system-integrated AI with Apple Foundation Models on iOS 26+.
| Capability | MLX | HuggingFace | Anthropic | Foundation Models |
|---|---|---|---|---|
| Text Generation | ✓ | ✓ | ✓ | ✓ |
| Streaming | ✓ | ✓ | ✓ | ✓ |
| Structured Output | ✓ | ✓ | ✓ | ✓ |
| Tool Calling | — | — | ✓ | — |
| Vision | — | — | ✓ | — |
| Extended Thinking | — | — | ✓ | — |
| Embeddings | — | ✓ | — | — |
| Transcription | — | ✓ | — | — |
| Image Generation | — | ✓ | — | — |
| Token Counting | ✓ | — | — | — |
| Offline | ✓ | — | — | ✓ |
| Privacy | ✓ | — | — | ✓ |
Add SwiftAI to your Package.swift:
dependencies: [
.package(url: "https://github.com/christopherkarani/SwiftAI", from: "0.1.0")
]Then add "SwiftAI" to your target's dependencies.
import SwiftAI
let provider = MLXProvider()
let response = try await provider.generate(
"Explain quantum computing in simple terms",
model: .llama3_2_1B,
config: .default
)
print(response)let provider = HuggingFaceProvider() // Uses HF_TOKEN environment variable
let response = try await provider.generate(
"Write a haiku about Swift",
model: .huggingFace("meta-llama/Llama-3.1-8B-Instruct"),
config: .creative
)
print(response)let provider = MLXProvider()
let stream = provider.stream(
"Tell me a story about a robot",
model: .llama3_2_3B,
config: .default
)
for try await chunk in stream {
print(chunk.text, terminator: "")
}Local inference on Apple Silicon. Zero network traffic, complete privacy.
Best for: Privacy-sensitive apps, offline functionality, consistent latency
// Default configuration
let provider = MLXProvider()
// Optimized for M1 devices
let provider = MLXProvider(configuration: .m1Optimized)
// Full control
let config = MLXConfiguration.default
.memoryLimit(.gigabytes(8))
.withQuantizedKVCache(bits: 4)
let provider = MLXProvider(configuration: config)Configuration Presets:
| Preset | Memory | Use Case |
|---|---|---|
.default |
Auto | Balanced performance |
.m1Optimized |
6 GB | M1 MacBooks, base iPads |
.mProOptimized |
12 GB | M1/M2 Pro, Max chips |
.memoryEfficient |
4 GB | Constrained devices |
.highPerformance |
16+ GB | M2/M3 Max, Ultra |
Warmup for Fast First Response:
let provider = MLXProvider()
// Warm up Metal shaders before first generation
try await provider.warmUp(model: .llama3_2_1B, maxTokens: 5)
// Now first response is fast
let response = try await provider.generate("Hello", model: .llama3_2_1B)Cloud inference via HuggingFace Inference API. Access hundreds of models.
Best for: Large models, embeddings, transcription, image generation, model variety
Setup:
export HF_TOKEN=hf_your_token_here// Auto-detects HF_TOKEN from environment
let provider = HuggingFaceProvider()
// Or provide token explicitly
let provider = HuggingFaceProvider(token: "hf_...")
// Custom configuration
let config = HFConfiguration.default.timeout(120)
let provider = HuggingFaceProvider(configuration: config)Embeddings:
let provider = HuggingFaceProvider()
let embedding = try await provider.embed(
"SwiftAI makes LLM inference easy",
model: .huggingFace("sentence-transformers/all-MiniLM-L6-v2")
)
print("Dimensions: \(embedding.dimensions)")
print("Vector: \(embedding.vector)")
// Similarity comparison
let other = try await provider.embed("AI frameworks for Swift", model: /* ... */)
let similarity = embedding.cosineSimilarity(with: other)Transcription:
let provider = HuggingFaceProvider()
let result = try await provider.transcribe(
audioURL: audioFileURL,
model: .huggingFace("openai/whisper-large-v3"),
config: .detailed
)
print(result.text)
for segment in result.segments {
print("\(segment.startTime)s: \(segment.text)")
}Image Generation:
let provider = HuggingFaceProvider()
// Simple text-to-image with defaults
let result = try await provider.textToImage(
"A cat wearing a top hat, digital art",
model: .huggingFace("stabilityai/stable-diffusion-3")
)
// Use directly in SwiftUI
result.image // SwiftUI Image (cross-platform)
// With configuration presets
let result = try await provider.textToImage(
"Mountain landscape at sunset, photorealistic",
model: .huggingFace("stabilityai/stable-diffusion-xl-base-1.0"),
config: .highQuality.width(1024).height(768)
)
// Available presets: .default, .highQuality, .fast, .square512, .square1024, .landscape, .portrait
// Save to file
try result.save(to: URL.documentsDirectory.appending(path: "landscape.png"))
// Save to Photos library (iOS only, requires NSPhotoLibraryAddUsageDescription)
try await result.saveToPhotos()
// Access raw data if needed
let data = result.dataSystem-integrated on-device AI. Zero setup, managed by the OS.
Best for: iOS 26+ apps, system integration, no model downloads
if #available(iOS 26.0, *) {
let provider = FoundationModelsProvider()
let response = try await provider.generate(
"What can you help me with?",
model: .foundationModels,
config: .default
)
print(response)
}SwiftAI includes first-class support for Anthropic's Claude models via the Anthropic API.
Best for: Advanced reasoning, vision tasks, extended thinking, production applications
Setup:
export ANTHROPIC_API_KEY=sk-ant-api-03-...import SwiftAI
// Simple generation
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let response = try await provider.generate(
"Explain quantum computing",
model: .claudeSonnet45,
config: .default.maxTokens(500)
)
// Streaming
for try await chunk in provider.stream(
"Write a poem about Swift",
model: .claude3Haiku,
config: .default
) {
print(chunk, terminator: "")
}Available Models:
| Model | ID | Best For |
|---|---|---|
| Claude Opus 4.5 | .claudeOpus45 |
Most capable, complex reasoning |
| Claude Sonnet 4.5 | .claudeSonnet45 |
Balanced performance and speed |
| Claude 3.5 Sonnet | .claude35Sonnet |
Fast, high-quality responses |
| Claude 3 Haiku | .claude3Haiku |
Fastest, most cost-effective |
Features:
- Text generation (streaming and non-streaming)
- Multi-turn conversations with context
- Vision support (multimodal image+text)
- Extended thinking mode for complex reasoning
- Comprehensive error handling
- Environment variable support (ANTHROPIC_API_KEY)
Vision Example:
let messages = Messages {
Message.user([
.text("What's in this image?"),
.image(base64Data: imageData, mimeType: "image/jpeg")
])
}
let result = try await provider.generate(
messages: messages,
model: .claudeSonnet45,
config: .default
)Extended Thinking:
var config = AnthropicConfiguration.standard(apiKey: "sk-ant-...")
config.thinkingConfig = .standard
let provider = AnthropicProvider(configuration: config)
let result = try await provider.generate(
"Solve this complex problem...",
model: .claudeOpus45,
config: .default
)Get your API key at: https://console.anthropic.com/
SwiftAI requires explicit model selection—no magic auto-detection:
// MLX models (local)
.mlx("mlx-community/Llama-3.2-1B-Instruct-4bit")
.llama3_2_1B // Convenience alias
.phi4
.qwen2_5_3B
// HuggingFace models (cloud)
.huggingFace("meta-llama/Llama-3.1-70B-Instruct")
.huggingFace("sentence-transformers/all-MiniLM-L6-v2")
// Anthropic models (cloud)
.claudeOpus45
.claudeSonnet45
.claude35Sonnet
.claude3Haiku
// Foundation Models (iOS 26+)
.foundationModelsControl generation behavior with presets or custom settings:
// Presets
.default // temperature: 0.7, topP: 0.9
.creative // temperature: 1.0, topP: 0.95
.precise // temperature: 0.3, topP: 0.8
.code // temperature: 0.2, topP: 0.9
// Custom
let config = GenerateConfig(
temperature: 0.8,
maxTokens: 500,
topP: 0.9,
stopSequences: ["END"]
)
// Fluent API
let config = GenerateConfig.default
.temperature(0.8)
.maxTokens(500)Build conversations with the Messages result builder:
let messages = Messages {
Message.system("You are a Swift expert.")
Message.user("What are actors?")
}
let result = try await provider.generate(
messages: messages,
model: .llama3_2_1B,
config: .default
)
print(result.text)
print("Tokens: \(result.usage.totalTokens)")Real-time token streaming with AsyncSequence:
// Simple text streaming
for try await text in provider.stream("Tell me a joke", model: .llama3_2_1B) {
print(text, terminator: "")
}
// With metadata
let stream = provider.streamWithMetadata(
messages: messages,
model: .llama3_2_1B,
config: .default
)
for try await chunk in stream {
print(chunk.text, terminator: "")
if let tokensPerSecond = chunk.tokensPerSecond {
// Track performance
}
if let reason = chunk.finishReason {
print("\nFinished: \(reason)")
}
}
// Collect all chunks into final result
let result = try await stream.collectWithMetadata()
print("Total tokens: \(result.tokenCount)")Generate type-safe structured responses using the @Generable macro, mirroring Apple's FoundationModels API from iOS 26.
import SwiftAI
@Generable
struct Recipe {
@Guide("The recipe title")
let title: String
@Guide("Cooking time in minutes", .range(1...180))
let cookingTime: Int
@Guide("Difficulty level", .anyOf(["easy", "medium", "hard"]))
let difficulty: String
@Guide("List of ingredients")
let ingredients: [String]
}let provider = AnthropicProvider(apiKey: "sk-ant-...")
// Generate typed response
let recipe = try await provider.generate(
"Create a recipe for chocolate chip cookies",
returning: Recipe.self,
model: .claudeSonnet45
)
print(recipe.title) // "Classic Chocolate Chip Cookies"
print(recipe.cookingTime) // 25
print(recipe.difficulty) // "easy"
print(recipe.ingredients) // ["flour", "butter", "chocolate chips", ...]Get progressive updates as the response is generated:
let stream = provider.stream(
"Generate a detailed recipe",
returning: Recipe.self,
model: .claudeSonnet45
)
for try await partial in stream {
// Update UI progressively
if let title = partial.title {
titleLabel.text = title
}
if let ingredients = partial.ingredients {
updateIngredientsList(ingredients)
}
}
// Get final complete result
let recipe = try await stream.collect()| Type | Constraints |
|---|---|
| String | .pattern(_:), .anyOf(_:), .minLength(_:), .maxLength(_:) |
| Int/Double | .range(_:), .minimum(_:), .maximum(_:) |
| Array | .count(_:), .minimumCount(_:), .maximumCount(_:) |
Define and execute tools that LLMs can invoke during generation.
struct WeatherTool: AITool {
@Generable
struct Arguments {
@Guide("City name to get weather for")
let city: String
@Guide("Temperature unit", .anyOf(["celsius", "fahrenheit"]))
let unit: String?
}
var description: String { "Get current weather for a city" }
func call(arguments: Arguments) async throws -> String {
// Implement weather lookup
return "Weather in \(arguments.city): 22C, Sunny"
}
}// Create executor and register tools
let executor = AIToolExecutor()
await executor.register(WeatherTool())
await executor.register(SearchTool())
// Configure provider with tools
let config = GenerateConfig.default
.tools([WeatherTool(), SearchTool()])
.toolChoice(.auto)
// Generate with tool access
let response = try await provider.generate(
messages: messages,
model: .claudeSonnet45,
config: config
)
// Handle tool calls if present
if let toolCalls = response.toolCalls {
let results = try await executor.execute(toolCalls: toolCalls)
// Continue conversation with results...
}.toolChoice(.auto) // Model decides whether to use tools
.toolChoice(.required) // Model must use a tool
.toolChoice(.none) // Model cannot use tools
.toolChoice(.tool(name: "weather")) // Model must use specific toolStateful conversation management with automatic history:
let session = try await ChatSession(
provider: MLXProvider(),
model: .llama3_2_1B,
systemPrompt: "You are a helpful coding assistant.",
warmup: .eager // Fast first response
)
// Send messages—history is managed automatically
let response1 = try await session.send("What is a protocol in Swift?")
let response2 = try await session.send("Can you give me an example?")
// Stream responses
for try await text in session.streamResponse("Explain associated types") {
print(text, terminator: "")
}
// Access conversation history
let history = await session.messages
// Clear and start fresh
await session.clearHistory()Warmup Options:
| Option | First Message Latency | Use Case |
|---|---|---|
nil |
2-4 seconds | Infrequent use |
.default |
1-2 seconds | Balanced |
.eager |
100-300ms | Chat interfaces |
Download, cache, and manage models:
let manager = ModelManager.shared
// Check if model is cached
if await manager.isCached(.llama3_2_1B) {
print("Model ready")
}
// Download with progress
let url = try await manager.download(.llama3_2_1B) { progress in
print("Downloading: \(Int(progress.percentComplete))%")
}
// Cache management
let size = await manager.cacheSize()
print("Cache size: \(size.formatted())")
// Evict to fit storage limit
try await manager.evictToFit(maxSize: .gigabytes(30))
// Remove specific model
try await manager.remove(.llama3_2_1B)Storage Location: ~/Library/Caches/SwiftAI/Models/
Manage context windows with precise token counts:
let provider = MLXProvider()
// Count tokens in text
let count = try await provider.countTokens(
in: "Hello, world!",
for: .llama3_2_1B
)
print("Tokens: \(count.count)")
// Count tokens in conversation (includes chat template overhead)
let messageCount = try await provider.countTokens(
in: messages,
for: .llama3_2_1B
)
// Check if content fits in context window
if messageCount.fitsInContext(size: 4096) {
// Safe to generate
}
// Encode/decode
let tokens = try await provider.encode("Hello", for: .llama3_2_1B)
let decoded = try await provider.decode(tokens, for: .llama3_2_1B)SwiftAI provides detailed, actionable errors:
do {
let response = try await provider.generate(prompt, model: model)
} catch AIError.modelNotCached(let model) {
// Download the model first
try await ModelManager.shared.download(model)
} catch AIError.providerUnavailable(let reason) {
// Check availability requirements
print("Provider unavailable: \(reason)")
} catch AIError.tokenLimitExceeded(let count, let limit) {
// Truncate input
print("Input has \(count) tokens, limit is \(limit)")
} catch AIError.networkError(let message) {
// Handle connectivity issues
print("Network error: \(message)")
}| Platform | Minimum Version |
|---|---|
| iOS | 17.0 |
| macOS | 14.0 |
| visionOS | 1.0 |
| Swift | 6.2 |
MLX Provider: Requires Apple Silicon (arm64)
Foundation Models: Requires iOS 26.0+
- Explicit Model Selection — You choose your provider. No magic auto-detection.
- Swift 6.2 Concurrency — Actors, Sendable types, and AsyncSequence throughout.
- Progressive Disclosure — Simple one-liners for beginners, full control for experts.
- Protocol-Oriented — Extensible via protocols with associated types.
- Type-Safe Structured Output — @Generable macro mirrors Apple's FoundationModels API.
- Tool Integration — First-class support for LLM tool/function calling.
MIT License — see LICENSE for details.