Traditional UI testing is painful:
- Brittle selectors break with every design change
- Pixel-perfect comparisons fail on minor, acceptable variations
- Writing test assertions requires deep technical knowledge
- Cross-browser testing multiplies complexity
- Generic analysis lacks domain expertise - accessibility, conversion optimization, mobile UX
- Accessibility checks need specialized tools and expertise
LayoutLens lets you test UIs the way humans see them - using natural language and domain expert knowledge:
# Basic analysis
result = await lens.analyze("https://example.com", "Is the navigation user-friendly?")
# Expert-powered analysis
result = await lens.audit_accessibility("https://example.com", compliance_level="AA")
# Returns: "WCAG AA compliant with 4.7:1 contrast ratio. Focus indicators visible..."Instead of writing complex selectors and assertions, just ask questions like:
- "Is this page mobile-friendly?"
- "Are all buttons accessible?"
- "Does the layout look professional?"
Get expert-level insights from built-in domain knowledge in accessibility, conversion optimization, mobile UX, and more.
β 95.2% accuracy on real-world UI testing benchmarks
pip install layoutlens
playwright install chromium # For screenshot capturefrom layoutlens import LayoutLens
# Initialize (uses OPENAI_API_KEY env var)
lens = LayoutLens()
# Test any website or local HTML
result = await lens.analyze("https://your-site.com", "Is the header properly aligned?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.1%}")That's it! No selectors, no complex setup, just natural language questions.
Test single pages with custom questions:
# Test local HTML files
result = await lens.analyze("checkout.html", "Is the payment form user-friendly?")
# Test with expert context
from layoutlens.prompts import Instructions, UserContext
instructions = Instructions(
expert_persona="conversion_expert",
user_context=UserContext(
business_goals=["reduce_cart_abandonment"],
target_audience="mobile_shoppers"
)
)
result = await lens.analyze(
"checkout.html",
"How can we optimize this checkout flow?",
instructions=instructions
)Perfect for A/B testing and redesign validation:
result = await lens.compare(
["old-design.html", "new-design.html"],
"Which design is more accessible?"
)
print(f"Winner: {result.answer}")Domain expert knowledge with one line of code:
# Professional accessibility audit (WCAG expert)
result = await lens.audit_accessibility("product-page.html", compliance_level="AA")
# Conversion rate optimization (CRO expert)
result = await lens.optimize_conversions("landing.html",
business_goals=["increase_signups"], industry="saas")
# Mobile UX analysis (Mobile expert)
result = await lens.analyze_mobile_ux("app.html", performance_focus=True)
# E-commerce audit (Retail expert)
result = await lens.audit_ecommerce("checkout.html", page_type="checkout")
# Legacy methods still work
result = await lens.check_accessibility("product-page.html") # Backward compatibleTest multiple pages efficiently:
results = await lens.analyze(
sources=["home.html", "about.html", "contact.html"],
queries=["Is it accessible?", "Is it mobile-friendly?"]
)
# Processes 6 tests in parallel# Async for maximum throughput
result = await lens.analyze(
sources=["page1.html", "page2.html", "page3.html"],
queries=["Is it accessible?"],
max_concurrent=5
)All results provide clean, typed JSON for automation:
result = await lens.analyze("page.html", "Is it accessible?")
# Export to clean JSON
json_data = result.to_json() # Returns typed JSON string
print(json_data)
# {
# "source": "page.html",
# "query": "Is it accessible?",
# "answer": "Yes, the page follows accessibility standards...",
# "confidence": 0.85,
# "reasoning": "The page has proper heading structure...",
# "screenshot_path": "/path/to/screenshot.png",
# "viewport": "desktop",
# "timestamp": "2024-01-15 10:30:00",
# "execution_time": 2.3,
# "metadata": {}
# }
# Type-safe structured access
from layoutlens.types import AnalysisResultJSON
import json
data: AnalysisResultJSON = json.loads(result.to_json())
confidence = data["confidence"] # Fully typed: floatChoose from 6 built-in domain experts with specialized knowledge:
# Available experts: accessibility_expert, conversion_expert, mobile_expert,
# ecommerce_expert, healthcare_expert, finance_expert
# Use any expert with custom analysis
result = await lens.analyze_with_expert(
source="healthcare-portal.html",
query="How can we improve patient experience?",
expert_persona="healthcare_expert",
focus_areas=["patient_privacy", "health_literacy"],
user_context={
"target_audience": "elderly_patients",
"accessibility_needs": ["large_text", "simple_navigation"],
"industry": "healthcare"
}
)
# Expert comparison analysis
result = await lens.compare_with_expert(
sources=["old-design.html", "new-design.html"],
query="Which design converts better?",
expert_persona="conversion_expert",
focus_areas=["cta_prominence", "trust_signals"]
)# Analyze a single page
layoutlens https://example.com "Is this accessible?"
# Analyze local files
layoutlens page.html "Is the design professional?"
# Compare two designs
layoutlens page1.html page2.html --compare
# Analyze with different viewport
layoutlens site.com "Is it mobile-friendly?" --viewport mobile
# JSON output for automation
layoutlens page.html "Is it accessible?" --output json- name: Visual UI Test
run: |
pip install layoutlens
playwright install chromium
layoutlens ${{ env.PREVIEW_URL }} "Is it accessible and mobile-friendly?"import pytest
from layoutlens import LayoutLens
@pytest.mark.asyncio
async def test_homepage_quality():
lens = LayoutLens()
result = await lens.analyze("homepage.html", "Is this production-ready?")
assert result.confidence > 0.8
assert "yes" in result.answer.lower()LayoutLens includes a comprehensive benchmarking system to validate AI performance:
# Run LayoutLens against test data
python benchmarks/run_benchmark.py --api-key sk-your-key
# With custom settings
python benchmarks/run_benchmark.py \
--api-key sk-your-key \
--output benchmarks/my_results \
--no-batch \
--filename custom_results.json# Evaluate results against ground truth
python benchmarks/evaluation/evaluator.py \
--answer-keys benchmarks/answer_keys \
--results benchmarks/layoutlens_output \
--output evaluation_report.jsonThe benchmark runner outputs clean JSON for analysis:
# Example benchmark result structure
{
"benchmark_info": {
"total_tests": 150,
"successful_tests": 143,
"failed_tests": 7,
"success_rate": 0.953,
"batch_processing_used": true,
"model_used": "gpt-4o-mini"
},
"results": [
{
"html_file": "good_contrast.html",
"query": "Is this page accessible?",
"answer": "Yes, the page has good color contrast...",
"confidence": 0.89,
"reasoning": "WCAG guidelines are followed...",
"success": true,
"error": null,
"metadata": {"category": "accessibility"}
}
]
}Create your own test data and answer keys:
# Use the async API for custom benchmark workflows
from layoutlens import LayoutLens
async def run_custom_benchmark():
lens = LayoutLens()
test_cases = [
{"source": "page1.html", "query": "Is it accessible?"},
{"source": "page2.html", "query": "Is it mobile-friendly?"}
]
results = []
for case in test_cases:
result = await lens.analyze(case["source"], case["query"])
results.append({
"test": case,
"result": result.to_json(), # Clean JSON output
"passed": result.confidence > 0.7
})
return resultsSimple configuration options:
# Via environment
export OPENAI_API_KEY="sk-..."
# Via code
lens = LayoutLens(
api_key="sk-...",
model="gpt-4o-mini", # or "gpt-4o" for higher accuracy
cache_enabled=True, # Reduce API costs
cache_type="memory", # "memory" or "file"
)- π Full Documentation - Comprehensive guides and API reference
- π― Examples - Real-world usage patterns
- π Issues - Report bugs or request features
- π¬ Discussions - Get help and share ideas
- Natural Language - Write tests like you'd describe the UI to a colleague
- Domain Expert Knowledge - Built-in expertise in accessibility, CRO, mobile UX, and more
- Rich Context Support - Business goals, user personas, compliance standards, and technical constraints
- Zero Selectors - No more fragile XPath or CSS selectors
- Visual Understanding - AI sees what users see, not just code
- Async-by-Default - Concurrent processing for optimal performance
- Simple API - One analyze method handles single pages, batches, and comparisons
- Structured JSON Output - TypedDict schemas for full type safety in automation
- Comprehensive Benchmarking - Built-in evaluation system with 95.2% accuracy
- Production Ready - Used by teams for real-world applications
Making UI testing as simple as asking "Does this look right?"