A synthetic data generator that creates realistic corporate datasets including emails, documents, and attachments using AI. Built with a Rust TUI frontend and Python generation backend.
- 📧 Email Generation: Creates threaded email conversations with realistic headers and attachments
- 📄 Document Creation: Generates Word documents and PDFs with corporate content
- 🎭 Tone Control: Configurable tone mix (regular, upset, friendly, formal, urgent)
- 🔗 Relationship Management: Builds realistic company rosters and employee relationships
- ⚙️ Flexible Configuration: Control file types, attachment ratios, and internal/external email distribution
- 🤖 AI-Powered: Uses Google Gemini or OpenRouter for content generation with Faker fallbacks
- Frontend: Rust-based terminal UI (TUI) for configuration and control
- Backend: Python pipeline for data generation and file creation
- Output: Structured datasets in
output/<topic>/with consistent naming
- Rust 1.70+
- Python 3.8+
- API key for Google Gemini or OpenRouter (optional)
# Clone and build
git clone <repository>
cd syndata
cargo build --release
pip install -r requirements.txt
# Set up API keys (optional)
cp .env.example .env
# Edit .env with your API keys# Launch the TUI
cargo run --release
# Or use CLI directly
python -m generators.cli --topic "Quarterly financial reports" --chains 5The TUI provides controls for:
- Topic selection and chain count
- File type distribution (emails, Word docs, PDFs)
- Attachment percentages
- Tone mixing
- Internal vs external email ratios
- Company and employee management
output/<topic>/
├── 001a_quarterly_review.eml
├── 001b_quarterly_review.docx
├── 002a_response.eml
├── 002b_response.pdf
└── ...
Files follow the NNN<letter>_<description>.ext naming convention for consistent sorting.
- Rust code:
src/- TUI and bridge logic - Python generators:
generators/- Data generation pipeline - Configuration:
generators/config.py- Settings and defaults
[Add your license here]