WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

chrispository/datasynth

Repository files navigation

SynthData 📊

A synthetic data generator that creates realistic corporate datasets including emails, documents, and attachments using AI. Built with a Rust TUI frontend and Python generation backend.

Features

  • 📧 Email Generation: Creates threaded email conversations with realistic headers and attachments
  • 📄 Document Creation: Generates Word documents and PDFs with corporate content
  • 🎭 Tone Control: Configurable tone mix (regular, upset, friendly, formal, urgent)
  • 🔗 Relationship Management: Builds realistic company rosters and employee relationships
  • ⚙️ Flexible Configuration: Control file types, attachment ratios, and internal/external email distribution
  • 🤖 AI-Powered: Uses Google Gemini or OpenRouter for content generation with Faker fallbacks

Architecture

  • Frontend: Rust-based terminal UI (TUI) for configuration and control
  • Backend: Python pipeline for data generation and file creation
  • Output: Structured datasets in output/<topic>/ with consistent naming

Quick Start

Prerequisites

  • Rust 1.70+
  • Python 3.8+
  • API key for Google Gemini or OpenRouter (optional)

Installation

# Clone and build
git clone <repository>
cd syndata
cargo build --release
pip install -r requirements.txt

# Set up API keys (optional)
cp .env.example .env
# Edit .env with your API keys

Running

# Launch the TUI
cargo run --release

# Or use CLI directly
python -m generators.cli --topic "Quarterly financial reports" --chains 5

Configuration

The TUI provides controls for:

  • Topic selection and chain count
  • File type distribution (emails, Word docs, PDFs)
  • Attachment percentages
  • Tone mixing
  • Internal vs external email ratios
  • Company and employee management

Output Structure

output/<topic>/
├── 001a_quarterly_review.eml
├── 001b_quarterly_review.docx
├── 002a_response.eml
├── 002b_response.pdf
└── ...

Files follow the NNN<letter>_<description>.ext naming convention for consistent sorting.

Development

  • Rust code: src/ - TUI and bridge logic
  • Python generators: generators/ - Data generation pipeline
  • Configuration: generators/config.py - Settings and defaults

License

[Add your license here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published