An AI-powered tool that automatically clones Git repositories, analyzes their code structure, and generates comprehensive documentation using OpenAI's GPT-4o-mini. Features multi-agent architecture for parallel processing of multiple branches, intelligent Q&A system with conversation memory, and support for 30+ programming languages.
- Multi-Agent Architecture: Parallel processing of multiple branches using ThreadPoolExecutor
- AI-Powered Analysis: Leverages OpenAI GPT-4o-mini for intelligent code documentation
- Multi-Language Support: 30+ programming languages (Python, JavaScript, TypeScript, Java, C++, Go, Rust, PHP, Ruby, and more)
- Multiple Documentation Types: API docs, class documentation, architecture analysis, setup guides, and comprehensive overviews
- Branch Management: Analyze single branches, multiple branches, or all branches in parallel
- Folder-Specific Analysis: Target specific folders within repositories
- Multiple Output Formats: Generate both Markdown (.md) and PDF (.pdf) reports
- Interactive Chat Interface: Ask questions about any repository in natural language
- Conversation Memory: Maintains context across the entire chat session
- File-Level Similarity Search: Smart retrieval of relevant code files for accurate answers
- Cross-Reference Understanding: Handles questions like "Can you explain the function we discussed earlier?"
- Multi-Repository Support: Switch between different repositories and branches
- Memory Management: View history, clear memory, get conversation summaries
- Visual Code Enhancement: Replace code blocks with beautiful IDE screenshots (optional)
- AI-Powered Workflow Diagrams: Automatically generate intelligent flowcharts showing application logic
- Interactive CLI: User-friendly menu system for easy operation
- Comprehensive Analysis: File structure, Git history, language statistics, and code complexity
The system follows a pipeline architecture with these main components:
- GitManager (
git_manager.py) - Main orchestrator and CLI interface - GitClone (
functions/git_clone.py) - Repository cloning operations - GitAnalysis (
functions/git_analysis.py) - Code analysis and AI-powered documentation generation - GitQnASystem (
functions/git_qna.py) - Vector database and intelligent Q&A with conversation memory - MultiAgentDocumentationGenerator - Parallel branch processing system
- VisualDocumentationEnhancer (
functions/visual_enhancer.py) - IDE screenshot integration - WorkflowDiagramGenerator (
functions/workflow_generator.py) - AI-powered diagram generation
- Python 3.7+
- Git (accessible via command line)
- OpenAI API key
- Visual Enhancement: VS Code, Sublime Text, or Atom editor + Tesseract OCR
- Q&A System: ChromaDB and LangChain (auto-installed with requirements.txt)
-
Clone the repository
git clone <repository-url> cd Doc-Generation
-
Install dependencies
pip install -r requirements.txt
-
Configure OpenAI API Key
Option 1: Environment Variable (Recommended)
export OPENAI_API_KEY="your-openai-api-key-here"
Option 2: Create .env file
# Create .env file in project root echo "OPENAI_API_KEY=your-openai-api-key-here" > .env
Get your API key from OpenAI Platform
python git_manager.pyAnalyze specific branch:
python git_manager.py <git_url> <branch_name>Analyze specific branch and folder:
python git_manager.py <git_url> <branch_name> <folder_name>Analyze multiple branches:
python git_manager.py <git_url> <branch1,branch2,branch3>Analyze all branches:
python git_manager.py <git_url> allEnable visual enhancement with IDE screenshots:
python git_manager.py <git_url> <branch_name> --visualDisable workflow diagrams:
python git_manager.py <git_url> <branch_name> --no-diagrams# Single branch analysis (entire repository)
python git_manager.py https://github.com/user/repo.git main
# Single branch analysis (specific folder only)
python git_manager.py https://github.com/user/repo.git main src
# Multiple branches (specific folder only)
python git_manager.py https://github.com/user/repo.git main,develop crawler
# All branches with parallel processing (specific folder)
python git_manager.py https://github.com/user/repo.git all backend
# With visual enhancement and workflow diagrams
python git_manager.py https://github.com/user/repo.git main --visual
# Interactive Q&A session
python git_manager.py
# Then select option 6 from the menuThe tool includes an intelligent Q&A system that allows you to ask questions about any repository in natural language.
- Conversation Memory: Remembers previous questions and builds context
- File-Level Search: Finds relevant code across multiple files
- Multi-Language Support: Works with all supported programming languages
- Branch-Specific Analysis: Can analyze different branches separately
help - Show available commands
repos - List all indexed repositories
history - Show conversation history
summary - Get conversation summary
clear - Clear conversation memory
quit/exit - End the session
π€ Ask a question: What is the main purpose of this project?
π Answer: [Detailed analysis based on repository structure and code]
π€ Ask a question: How does authentication work?
π Answer: [Response that can reference the previous discussion]
π€ Ask a question: Can you show me the auth functions we discussed?
π Answer: [Understands "we discussed" refers to previous conversation]
The tool generates five specialized documentation types:
- π§ API Documentation - Function signatures, parameters, and usage examples
- π Class Documentation - Class purposes, methods, and usage patterns
- ποΈ Architecture Analysis - System design, data flow, and integration points
- βοΈ Setup/Configuration - Installation guides, environment setup, and troubleshooting
- π Comprehensive Documentation - General developer-focused explanations
Doc-Generation/
βββ git_manager.py # Main orchestrator and CLI
βββ functions/
β βββ git_clone.py # Repository cloning operations
β βββ git_analysis.py # Code analysis and documentation generation
β βββ git_qna.py # Q&A system with conversation memory
β βββ visual_enhancer.py # IDE screenshot integration
β βββ workflow_generator.py # AI-powered diagram generation
βββ requirements.txt # Python dependencies
βββ CLAUDE.md # Project instructions for Claude Code
βββ local-folder/ # Directory for cloned repositories
β βββ chroma/ # Vector database storage for Q&A
βββ diagrams/ # Generated workflow diagrams
βββ screenshots/ # IDE screenshots (if using --visual)
βββ README.md # This file
- Cloned repositories:
./local-folder/ - Generated reports: Current working directory
- Vector database:
./local-folder/chroma/(for Q&A system) - Workflow diagrams:
./diagrams/ - Screenshots:
./screenshots/(if using visual enhancement) - Supported output formats: Markdown (.md) and PDF (.pdf)
Programming Languages (30+):
- Python (.py), JavaScript (.js), TypeScript (.ts/.tsx)
- Java (.java), C/C++ (.c/.cpp/.h), C# (.cs)
- PHP (.php), Ruby (.rb), Go (.go), Rust (.rs)
- Swift (.swift), Kotlin (.kt), Scala (.scala)
- And many more...
Web & Config Files:
- HTML (.html), CSS (.css/.scss/.less)
- Vue (.vue), Svelte (.svelte)
- JSON (.json), YAML (.yml/.yaml), XML (.xml)
- Markdown (.md), SQL (.sql), TOML (.toml)
- Filters out build/dependency directories (
.git,node_modules,__pycache__,.venv) - Limits individual file size to 50KB for OpenAI processing
- Detects programming languages via file extensions
- Prioritizes main code files over configuration files
- Creates vector embeddings for intelligent search and Q&A
The MultiAgentDocumentationGenerator class provides:
- Parallel processing of multiple branches using ThreadPoolExecutor (max 4 workers)
- Safe branch checkout operations
- Result aggregation across all branches
- Automatic resource cleanup and thread management
- ChromaDB Integration: Persistent vector storage for intelligent code search
- File-Level Organization: Groups code chunks by file for better context
- Branch-Specific Collections: Separate analysis for different branches
- Similarity Search: Find relevant code based on semantic similarity
- Conversation Memory: LangChain-powered memory for continuous dialogues
- IDE Screenshots: Automatic code screenshot generation with syntax highlighting
- OCR Verification: Tesseract OCR ensures screenshot accuracy
- Multiple IDE Support: VS Code, Sublime Text, Atom compatibility
- Workflow Diagrams: AI-generated flowcharts showing application logic
- Interactive prompts for target folder selection
- Optional folder-specific analysis
- Folder path validation before analysis
- Output files include folder names for organization
Extracts comprehensive repository insights:
- Commit count and contributor information
- Recent activity patterns
- Branch creation dates and relationships
- Language distribution statistics
Repository not found
- Verify Git URL and network connectivity
- Check repository access permissions
Branch doesn't exist
- Check available branches:
git ls-remote --heads <url> - Ensure branch names are spelled correctly
OpenAI API errors
- Verify API key validity and format
- Check OpenAI account credits and usage limits
- Ensure proper environment variable setup
Q&A System Issues
- Install missing dependencies:
pip install chromadb langchain-community - Check if vector database exists in
./local-folder/chroma/ - Ensure repository has supported file types
- Clear memory if experiencing context issues: type
clearin Q&A session
Visual Enhancement Issues
- Install required IDE (VS Code, Sublime Text, or Atom)
- Install Tesseract OCR:
brew install tesseract(macOS) - Ensure screenshots directory has write permissions
Permission errors
- Ensure write permissions in working directory
- Check Git credentials for private repositories
The tool includes comprehensive error handling for:
- Network connectivity issues
- Invalid repository URLs
- Missing branches or folders
- OpenAI API rate limits and errors
- File system permission issues
- Vector database initialization failures
- Missing IDE dependencies for visual enhancement
- Conversation memory overflow management
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the GPT-4o-mini API for intelligent code analysis
- ChromaDB and LangChain for vector database and conversation memory capabilities
- ReportLab for PDF generation capabilities
- Tesseract OCR for visual enhancement features
- The Python community for excellent libraries and tools
Need help? Open an issue or check the troubleshooting section above.