WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

CodeDocs-io/Doc-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Git Repository Analysis and Documentation Generator

An AI-powered tool that automatically clones Git repositories, analyzes their code structure, and generates comprehensive documentation using OpenAI's GPT-4o-mini. Features multi-agent architecture for parallel processing of multiple branches, intelligent Q&A system with conversation memory, and support for 30+ programming languages.

✨ Features

πŸš€ Core Capabilities

  • Multi-Agent Architecture: Parallel processing of multiple branches using ThreadPoolExecutor
  • AI-Powered Analysis: Leverages OpenAI GPT-4o-mini for intelligent code documentation
  • Multi-Language Support: 30+ programming languages (Python, JavaScript, TypeScript, Java, C++, Go, Rust, PHP, Ruby, and more)
  • Multiple Documentation Types: API docs, class documentation, architecture analysis, setup guides, and comprehensive overviews
  • Branch Management: Analyze single branches, multiple branches, or all branches in parallel
  • Folder-Specific Analysis: Target specific folders within repositories
  • Multiple Output Formats: Generate both Markdown (.md) and PDF (.pdf) reports

🧠 Intelligent Q&A System

  • Interactive Chat Interface: Ask questions about any repository in natural language
  • Conversation Memory: Maintains context across the entire chat session
  • File-Level Similarity Search: Smart retrieval of relevant code files for accurate answers
  • Cross-Reference Understanding: Handles questions like "Can you explain the function we discussed earlier?"
  • Multi-Repository Support: Switch between different repositories and branches
  • Memory Management: View history, clear memory, get conversation summaries

🎨 Visual Enhancements

  • Visual Code Enhancement: Replace code blocks with beautiful IDE screenshots (optional)
  • AI-Powered Workflow Diagrams: Automatically generate intelligent flowcharts showing application logic
  • Interactive CLI: User-friendly menu system for easy operation
  • Comprehensive Analysis: File structure, Git history, language statistics, and code complexity

πŸ—οΈ Architecture

The system follows a pipeline architecture with these main components:

  • GitManager (git_manager.py) - Main orchestrator and CLI interface
  • GitClone (functions/git_clone.py) - Repository cloning operations
  • GitAnalysis (functions/git_analysis.py) - Code analysis and AI-powered documentation generation
  • GitQnASystem (functions/git_qna.py) - Vector database and intelligent Q&A with conversation memory
  • MultiAgentDocumentationGenerator - Parallel branch processing system
  • VisualDocumentationEnhancer (functions/visual_enhancer.py) - IDE screenshot integration
  • WorkflowDiagramGenerator (functions/workflow_generator.py) - AI-powered diagram generation

πŸš€ Quick Start

Prerequisites

  • Python 3.7+
  • Git (accessible via command line)
  • OpenAI API key

Optional Dependencies (for enhanced features)

  • Visual Enhancement: VS Code, Sublime Text, or Atom editor + Tesseract OCR
  • Q&A System: ChromaDB and LangChain (auto-installed with requirements.txt)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd Doc-Generation
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure OpenAI API Key

    Option 1: Environment Variable (Recommended)

    export OPENAI_API_KEY="your-openai-api-key-here"

    Option 2: Create .env file

    # Create .env file in project root
    echo "OPENAI_API_KEY=your-openai-api-key-here" > .env

    Get your API key from OpenAI Platform

Basic Usage

Interactive Mode

python git_manager.py

Command Line Usage

Analyze specific branch:

python git_manager.py <git_url> <branch_name>

Analyze specific branch and folder:

python git_manager.py <git_url> <branch_name> <folder_name>

Analyze multiple branches:

python git_manager.py <git_url> <branch1,branch2,branch3>

Analyze all branches:

python git_manager.py <git_url> all

Enable visual enhancement with IDE screenshots:

python git_manager.py <git_url> <branch_name> --visual

Disable workflow diagrams:

python git_manager.py <git_url> <branch_name> --no-diagrams

Examples

# Single branch analysis (entire repository)
python git_manager.py https://github.com/user/repo.git main

# Single branch analysis (specific folder only)
python git_manager.py https://github.com/user/repo.git main src

# Multiple branches (specific folder only)  
python git_manager.py https://github.com/user/repo.git main,develop crawler

# All branches with parallel processing (specific folder)
python git_manager.py https://github.com/user/repo.git all backend

# With visual enhancement and workflow diagrams
python git_manager.py https://github.com/user/repo.git main --visual

# Interactive Q&A session
python git_manager.py
# Then select option 6 from the menu

πŸ€– Interactive Q&A System

The tool includes an intelligent Q&A system that allows you to ask questions about any repository in natural language.

Key Features

  • Conversation Memory: Remembers previous questions and builds context
  • File-Level Search: Finds relevant code across multiple files
  • Multi-Language Support: Works with all supported programming languages
  • Branch-Specific Analysis: Can analyze different branches separately

Available Commands

help         - Show available commands
repos        - List all indexed repositories  
history      - Show conversation history
summary      - Get conversation summary
clear        - Clear conversation memory
quit/exit    - End the session

Example Q&A Session

πŸ€– Ask a question: What is the main purpose of this project?
πŸ“‹ Answer: [Detailed analysis based on repository structure and code]

πŸ€– Ask a question: How does authentication work?
πŸ“‹ Answer: [Response that can reference the previous discussion]

πŸ€– Ask a question: Can you show me the auth functions we discussed?
πŸ“‹ Answer: [Understands "we discussed" refers to previous conversation]

πŸ“‹ Documentation Types

The tool generates five specialized documentation types:

  1. πŸ”§ API Documentation - Function signatures, parameters, and usage examples
  2. πŸ“š Class Documentation - Class purposes, methods, and usage patterns
  3. πŸ›οΈ Architecture Analysis - System design, data flow, and integration points
  4. βš™οΈ Setup/Configuration - Installation guides, environment setup, and troubleshooting
  5. πŸ“– Comprehensive Documentation - General developer-focused explanations

πŸ—‚οΈ File Structure

Doc-Generation/
β”œβ”€β”€ git_manager.py              # Main orchestrator and CLI
β”œβ”€β”€ functions/
β”‚   β”œβ”€β”€ git_clone.py           # Repository cloning operations
β”‚   β”œβ”€β”€ git_analysis.py        # Code analysis and documentation generation
β”‚   β”œβ”€β”€ git_qna.py            # Q&A system with conversation memory
β”‚   β”œβ”€β”€ visual_enhancer.py    # IDE screenshot integration
β”‚   └── workflow_generator.py # AI-powered diagram generation
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ CLAUDE.md                 # Project instructions for Claude Code
β”œβ”€β”€ local-folder/             # Directory for cloned repositories
β”‚   └── chroma/              # Vector database storage for Q&A
β”œβ”€β”€ diagrams/                # Generated workflow diagrams
β”œβ”€β”€ screenshots/             # IDE screenshots (if using --visual)
└── README.md               # This file

πŸ”§ Configuration

Storage Locations

  • Cloned repositories: ./local-folder/
  • Generated reports: Current working directory
  • Vector database: ./local-folder/chroma/ (for Q&A system)
  • Workflow diagrams: ./diagrams/
  • Screenshots: ./screenshots/ (if using visual enhancement)
  • Supported output formats: Markdown (.md) and PDF (.pdf)

Supported File Types

Programming Languages (30+):

  • Python (.py), JavaScript (.js), TypeScript (.ts/.tsx)
  • Java (.java), C/C++ (.c/.cpp/.h), C# (.cs)
  • PHP (.php), Ruby (.rb), Go (.go), Rust (.rs)
  • Swift (.swift), Kotlin (.kt), Scala (.scala)
  • And many more...

Web & Config Files:

  • HTML (.html), CSS (.css/.scss/.less)
  • Vue (.vue), Svelte (.svelte)
  • JSON (.json), YAML (.yml/.yaml), XML (.xml)
  • Markdown (.md), SQL (.sql), TOML (.toml)

File Processing Strategy

  • Filters out build/dependency directories (.git, node_modules, __pycache__, .venv)
  • Limits individual file size to 50KB for OpenAI processing
  • Detects programming languages via file extensions
  • Prioritizes main code files over configuration files
  • Creates vector embeddings for intelligent search and Q&A

🌟 Advanced Features

Multi-Agent Processing

The MultiAgentDocumentationGenerator class provides:

  • Parallel processing of multiple branches using ThreadPoolExecutor (max 4 workers)
  • Safe branch checkout operations
  • Result aggregation across all branches
  • Automatic resource cleanup and thread management

Vector Database & Q&A System

  • ChromaDB Integration: Persistent vector storage for intelligent code search
  • File-Level Organization: Groups code chunks by file for better context
  • Branch-Specific Collections: Separate analysis for different branches
  • Similarity Search: Find relevant code based on semantic similarity
  • Conversation Memory: LangChain-powered memory for continuous dialogues

Visual Enhancements

  • IDE Screenshots: Automatic code screenshot generation with syntax highlighting
  • OCR Verification: Tesseract OCR ensures screenshot accuracy
  • Multiple IDE Support: VS Code, Sublime Text, Atom compatibility
  • Workflow Diagrams: AI-generated flowcharts showing application logic

Folder Selection

  • Interactive prompts for target folder selection
  • Optional folder-specific analysis
  • Folder path validation before analysis
  • Output files include folder names for organization

Git History Analysis

Extracts comprehensive repository insights:

  • Commit count and contributor information
  • Recent activity patterns
  • Branch creation dates and relationships
  • Language distribution statistics

πŸ› Troubleshooting

Common Issues

Repository not found

  • Verify Git URL and network connectivity
  • Check repository access permissions

Branch doesn't exist

  • Check available branches: git ls-remote --heads <url>
  • Ensure branch names are spelled correctly

OpenAI API errors

  • Verify API key validity and format
  • Check OpenAI account credits and usage limits
  • Ensure proper environment variable setup

Q&A System Issues

  • Install missing dependencies: pip install chromadb langchain-community
  • Check if vector database exists in ./local-folder/chroma/
  • Ensure repository has supported file types
  • Clear memory if experiencing context issues: type clear in Q&A session

Visual Enhancement Issues

  • Install required IDE (VS Code, Sublime Text, or Atom)
  • Install Tesseract OCR: brew install tesseract (macOS)
  • Ensure screenshots directory has write permissions

Permission errors

  • Ensure write permissions in working directory
  • Check Git credentials for private repositories

Error Handling

The tool includes comprehensive error handling for:

  • Network connectivity issues
  • Invalid repository URLs
  • Missing branches or folders
  • OpenAI API rate limits and errors
  • File system permission issues
  • Vector database initialization failures
  • Missing IDE dependencies for visual enhancement
  • Conversation memory overflow management

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for providing the GPT-4o-mini API for intelligent code analysis
  • ChromaDB and LangChain for vector database and conversation memory capabilities
  • ReportLab for PDF generation capabilities
  • Tesseract OCR for visual enhancement features
  • The Python community for excellent libraries and tools

Need help? Open an issue or check the troubleshooting section above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages