diff --git a/audit-cli/README.md b/audit-cli/README.md index b057ae5..c5e124c 100644 --- a/audit-cli/README.md +++ b/audit-cli/README.md @@ -54,12 +54,14 @@ The CLI is organized into parent commands with subcommands: ``` audit-cli ├── extract # Extract content from RST files -│ └── code-examples +│ ├── code-examples +│ └── procedures ├── search # Search through extracted content or source files │ └── find-string ├── analyze # Analyze RST file structures │ ├── includes -│ └── usage +│ ├── usage +│ └── procedures └── compare # Compare files across versions └── file-contents ``` @@ -140,6 +142,111 @@ After extraction, the code extraction report shows: - Code examples by language - Code examples by directive type +#### `extract procedures` + +Extract unique procedures from reStructuredText files into individual files. This command parses procedures and creates +one file per unique procedure (grouped by heading and content). Each procedure file represents a distinct piece of content, +even if it appears in multiple selections or variations. + +**Use Cases:** + +This command helps writers: +- Extract all unique procedures from a page for testing or migration +- Generate individual procedure files for each distinct procedure +- Understand how many different procedures exist in a document +- Create standalone procedure files for reuse or testing +- See which selections each procedure appears in + +**Basic Usage:** + +```bash +# Extract all unique procedures from a file +./audit-cli extract procedures path/to/file.rst -o ./output + +# Extract only procedures that appear in a specific selection +./audit-cli extract procedures path/to/file.rst -o ./output --selection "driver, nodejs" + +# Dry run (show what would be extracted without writing files) +./audit-cli extract procedures path/to/file.rst -o ./output --dry-run + +# Verbose output (shows all selections each procedure appears in) +./audit-cli extract procedures path/to/file.rst -o ./output -v + +# Expand include directives inline +./audit-cli extract procedures path/to/file.rst -o ./output --expand-includes +``` + +**Flags:** + +- `-o, --output ` - Output directory for extracted procedure files (default: `./output`) +- `--selection ` - Extract only procedures that appear in a specific selection (e.g., "python", "driver, nodejs") +- `--expand-includes` - Expand include directives inline instead of preserving them +- `--dry-run` - Show what would be extracted without writing files +- `-v, --verbose` - Show detailed processing information including all selections each procedure appears in + +**Output Format:** + +Extracted files are named: `{heading}_{first-step-title}_{hash}.rst` + +The filename includes: +- **Heading**: The section heading above the procedure +- **First step title**: The title of the first step (for readability) +- **Hash**: A short 6-character hash of the content (for uniqueness) + +Examples: +- `before-you-begin_pull-the-mongodb-docker-image_e8eeec.rst` +- `install-mongodb-community-edition_download-the-tarball_44c437.rst` +- `configuration_create-the-data-and-log-directories_f1d35b.rst` + +**Verbose Output:** + +With the `-v` flag, the command shows detailed information about each procedure: + +``` +Found 36 unique procedure(s): + +1. Before You Begin + Output file: before-you-begin-pull-the-mongodb-docker-image-e8eeec.rst + Steps: 5 + Appears in 2 selection(s): + - docker, None, None, None, None, None, without-search-docker + - docker, None, None, None, None, None, with-search-docker + +2. Install MongoDB Community Edition + Output file: install-mongodb-community-edition-download-the-tarball-44c437.rst + Steps: 4 + Appears in 1 selection(s): + - macos, None, None, tarball, None, None, None +``` + +**Supported Procedure Types:** + +The command recognizes and extracts: +- `.. procedure::` directives with `.. step::` directives +- Ordered lists (numbered or lettered) as procedures +- `.. tabs::` directives with `:tabid:` options for variations +- `.. composable-tutorial::` directives with `.. selected-content::` blocks +- Sub-procedures (ordered lists within steps) +- YAML steps files (automatically converted to RST format) + +**How Uniqueness is Determined:** + +Procedures are grouped by: +1. **Heading**: The section heading above the procedure +2. **Content hash**: A hash of the procedure's steps and content + +This means: +- Procedures with the same heading but different content are treated as separate unique procedures +- Procedures with identical content that appear in multiple selections are extracted once +- The output file shows all selections where that procedure appears (visible with `-v` flag) + +**Report:** + +After extraction, the report shows: +- Number of unique procedures extracted +- Number of files written +- Detailed list of procedures with step counts and selections (with `-v` flag) + ### Search Commands #### `search find-string` @@ -515,6 +622,153 @@ include : 3 files, 4 usages ./audit-cli analyze usage ~/docs/source/includes/fact.rst --exclude "*/deprecated/*" ``` +#### `analyze procedures` + +Analyze procedures in reStructuredText files to understand procedure complexity, uniqueness, and how they appear across +different selections. + +This command parses procedures from RST files and provides statistics about: +- Total number of unique procedures (grouped by heading and content) +- Total number of procedure appearances across all selections +- Implementation types (procedure directive vs ordered list) +- Step counts for each procedure +- Detection of sub-procedures (ordered lists within steps) +- All selections where each procedure appears + +**Use Cases:** + +This command helps writers: +- Understand the complexity of procedures in a document +- Count how many unique procedures exist vs. how many times they appear +- Identify procedures that use different implementation approaches +- See which selections each procedure appears in +- Plan testing coverage for procedure variations +- Scope work related to procedure updates + +**Basic Usage:** + +```bash +# Get summary count of unique procedures and total appearances +./audit-cli analyze procedures path/to/file.rst + +# Show summary with incremental reporting flags +./audit-cli analyze procedures path/to/file.rst --list-summary + +# List all unique procedures with full details +./audit-cli analyze procedures path/to/file.rst --list-all + +# Expand include directives inline before analyzing +./audit-cli analyze procedures path/to/file.rst --expand-includes +``` + +**Flags:** + +- `--list-summary` - Show summary statistics plus a list of procedure headings +- `--list-all` - Show full details for each procedure including steps, selections, and implementation +- `--expand-includes` - Expand include directives inline instead of preserving them + +**Output:** + +**Default output (summary only):** +``` +File: path/to/file.rst +Total unique procedures: 36 +Total procedure appearances: 93 +``` + +**With `--list-summary`:** +``` +File: path/to/file.rst +Total unique procedures: 36 +Total procedure appearances: 93 + +Unique Procedures: + 1. Before You Begin + 2. Install MongoDB Community Edition + 3. Configuration + 4. Run MongoDB Community Edition + ... +``` + +**With `--list-all`:** +``` +File: path/to/file.rst +Total unique procedures: 36 +Total procedure appearances: 93 + +================================================================================ +Procedure Details +================================================================================ + +1. Before You Begin + Line: 45 + Implementation: procedure-directive + Steps: 5 + Contains sub-procedures: no + Appears in 2 selection(s): + - docker, None, None, None, None, None, without-search-docker + - docker, None, None, None, None, None, with-search-docker + + Steps: + 1. Pull the MongoDB Docker Image + 2. Run the MongoDB Docker Container + 3. Verify MongoDB is Running + 4. Connect to MongoDB + 5. Stop the MongoDB Docker Container + +2. Install MongoDB Community Edition + Line: 123 + Implementation: ordered-list + Steps: 4 + Contains sub-procedures: yes + Appears in 10 selection(s): + - linux, None, None, tarball, None, None, with-search + - linux, None, None, tarball, None, None, without-search + ... + + Steps: + 1. Download the tarball + 2. Extract the files from the tarball + 3. Ensure the binaries are in a directory listed in your PATH + 4. Run MongoDB Community Edition +``` + +**Understanding the Counts:** + +The command reports two key metrics: + +1. **Total unique procedures**: Number of distinct procedures (grouped by heading and content hash) + - Procedures with the same heading but different content are counted separately + - Procedures with identical content are counted once, even if they appear in multiple selections + +2. **Total procedure appearances**: Total number of times procedures appear across all selections + - If a procedure appears in 5 different selections, it contributes 5 to this count + - This represents the total number of procedure instances a user might encounter + +**Example:** +- A file might have **36 unique procedures** that appear a total of **93 times** across different selections +- This means some procedures appear in multiple selections (e.g., a "Before You Begin" procedure that's the same for Docker with and without search) + +**Supported Procedure Types:** + +The command recognizes: +- `.. procedure::` directives with `.. step::` directives +- Ordered lists (numbered or lettered) as procedures +- `.. tabs::` directives with `:tabid:` options for variations +- `.. composable-tutorial::` directives with `.. selected-content::` blocks +- Sub-procedures (ordered lists within steps) +- YAML steps files (automatically converted to RST format) + +**Deterministic Parsing:** + +The parser ensures deterministic results by: +- Sorting all map iterations to ensure consistent ordering +- Sorting procedures by line number +- Computing content hashes in a consistent manner +- This guarantees the same file will always produce the same counts and groupings + +For more details about procedure parsing logic, refer to [docs/PROCEDURE_PARSING.md](docs/PROCEDURE_PARSING.md). + ### Compare Commands #### `compare file-contents` @@ -682,72 +936,90 @@ since features may be added or removed across versions. ``` audit-cli/ -├── main.go # CLI entry point -├── commands/ # Command implementations -│ ├── extract/ # Extract parent command -│ │ ├── extract.go # Parent command definition -│ │ └── code-examples/ # Code examples subcommand -│ │ ├── code_examples.go # Command logic -│ │ ├── code_examples_test.go # Tests -│ │ ├── parser.go # RST directive parsing -│ │ ├── writer.go # File writing logic -│ │ ├── report.go # Report generation -│ │ ├── types.go # Type definitions -│ │ └── language.go # Language normalization -│ ├── search/ # Search parent command -│ │ ├── search.go # Parent command definition -│ │ └── find-string/ # Find string subcommand -│ │ ├── find_string.go # Command logic -│ │ ├── types.go # Type definitions -│ │ └── report.go # Report generation -│ ├── analyze/ # Analyze parent command -│ │ ├── analyze.go # Parent command definition -│ │ ├── includes/ # Includes analysis subcommand -│ │ │ ├── includes.go # Command logic -│ │ │ ├── analyzer.go # Include tree building -│ │ │ ├── output.go # Output formatting -│ │ │ └── types.go # Type definitions -│ │ └── usage/ # Usage analysis subcommand -│ │ ├── usage.go # Command logic -│ │ ├── usage_test.go # Tests -│ │ ├── analyzer.go # Reference finding logic -│ │ ├── output.go # Output formatting -│ │ └── types.go # Type definitions -│ └── compare/ # Compare parent command -│ ├── compare.go # Parent command definition -│ └── file-contents/ # File contents comparison subcommand -│ ├── file_contents.go # Command logic -│ ├── file_contents_test.go # Tests -│ ├── comparer.go # Comparison logic -│ ├── differ.go # Diff generation -│ ├── output.go # Output formatting -│ ├── types.go # Type definitions -│ └── version_resolver.go # Version path resolution -├── internal/ # Internal packages -│ ├── pathresolver/ # Path resolution utilities -│ │ ├── pathresolver.go # Core path resolution -│ │ ├── pathresolver_test.go # Tests -│ │ ├── source_finder.go # Source directory detection -│ │ ├── version_resolver.go # Version path resolution -│ │ └── types.go # Type definitions -│ └── rst/ # RST parsing utilities -│ ├── parser.go # Generic parsing with includes -│ ├── include_resolver.go # Include directive resolution -│ ├── directive_parser.go # Directive parsing -│ └── file_utils.go # File utilities -└── testdata/ # Test fixtures - ├── input-files/ # Test RST files - │ └── source/ # Source directory (required) - │ ├── *.rst # Test files - │ ├── includes/ # Included RST files - │ └── code-examples/ # Code files for literalinclude - ├── expected-output/ # Expected extraction results - └── compare/ # Compare command test data - ├── product/ # Version structure tests - │ ├── manual/ # Manual version - │ ├── upcoming/ # Upcoming version - │ └── v8.0/ # v8.0 version - └── *.txt # Direct comparison tests +├── main.go # CLI entry point +├── commands/ # Command implementations +│ ├── extract/ # Extract parent command +│ │ ├── extract.go # Parent command definition +│ │ ├── code-examples/ # Code examples subcommand +│ │ │ ├── code_examples.go # Command logic +│ │ │ ├── code_examples_test.go # Tests +│ │ │ ├── parser.go # RST directive parsing +│ │ │ ├── writer.go # File writing logic +│ │ │ ├── report.go # Report generation +│ │ │ ├── types.go # Type definitions +│ │ │ └── language.go # Language normalization +│ │ └── procedures/ # Procedures extraction subcommand +│ │ ├── procedures.go # Command logic +│ │ ├── procedures_test.go # Tests +│ │ ├── parser.go # Filename generation and filtering +│ │ ├── writer.go # RST file writing +│ │ └── types.go # Type definitions +│ ├── search/ # Search parent command +│ │ ├── search.go # Parent command definition +│ │ └── find-string/ # Find string subcommand +│ │ ├── find_string.go # Command logic +│ │ ├── types.go # Type definitions +│ │ └── report.go # Report generation +│ ├── analyze/ # Analyze parent command +│ │ ├── analyze.go # Parent command definition +│ │ ├── includes/ # Includes analysis subcommand +│ │ │ ├── includes.go # Command logic +│ │ │ ├── analyzer.go # Include tree building +│ │ │ ├── output.go # Output formatting +│ │ │ └── types.go # Type definitions +│ │ ├── procedures/ # Procedures analysis subcommand +│ │ │ ├── procedures.go # Command logic +│ │ │ ├── procedures_test.go # Tests +│ │ │ ├── analyzer.go # Procedure analysis logic +│ │ │ ├── output.go # Output formatting +│ │ │ └── types.go # Type definitions +│ │ └── usage/ # Usage analysis subcommand +│ │ ├── usage.go # Command logic +│ │ ├── usage_test.go # Tests +│ │ ├── analyzer.go # Reference finding logic +│ │ ├── output.go # Output formatting +│ │ └── types.go # Type definitions +│ └── compare/ # Compare parent command +│ ├── compare.go # Parent command definition +│ └── file-contents/ # File contents comparison subcommand +│ ├── file_contents.go # Command logic +│ ├── file_contents_test.go # Tests +│ ├── comparer.go # Comparison logic +│ ├── differ.go # Diff generation +│ ├── output.go # Output formatting +│ ├── types.go # Type definitions +│ └── version_resolver.go # Version path resolution +├── internal/ # Internal packages +│ ├── pathresolver/ # Path resolution utilities +│ │ ├── pathresolver.go # Core path resolution +│ │ ├── pathresolver_test.go # Tests +│ │ ├── source_finder.go # Source directory detection +│ │ ├── version_resolver.go # Version path resolution +│ │ └── types.go # Type definitions +│ └── rst/ # RST parsing utilities +│ ├── parser.go # Generic parsing with includes +│ ├── include_resolver.go # Include directive resolution +│ ├── directive_parser.go # Directive parsing +│ ├── directive_regex.go # Directive regex patterns +│ ├── parse_procedures.go # Procedure parsing (core logic) +│ ├── parse_procedures_test.go # Procedure parsing tests +│ ├── get_procedure_variations.go # Variation extraction logic +│ ├── get_procedure_variations_test.go # Variation tests +│ ├── procedure_types.go # Procedure type definitions +│ └── file_utils.go # File utilities +└── testdata/ # Test fixtures + ├── input-files/ # Test RST files + │ └── source/ # Source directory (required) + │ ├── *.rst # Test files + │ ├── includes/ # Included RST files + │ └── code-examples/ # Code files for literalinclude + ├── expected-output/ # Expected extraction results + └── compare/ # Compare command test data + ├── product/ # Version structure tests + │ ├── manual/ # Manual version + │ ├── upcoming/ # Upcoming version + │ └── v8.0/ # v8.0 version + └── *.txt # Direct comparison tests ``` ### Adding New Commands diff --git a/audit-cli/commands/analyze/analyze.go b/audit-cli/commands/analyze/analyze.go index 222ea1b..d3cd9a0 100644 --- a/audit-cli/commands/analyze/analyze.go +++ b/audit-cli/commands/analyze/analyze.go @@ -4,12 +4,14 @@ // Currently supports: // - includes: Analyze include directive relationships in RST files // - usage: Find all files that use a target file +// - procedures: Analyze procedure variations and statistics // // Future subcommands could include analyzing cross-references, broken links, or content metrics. package analyze import ( "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/includes" + "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/procedures" "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/usage" "github.com/spf13/cobra" ) @@ -27,6 +29,7 @@ func NewAnalyzeCommand() *cobra.Command { Currently supports: - includes: Analyze include directive relationships (forward dependencies) - usage: Find all files that use a target file (reverse dependencies) + - procedures: Analyze procedure variations and statistics Future subcommands may support analyzing cross-references, broken links, or content metrics.`, } @@ -34,6 +37,7 @@ Future subcommands may support analyzing cross-references, broken links, or cont // Add subcommands cmd.AddCommand(includes.NewIncludesCommand()) cmd.AddCommand(usage.NewUsageCommand()) + cmd.AddCommand(procedures.NewProceduresCommand()) return cmd } diff --git a/audit-cli/commands/analyze/procedures/analyzer.go b/audit-cli/commands/analyze/procedures/analyzer.go new file mode 100644 index 0000000..c634945 --- /dev/null +++ b/audit-cli/commands/analyze/procedures/analyzer.go @@ -0,0 +1,144 @@ +package procedures + +import ( + "fmt" + + "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" +) + +// AnalyzeFile analyzes procedures in a file and returns a report. +// +// This function parses all procedures from the file and generates analysis +// information including variation counts, step counts, implementation types, +// and sub-procedure detection. +// +// This function expands include directives to properly detect variations that +// may be defined in included files. +// +// Parameters: +// - filePath: Path to the RST file to analyze +// +// Returns: +// - *AnalysisReport: Analysis report containing all procedure information +// - error: Any error encountered during analysis +func AnalyzeFile(filePath string) (*AnalysisReport, error) { + return AnalyzeFileWithOptions(filePath, true) +} + +// AnalyzeFileWithOptions analyzes procedures in a file with options and returns a report. +// +// This function parses all procedures from the file and generates analysis +// information including variation counts, step counts, implementation types, +// and sub-procedure detection. +// +// Parameters: +// - filePath: Path to the RST file to analyze +// - expandIncludes: If true, expands include directives inline +// +// Returns: +// - *AnalysisReport: Analysis report containing all procedure information +// - error: Any error encountered during analysis +func AnalyzeFileWithOptions(filePath string, expandIncludes bool) (*AnalysisReport, error) { + // Parse all procedures from the file + procedures, err := rst.ParseProceduresWithOptions(filePath, expandIncludes) + if err != nil { + return nil, fmt.Errorf("failed to parse procedures from %s: %w", filePath, err) + } + + // Create the report + report := NewAnalysisReport(filePath) + + // Group procedures from the same tab set + // Track which tab sets we've already processed + processedTabSets := make(map[*rst.TabSetInfo]bool) + + for _, procedure := range procedures { + // If this procedure is part of a tab set and we haven't processed it yet + if procedure.TabSet != nil && !processedTabSets[procedure.TabSet] { + // Mark this tab set as processed + processedTabSets[procedure.TabSet] = true + + // Create a grouped analysis for all procedures in this tab set + analysis := analyzeTabSet(procedure.TabSet) + report.AddProcedure(analysis) + } else if procedure.TabSet == nil { + // Regular procedure (not part of a tab set) + analysis := analyzeProcedure(procedure) + report.AddProcedure(analysis) + } + // Skip procedures that are part of an already-processed tab set + } + + return report, nil +} + +// analyzeProcedure analyzes a single procedure and returns analysis results. +func analyzeProcedure(procedure rst.Procedure) ProcedureAnalysis { + // Get variations + variations := rst.GetProcedureVariations(procedure) + + // If no variations, count as 1 (single variation) + variationCount := len(variations) + if variationCount == 0 { + variationCount = 1 + variations = []string{"(no variations)"} + } + + // Count steps + stepCount := len(procedure.Steps) + + // Determine implementation type + implementation := string(procedure.Type) + + // Check for sub-steps + hasSubSteps := procedure.HasSubSteps + + return ProcedureAnalysis{ + Procedure: procedure, + Variations: variations, + VariationCount: variationCount, + StepCount: stepCount, + HasSubSteps: hasSubSteps, + Implementation: implementation, + } +} + +// analyzeTabSet analyzes a tab set containing multiple procedure variations. +// This groups all procedures from the same tab set for reporting purposes. +func analyzeTabSet(tabSet *rst.TabSetInfo) ProcedureAnalysis { + // Use the first procedure as the representative + // (they all have the same title/heading) + var firstProc rst.Procedure + for _, tabID := range tabSet.TabIDs { + if proc, ok := tabSet.Procedures[tabID]; ok { + firstProc = proc + break + } + } + + // Get all tab IDs as variations + variations := tabSet.TabIDs + + // Count total variations + variationCount := len(variations) + + // Use the step count from the first procedure + // (each tab may have different step counts, but we report the first one) + stepCount := len(firstProc.Steps) + + // Determine implementation type + implementation := string(firstProc.Type) + + // Check for sub-steps + hasSubSteps := firstProc.HasSubSteps + + return ProcedureAnalysis{ + Procedure: firstProc, + Variations: variations, + VariationCount: variationCount, + StepCount: stepCount, + HasSubSteps: hasSubSteps, + Implementation: implementation, + } +} + diff --git a/audit-cli/commands/analyze/procedures/output.go b/audit-cli/commands/analyze/procedures/output.go new file mode 100644 index 0000000..3b2b6fe --- /dev/null +++ b/audit-cli/commands/analyze/procedures/output.go @@ -0,0 +1,197 @@ +package procedures + +import ( + "fmt" + "strings" +) + +// OutputOptions controls what information is displayed in the output. +type OutputOptions struct { + ListAll bool // List all variations with their selection/tabid values + ListSummary bool // List procedures grouped by heading without selection details + Implementation bool // Show how each procedure is implemented + SubProcedures bool // Indicate if procedures contain nested sub-procedures + StepCount bool // Show step count for each procedure +} + +// PrintReport prints the analysis report to stdout based on the output options. +func PrintReport(report *AnalysisReport, options OutputOptions) { + // If no special options are set, just print the count + if !options.ListAll && !options.ListSummary && !options.Implementation && !options.SubProcedures && !options.StepCount { + printSummary(report) + return + } + + // Print detailed report + printDetailedReport(report, options) +} + +// groupProceduresByHeading groups procedures by their heading and returns the groups and order. +func groupProceduresByHeading(procedures []ProcedureAnalysis) (map[string][]ProcedureAnalysis, []string) { + headingGroups := make(map[string][]ProcedureAnalysis) + headingOrder := []string{} + + for _, analysis := range procedures { + heading := analysis.Procedure.Title + if heading == "" { + heading = "(Untitled)" + } + + if _, exists := headingGroups[heading]; !exists { + headingOrder = append(headingOrder, heading) + } + headingGroups[heading] = append(headingGroups[heading], analysis) + } + + return headingGroups, headingOrder +} + +// calculateTotals calculates total unique procedures and appearances from grouped data. +func calculateTotals(headingGroups map[string][]ProcedureAnalysis) (int, int) { + totalUniqueProcedures := 0 + totalAppearances := 0 + + for _, procedures := range headingGroups { + totalUniqueProcedures += len(procedures) + for _, proc := range procedures { + totalAppearances += proc.VariationCount + } + } + + return totalUniqueProcedures, totalAppearances +} + +// printSummary prints a summary of the analysis. +func printSummary(report *AnalysisReport) { + fmt.Printf("File: %s\n", report.FilePath) + fmt.Printf("Total unique procedures: %d\n", len(report.Procedures)) + fmt.Printf("Total procedure appearances: %d\n", report.TotalVariations) +} + +// printDetailedReport prints a detailed analysis report. +func printDetailedReport(report *AnalysisReport, options OutputOptions) { + fmt.Printf("Procedure Analysis for: %s\n", report.FilePath) + fmt.Println(strings.Repeat("=", 80)) + + // Group procedures by heading first to get accurate counts + headingGroups, headingOrder := groupProceduresByHeading(report.Procedures) + totalUniqueProcedures, totalAppearances := calculateTotals(headingGroups) + + fmt.Printf("\nTotal unique procedures: %d\n", totalUniqueProcedures) + fmt.Printf("Total procedure appearances: %d\n\n", totalAppearances) + + // Print implementation type summary if requested + if options.Implementation { + fmt.Println("Procedures by implementation type:") + for implType, count := range report.ProceduresByType { + fmt.Printf(" - %s: %d\n", implType, count) + } + fmt.Println() + } + + // Print details grouped by heading (headingGroups already created above) + fmt.Println("Procedures by Heading:") + fmt.Println(strings.Repeat("-", 80)) + + headingNum := 1 + for _, heading := range headingOrder { + procedures := headingGroups[heading] + + fmt.Printf("\n%d. %s\n", headingNum, heading) + fmt.Printf(" Unique procedures: %d\n", len(procedures)) + + // Calculate total appearances for this heading + totalAppearances := 0 + for _, proc := range procedures { + totalAppearances += proc.VariationCount + } + fmt.Printf(" Total appearances: %d\n", totalAppearances) + + // If only showing summary, skip the individual procedure details + if options.ListSummary && !options.ListAll { + headingNum++ + continue + } + + // Determine if we need sub-numbering (only when there are multiple unique procedures) + useSubNumbering := len(procedures) > 1 + + // Show each unique procedure under this heading + for i, analysis := range procedures { + fmt.Printf("\n ") + + // Only show sub-numbering if there are multiple unique procedures + if useSubNumbering { + fmt.Printf("%d.%d. ", headingNum, i+1) + } + + // Show the first step to distinguish procedures (only if there are multiple) + if useSubNumbering { + if len(analysis.Procedure.Steps) > 0 && analysis.Procedure.Steps[0].Title != "" { + fmt.Printf("%s\n", analysis.Procedure.Steps[0].Title) + } else if len(analysis.Procedure.Steps) > 0 { + fmt.Printf("(Untitled first step)\n") + } else { + fmt.Printf("(No steps)\n") + } + } else { + // For single procedures, just show the step count + fmt.Printf("Steps: %d\n", len(analysis.Procedure.Steps)) + } + + // Indent based on whether we're using sub-numbering + indent := " " + if !useSubNumbering { + indent = " " + } + + // Only show step count if we already showed the first step title + if useSubNumbering { + fmt.Printf("%sSteps: %d\n", indent, len(analysis.Procedure.Steps)) + } + + // Print implementation type if requested + if options.Implementation { + fmt.Printf("%sImplementation: %s\n", indent, analysis.Implementation) + } + + // Print sub-procedures flag if requested + if options.SubProcedures { + if analysis.HasSubSteps { + fmt.Printf("%sContains sub-procedures: yes\n", indent) + } else { + fmt.Printf("%sContains sub-procedures: no\n", indent) + } + } + + // Print selections if requested + if options.ListAll { + if analysis.VariationCount == 1 { + fmt.Printf("%sAppears in 1 selection:\n", indent) + } else { + fmt.Printf("%sAppears in %d selections:\n", indent, analysis.VariationCount) + } + + if len(analysis.Variations) > 0 && analysis.Variations[0] != "(no variations)" { + for _, variation := range analysis.Variations { + fmt.Printf("%s - %s\n", indent, variation) + } + } else { + fmt.Printf("%s (single variation, no tabs or selections)\n", indent) + } + } else if options.ListSummary { + // For summary, just show the count without listing all selections + if analysis.VariationCount == 1 { + fmt.Printf("%sAppears in 1 selection\n", indent) + } else { + fmt.Printf("%sAppears in %d selections\n", indent, analysis.VariationCount) + } + } + } + + headingNum++ + } + + fmt.Println() +} + diff --git a/audit-cli/commands/analyze/procedures/procedures.go b/audit-cli/commands/analyze/procedures/procedures.go new file mode 100644 index 0000000..10a59b9 --- /dev/null +++ b/audit-cli/commands/analyze/procedures/procedures.go @@ -0,0 +1,104 @@ +// Package procedures provides functionality for analyzing procedures in RST files. +// +// This package implements the "analyze procedures" subcommand, which parses +// reStructuredText files and analyzes procedure variations, providing statistics +// and details about: +// - Number of procedures and variations +// - Implementation types (procedure directive vs ordered list) +// - Step counts +// - Sub-procedure detection +// - Variation listings (composable tutorial selections and tabids) +package procedures + +import ( + "fmt" + "os" + + "github.com/spf13/cobra" +) + +// NewProceduresCommand creates the procedures subcommand for analysis. +// +// This command analyzes procedures in RST files and outputs statistics and details +// based on the specified flags: +// - Default: Count of procedure variations +// - --list: List all variations with their selection/tabid values +// - --implementation: Show how each procedure is implemented +// - --sub-procedures: Indicate if procedures contain nested sub-procedures +// - --step-count: Show step count for each procedure +func NewProceduresCommand() *cobra.Command { + var ( + listAll bool + listSummary bool + implementation bool + subProcedures bool + stepCount bool + ) + + cmd := &cobra.Command{ + Use: "procedures [filepath]", + Short: "Analyze procedure variations in reStructuredText files", + Long: `Analyze procedure variations in reStructuredText files. + +This command parses procedures from RST files and provides analysis including: + - Total count of procedures and variations + - Implementation types (procedure directive vs ordered list) + - Step counts for each procedure + - Detection of sub-procedures (ordered lists within steps) + - Listing of all variations (composable tutorial selections and tabids) + +By default, outputs a summary count. Use flags to get more detailed information.`, + Args: cobra.ExactArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + filePath := args[0] + + options := OutputOptions{ + ListAll: listAll, + ListSummary: listSummary, + Implementation: implementation, + SubProcedures: subProcedures, + StepCount: stepCount, + } + + return runAnalyze(filePath, options) + }, + } + + cmd.Flags().BoolVar(&listAll, "list-all", false, "List all procedures with full selection details") + cmd.Flags().BoolVar(&listSummary, "list-summary", false, "List procedures grouped by heading without selection details") + cmd.Flags().BoolVar(&implementation, "implementation", false, "Show how each procedure is implemented") + cmd.Flags().BoolVar(&subProcedures, "sub-procedures", false, "Indicate if procedures contain nested sub-procedures") + cmd.Flags().BoolVar(&stepCount, "step-count", false, "Show step count for each procedure") + + return cmd +} + +// runAnalyze executes the analysis operation. +func runAnalyze(filePath string, options OutputOptions) error { + // Verify the file exists + fileInfo, err := os.Stat(filePath) + if err != nil { + return fmt.Errorf("failed to access path %s: %w", filePath, err) + } + + if fileInfo.IsDir() { + return fmt.Errorf("path %s is a directory; please specify a file", filePath) + } + + // Analyze the file + report, err := AnalyzeFile(filePath) + if err != nil { + return err + } + + if report.TotalProcedures == 0 { + fmt.Println("No procedures found in the file.") + return nil + } + + // Print the report + PrintReport(report, options) + + return nil +} + diff --git a/audit-cli/commands/analyze/procedures/procedures_test.go b/audit-cli/commands/analyze/procedures/procedures_test.go new file mode 100644 index 0000000..14bbd5f --- /dev/null +++ b/audit-cli/commands/analyze/procedures/procedures_test.go @@ -0,0 +1,222 @@ +package procedures + +import ( + "testing" +) + +func TestAnalyzeFile(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + report, err := AnalyzeFile(testFile) + if err != nil { + t.Fatalf("AnalyzeFile failed: %v", err) + } + + // Expected: 5 unique procedures (grouped by heading + content hash) + // 1. Simple Procedure with Steps (1 unique) + // 2. Procedure with Tabs (1 unique, appears in 3 selections: shell, nodejs, python) + // 3. Composable Tutorial (1 unique, appears in 3 selections: driver/nodejs, driver/python, atlas-cli/none) + // 4. Ordered List Procedure (1 unique) + // 5. Procedure with Sub-steps (1 unique) + if report.TotalProcedures != 5 { + t.Errorf("Expected to find 5 unique procedures, but got %d", report.TotalProcedures) + } + + // Expected total appearances: 1 + 3 + 3 + 1 + 1 = 9 + if report.TotalVariations != 9 { + t.Errorf("Expected to find 9 total procedure appearances, but got %d", report.TotalVariations) + } + + // Verify implementation types + if report.ProceduresByType["procedure-directive"] != 4 { + t.Errorf("Expected 4 procedure-directive implementations, got %d", report.ProceduresByType["procedure-directive"]) + } + if report.ProceduresByType["ordered-list"] != 1 { + t.Errorf("Expected 1 ordered-list implementation, got %d", report.ProceduresByType["ordered-list"]) + } + + t.Logf("Found %d unique procedures with %d total appearances", report.TotalProcedures, report.TotalVariations) +} + +func TestAnalyzeFileNonExistent(t *testing.T) { + _, err := AnalyzeFile("nonexistent-file.rst") + if err == nil { + t.Error("Expected error for nonexistent file, but got none") + } +} + +func TestAnalyzeFileDeterministic(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + // Run analysis multiple times to ensure deterministic results + var reports []*AnalysisReport + for i := 0; i < 5; i++ { + report, err := AnalyzeFile(testFile) + if err != nil { + t.Fatalf("AnalyzeFile failed on iteration %d: %v", i, err) + } + reports = append(reports, report) + } + + // Verify all runs produce the same counts + for i := 1; i < len(reports); i++ { + if reports[i].TotalProcedures != reports[0].TotalProcedures { + t.Errorf("Iteration %d: TotalProcedures = %d, want %d (non-deterministic!)", + i, reports[i].TotalProcedures, reports[0].TotalProcedures) + } + if reports[i].TotalVariations != reports[0].TotalVariations { + t.Errorf("Iteration %d: TotalVariations = %d, want %d (non-deterministic!)", + i, reports[i].TotalVariations, reports[0].TotalVariations) + } + } +} + +func TestAnalyzeFileWithExpandIncludes(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-with-includes.rst" + + // The analyze command always expands includes (it calls AnalyzeFile which uses expandIncludes=true) + // This test verifies that include expansion works correctly for detecting variations + // in selected-content blocks within included files. + + // With expanding includes (the default behavior) + reportExpand, err := AnalyzeFileWithOptions(testFile, true) + if err != nil { + t.Fatalf("AnalyzeFile with expand failed: %v", err) + } + + // Should find 1 unique procedure (the composable tutorial) + if reportExpand.TotalProcedures != 1 { + t.Errorf("With expand: expected 1 unique procedure, got %d", reportExpand.TotalProcedures) + } + + // Should detect 3 variations (driver/nodejs, driver/python, atlas-cli/none) + // from the selected-content blocks in the included files + if reportExpand.TotalVariations != 3 { + t.Errorf("With expand: expected 3 appearances, got %d", reportExpand.TotalVariations) + } + + // Verify the procedure has the expected variations + if len(reportExpand.Procedures) > 0 { + proc := reportExpand.Procedures[0] + if len(proc.Variations) != 3 { + t.Errorf("Expected 3 variations, got %d: %v", len(proc.Variations), proc.Variations) + } + + // Verify expected selections + expectedSelections := map[string]bool{ + "driver, nodejs": true, + "driver, python": true, + "atlas-cli, none": true, + } + for _, variation := range proc.Variations { + if !expectedSelections[variation] { + t.Errorf("Unexpected variation: %s", variation) + } + } + } + + t.Logf("With expand: %d procedures, %d appearances", reportExpand.TotalProcedures, reportExpand.TotalVariations) +} + +func TestPrintReport(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + report, err := AnalyzeFile(testFile) + if err != nil { + t.Fatalf("AnalyzeFile failed: %v", err) + } + + // Test with default options (just summary) + options := OutputOptions{} + PrintReport(report, options) + + // Test with ListSummary + options = OutputOptions{ + ListSummary: true, + } + PrintReport(report, options) + + // Test with ListAll and all details + options = OutputOptions{ + ListAll: true, + Implementation: true, + SubProcedures: true, + StepCount: true, + } + PrintReport(report, options) + + // This test just ensures PrintReport doesn't panic + // In a real test, we might capture stdout and verify the output +} + +func TestProcedureAnalysisDetails(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + report, err := AnalyzeFile(testFile) + if err != nil { + t.Fatalf("AnalyzeFile failed: %v", err) + } + + // Verify we have the expected procedures + expectedTitles := map[string]bool{ + "Simple Procedure with Steps": true, + "Procedure with Tabs": true, + "Composable Tutorial Example": true, + "Ordered List Procedure": true, + "Procedure with Sub-steps": true, + } + + for _, proc := range report.Procedures { + if !expectedTitles[proc.Procedure.Title] { + t.Errorf("Unexpected procedure title: %s", proc.Procedure.Title) + } + + // Verify step counts are reasonable + if proc.StepCount == 0 { + t.Errorf("Procedure '%s' has 0 steps", proc.Procedure.Title) + } + } +} + +func TestAnalyzeTabsWithProcedures(t *testing.T) { + testFile := "../../../testdata/input-files/source/tabs-with-procedures.rst" + + report, err := AnalyzeFile(testFile) + if err != nil { + t.Fatalf("AnalyzeFile failed: %v", err) + } + + // Expected: 1 unique procedure (grouped as a tab set) + // with 3 appearances (macos, ubuntu, windows) + if report.TotalProcedures != 1 { + t.Errorf("Expected to find 1 unique procedure, but got %d", report.TotalProcedures) + } + + // Expected total appearances: 3 (one for each tab) + if report.TotalVariations != 3 { + t.Errorf("Expected to find 3 total procedure appearances, but got %d", report.TotalVariations) + } + + // Verify the procedure has the expected variations + if len(report.Procedures) > 0 { + proc := report.Procedures[0] + if len(proc.Variations) != 3 { + t.Errorf("Expected 3 variations, got %d: %v", len(proc.Variations), proc.Variations) + } + + // Verify expected tab IDs + expectedTabs := map[string]bool{ + "macos": true, + "ubuntu": true, + "windows": true, + } + for _, variation := range proc.Variations { + if !expectedTabs[variation] { + t.Errorf("Unexpected variation: %s", variation) + } + } + } + + t.Logf("Found %d unique procedures with %d total appearances", report.TotalProcedures, report.TotalVariations) +} + diff --git a/audit-cli/commands/analyze/procedures/types.go b/audit-cli/commands/analyze/procedures/types.go new file mode 100644 index 0000000..7ffcffa --- /dev/null +++ b/audit-cli/commands/analyze/procedures/types.go @@ -0,0 +1,40 @@ +package procedures + +import "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" + +// ProcedureAnalysis contains analysis results for a single procedure. +type ProcedureAnalysis struct { + Procedure rst.Procedure // The procedure being analyzed + Variations []string // List of variation identifiers + VariationCount int // Number of variations + StepCount int // Number of steps + HasSubSteps bool // Whether procedure contains sub-steps + Implementation string // How the procedure is implemented (directive or ordered-list) +} + +// AnalysisReport contains the complete analysis results for a file. +type AnalysisReport struct { + FilePath string // Path to the analyzed file + Procedures []ProcedureAnalysis // Analysis for each procedure + TotalProcedures int // Total number of procedures + TotalVariations int // Total number of variations across all procedures + ProceduresByType map[string]int // Count of procedures by implementation type +} + +// NewAnalysisReport creates a new analysis report. +func NewAnalysisReport(filePath string) *AnalysisReport { + return &AnalysisReport{ + FilePath: filePath, + Procedures: []ProcedureAnalysis{}, + ProceduresByType: make(map[string]int), + } +} + +// AddProcedure adds a procedure analysis to the report. +func (r *AnalysisReport) AddProcedure(analysis ProcedureAnalysis) { + r.Procedures = append(r.Procedures, analysis) + r.TotalProcedures++ + r.TotalVariations += analysis.VariationCount + r.ProceduresByType[analysis.Implementation]++ +} + diff --git a/audit-cli/commands/extract/code-examples/code_examples_test.go b/audit-cli/commands/extract/code-examples/code_examples_test.go index 4187c45..b6c63ee 100644 --- a/audit-cli/commands/extract/code-examples/code_examples_test.go +++ b/audit-cli/commands/extract/code-examples/code_examples_test.go @@ -582,8 +582,8 @@ func TestNoFlagsOnDirectory(t *testing.T) { // Should NOT include files in includes/ subdirectory // Expected: code-block-test.rst, duplicate-include-test.rst, include-test.rst, // io-code-block-test.rst, literalinclude-test.rst, nested-code-block-test.rst, - // nested-include-test.rst, index.rst (8 files) - expectedFiles := 8 + // nested-include-test.rst, index.rst, procedure-test.rst, procedure-with-includes.rst (10 files) + expectedFiles := 11 if report.FilesTraversed != expectedFiles { t.Errorf("Expected %d files traversed (top-level only), got %d", expectedFiles, report.FilesTraversed) diff --git a/audit-cli/commands/extract/extract.go b/audit-cli/commands/extract/extract.go index 7f1926b..bd639d2 100644 --- a/audit-cli/commands/extract/extract.go +++ b/audit-cli/commands/extract/extract.go @@ -3,12 +3,14 @@ // This package serves as the parent command for various extraction operations. // Currently supports: // - code-examples: Extract code examples from RST directives +// - procedures: Extract procedure variations from RST files // // Future subcommands could include extracting tables, images, or other structured content. package extract import ( "github.com/mongodb/code-example-tooling/audit-cli/commands/extract/code-examples" + "github.com/mongodb/code-example-tooling/audit-cli/commands/extract/procedures" "github.com/spf13/cobra" ) @@ -23,12 +25,15 @@ func NewExtractCommand() *cobra.Command { Long: `Extract various types of content from reStructuredText files. Currently supports extracting code examples from directives like literalinclude, -code-block, and io-code-block. Future subcommands may support extracting other -types of structured content such as tables, images, or metadata.`, +code-block, and io-code-block, as well as extracting procedure variations from +composable tutorials, tabs, and procedure directives. Future subcommands may +support extracting other types of structured content such as tables, images, +or metadata.`, } // Add subcommands cmd.AddCommand(code_examples.NewCodeExamplesCommand()) + cmd.AddCommand(procedures.NewProceduresCommand()) return cmd } diff --git a/audit-cli/commands/extract/procedures/parser.go b/audit-cli/commands/extract/procedures/parser.go new file mode 100644 index 0000000..42e9c77 --- /dev/null +++ b/audit-cli/commands/extract/procedures/parser.go @@ -0,0 +1,185 @@ +package procedures + +import ( + "crypto/sha256" + "encoding/hex" + "fmt" + "path/filepath" + "strings" + + "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" +) + +// ParseFile parses a file and extracts all procedure variations. +// +// This function parses all procedures from the file and generates variations +// based on tabs, composable tutorials, or returns a single variation if no +// variations are present. +// +// Parameters: +// - filePath: Path to the RST file to parse +// - selectionFilter: Optional filter to extract only a specific variation +// - expandIncludes: If true, expands .. include:: directives inline +// +// Returns: +// - []ProcedureVariation: Slice of procedure variations to extract +// - error: Any error encountered during parsing +func ParseFile(filePath string, selectionFilter string, expandIncludes bool) ([]ProcedureVariation, error) { + // Parse all procedures from the file + procedures, err := rst.ParseProceduresWithOptions(filePath, expandIncludes) + if err != nil { + return nil, fmt.Errorf("failed to parse procedures from %s: %w", filePath, err) + } + + var variations []ProcedureVariation + + // Create one variation per unique procedure + // Each procedure represents a unique piece of content (grouped by heading + content hash) + for _, procedure := range procedures { + // Get all selections this procedure appears in + variationNames := rst.GetProcedureVariations(procedure) + + // If a selection filter is specified, check if this procedure matches + if selectionFilter != "" { + matches := false + for _, varName := range variationNames { + if varName == selectionFilter { + matches = true + break + } + } + if !matches { + continue + } + } + + // Create a single variation representing this unique procedure + // The VariationName will contain all selections this procedure appears in + variationName := "" + if len(variationNames) > 0 { + variationName = strings.Join(variationNames, "; ") + } + + variation := ProcedureVariation{ + Procedure: procedure, + VariationName: variationName, + SourceFile: filePath, + OutputFile: generateOutputFilename(filePath, procedure, ""), + } + + variations = append(variations, variation) + } + + return variations, nil +} + +// generateVariations generates all variations for a procedure. +func generateVariations(procedure rst.Procedure, sourceFile string, selectionFilter string) []ProcedureVariation { + var variations []ProcedureVariation + + // Get all variation identifiers for this procedure + variationNames := rst.GetProcedureVariations(procedure) + + // If no variations, create a single variation with empty name + if len(variationNames) == 0 { + variationNames = []string{""} + } + + // Generate a variation for each identifier + for _, variationName := range variationNames { + // If a selection filter is specified, only include matching variations + if selectionFilter != "" && variationName != selectionFilter { + continue + } + + variation := ProcedureVariation{ + Procedure: procedure, + VariationName: variationName, + SourceFile: sourceFile, + OutputFile: generateOutputFilename(sourceFile, procedure, variationName), + } + + variations = append(variations, variation) + } + + return variations +} + +// generateOutputFilename generates the output filename for a procedure. +// +// Format: {heading}_{first-step-title}_{hash}.rst +// Example: "before-you-begin_pull-the-mongodb-docker-image_a1b2c3.rst" +// +// The hash is a short (6 character) hash of the procedure content to ensure uniqueness. +func generateOutputFilename(sourceFile string, procedure rst.Procedure, variationName string) string { + // Get the base name from the source file + baseName := filepath.Base(sourceFile) + baseName = strings.TrimSuffix(baseName, filepath.Ext(baseName)) + + // Sanitize the procedure title (heading) for use in filename + title := sanitizeFilename(procedure.Title) + if title == "" { + title = baseName + } + + // Generate a short hash of the procedure content for uniqueness + contentHash := computeContentHash(procedure) + shortHash := contentHash[:6] + + // If the procedure has steps, use the first step title to make the filename descriptive + if len(procedure.Steps) > 0 && procedure.Steps[0].Title != "" { + firstStepTitle := sanitizeFilename(procedure.Steps[0].Title) + return fmt.Sprintf("%s_%s_%s.rst", title, firstStepTitle, shortHash) + } + + return fmt.Sprintf("%s_%s.rst", title, shortHash) +} + +// computeContentHash generates a hash of the procedure's content for uniqueness. +func computeContentHash(proc rst.Procedure) string { + var content strings.Builder + + // Include title + content.WriteString(proc.Title) + content.WriteString("|") + + // Include all step titles and content + for _, step := range proc.Steps { + content.WriteString(step.Title) + content.WriteString("|") + content.WriteString(step.Content) + content.WriteString("|") + } + + // Compute SHA256 hash + hash := sha256.Sum256([]byte(content.String())) + return hex.EncodeToString(hash[:]) +} + +// sanitizeFilename sanitizes a string for use in a filename. +func sanitizeFilename(s string) string { + // Convert to lowercase + s = strings.ToLower(s) + + // Replace spaces and special characters with hyphens + s = strings.Map(func(r rune) rune { + if (r >= 'a' && r <= 'z') || (r >= '0' && r <= '9') { + return r + } + if r == ' ' || r == '_' || r == ',' { + return '-' + } + return -1 // Remove character + }, s) + + // Remove multiple consecutive hyphens + for strings.Contains(s, "--") { + s = strings.ReplaceAll(s, "--", "-") + } + + // Trim hyphens from start and end + s = strings.Trim(s, "-") + + return s +} + diff --git a/audit-cli/commands/extract/procedures/procedures.go b/audit-cli/commands/extract/procedures/procedures.go new file mode 100644 index 0000000..34595bf --- /dev/null +++ b/audit-cli/commands/extract/procedures/procedures.go @@ -0,0 +1,143 @@ +// Package procedures provides functionality for extracting procedures from RST files. +// +// This package implements the "extract procedures" subcommand, which parses +// reStructuredText files and extracts procedure variations based on: +// - Composable tutorial selections +// - Tab selections (tabids) +// - Ordered lists +// - Procedure directives +// +// The extracted procedures are written to individual RST files with standardized naming: +// {heading}-{selection}.rst +// +// Supports filtering to extract only specific variations using the --selection flag. +package procedures + +import ( + "fmt" + "os" + "strings" + + "github.com/spf13/cobra" +) + +// NewProceduresCommand creates the procedures subcommand. +// +// This command extracts procedure variations from RST files and writes them to +// individual files in the output directory. Supports various flags for controlling behavior: +// - --selection: Extract only a specific variation (by selection or tabid) +// - -o, --output: Output directory for extracted files +// - --dry-run: Show what would be extracted without writing files +// - -v, --verbose: Show detailed processing information +func NewProceduresCommand() *cobra.Command { + var ( + selection string + outputDir string + dryRun bool + verbose bool + expandIncludes bool + ) + + cmd := &cobra.Command{ + Use: "procedures [filepath]", + Short: "Extract procedure variations from reStructuredText files", + Long: `Extract procedure variations from reStructuredText files. + +This command parses procedures from RST files and extracts all variations based on: + - Composable tutorial selections (.. composable-tutorial::) + - Tab selections (.. tabs:: with :tabid:) + - Procedure directives (.. procedure::) + - Ordered lists + +Each variation is written to a separate RST file with interpolated content, +showing the procedure as it would be rendered for that specific variation. + +The output files are named using the format: {heading}-{selection}.rst +For example: "connect-to-cluster-python.rst", "create-index-drivers.rst" + +By default, include directives are preserved in the output. Use --expand-includes +to inline the content of included files.`, + Args: cobra.ExactArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + filePath := args[0] + return runExtract(filePath, selection, outputDir, dryRun, verbose, expandIncludes) + }, + } + + cmd.Flags().StringVar(&selection, "selection", "", "Extract only a specific variation (by selection or tabid)") + cmd.Flags().StringVarP(&outputDir, "output", "o", "./output", "Output directory for procedure files") + cmd.Flags().BoolVar(&dryRun, "dry-run", false, "Show what would be extracted without writing files") + cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Provide additional information during execution") + cmd.Flags().BoolVar(&expandIncludes, "expand-includes", false, "Expand include directives inline instead of preserving them") + + return cmd +} + +// runExtract executes the extraction operation. +func runExtract(filePath string, selection string, outputDir string, dryRun bool, verbose bool, expandIncludes bool) error { + // Verify the file exists + fileInfo, err := os.Stat(filePath) + if err != nil { + return fmt.Errorf("failed to access path %s: %w", filePath, err) + } + + if fileInfo.IsDir() { + return fmt.Errorf("path %s is a directory; please specify a file", filePath) + } + + // Parse the file and extract procedure variations + if verbose { + fmt.Printf("Parsing procedures from %s\n", filePath) + if expandIncludes { + fmt.Println("Expanding include directives inline") + } + } + + variations, err := ParseFile(filePath, selection, expandIncludes) + if err != nil { + return err + } + + if len(variations) == 0 { + fmt.Println("No procedures found in the file.") + return nil + } + + // Report what was found + if verbose || dryRun { + fmt.Printf("\nFound %d unique procedure(s):\n", len(variations)) + for i, v := range variations { + fmt.Printf("\n%d. %s\n", i+1, v.Procedure.Title) + fmt.Printf(" Output file: %s\n", v.OutputFile) + fmt.Printf(" Steps: %d\n", len(v.Procedure.Steps)) + + if v.VariationName != "" { + // Split the selections and format as a list + selections := strings.Split(v.VariationName, "; ") + fmt.Printf(" Appears in %d selection(s):\n", len(selections)) + for _, sel := range selections { + fmt.Printf(" - %s\n", sel) + } + } else { + fmt.Printf(" Appears in: (no specific selections)\n") + } + } + fmt.Println() + } + + // Write the variations + filesWritten, err := WriteAllVariations(variations, outputDir, dryRun, verbose) + if err != nil { + return err + } + + // Print summary + if dryRun { + fmt.Printf("Dry run complete. Would have written %d file(s) to %s\n", len(variations), outputDir) + } else { + fmt.Printf("Successfully extracted %d unique procedure(s) to %s\n", filesWritten, outputDir) + } + + return nil +} + diff --git a/audit-cli/commands/extract/procedures/procedures_test.go b/audit-cli/commands/extract/procedures/procedures_test.go new file mode 100644 index 0000000..802db83 --- /dev/null +++ b/audit-cli/commands/extract/procedures/procedures_test.go @@ -0,0 +1,371 @@ +package procedures + +import ( + "os" + "path/filepath" + "strings" + "testing" +) + +func TestParseFile(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + // Expected: 5 unique procedures (one file per unique procedure) + // 1. Simple Procedure with Steps (1 unique) + // 2. Procedure with Tabs (1 unique, appears in 3 selections) + // 3. Composable Tutorial (1 unique, appears in 3 selections) + // 4. Ordered List Procedure (1 unique) + // 5. Procedure with Sub-steps (1 unique) + if len(variations) != 5 { + t.Errorf("Expected to find 5 unique procedures, but got %d", len(variations)) + } + + // Verify that procedures with multiple selections have them listed + foundMultiSelection := false + for _, v := range variations { + if strings.Contains(v.VariationName, ";") { + foundMultiSelection = true + // Count selections + selections := strings.Split(v.VariationName, "; ") + if len(selections) < 2 { + t.Errorf("Procedure with semicolon should have multiple selections, got: %s", v.VariationName) + } + } + } + if !foundMultiSelection { + t.Error("Expected to find at least one procedure with multiple selections") + } + + t.Logf("Found %d unique procedures", len(variations)) +} + +func TestParseFileWithFilter(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + variations, err := ParseFile(testFile, "python", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + // Should only get procedures that appear in the "python" selection + // Expected: 1 procedure (the one with tabs that includes python) + if len(variations) != 1 { + t.Errorf("Expected 1 procedure matching 'python', got %d", len(variations)) + } + + // Verify the variation name contains "python" + for _, v := range variations { + if !strings.Contains(v.VariationName, "python") { + t.Errorf("Expected variation to contain 'python', got: %s", v.VariationName) + } + } + + t.Logf("Found %d procedure(s) matching 'python'", len(variations)) +} + +func TestParseFileDeterministic(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + // Run parsing multiple times to ensure deterministic results + var allVariations [][]ProcedureVariation + for i := 0; i < 5; i++ { + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed on iteration %d: %v", i, err) + } + allVariations = append(allVariations, variations) + } + + // Verify all runs produce the same count + for i := 1; i < len(allVariations); i++ { + if len(allVariations[i]) != len(allVariations[0]) { + t.Errorf("Iteration %d: found %d procedures, want %d (non-deterministic!)", + i, len(allVariations[i]), len(allVariations[0])) + } + } + + // Verify filenames are consistent + for i := 1; i < len(allVariations); i++ { + for j := 0; j < len(allVariations[0]); j++ { + if allVariations[i][j].OutputFile != allVariations[0][j].OutputFile { + t.Errorf("Iteration %d, procedure %d: filename = %s, want %s (non-deterministic!)", + i, j, allVariations[i][j].OutputFile, allVariations[0][j].OutputFile) + } + } + } +} + +func TestWriteVariation(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + outputDir := t.TempDir() + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + if len(variations) == 0 { + t.Fatal("No variations found to test writing") + } + + // Write the first variation + err = WriteVariation(variations[0], outputDir, false) + if err != nil { + t.Fatalf("WriteVariation failed: %v", err) + } + + // Check that the file was created + outputPath := filepath.Join(outputDir, variations[0].OutputFile) + if _, err := os.Stat(outputPath); os.IsNotExist(err) { + t.Errorf("Expected output file %s to exist, but it doesn't", outputPath) + } + + // Verify the file has content + content, err := os.ReadFile(outputPath) + if err != nil { + t.Fatalf("Failed to read output file: %v", err) + } + if len(content) == 0 { + t.Error("Output file is empty") + } + + t.Logf("Successfully wrote variation to %s (%d bytes)", outputPath, len(content)) +} + +func TestWriteAllVariations(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + outputDir := t.TempDir() + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + filesWritten, err := WriteAllVariations(variations, outputDir, false, false) + if err != nil { + t.Fatalf("WriteAllVariations failed: %v", err) + } + + if filesWritten != len(variations) { + t.Errorf("Expected to write %d files, but wrote %d", len(variations), filesWritten) + } + + // Verify all files exist + for _, v := range variations { + outputPath := filepath.Join(outputDir, v.OutputFile) + if _, err := os.Stat(outputPath); os.IsNotExist(err) { + t.Errorf("Expected output file %s to exist, but it doesn't", outputPath) + } + } + + t.Logf("Successfully wrote %d files", filesWritten) +} + +func TestParseFileWithIncludes(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-with-includes.rst" + + // With expanding includes, should find 1 unique procedure appearing in 3 selections + // The selected-content blocks are in the included files (install-deps.rst, connect.rst, operations.rst) + variationsExpand, err := ParseFile(testFile, "", true) + if err != nil { + t.Fatalf("ParseFile with expand failed: %v", err) + } + + if len(variationsExpand) != 1 { + t.Errorf("Expected 1 unique procedure with expanding includes, got %d", len(variationsExpand)) + } + + // Verify the expanded version has multiple selections + if len(variationsExpand) > 0 { + // Should detect 3 selections from the included selected-content blocks + selectionsExpand := strings.Split(variationsExpand[0].VariationName, "; ") + if len(selectionsExpand) != 3 { + t.Errorf("Expected 3 selections with expand, got %d: %v", len(selectionsExpand), selectionsExpand) + } + + // Verify expected selections + expectedSelections := map[string]bool{ + "driver, nodejs": true, + "driver, python": true, + "atlas-cli, none": true, + } + for _, sel := range selectionsExpand { + if !expectedSelections[sel] { + t.Errorf("Unexpected selection: %s", sel) + } + } + } + + t.Logf("With expand: %d procedures with %d selections", + len(variationsExpand), + len(strings.Split(variationsExpand[0].VariationName, "; "))) +} + +func TestSanitizeFilename(t *testing.T) { + tests := []struct { + input string + expected string + }{ + {"Simple Procedure", "simple-procedure"}, + {"Connect to Cluster", "connect-to-cluster"}, + {"driver, nodejs", "driver-nodejs"}, + {"Multiple Spaces", "multiple-spaces"}, + {"Special!@#Characters", "specialcharacters"}, + {" Leading and Trailing ", "leading-and-trailing"}, + {"UPPERCASE", "uppercase"}, + {"Mixed_Case-With.Dots", "mixed-casewithdots"}, // Dots are removed, not converted to hyphens + } + + for _, tt := range tests { + t.Run(tt.input, func(t *testing.T) { + result := sanitizeFilename(tt.input) + if result != tt.expected { + t.Errorf("sanitizeFilename(%q) = %q, want %q", tt.input, result, tt.expected) + } + }) + } +} + +func TestGenerateOutputFilename(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + // Verify all filenames are unique + filenameMap := make(map[string]bool) + for _, v := range variations { + if filenameMap[v.OutputFile] { + t.Errorf("Duplicate filename generated: %s", v.OutputFile) + } + filenameMap[v.OutputFile] = true + + // Verify filename format + if !strings.HasSuffix(v.OutputFile, ".rst") { + t.Errorf("Filename should end with .rst: %s", v.OutputFile) + } + + // Verify filename contains a hash (6 characters before .rst) + parts := strings.Split(v.OutputFile, "_") + if len(parts) < 2 { + t.Errorf("Filename should contain at least heading and hash: %s", v.OutputFile) + } + } + + t.Logf("Generated %d unique filenames", len(filenameMap)) +} + +func TestDryRun(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + outputDir := t.TempDir() + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + // Write with dry run enabled + filesWritten, err := WriteAllVariations(variations, outputDir, true, false) + if err != nil { + t.Fatalf("WriteAllVariations with dry run failed: %v", err) + } + + // Should report files that would be written + if filesWritten != len(variations) { + t.Errorf("Dry run should report %d files, but reported %d", len(variations), filesWritten) + } + + // Verify no files were actually created + entries, err := os.ReadDir(outputDir) + if err != nil { + t.Fatalf("Failed to read output directory: %v", err) + } + + if len(entries) > 0 { + t.Errorf("Dry run should not create files, but found %d files", len(entries)) + } + + t.Logf("Dry run correctly reported %d files without writing them", filesWritten) +} + +func TestContentHash(t *testing.T) { + testFile := "../../../testdata/input-files/source/procedure-test.rst" + + // Parse the file multiple times + var hashes []string + for i := 0; i < 3; i++ { + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed on iteration %d: %v", i, err) + } + + // Extract hash from first filename (last 6 chars before .rst) + if len(variations) > 0 { + filename := variations[0].OutputFile + // Remove .rst extension + nameWithoutExt := strings.TrimSuffix(filename, ".rst") + // Get last part after last underscore (the hash) + parts := strings.Split(nameWithoutExt, "_") + hash := parts[len(parts)-1] + hashes = append(hashes, hash) + } + } + + // Verify all hashes are identical (deterministic) + for i := 1; i < len(hashes); i++ { + if hashes[i] != hashes[0] { + t.Errorf("Iteration %d: hash = %s, want %s (non-deterministic!)", i, hashes[i], hashes[0]) + } + } + + t.Logf("Content hash is deterministic: %s", hashes[0]) +} + +func TestParseTabsWithProcedures(t *testing.T) { + testFile := "../../../testdata/input-files/source/tabs-with-procedures.rst" + + variations, err := ParseFile(testFile, "", false) + if err != nil { + t.Fatalf("ParseFile failed: %v", err) + } + + // Expected: 3 unique procedures (one for each tab: macos, ubuntu, windows) + if len(variations) != 3 { + t.Errorf("Expected to find 3 unique procedures, but got %d", len(variations)) + } + + // Verify each procedure has only one selection (its specific tab) + for _, v := range variations { + if strings.Contains(v.VariationName, ";") { + t.Errorf("Expected each procedure to have only one selection, got: %s", v.VariationName) + } + + // Verify the selection is one of the expected tabs + expectedTabs := map[string]bool{ + "macos": true, + "ubuntu": true, + "windows": true, + } + if !expectedTabs[v.VariationName] { + t.Errorf("Unexpected variation name: %s", v.VariationName) + } + } + + // Verify all three tabs are present + foundTabs := make(map[string]bool) + for _, v := range variations { + foundTabs[v.VariationName] = true + } + if len(foundTabs) != 3 { + t.Errorf("Expected to find all 3 tabs, but got: %v", foundTabs) + } + + t.Logf("Found %d unique procedures from tabs", len(variations)) +} diff --git a/audit-cli/commands/extract/procedures/types.go b/audit-cli/commands/extract/procedures/types.go new file mode 100644 index 0000000..27391b6 --- /dev/null +++ b/audit-cli/commands/extract/procedures/types.go @@ -0,0 +1,33 @@ +package procedures + +import "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" + +// ProcedureVariation represents a single variation of a procedure to be extracted. +type ProcedureVariation struct { + Procedure rst.Procedure // The procedure + VariationName string // The variation identifier (e.g., "python", "nodejs", "driver, nodejs") + SourceFile string // Path to the source RST file + OutputFile string // Path to the output file for this variation +} + +// ExtractionReport contains statistics about the extraction operation. +type ExtractionReport struct { + TotalProcedures int // Total number of procedures found + TotalVariations int // Total number of variations extracted + FilesProcessed int // Number of files processed + FilesWritten int // Number of output files written + Errors []string // Any errors encountered +} + +// NewExtractionReport creates a new extraction report. +func NewExtractionReport() *ExtractionReport { + return &ExtractionReport{ + Errors: []string{}, + } +} + +// AddError adds an error to the report. +func (r *ExtractionReport) AddError(err string) { + r.Errors = append(r.Errors, err) +} + diff --git a/audit-cli/commands/extract/procedures/writer.go b/audit-cli/commands/extract/procedures/writer.go new file mode 100644 index 0000000..7fb554d --- /dev/null +++ b/audit-cli/commands/extract/procedures/writer.go @@ -0,0 +1,75 @@ +package procedures + +import ( + "fmt" + "os" + "path/filepath" + + "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" +) + +// WriteVariation writes a procedure variation to a file. +// +// This function formats the procedure for the specific variation and writes +// it to the output file in RST format. +// +// Parameters: +// - variation: The procedure variation to write +// - outputDir: Directory where the file should be written +// - dryRun: If true, don't actually write the file +// +// Returns: +// - error: Any error encountered during writing +func WriteVariation(variation ProcedureVariation, outputDir string, dryRun bool) error { + // Format the procedure for this variation + content, err := rst.FormatProcedureForVariation(variation.Procedure, variation.VariationName) + if err != nil { + return fmt.Errorf("failed to format procedure variation: %w", err) + } + + // Generate output path + outputPath := filepath.Join(outputDir, variation.OutputFile) + + if dryRun { + fmt.Printf("Would write: %s\n", outputPath) + return nil + } + + // Ensure output directory exists + if err := os.MkdirAll(outputDir, 0755); err != nil { + return fmt.Errorf("failed to create output directory: %w", err) + } + + // Write the file + if err := os.WriteFile(outputPath, []byte(content), 0644); err != nil { + return fmt.Errorf("failed to write file %s: %w", outputPath, err) + } + + return nil +} + +// WriteAllVariations writes all procedure variations to files. +// +// Parameters: +// - variations: Slice of procedure variations to write +// - outputDir: Directory where files should be written +// - dryRun: If true, don't actually write files +// - verbose: If true, print detailed information +// +// Returns: +// - int: Number of files written (or would be written in dry run mode) +// - error: Any error encountered during writing +func WriteAllVariations(variations []ProcedureVariation, outputDir string, dryRun bool, verbose bool) (int, error) { + filesWritten := 0 + + for _, variation := range variations { + if err := WriteVariation(variation, outputDir, dryRun); err != nil { + return filesWritten, err + } + + filesWritten++ + } + + return filesWritten, nil +} + diff --git a/audit-cli/docs/PROCEDURE_PARSING.md b/audit-cli/docs/PROCEDURE_PARSING.md new file mode 100644 index 0000000..c792107 --- /dev/null +++ b/audit-cli/docs/PROCEDURE_PARSING.md @@ -0,0 +1,732 @@ +# Procedure Parsing - Business Logic and Design Decisions + +This document describes the business logic behind procedure parsing in the `audit-cli` tool. It explains what constitutes a procedure, how variations are detected, and key design decisions that govern the parser's behavior. + +## Table of Contents + +- [Overview](#overview) +- [What is a Procedure?](#what-is-a-procedure) +- [Procedure Formats](#procedure-formats) +- [Procedure Variations](#procedure-variations) +- [Include Directive Handling](#include-directive-handling) +- [Uniqueness and Grouping](#uniqueness-and-grouping) +- [Analysis vs. Extraction Semantics](#analysis-vs-extraction-semantics) +- [Key Design Decisions](#key-design-decisions) +- [Common Patterns and Edge Cases](#common-patterns-and-edge-cases) + +## Overview + +The procedure parser (`internal/rst/procedure_parser.go`) extracts and analyzes procedural content from MongoDB's reStructuredText (RST) documentation. MongoDB documentation uses procedures inconsistently across different contexts (drivers, deployment methods, platforms, etc.), so the parser must handle multiple formats and variation mechanisms. + +## What is a Procedure? + +A **procedure** is a set of sequential steps that guide users through a task. Examples include: +- Installing MongoDB +- Connecting to a cluster +- Creating a database +- Deploying an application + +Procedures have: +- A **title/heading** (the section heading above the procedure) +- A **series of steps** (numbered or bulleted instructions) +- Optional **variations** (different content for different contexts) +- Optional **sub-steps** (nested procedures within steps) + +## Procedure Formats + +MongoDB documentation uses three formats for procedures: + +### 1. Procedure Directive + +The most common format uses `.. procedure::` and `.. step::` directives: + +```rst +Before You Begin +---------------- + +.. procedure:: + + .. step:: Create a MongoDB Atlas account + + Navigate to the MongoDB Atlas website and sign up for a free account. + + .. step:: Create a cluster + + Click "Build a Cluster" and select the free tier. + + .. step:: Configure network access + + Add your IP address to the IP Access List. +``` + +### 2. Ordered Lists + +Some procedures use simple numbered or lettered lists: + +```rst +Installation Steps +------------------ + +1. Download the MongoDB installer from the official website. + +2. Run the installer and follow the prompts. + +3. Verify the installation by running ``mongod --version``. +``` + +Or with letters: + +```rst +a. First step +b. Second step +c. Third step +``` + +### 3. YAML Steps Files + +MongoDB's build system converts YAML files to procedures: + +```yaml +title: Connect to MongoDB +steps: + - step: Import the MongoDB client + content: | + Import the MongoClient class from the pymongo package. + - step: Create a connection string + content: | + Define your connection string with your credentials. +``` + +The parser detects references to these YAML files and extracts the steps. + +## Procedure Variations + +MongoDB documentation represents the same logical procedure differently for different contexts (Node.js vs. Python, Atlas CLI vs. drivers, macOS vs. Windows, etc.). The parser handles three mechanisms for variations: + +### 1. Composable Tutorials with Selected Content Blocks + +**Pattern:** A `.. composable-tutorial::` directive wraps a procedure and defines variation options. Within the procedure, `.. selected-content::` blocks provide different content for different selections. + +**Example:** + +```rst +Connect to Your Cluster +----------------------- + +.. composable-tutorial:: + :options: driver, atlas-cli + :defaults: driver=nodejs; atlas-cli=none + + .. procedure:: + + .. step:: Install dependencies + + .. selected-content:: + :selections: driver=nodejs + + Install the MongoDB Node.js driver: + + .. code-block:: bash + + npm install mongodb + + .. selected-content:: + :selections: driver=python + + Install the PyMongo driver: + + .. code-block:: bash + + pip install pymongo + + .. selected-content:: + :selections: atlas-cli=none + + No installation required for Atlas CLI. + + .. step:: Connect to the cluster + + .. selected-content:: + :selections: driver=nodejs + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const client = new MongoClient(uri); + + .. selected-content:: + :selections: driver=python + + .. code-block:: python + + from pymongo import MongoClient + client = MongoClient(uri) +``` + +**Parser Behavior:** +- Detects the composable tutorial and extracts options/defaults +- Parses selected-content blocks within steps +- Creates variations: `driver=nodejs`, `driver=python`, `atlas-cli=none` +- **Analysis:** Shows 1 unique procedure with 3 variations +- **Extraction:** Creates 1 file listing all 3 selections + +### 2. Tabs Within Steps + +**Pattern:** `.. tabs::` directives within procedure steps show different ways to accomplish the same task. + +**Example:** + +```rst +Procedure with Tabs +------------------- + +.. procedure:: + + .. step:: Connect to MongoDB + + Choose your programming language: + + .. tabs:: + + .. tab:: Node.js + :tabid: nodejs + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const client = new MongoClient(uri); + + .. tab:: Python + :tabid: python + + .. code-block:: python + + from pymongo import MongoClient + client = MongoClient(uri) + + .. tab:: Shell + :tabid: shell + + .. code-block:: bash + + mongosh "mongodb://localhost:27017" + + .. step:: Verify the connection + + Run a simple query to verify connectivity. +``` + +**Parser Behavior:** +- Detects tabs within the step content +- Extracts tab IDs: `nodejs`, `python`, `shell` +- Creates variations for each tab +- **Analysis:** Shows 1 unique procedure with 3 variations +- **Extraction:** Creates 1 file listing all 3 tab variations + +### 3. Tabs Containing Procedures + +**Pattern:** `.. tabs::` directives at the top level contain entirely different procedures for different platforms or contexts. + +**Example:** + +```rst +Installation Instructions +------------------------- + +.. tabs:: + + .. tab:: macOS + :tabid: macos + + .. procedure:: + + .. step:: Install Homebrew + + If you don't have Homebrew installed, run: + + .. code-block:: bash + + /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" + + .. step:: Install MongoDB + + .. code-block:: bash + + brew tap mongodb/brew + brew install mongodb-community + + .. step:: Start MongoDB + + .. code-block:: bash + + brew services start mongodb-community + + .. tab:: Ubuntu + :tabid: ubuntu + + .. procedure:: + + .. step:: Import the public key + + .. code-block:: bash + + wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo apt-key add - + + .. step:: Create a list file + + .. code-block:: bash + + echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list + + .. step:: Update package database + + .. code-block:: bash + + sudo apt-get update + + .. step:: Install MongoDB + + .. code-block:: bash + + sudo apt-get install -y mongodb-org + + .. tab:: Windows + :tabid: windows + + .. procedure:: + + .. step:: Download the installer + + Navigate to the MongoDB Download Center and download the Windows installer. + + .. step:: Run the installer + + Double-click the downloaded .msi file and follow the installation wizard. + + .. step:: Configure MongoDB as a service + + During installation, select "Install MongoDB as a Service". +``` + +**Parser Behavior:** +- Detects tabs at the top level (before any procedure directive) +- Parses each tab's procedure separately +- Each procedure gets a unique content hash (different steps) +- All procedures share a `TabSet` reference for grouping +- **Analysis:** Shows 1 logical procedure with 3 appearances (macos, ubuntu, windows) +- **Extraction:** Creates 3 separate files: + - `installation-instructions-install-homebrew-87514e.rst` (appears in `macos` selection) + - `installation-instructions-import-the-public-key-87b686.rst` (appears in `ubuntu` selection) + - `installation-instructions-download-the-installer-1f0961.rst` (appears in `windows` selection) + +**Rationale:** Each platform has a completely different installation procedure with different steps, so they should be extracted as separate files. However, for analysis/reporting, they're grouped as one logical "Installation Instructions" procedure with platform variations. + +## Include Directive Handling + +MongoDB documentation uses `.. include::` directives to reuse content across files. The parser handles includes with context-aware expansion: + +### Pattern 1: No Composable Tutorial + +If a file has NO composable tutorial, all includes are expanded globally before parsing: + +```rst +Simple Procedure +---------------- + +.. procedure:: + + .. step:: First step + + .. include:: /includes/common-setup.rst + + .. step:: Second step + + Do something else. +``` + +**Parser Behavior:** +- Expands all `.. include::` directives inline +- Then parses the expanded content + +### Pattern 2: Composable Tutorial with Selected Content in Main File + +If selected-content blocks are in the main file, includes within those blocks are expanded: + +```rst +.. composable-tutorial:: + :options: driver + :defaults: driver=nodejs + + .. procedure:: + + .. step:: Install dependencies + + .. selected-content:: + :selections: driver=nodejs + + .. include:: /includes/install-nodejs.rst + + .. selected-content:: + :selections: driver=python + + .. include:: /includes/install-python.rst +``` + +**Parser Behavior:** +- Detects selected-content blocks +- Expands includes within each block +- Preserves block boundaries + +### Pattern 3: Composable Tutorial with Includes Containing Selected Content + +If procedure steps include files that contain selected-content blocks: + +```rst +.. composable-tutorial:: + :options: driver, atlas-cli + :defaults: driver=nodejs; atlas-cli=none + + .. procedure:: + + .. step:: Install dependencies + + .. include:: /includes/install-deps.rst + + .. step:: Connect to cluster + + .. include:: /includes/connect.rst +``` + +Where `/includes/install-deps.rst` contains: + +```rst +.. selected-content:: + :selections: driver=nodejs + + npm install mongodb + +.. selected-content:: + :selections: driver=python + + pip install pymongo +``` + +**Parser Behavior:** +- When parsing step content, checks if it contains `.. include::` directives +- If no selected-content blocks have been found yet, expands the includes +- Re-parses the expanded content to detect selected-content blocks +- This ensures variations in included files are properly detected + +**Rationale:** MongoDB documentation uses composable tutorials inconsistently. Sometimes selected-content blocks are in the main file, sometimes in included files. The parser must handle both patterns. + +## Uniqueness and Grouping + +The parser uses two mechanisms to identify procedures: + +### 1. Heading (Title) + +The procedure's heading is the section title above the procedure. For example: + +```rst +Connect to Your Cluster +----------------------- + +.. procedure:: + ... +``` + +The heading is "Connect to Your Cluster". + +### 2. Content Hash + +The content hash is a SHA256 hash of: +- Step titles +- Step content (normalized) +- Variations (sorted for determinism) +- Sub-steps + +Two procedures with the same heading but different content will have different hashes. + +**Example:** + +```rst +# Procedure A +Install MongoDB +--------------- +.. procedure:: + .. step:: Download installer + .. step:: Run installer + +# Procedure B +Install MongoDB +--------------- +.. procedure:: + .. step:: Install via package manager + .. step:: Start the service +``` + +Both have heading "Install MongoDB" but different content hashes because the steps are different. + +### Grouping Logic + +**For Analysis/Reporting:** +- Procedures are grouped by heading only +- Shows "N unique procedures under this heading" +- Displays all variations for each unique procedure + +**For Extraction:** +- Procedures are grouped by heading + content hash +- Each unique procedure (by content hash) is extracted to a separate file +- Filename includes a 6-character hash suffix to prevent collisions + +## Analysis vs. Extraction Semantics + +The parser has different behavior for analysis vs. extraction: + +### Analysis (analyze procedures command) + +**Goal:** Give an overview of procedure structure and variations in the documentation. + +**Behavior:** +- Groups procedures by heading +- Shows unique procedure count and total appearances +- Lists all variations for each procedure +- **Tabs containing procedures:** Groups all procedures from the same tab set as one logical procedure + +**Example Output:** + +``` +1. Installation Instructions + Unique procedures: 1 + Total appearances: 3 + + Appears in 3 selections: + - macos + - ubuntu + - windows +``` + +### Extraction (extract procedures command) + +**Goal:** Extract each unique procedure to a separate file for reuse. + +**Behavior:** +- Creates one file per unique procedure (by content hash) +- Filename includes heading + first step + hash +- Each file lists which selections it appears in +- **Tabs containing procedures:** Creates separate files for each tab's procedure + +**Example Output:** + +``` +Would write: output/installation-instructions-install-homebrew-87514e.rst + Appears in 1 selection: + - macos + +Would write: output/installation-instructions-import-the-public-key-87b686.rst + Appears in 1 selection: + - ubuntu + +Would write: output/installation-instructions-download-the-installer-1f0961.rst + Appears in 1 selection: + - windows +``` + +**Rationale:** For analysis, we want to see that there's one logical "Installation Instructions" procedure with platform variations. For extraction, we want separate files because each platform has completely different steps. + +## Key Design Decisions + +### 1. Deterministic Ordering + +**Problem:** Go maps have randomized iteration order, which caused non-deterministic output (procedure counts varied between runs). + +**Solution:** All map iterations are sorted by key before processing: +- Tab IDs are sorted alphabetically +- Selected-content selections are sorted +- Variation lists are sorted +- Hash computation uses sorted keys + +**Impact:** Ensures consistent output across runs, critical for testing and CI/CD. + +### 2. Content Hashing for Uniqueness + +**Problem:** Need to detect when two procedures are identical vs. different, even if they have the same heading. + +**Solution:** Compute SHA256 hash of normalized step content, including: +- Step titles +- Step content (trimmed, normalized whitespace) +- Variations (sorted) +- Sub-steps (recursively hashed) + +**Impact:** Accurately detects duplicate procedures and prevents false grouping. + +### 3. Dual-Purpose TabSet Structure + +**Problem:** Tabs containing procedures need to be grouped for analysis but extracted separately. + +**Solution:** +- Each procedure has a `TabID` field (its specific tab) +- Each procedure has a `TabSet` reference (all procedures in the set) +- `GetProcedureVariations()` returns `TabID` for extraction, `TabSet.TabIDs` for grouping + +**Impact:** Same data structure supports both analysis and extraction semantics. + +### 4. Context-Aware Include Expansion + +**Problem:** Includes can appear at different levels (global, within selected-content, within steps) and need different handling. + +**Solution:** +- No composable tutorial → Expand all includes globally +- Composable tutorial with selected-content in main file → Expand includes within blocks +- Composable tutorial with includes in steps → Expand includes to detect selected-content blocks + +**Impact:** Handles all patterns of composable tutorial usage in MongoDB docs. + +## Common Patterns and Edge Cases + +### Pattern: Procedure with No Variations + +```rst +Simple Task +----------- + +.. procedure:: + + .. step:: Do this + .. step:: Do that +``` + +**Result:** +- Analysis: 1 unique procedure, 1 appearance, appears in 1 selection (empty string) +- Extraction: 1 file with no selection listed + +### Pattern: Multiple Procedures Under Same Heading + +```rst +Setup Instructions +------------------ + +.. procedure:: + .. step:: Install Node.js + .. step:: Install npm + +.. procedure:: + .. step:: Install Python + .. step:: Install pip +``` + +**Result:** +- Analysis: 2 unique procedures under "Setup Instructions" +- Extraction: 2 separate files (different content hashes) + +### Pattern: Nested Tabs (Tabs Within Tabs) + +```rst +.. tabs:: + .. tab:: Platform + .. tabs:: + .. tab:: macOS + .. tab:: Windows +``` + +**Current Behavior:** Only the outer tabs are detected. Inner tabs are treated as regular content. + +**Rationale:** Nested tabs are rare in MongoDB docs and add significant complexity. Can be added if needed. + +### Pattern: Composable Tutorial with Tabs Within Steps + +```rst +.. composable-tutorial:: + :options: driver + :defaults: driver=nodejs + + .. procedure:: + .. step:: Connect + .. tabs:: + .. tab:: Async + :tabid: async + .. tab:: Sync + :tabid: sync +``` + +**Result:** +- Variations are combined: `driver=nodejs; async`, `driver=nodejs; sync` +- Analysis: 1 unique procedure with multiple variations +- Extraction: 1 file listing all combined variations + +### Edge Case: Empty Procedure + +```rst +.. procedure:: +``` + +**Result:** Skipped (no steps to extract) + +### Edge Case: Procedure with Only Sub-steps + +```rst +.. procedure:: + .. step:: Main step + .. procedure:: + .. step:: Sub-step 1 + .. step:: Sub-step 2 +``` + +**Result:** +- Main procedure has 1 step with sub-procedure +- `HasSubSteps` flag is set to true +- Sub-procedure is not extracted separately (only top-level procedures are extracted) + +## Testing Strategy + +The parser has comprehensive test coverage: + +1. **Unit tests** (`internal/rst/procedure_parser_test.go`): + - Test each procedure format (directive, ordered list, YAML) + - Test each variation mechanism (composable tutorials, tabs) + - Test include expansion + - Test content hashing determinism + +2. **Integration tests** (`commands/analyze/procedures/procedures_test.go`): + - Test analysis output format + - Test grouping logic + - Test deterministic ordering + +3. **Extraction tests** (`commands/extract/procedures/procedures_test.go`): + - Test file generation + - Test filename uniqueness + - Test dry-run mode + - Test selection filtering + +4. **Test fixtures** (`testdata/input-files/source/`): + - `procedure-test.rst`: Comprehensive test file with all patterns + - `procedure-with-includes.rst`: Tests include expansion + - `tabs-with-procedures.rst`: Tests tabs containing procedures + +## Future Enhancements + +Potential improvements to consider: + +1. **Nested tabs support**: Handle tabs within tabs +2. **Procedure validation**: Detect malformed procedures and warn users +3. **Cross-file procedure tracking**: Detect when the same procedure appears in multiple files +4. **Variation conflict detection**: Warn when variations have conflicting content +5. **Performance optimization**: Cache parsed procedures for large documentation sets + +## Maintenance Guidelines + +When modifying the parser: + +1. **Update this document** when business logic changes +2. **Update package-level comment** in `procedure_parser.go` +3. **Add test cases** for new patterns or edge cases +4. **Run determinism tests** to ensure consistent output +5. **Check both analysis and extraction** to ensure changes work for both use cases + +## Questions? + +If you have questions about procedure parsing logic, please: + +1. Check this document first +2. Review the package-level comment in `procedure_parser.go` +3. Look at test cases in `*_test.go` files for examples diff --git a/audit-cli/go.mod b/audit-cli/go.mod index 788992d..d32a943 100644 --- a/audit-cli/go.mod +++ b/audit-cli/go.mod @@ -10,4 +10,5 @@ require ( require ( github.com/inconshreveable/mousetrap v1.1.0 // indirect github.com/spf13/pflag v1.0.10 // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect ) diff --git a/audit-cli/go.sum b/audit-cli/go.sum index ce2736b..d072fa3 100644 --- a/audit-cli/go.sum +++ b/audit-cli/go.sum @@ -10,4 +10,5 @@ github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An github.com/spf13/pflag v1.0.10 h1:4EBh2KAYBwaONj6b2Ye1GiHfwjqyROoF4RwYO+vPwFk= github.com/spf13/pflag v1.0.10/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/audit-cli/internal/rst/directive_regex.go b/audit-cli/internal/rst/directive_regex.go index 9fc7ac0..cc063ac 100644 --- a/audit-cli/internal/rst/directive_regex.go +++ b/audit-cli/internal/rst/directive_regex.go @@ -31,3 +31,27 @@ var OutputDirectiveRegex = regexp.MustCompile(`^\.\.\s+output::\s+(.+)$`) // Example: .. toctree:: var ToctreeDirectiveRegex = regexp.MustCompile(`^\.\.\s+toctree::`) +// ProcedureDirectiveRegex matches .. procedure:: directives in RST files. +// Example: .. procedure:: +var ProcedureDirectiveRegex = regexp.MustCompile(`^\.\.\s+procedure::`) + +// StepDirectiveRegex matches .. step:: directives in RST files. +// Example: .. step:: Connect to the database +var StepDirectiveRegex = regexp.MustCompile(`^\.\.\s+step::\s*(.*)$`) + +// TabsDirectiveRegex matches .. tabs:: directives in RST files. +// Example: .. tabs:: +var TabsDirectiveRegex = regexp.MustCompile(`^\.\.\s+tabs::`) + +// TabDirectiveRegex matches .. tab:: directives in RST files. +// Example: .. tab:: Python +var TabDirectiveRegex = regexp.MustCompile(`^\.\.\s+tab::\s*(.*)$`) + +// ComposableTutorialDirectiveRegex matches .. composable-tutorial:: directives in RST files. +// Example: .. composable-tutorial:: +var ComposableTutorialDirectiveRegex = regexp.MustCompile(`^\.\.\s+composable-tutorial::`) + +// SelectedContentDirectiveRegex matches .. selected-content:: directives in RST files. +// Example: .. selected-content:: +var SelectedContentDirectiveRegex = regexp.MustCompile(`^\.\.\s+selected-content::`) + diff --git a/audit-cli/internal/rst/get_procedure_variations.go b/audit-cli/internal/rst/get_procedure_variations.go new file mode 100644 index 0000000..d7814e5 --- /dev/null +++ b/audit-cli/internal/rst/get_procedure_variations.go @@ -0,0 +1,440 @@ +// Package rst provides parsing and analysis of reStructuredText (RST) procedures +// from MongoDB documentation. +package rst + +import ( + "fmt" + "sort" + "strings" +) + +// GetProcedureVariations returns all variations of a procedure based on tabs and selected content. +// +// Parameters: +// - procedure: The procedure to analyze +// +// Returns: +// - []string: List of variation identifiers (e.g., "python", "nodejs", "drivers-tab") +func GetProcedureVariations(procedure Procedure) []string { + // If this procedure has a composable tutorial, return those selections + if procedure.ComposableTutorial != nil { + return procedure.ComposableTutorial.Selections + } + + // If this procedure has a specific tab ID, return only that tab ID + // This is for individual procedures extracted from a tab set + if procedure.TabID != "" { + return []string{procedure.TabID} + } + + // If this procedure is part of a tab set (for grouping/analysis), + // return all tab IDs in the set + if procedure.TabSet != nil { + return procedure.TabSet.TabIDs + } + + // Otherwise, collect variations from tabs within steps + variations := []string{} + variationSet := make(map[string]bool) + + for _, step := range procedure.Steps { + for _, variation := range step.Variations { + if variation.Type == TabVariation { + for _, option := range variation.Options { + variationSet[option] = true + } + } + } + } + + // Convert set to slice and sort for deterministic order + for variation := range variationSet { + variations = append(variations, variation) + } + sort.Strings(variations) + + // If no variations found, return a single empty variation + if len(variations) == 0 { + return []string{""} + } + + return variations +} + +// parseComposableTutorial parses a .. composable-tutorial:: directive +func parseComposableTutorial(lines []string, startIdx int, title string, filePath string) (*ComposableTutorial, int) { + tutorial := &ComposableTutorial{ + Title: title, + Options: []string{}, + Defaults: []string{}, + Selections: []string{}, + GeneralContent: []string{}, + LineNum: startIdx + 1, + FilePath: filePath, + } + + i := startIdx + 1 // Skip the .. composable-tutorial:: line + baseIndent := -1 + + // Track selected-content blocks to check for procedures inside them + var selectedContentBlocks []SelectedContent + + // Parse options and procedure + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + if trimmedLine == "" { + i++ + continue + } + + indent := getIndentLevel(line) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for options + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + if matches[1] == "options" { + tutorial.Options = strings.Split(strings.TrimSpace(matches[2]), ",") + for j := range tutorial.Options { + tutorial.Options[j] = strings.TrimSpace(tutorial.Options[j]) + } + } else if matches[1] == "defaults" { + tutorial.Defaults = strings.Split(strings.TrimSpace(matches[2]), ",") + for j := range tutorial.Defaults { + tutorial.Defaults[j] = strings.TrimSpace(tutorial.Defaults[j]) + } + } + i++ + continue + } + + // Check for selected-content directive + if SelectedContentDirectiveRegex.MatchString(trimmedLine) { + selectedContent, endLine := parseSelectedContent(lines, i) + selectedContentBlocks = append(selectedContentBlocks, selectedContent) + i = endLine + 1 + continue + } + + // Check for procedure directive within composable tutorial + // NOTE: We only capture tutorial-level procedures if we haven't found any selected-content blocks yet. + // If we have selected-content blocks, all procedures should be extracted from those blocks, + // not from the tutorial level (which may contain expanded includes from multiple selections). + if ProcedureDirectiveRegex.MatchString(trimmedLine) && len(selectedContentBlocks) == 0 { + procedure, endLine := parseProcedureDirectiveFromLines(lines, i, title, tutorial.FilePath) + tutorial.Procedure = &procedure + + // Extract all unique selections from the procedure's steps + selectionsMap := make(map[string]bool) + for _, step := range procedure.Steps { + for _, variation := range step.Variations { + if variation.Type == SelectedContentVariation { + for _, option := range variation.Options { + selectionsMap[option] = true + } + } + } + } + + // Convert to slice and sort for deterministic order + var selections []string + for selection := range selectionsMap { + selections = append(selections, selection) + } + sort.Strings(selections) + tutorial.Selections = selections + + i = endLine + 1 + continue + } + + // If we've dedented, we're done + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + if indent == 0 && trimmedLine != "" { + break + } + + // This is general content + tutorial.GeneralContent = append(tutorial.GeneralContent, line) + i++ + } + + // Store the selected-content blocks in the tutorial + tutorial.SelectedContentBlocks = selectedContentBlocks + + // NOTE: We do NOT set tutorial.Procedure here even if we find procedures in selected-content blocks. + // The extractProceduresFromComposableTutorial function will extract ALL procedures from all + // selected-content blocks and return them as separate Procedure objects. + + return tutorial, i - 1 +} + +// extractProceduresFromComposableTutorial extracts ALL procedures from a composable tutorial. +// This function finds all procedures across all selected-content blocks and returns them +// as separate Procedure objects, each with their own list of variations (selections). +// +// Key insight: A composable tutorial can contain MULTIPLE procedures, and each selected-content +// block can have DIFFERENT procedures. We need to track which procedures appear in which selections. +func extractProceduresFromComposableTutorial(tutorial *ComposableTutorial, startLine int) []Procedure { + var procedures []Procedure + + // If there's a procedure at the tutorial level, use it + if tutorial.Procedure != nil { + tutorial.Procedure.LineNum = startLine + 1 + tutorial.Procedure.ComposableTutorial = tutorial + procedures = append(procedures, *tutorial.Procedure) + return procedures + } + + // Map to track procedures by a unique identifier + // Key: unique procedure identifier (based on first step title), Value: procedure info + type ProcedureInfo struct { + Procedure *Procedure + Selections []string + } + proceduresMap := make(map[string]*ProcedureInfo) + + // Extract procedures from each selected-content block + for _, sc := range tutorial.SelectedContentBlocks { + selectionKey := strings.Join(sc.Selections, ", ") + contentLines := strings.Split(sc.Content, "\n") + + // Expand includes in the selected-content block + // This is necessary because the selected-content block may contain include directives + // that reference files with procedure directives + expandedLines, err := expandIncludesInLines(tutorial.FilePath, contentLines) + if err != nil { + // Fall back to unexpanded lines if expansion fails + expandedLines = contentLines + } + + contentLines = expandedLines + + // Find ALL procedures in this selected-content block + // Track the most recent heading to use as the procedure title + currentHeading := tutorial.Title // Start with the composable tutorial's title + j := 0 + for j < len(contentLines) { + trimmedLine := strings.TrimSpace(contentLines[j]) + + // Check for headings (look ahead for underline) + if j+1 < len(contentLines) { + nextLine := strings.TrimSpace(contentLines[j+1]) + if isHeadingUnderline(nextLine) && len(nextLine) >= len(trimmedLine) { + // Skip empty headings and generic headings that don't provide meaningful context + headingLower := strings.ToLower(trimmedLine) + if trimmedLine != "" && headingLower != "procedure" && headingLower != "overview" && headingLower != "steps" { + currentHeading = trimmedLine + } + j += 2 // Skip heading and underline + continue + } + } + + if ProcedureDirectiveRegex.MatchString(trimmedLine) { + // Parse the procedure and set its title to the most recent heading + procedure, endLine := parseProcedureDirectiveFromLines(contentLines, j, currentHeading, tutorial.FilePath) + procedure.LineNum = startLine + 1 + + // Create a unique identifier based on the procedure's actual content + // This allows us to detect when the same procedure appears in multiple selections + contentHash := computeProcedureContentHash(&procedure) + + // Use heading + content hash as the key + // This groups procedures with identical content but keeps them separate if content differs + procedureID := currentHeading + "::" + contentHash + + // Track this procedure and which selection it appears in + if proceduresMap[procedureID] == nil { + proceduresMap[procedureID] = &ProcedureInfo{ + Procedure: &procedure, + Selections: []string{}, + } + } + proceduresMap[procedureID].Selections = append(proceduresMap[procedureID].Selections, selectionKey) + + j = endLine + 1 + } else { + j++ + } + } + } + + // Convert the map to a list of procedures with their selections + // Sort the keys to ensure deterministic order + var procedureIDs []string + for id := range proceduresMap { + procedureIDs = append(procedureIDs, id) + } + sort.Strings(procedureIDs) + + for _, id := range procedureIDs { + info := proceduresMap[id] + // Create a new composable tutorial for this procedure with its specific selections + procTutorial := &ComposableTutorial{ + Options: tutorial.Options, + Defaults: tutorial.Defaults, + Selections: info.Selections, + GeneralContent: tutorial.GeneralContent, + LineNum: tutorial.LineNum, + } + + info.Procedure.ComposableTutorial = procTutorial + procedures = append(procedures, *info.Procedure) + } + + return procedures +} + +// parseSelectedContent parses a .. selected-content:: directive +func parseSelectedContent(lines []string, startIdx int) (SelectedContent, int) { + selectedContent := SelectedContent{ + Selections: []string{}, + LineNum: startIdx + 1, + } + + i := startIdx + 1 // Skip the .. selected-content:: line + baseIndent := -1 + var contentLines []string + + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + if trimmedLine == "" { + contentLines = append(contentLines, "") + i++ + continue + } + + indent := getIndentLevel(line) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for :selections: option + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + if matches[1] == "selections" { + selectedContent.Selections = strings.Split(strings.TrimSpace(matches[2]), ",") + for j := range selectedContent.Selections { + selectedContent.Selections[j] = strings.TrimSpace(selectedContent.Selections[j]) + } + } + i++ + continue + } + + // Check for next selected-content or step directive + if SelectedContentDirectiveRegex.MatchString(trimmedLine) || StepDirectiveRegex.MatchString(trimmedLine) { + break + } + + // If we've dedented, we're done + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + if indent == 0 && trimmedLine != "" { + break + } + + // Add content line + contentLines = append(contentLines, line) + i++ + } + + // Normalize indentation before storing + rawContent := strings.Join(contentLines, "\n") + selectedContent.Content = normalizeIndentation(rawContent) + + return selectedContent, i - 1 +} + +// FormatProcedureForVariation formats a procedure for a specific variation. +// +// This function interpolates the general content with the selection-specific content +// to produce the complete procedure as it would be rendered for that variation. +// +// Parameters: +// - procedure: The procedure to format +// - variation: The variation identifier (e.g., "python", "nodejs", "driver, nodejs") +// +// Returns: +// - string: The formatted procedure content in RST format +// - error: Any error encountered during formatting +func FormatProcedureForVariation(procedure Procedure, variation string) (string, error) { + var output strings.Builder + + // Write procedure header if it's a directive + if procedure.Type == ProcedureDirective { + output.WriteString(".. procedure::\n") + for key, value := range procedure.Options { + output.WriteString(fmt.Sprintf(" :%s: %s\n", key, value)) + } + output.WriteString("\n") + } + + // Write each step + for i, step := range procedure.Steps { + if procedure.Type == ProcedureDirective { + output.WriteString(fmt.Sprintf(" .. step:: %s\n\n", step.Title)) + } else if procedure.Type == OrderedList { + // For ordered lists, preserve the list marker + output.WriteString(fmt.Sprintf("%d. %s\n\n", i+1, step.Title)) + } + + // Write step content, filtering for the specific variation + content := filterContentForVariation(step, variation) + + // Indent content for procedure directive + if procedure.Type == ProcedureDirective { + lines := strings.Split(content, "\n") + for _, line := range lines { + if line != "" { + output.WriteString(" " + line + "\n") + } else { + output.WriteString("\n") + } + } + } else { + output.WriteString(content) + } + + output.WriteString("\n") + } + + return output.String(), nil +} + +// filterContentForVariation filters step content to only include the specified variation +func filterContentForVariation(step Step, variation string) string { + var result strings.Builder + + // Start with general content (content that's not in variations) + if step.Content != "" { + result.WriteString(step.Content) + } + + // If no variations, return all content + if len(step.Variations) == 0 { + return result.String() + } + + // Add variation-specific content + for _, v := range step.Variations { + if content, ok := v.Content[variation]; ok { + if result.Len() > 0 { + result.WriteString("\n\n") + } + result.WriteString(content) + } + } + + return result.String() +} diff --git a/audit-cli/internal/rst/get_procedure_variations_test.go b/audit-cli/internal/rst/get_procedure_variations_test.go new file mode 100644 index 0000000..cda307c --- /dev/null +++ b/audit-cli/internal/rst/get_procedure_variations_test.go @@ -0,0 +1,123 @@ +package rst + +import ( + "testing" +) + +func TestGetProcedureVariations(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed: %v", err) + } + + // Find the procedure with tabs (should have 3 variations) + var tabProcedure *Procedure + for i := range procedures { + if procedures[i].Title == "Procedure with Tabs" { + tabProcedure = &procedures[i] + break + } + } + + if tabProcedure == nil { + t.Fatal("Could not find 'Procedure with Tabs'") + } + + variations := GetProcedureVariations(*tabProcedure) + if len(variations) != 3 { + t.Errorf("Expected 3 variations for tabbed procedure, got %d: %v", len(variations), variations) + } + + // Verify variations contain expected tabids + expectedTabids := map[string]bool{"shell": true, "nodejs": true, "python": true} + for _, variation := range variations { + if !expectedTabids[variation] { + t.Errorf("Unexpected variation: %s", variation) + } + } + + t.Logf("Found %d variations: %v", len(variations), variations) +} + +func TestParseProceduresWithExpandIncludes(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-with-includes.rst" + + // With expanding includes - should detect selected-content blocks in included files + proceduresExpand, err := ParseProceduresWithOptions(testFile, true) + if err != nil { + t.Fatalf("ParseProceduresWithOptions with expand failed: %v", err) + } + + // Should find 1 unique procedure + if len(proceduresExpand) != 1 { + t.Errorf("With expand: expected 1 procedure, got %d", len(proceduresExpand)) + } + + // Should detect 3 variations from the selected-content blocks in the included files + if len(proceduresExpand) > 0 { + variations := GetProcedureVariations(proceduresExpand[0]) + if len(variations) != 3 { + t.Errorf("With expand: expected 3 variations, got %d: %v", len(variations), variations) + } + + // Verify expected selections + expectedSelections := map[string]bool{ + "driver, nodejs": true, + "driver, python": true, + "atlas-cli, none": true, + } + for _, variation := range variations { + if !expectedSelections[variation] { + t.Errorf("Unexpected variation: %s", variation) + } + } + } + + t.Logf("With expand: %d procedures with %d variations", + len(proceduresExpand), len(GetProcedureVariations(proceduresExpand[0]))) +} + +func TestParseComposableTutorial(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed: %v", err) + } + + // Find the composable tutorial + var composableProc *Procedure + for i := range procedures { + if procedures[i].Title == "Composable Tutorial Example" { + composableProc = &procedures[i] + break + } + } + + if composableProc == nil { + t.Fatal("Could not find 'Composable Tutorial Example'") + } + + // Verify it has variations + variations := GetProcedureVariations(*composableProc) + if len(variations) != 3 { + t.Errorf("Expected 3 variations, got %d: %v", len(variations), variations) + } + + // Verify expected selections + expectedSelections := map[string]bool{ + "driver, nodejs": true, + "driver, python": true, + "atlas-cli, none": true, + } + + for _, variation := range variations { + if !expectedSelections[variation] { + t.Errorf("Unexpected variation: %s", variation) + } + } + + t.Logf("Composable tutorial parsed correctly with %d variations", len(variations)) +} diff --git a/audit-cli/internal/rst/parse_procedures.go b/audit-cli/internal/rst/parse_procedures.go new file mode 100644 index 0000000..a992cac --- /dev/null +++ b/audit-cli/internal/rst/parse_procedures.go @@ -0,0 +1,1175 @@ +// Package rst provides parsing and analysis of reStructuredText (RST) procedures +// from MongoDB documentation. +// +// # What is a Procedure? +// +// A procedure is a set of sequential steps that guide users through a task. In MongoDB +// documentation, procedures can be implemented in several formats: +// +// 1. Procedure Directive: Using .. procedure:: and .. step:: directives +// 2. Ordered Lists: Using numbered or lettered lists (1., 2., 3. or a., b., c.) +// 3. YAML Steps Files: Using .yaml files with a steps: array (converted to procedures during build) +// +// # Procedure Variations +// +// MongoDB documentation uses procedures inconsistently across different contexts (drivers, +// deployment methods, platforms, etc.). This parser handles three mechanisms for representing +// procedure variations: +// +// 1. Composable Tutorials with Selected Content Blocks +// +// A composable tutorial wraps a procedure and defines variations using selected-content blocks: +// +// .. composable-tutorial:: +// :options: driver, atlas-cli +// :defaults: driver=nodejs; atlas-cli=none +// +// .. procedure:: +// .. step:: Install dependencies +// .. selected-content:: +// :selections: driver=nodejs +// npm install mongodb +// .. selected-content:: +// :selections: driver=python +// pip install pymongo +// +// This creates variations like "driver=nodejs" and "driver=python" with different content +// for the same logical step. +// +// 2. Tabs Within Steps +// +// Tabs can appear within procedure steps to show different ways to accomplish the same task: +// +// .. procedure:: +// .. step:: Connect to MongoDB +// .. tabs:: +// .. tab:: Node.js +// :tabid: nodejs +// const client = new MongoClient(uri); +// .. tab:: Python +// :tabid: python +// client = MongoClient(uri) +// +// This creates variations "nodejs" and "python" for the same procedure. +// +// 3. Tabs Containing Procedures +// +// Tabs can contain entirely different procedures for different platforms/contexts: +// +// Installation Instructions +// -------------------------- +// .. tabs:: +// .. tab:: macOS +// :tabid: macos +// .. procedure:: +// .. step:: Install Homebrew +// .. tab:: Windows +// :tabid: windows +// .. procedure:: +// .. step:: Download the installer +// +// This creates separate procedures that are grouped for analysis but extracted separately. +// +// # Include Directive Expansion +// +// The parser handles .. include:: directives with special logic: +// +// - If a file has NO composable tutorial: Expands all includes globally before parsing +// - If a file HAS a composable tutorial: Expands includes within selected-content blocks +// and within procedure steps to detect selected-content blocks in included files +// +// This ensures that variations defined in included files are properly detected. +// +// # Uniqueness and Grouping +// +// Procedures are identified by their heading (title) and content hash. The content hash +// includes step titles, content, and variations to detect when procedures are identical +// vs. different. +// +// For Analysis/Reporting: +// - Procedures with the same TabSet are grouped as one logical procedure +// - Procedures with the same ComposableTutorial selections are grouped together +// - Shows "1 unique procedure with N variations" +// +// For Extraction: +// - Each unique procedure (by content hash) is extracted to a separate file +// - Tabs containing procedures: Each tab's procedure is extracted separately +// - Composable tutorials: One file per unique procedure, listing all selections +// - Tabs within steps: One file listing all tab variations +// +// # Key Design Decisions +// +// 1. Deterministic Ordering: All map iterations are sorted to ensure consistent output +// 2. Content Hashing: SHA256 hash of step content to detect identical procedures +// 3. Grouping Semantics: Same procedures grouped differently for analysis vs. extraction +// 4. Include Expansion: Context-aware expansion to detect variations in included files +// +// For detailed examples and edge cases, see docs/PROCEDURE_PARSING.md +package rst + +import ( + "crypto/sha256" + "encoding/hex" + "fmt" + "os" + "path/filepath" + "sort" + "strings" + + "gopkg.in/yaml.v3" +) + +// ParseProceduresWithOptions parses all procedures from an RST file with options. +// +// This function scans the file and extracts all procedures, whether they are +// implemented using .. procedure:: directives, ordered lists, or composable tutorials. +// +// Parameters: +// - filePath: Path to the RST file to parse +// - expandIncludes: If true, expands .. include:: directives inline to detect variations +// +// Returns: +// - []Procedure: Slice of all parsed procedures +// - error: Any error encountered during parsing +func ParseProceduresWithOptions(filePath string, expandIncludes bool) ([]Procedure, error) { + content, err := os.ReadFile(filePath) + if err != nil { + return nil, err + } + + lines := strings.Split(string(content), "\n") + + // Check if the file contains composable tutorials + hasComposableTutorial := false + for _, line := range lines { + if ComposableTutorialDirectiveRegex.MatchString(strings.TrimSpace(line)) { + hasComposableTutorial = true + break + } + } + + // If expandIncludes is true AND there are no composable tutorials, expand all include directives inline + // If there ARE composable tutorials, we DON'T expand includes globally because each selected-content + // block will expand its own includes to preserve the block boundaries + if expandIncludes && !hasComposableTutorial { + lines, err = expandIncludesInLines(filePath, lines) + if err != nil { + return nil, fmt.Errorf("failed to expand includes: %w", err) + } + } + + return parseProceduresFromLines(lines, filePath) +} + +// parseProceduresFromLines parses procedures from a slice of lines. +func parseProceduresFromLines(lines []string, filePath string) ([]Procedure, error) { + + var procedures []Procedure + var currentHeading string + + i := 0 + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + // Track headings for procedure titles + if i+1 < len(lines) { + nextLine := strings.TrimSpace(lines[i+1]) + if isHeadingUnderline(nextLine) && len(nextLine) >= len(trimmedLine) { + // Skip empty headings and generic headings that don't provide meaningful context + headingLower := strings.ToLower(trimmedLine) + if trimmedLine != "" && headingLower != "procedure" && headingLower != "overview" && headingLower != "steps" { + currentHeading = trimmedLine + } + i += 2 // Skip heading and underline + continue + } + } + + // Check for composable tutorial directive + if ComposableTutorialDirectiveRegex.MatchString(trimmedLine) { + tutorial, endLine := parseComposableTutorial(lines, i, currentHeading, filePath) + if tutorial != nil { + // Extract ALL procedures from this composable tutorial + tutorialProcs := extractProceduresFromComposableTutorial(tutorial, i) + procedures = append(procedures, tutorialProcs...) + } + i = endLine + 1 + continue + } + + // Check for tabs directive at top level (tabs containing procedures) + if TabsDirectiveRegex.MatchString(trimmedLine) { + tabSet, endLine := parseTabSetWithProcedures(lines, i, currentHeading, filePath) + if tabSet != nil && len(tabSet.Procedures) > 0 { + // Extract procedures from the tab set + tabProcs := extractProceduresFromTabSet(tabSet) + procedures = append(procedures, tabProcs...) + } + i = endLine + 1 + continue + } + + // Check for procedure directive + if ProcedureDirectiveRegex.MatchString(trimmedLine) { + procedure, endLine := parseProcedureDirectiveFromLines(lines, i, currentHeading, filePath) + procedure.LineNum = i + 1 + procedure.EndLineNum = endLine + 1 + procedures = append(procedures, procedure) + i = endLine + 1 + continue + } + + // Check for ordered list (potential procedure) + if isOrderedListStart(trimmedLine) { + procedure, endLine := parseOrderedListProcedure(lines, i, currentHeading) + if len(procedure.Steps) > 0 { + procedure.LineNum = i + 1 + procedure.EndLineNum = endLine + 1 + procedures = append(procedures, procedure) + } + i = endLine + 1 + continue + } + + i++ + } + + // Sort procedures by line number for deterministic order + sort.Slice(procedures, func(i, j int) bool { + return procedures[i].LineNum < procedures[j].LineNum + }) + + return procedures, nil +} + +// isHeading checks if the current line is part of a heading (checks next line for underline) +func isHeading(lines []string, idx int) bool { + if idx+1 >= len(lines) { + return false + } + nextLine := strings.TrimSpace(lines[idx+1]) + return isHeadingUnderline(nextLine) +} + +// isHeadingUnderline checks if a line is a heading underline +func isHeadingUnderline(line string) bool { + if len(line) == 0 { + return false + } + // RST headings are underlined with =, -, ~, ^, ", `, +, etc. + firstChar := line[0] + underlineChars := "=-~`^\"'+*#" + if !strings.ContainsRune(underlineChars, rune(firstChar)) { + return false + } + // Check if entire line is the same character + for _, ch := range line { + if ch != rune(firstChar) { + return false + } + } + return true +} + +// computeProcedureContentHash generates a hash of the procedure's content +// to detect when procedures are identical across different selections +func computeProcedureContentHash(proc *Procedure) string { + var content strings.Builder + + // Include all step titles and content + for _, step := range proc.Steps { + content.WriteString(step.Title) + content.WriteString("|") + content.WriteString(step.Content) + content.WriteString("|") + + // Include variations + for _, variation := range step.Variations { + content.WriteString(string(variation.Type)) + content.WriteString("|") + for _, opt := range variation.Options { + content.WriteString(opt) + content.WriteString("|") + } + // Sort keys for deterministic hash + var keys []string + for key := range variation.Content { + keys = append(keys, key) + } + sort.Strings(keys) + for _, key := range keys { + content.WriteString(key) + content.WriteString(":") + content.WriteString(variation.Content[key]) + content.WriteString("|") + } + } + + // Include substeps + for _, substep := range step.SubSteps { + content.WriteString(substep.Title) + content.WriteString("|") + content.WriteString(substep.Content) + content.WriteString("|") + } + } + + // Compute SHA256 hash + hash := sha256.Sum256([]byte(content.String())) + return hex.EncodeToString(hash[:]) +} + +// isOrderedListStart checks if a line starts an ordered list +func isOrderedListStart(line string) bool { + return numberedListRegex.MatchString(line) || letteredListRegex.MatchString(line) +} + +// getIndentLevel returns the indentation level of a line +func getIndentLevel(line string) int { + count := 0 + for _, ch := range line { + if ch == ' ' { + count++ + } else if ch == '\t' { + count += 4 // Treat tab as 4 spaces + } else { + break + } + } + return count +} + +// expandIncludesInLines expands all .. include:: directives in the lines. +// +// This function recursively processes include directives, replacing them with +// the content of the included files. The included content is indented to match +// the indentation of the include directive. +// +// Special handling for YAML steps files: When a .yaml steps file is encountered, +// it's converted to a placeholder procedure directive so it can be detected as a procedure. +// +// Parameters: +// - filePath: Path to the file being parsed (for resolving relative includes) +// - lines: The lines to process +// +// Returns: +// - []string: Lines with includes expanded +// - error: Any error encountered during expansion +func expandIncludesInLines(filePath string, lines []string) ([]string, error) { + var result []string + visited := make(map[string]bool) // Track visited files to prevent circular includes + + for i := 0; i < len(lines); i++ { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + // Check if this is an include directive + if matches := IncludeDirectiveRegex.FindStringSubmatch(trimmedLine); len(matches) > 1 { + includePath := strings.TrimSpace(matches[1]) + + // Resolve the include path + resolvedPath, err := ResolveIncludePath(filePath, includePath) + if err != nil { + // If we can't resolve the include, keep the directive as-is + result = append(result, line) + continue + } + + // Check for circular includes + if visited[resolvedPath] { + result = append(result, line) + continue + } + visited[resolvedPath] = true + + // Get the indentation of the include directive + indent := getIndentLevel(line) + + // Special handling for YAML steps files + // These are procedures defined in YAML format + if strings.HasSuffix(resolvedPath, ".yaml") && strings.Contains(resolvedPath, "steps-") { + // Parse the YAML steps file and convert to RST procedure format + yamlLines, err := parseYAMLStepsFile(resolvedPath, indent) + if err != nil { + // If parsing fails, skip this include + delete(visited, resolvedPath) + continue + } + result = append(result, yamlLines...) + delete(visited, resolvedPath) + continue + } + + // Read the included file + includeContent, err := os.ReadFile(resolvedPath) + if err != nil { + // If we can't read the file, keep the directive as-is + result = append(result, line) + continue + } + + // Split included content into lines + includeLines := strings.Split(string(includeContent), "\n") + + // Recursively expand includes in the included file + expandedLines, err := expandIncludesInLines(resolvedPath, includeLines) + if err != nil { + // If expansion fails, use the original lines + expandedLines = includeLines + } + + // Add the included content with proper indentation + for _, includeLine := range expandedLines { + if strings.TrimSpace(includeLine) == "" { + result = append(result, "") + } else { + // Add the include directive's indentation to each line + result = append(result, strings.Repeat(" ", indent)+includeLine) + } + } + + delete(visited, resolvedPath) + } else { + result = append(result, line) + } + } + + return result, nil +} + +// parseYAMLStepsFile parses a YAML steps file and converts it to RST procedure format +func parseYAMLStepsFile(yamlPath string, indent int) ([]string, error) { + content, err := os.ReadFile(yamlPath) + if err != nil { + return nil, err + } + + // Split by YAML document separator (---) + docs := strings.Split(string(content), "\n---\n") + + var steps []YAMLStep + for _, doc := range docs { + if strings.TrimSpace(doc) == "" { + continue + } + + var step YAMLStep + if err := yaml.Unmarshal([]byte(doc), &step); err != nil { + // Skip malformed steps + continue + } + steps = append(steps, step) + } + + // Convert to RST format + var result []string + indentStr := strings.Repeat(" ", indent) + + result = append(result, indentStr+".. procedure::") + result = append(result, indentStr+" :style: normal") + result = append(result, "") + + for _, step := range steps { + result = append(result, indentStr+" .. step:: "+step.Title) + result = append(result, "") + + // Add pre-action content if present + if step.Pre != "" { + for _, line := range strings.Split(strings.TrimSpace(step.Pre), "\n") { + result = append(result, indentStr+" "+line) + } + result = append(result, "") + } + + // Add action content if present + if step.Action != nil { + // Action can be a map or a slice of maps + result = append(result, indentStr+" (Action content from YAML)") + result = append(result, "") + } + + // Add post-action content if present + if step.Post != "" { + for _, line := range strings.Split(strings.TrimSpace(step.Post), "\n") { + result = append(result, indentStr+" "+line) + } + result = append(result, "") + } + } + + return result, nil +} + +// extractStepsTitle extracts a title from a YAML steps filename +// Example: steps-run-mongodb-on-a-linux-distribution-systemd.yaml -> "Run MongoDB" +func extractStepsTitle(yamlPath string) string { + basename := filepath.Base(yamlPath) + // Remove "steps-" prefix and ".yaml" suffix + basename = strings.TrimPrefix(basename, "steps-") + basename = strings.TrimSuffix(basename, ".yaml") + + // Convert hyphens to spaces and title case the first word + parts := strings.Split(basename, "-") + if len(parts) > 0 { + parts[0] = strings.Title(parts[0]) + return strings.Join(parts, " ") + } + return "Steps" +} + +// normalizeIndentation removes the base indentation from all lines +// This is used to normalize content before re-indenting it for output +func normalizeIndentation(content string) string { + lines := strings.Split(content, "\n") + if len(lines) == 0 { + return content + } + + // Find the minimum indentation (ignoring empty lines) + minIndent := -1 + for _, line := range lines { + if strings.TrimSpace(line) == "" { + continue + } + indent := getIndentLevel(line) + if minIndent == -1 || indent < minIndent { + minIndent = indent + } + } + + // If no indentation found, return as-is + if minIndent <= 0 { + return content + } + + // Remove the base indentation from all lines + var result []string + for _, line := range lines { + if strings.TrimSpace(line) == "" { + result = append(result, "") + } else if len(line) >= minIndent { + result = append(result, line[minIndent:]) + } else { + result = append(result, line) + } + } + + return strings.Join(result, "\n") +} + +// containsIncludeDirective checks if any line contains an include directive +func containsIncludeDirective(lines []string) bool { + for _, line := range lines { + trimmed := strings.TrimSpace(line) + if strings.HasPrefix(trimmed, ".. include::") { + return true + } + } + return false +} + +// parseProcedureDirectiveFromLines parses a .. procedure:: directive and its steps +func parseProcedureDirectiveFromLines(lines []string, startIdx int, title string, filePath string) (Procedure, int) { + procedure := Procedure{ + Type: ProcedureDirective, + Title: title, + Options: make(map[string]string), + Steps: []Step{}, + } + + i := startIdx + 1 // Skip the .. procedure:: line + baseIndent := -1 + + // Parse options and steps + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + // Empty line + if trimmedLine == "" { + i++ + continue + } + + indent := getIndentLevel(line) + + // Set base indent from first non-empty line + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for option + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + procedure.Options[matches[1]] = strings.TrimSpace(matches[2]) + i++ + continue + } + + // Check for step directive + if matches := StepDirectiveRegex.FindStringSubmatch(trimmedLine); len(matches) > 0 { + step, endLine := parseStepDirectiveFromLines(lines, i, matches[1], filePath) + procedure.Steps = append(procedure.Steps, step) + i = endLine + 1 + continue + } + + // If we've dedented back to base level or beyond, procedure is done + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + // Check if line is not indented - end of procedure + if indent == 0 && trimmedLine != "" { + break + } + + i++ + } + + // Check for sub-steps + for _, step := range procedure.Steps { + if len(step.SubSteps) > 0 { + procedure.HasSubSteps = true + break + } + } + + return procedure, i - 1 +} + +// parseStepDirectiveFromLines parses a .. step:: directive +func parseStepDirectiveFromLines(lines []string, startIdx int, title string, filePath string) (Step, int) { + step := Step{ + Title: strings.TrimSpace(title), + Options: make(map[string]string), + LineNum: startIdx + 1, + Variations: []Variation{}, + SubSteps: []Step{}, + } + + i := startIdx + 1 // Skip the .. step:: line + baseIndent := -1 + var contentLines []string + var selectedContents []SelectedContent + + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + // Empty line + if trimmedLine == "" { + contentLines = append(contentLines, "") + i++ + continue + } + + indent := getIndentLevel(line) + + // Set base indent from first non-empty line + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for option + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + step.Options[matches[1]] = strings.TrimSpace(matches[2]) + i++ + continue + } + + // Check for next step directive - we're done + if StepDirectiveRegex.MatchString(trimmedLine) { + break + } + + // Check for tabs directive + if TabsDirectiveRegex.MatchString(trimmedLine) { + variation, endLine := parseTabsVariation(lines, i) + step.Variations = append(step.Variations, variation) + // Don't add tabs content to contentLines - it's in the variation + i = endLine + 1 + continue + } + + // Check for selected-content directive + if SelectedContentDirectiveRegex.MatchString(trimmedLine) { + selectedContent, endLine := parseSelectedContent(lines, i) + selectedContents = append(selectedContents, selectedContent) + // Don't add selected-content to contentLines - it's tracked separately + i = endLine + 1 + continue + } + + // Check for ordered list (sub-steps) + if isOrderedListStart(trimmedLine) { + subSteps, endLine := parseOrderedListSteps(lines, i) + step.SubSteps = append(step.SubSteps, subSteps...) + // Add the sub-steps to content as well + for j := i; j <= endLine; j++ { + contentLines = append(contentLines, lines[j]) + } + i = endLine + 1 + continue + } + + // If we've dedented significantly, we're done with this step + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + // Check if line is not indented - end of step + if indent == 0 && trimmedLine != "" { + break + } + + // Add content line (this is general content, not variation-specific) + contentLines = append(contentLines, line) + i++ + } + + // IMPORTANT: If we haven't found any selected-content blocks yet, but the content + // contains include directives, we need to expand them to check for selected-content + // blocks in the included files. This handles the case where composable tutorials + // have procedures with steps that include files containing selected-content blocks. + if len(selectedContents) == 0 && containsIncludeDirective(contentLines) { + expandedLines, err := expandIncludesInLines(filePath, contentLines) + if err == nil { + // Re-parse the expanded content to find selected-content blocks + j := 0 + for j < len(expandedLines) { + trimmedLine := strings.TrimSpace(expandedLines[j]) + + if SelectedContentDirectiveRegex.MatchString(trimmedLine) { + selectedContent, endLine := parseSelectedContent(expandedLines, j) + selectedContents = append(selectedContents, selectedContent) + j = endLine + 1 + continue + } + j++ + } + } + } + + // If we have selected-content blocks, create a variation from them + if len(selectedContents) > 0 { + variation := Variation{ + Type: SelectedContentVariation, + Options: []string{}, + Content: make(map[string]string), + } + + for _, sc := range selectedContents { + selectionKey := strings.Join(sc.Selections, ", ") + variation.Options = append(variation.Options, selectionKey) + variation.Content[selectionKey] = sc.Content + } + + step.Variations = append(step.Variations, variation) + } + + // Normalize indentation for general content + rawContent := strings.Join(contentLines, "\n") + step.Content = normalizeIndentation(rawContent) + return step, i - 1 +} + +// parseOrderedListProcedure parses an ordered list as a procedure +func parseOrderedListProcedure(lines []string, startIdx int, title string) (Procedure, int) { + procedure := Procedure{ + Type: OrderedList, + Title: title, + Steps: []Step{}, + } + + steps, endLine := parseOrderedListSteps(lines, startIdx) + procedure.Steps = steps + + return procedure, endLine +} + +// parseOrderedListSteps parses ordered list items as steps +func parseOrderedListSteps(lines []string, startIdx int) ([]Step, int) { + var steps []Step + i := startIdx + baseIndent := getIndentLevel(lines[i]) + + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + // Empty line - might be between list items + if trimmedLine == "" { + i++ + continue + } + + indent := getIndentLevel(line) + + // Check if this is a list item at the same level + if indent == baseIndent && isOrderedListStart(trimmedLine) { + step, endLine := parseOrderedListItem(lines, i) + steps = append(steps, step) + i = endLine + 1 + continue + } + + // If we've dedented or hit a non-list line at base level, we're done + if indent <= baseIndent && !isOrderedListStart(trimmedLine) { + break + } + + i++ + } + + return steps, i - 1 +} + +// parseOrderedListItem parses a single ordered list item +func parseOrderedListItem(lines []string, startIdx int) (Step, int) { + line := lines[startIdx] + var title string + var contentLines []string + + // Extract the title from the list marker (don't add the line itself to content) + if matches := numberedListRegex.FindStringSubmatch(line); len(matches) > 3 { + title = strings.TrimSpace(matches[3]) + } else if matches := letteredListRegex.FindStringSubmatch(line); len(matches) > 3 { + title = strings.TrimSpace(matches[3]) + } + + baseIndent := getIndentLevel(line) + i := startIdx + 1 + + // The content indent should be greater than the list marker indent + contentIndent := -1 + + // Parse the content of this list item + for i < len(lines) { + currentLine := lines[i] + trimmedLine := strings.TrimSpace(currentLine) + + // Empty line - could be within the list item + if trimmedLine == "" { + contentLines = append(contentLines, "") + i++ + continue + } + + indent := getIndentLevel(currentLine) + + // Set content indent from first non-empty line after list marker + if contentIndent == -1 && indent > baseIndent { + contentIndent = indent + } + + // Check if this is the next list item at the same level + if indent == baseIndent && isOrderedListStart(trimmedLine) { + break + } + + // Check if we've dedented to or past the base indent (end of list item) + if indent <= baseIndent && trimmedLine != "" { + break + } + + // Check for RST directives at base level (end of list) + if indent == 0 && (strings.HasPrefix(trimmedLine, "..") || isHeading(lines, i)) { + break + } + + // Add content line + contentLines = append(contentLines, currentLine) + i++ + } + + step := Step{ + Title: title, + Content: strings.Join(contentLines, "\n"), + LineNum: startIdx + 1, + } + + return step, i - 1 +} + +// parseTabsVariation parses a .. tabs:: directive and its tab content +func parseTabsVariation(lines []string, startIdx int) (Variation, int) { + variation := Variation{ + Type: TabVariation, + Options: []string{}, + Content: make(map[string]string), + } + + i := startIdx + 1 // Skip the .. tabs:: line + baseIndent := -1 + + // Parse tabs options first + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + if trimmedLine == "" { + i++ + continue + } + + indent := getIndentLevel(line) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for tabs options + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + i++ + continue + } + + // Check for tab directive + if TabDirectiveRegex.MatchString(trimmedLine) { + tabid, content, endLine := parseTabContent(lines, i) + if tabid != "" { + variation.Options = append(variation.Options, tabid) + variation.Content[tabid] = content + } + i = endLine + 1 + continue + } + + // If we've dedented, we're done + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + if indent == 0 && trimmedLine != "" { + break + } + + i++ + } + + return variation, i - 1 +} + +// parseTabContent parses a single .. tab:: directive +func parseTabContent(lines []string, startIdx int) (string, string, int) { + var tabid string + var contentLines []string + + // Extract tabid from options + i := startIdx + 1 + baseIndent := -1 + + for i < len(lines) { + currentLine := lines[i] + trimmedCurrentLine := strings.TrimSpace(currentLine) + + if trimmedCurrentLine == "" { + contentLines = append(contentLines, "") + i++ + continue + } + + indent := getIndentLevel(currentLine) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for :tabid: option + if matches := optionRegex.FindStringSubmatch(currentLine); len(matches) > 1 { + if matches[1] == "tabid" { + tabid = strings.TrimSpace(matches[2]) + } + i++ + continue + } + + // Check for next tab directive + if TabDirectiveRegex.MatchString(trimmedCurrentLine) { + break + } + + // If we've dedented significantly, we're done + if baseIndent > 0 && indent < baseIndent && trimmedCurrentLine != "" { + break + } + + if indent == 0 && trimmedCurrentLine != "" { + break + } + + // Add content line + contentLines = append(contentLines, currentLine) + i++ + } + + // Normalize indentation before storing + rawContent := strings.Join(contentLines, "\n") + content := normalizeIndentation(rawContent) + return tabid, content, i - 1 +} + +// parseTabSetWithProcedures parses a top-level .. tabs:: directive that contains procedures. +// This is different from parseTabsVariation which handles tabs within steps. +func parseTabSetWithProcedures(lines []string, startIdx int, title string, filePath string) (*TabSet, int) { + tabSet := &TabSet{ + Title: title, + Tabs: make(map[string][]string), + TabIDs: []string{}, + Procedures: make(map[string]Procedure), + LineNum: startIdx + 1, + FilePath: filePath, + } + + i := startIdx + 1 // Skip the .. tabs:: line + baseIndent := -1 + + // Parse each tab + for i < len(lines) { + line := lines[i] + trimmedLine := strings.TrimSpace(line) + + if trimmedLine == "" { + i++ + continue + } + + indent := getIndentLevel(line) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for tabs options (skip them) + if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 { + i++ + continue + } + + // Check for tab directive + if TabDirectiveRegex.MatchString(trimmedLine) { + tabid, contentLines, endLine := parseTabContentLines(lines, i) + if tabid != "" { + tabSet.TabIDs = append(tabSet.TabIDs, tabid) + tabSet.Tabs[tabid] = contentLines + } + i = endLine + 1 + continue + } + + // If we've dedented, we're done + if baseIndent > 0 && indent < baseIndent && trimmedLine != "" { + break + } + + if indent == 0 && trimmedLine != "" { + break + } + + i++ + } + + // Now parse procedures from each tab's content + for _, tabid := range tabSet.TabIDs { + contentLines := tabSet.Tabs[tabid] + // Parse procedures from this tab's content + procedures, err := parseProceduresFromLines(contentLines, filePath) + if err == nil && len(procedures) > 0 { + // Take the first procedure found in this tab + // (typically there should only be one procedure per tab) + procedure := procedures[0] + procedure.Title = title // Use the heading as the title + tabSet.Procedures[tabid] = procedure + } + } + + return tabSet, i - 1 +} + +// parseTabContentLines parses a single .. tab:: directive and returns the content as lines. +// This is similar to parseTabContent but returns lines instead of normalized content. +func parseTabContentLines(lines []string, startIdx int) (string, []string, int) { + var tabid string + var contentLines []string + + // Extract tabid from options + i := startIdx + 1 + baseIndent := -1 + + for i < len(lines) { + currentLine := lines[i] + trimmedCurrentLine := strings.TrimSpace(currentLine) + + if trimmedCurrentLine == "" { + contentLines = append(contentLines, "") + i++ + continue + } + + indent := getIndentLevel(currentLine) + if baseIndent == -1 && indent > 0 { + baseIndent = indent + } + + // Check for :tabid: option + if matches := optionRegex.FindStringSubmatch(currentLine); len(matches) > 1 { + if matches[1] == "tabid" { + tabid = strings.TrimSpace(matches[2]) + } + i++ + continue + } + + // Check for next tab directive + if TabDirectiveRegex.MatchString(trimmedCurrentLine) { + break + } + + // If we've dedented significantly, we're done + if baseIndent > 0 && indent < baseIndent && trimmedCurrentLine != "" { + break + } + + if indent == 0 && trimmedCurrentLine != "" { + break + } + + // Add content line (preserve original indentation) + contentLines = append(contentLines, currentLine) + i++ + } + + return tabid, contentLines, i - 1 +} + +// extractProceduresFromTabSet extracts procedures from a tab set. +// Each tab's procedure is returned as a separate procedure, but they all share +// the same TabSet reference so they can be grouped for analysis/reporting. +func extractProceduresFromTabSet(tabSet *TabSet) []Procedure { + if len(tabSet.Procedures) == 0 { + return []Procedure{} + } + + // Sort tab IDs for deterministic order + sortedTabIDs := make([]string, len(tabSet.TabIDs)) + copy(sortedTabIDs, tabSet.TabIDs) + sort.Strings(sortedTabIDs) + + // Create a shared TabSetInfo that all procedures will reference + sharedTabSetInfo := &TabSetInfo{ + TabIDs: sortedTabIDs, + Procedures: tabSet.Procedures, + } + + // Extract each tab's procedure as a separate procedure + var procedures []Procedure + for _, tabid := range sortedTabIDs { + if proc, ok := tabSet.Procedures[tabid]; ok { + // Attach the shared TabSet reference for grouping + proc.TabSet = sharedTabSetInfo + // Set the specific tab ID for this procedure + proc.TabID = tabid + procedures = append(procedures, proc) + } + } + + return procedures +} diff --git a/audit-cli/internal/rst/parse_procedures_test.go b/audit-cli/internal/rst/parse_procedures_test.go new file mode 100644 index 0000000..36b422c --- /dev/null +++ b/audit-cli/internal/rst/parse_procedures_test.go @@ -0,0 +1,184 @@ +package rst + +import ( + "path/filepath" + "testing" +) + +func TestParseProceduresWithOptions(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed: %v", err) + } + + // Expected: 5 unique procedures + if len(procedures) != 5 { + t.Errorf("Expected 5 procedures, got %d", len(procedures)) + } + + // Verify each procedure has steps + for i, proc := range procedures { + if len(proc.Steps) == 0 { + t.Errorf("Procedure %d (%s) has no steps", i, proc.Title) + } + } + + t.Logf("Found %d procedures", len(procedures)) +} + +func TestParseProceduresDeterministic(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + // Parse multiple times to ensure deterministic results + var allProcedures [][]Procedure + for i := 0; i < 5; i++ { + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed on iteration %d: %v", i, err) + } + allProcedures = append(allProcedures, procedures) + } + + // Verify all runs produce the same count + for i := 1; i < len(allProcedures); i++ { + if len(allProcedures[i]) != len(allProcedures[0]) { + t.Errorf("Iteration %d: found %d procedures, want %d (non-deterministic!)", + i, len(allProcedures[i]), len(allProcedures[0])) + } + } + + // Verify procedure titles are in the same order + for i := 1; i < len(allProcedures); i++ { + for j := 0; j < len(allProcedures[0]); j++ { + if allProcedures[i][j].Title != allProcedures[0][j].Title { + t.Errorf("Iteration %d, procedure %d: title = %s, want %s (non-deterministic!)", + i, j, allProcedures[i][j].Title, allProcedures[0][j].Title) + } + } + } +} + +func TestComputeProcedureContentHash(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + // Parse the file multiple times + var hashes []string + for i := 0; i < 5; i++ { + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed on iteration %d: %v", i, err) + } + + if len(procedures) > 0 { + hash := computeProcedureContentHash(&procedures[0]) + hashes = append(hashes, hash) + } + } + + // Verify all hashes are identical (deterministic) + for i := 1; i < len(hashes); i++ { + if hashes[i] != hashes[0] { + t.Errorf("Iteration %d: hash = %s, want %s (non-deterministic!)", i, hashes[i], hashes[0]) + } + } + + t.Logf("Content hash is deterministic: %s", hashes[0]) +} + +func TestParseProcedureDirective(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed: %v", err) + } + + // Find a procedure directive (not ordered list) + var procedureDirective *Procedure + for i := range procedures { + if procedures[i].Title == "Simple Procedure with Steps" { + procedureDirective = &procedures[i] + break + } + } + + if procedureDirective == nil { + t.Fatal("Could not find 'Simple Procedure with Steps'") + } + + // Verify it has the expected number of steps + if len(procedureDirective.Steps) != 3 { + t.Errorf("Expected 3 steps, got %d", len(procedureDirective.Steps)) + } + + // Verify step titles + expectedTitles := []string{ + "Create a database connection", + "Insert a document", + "Close the connection", + } + + for i, expectedTitle := range expectedTitles { + if i >= len(procedureDirective.Steps) { + t.Errorf("Missing step %d", i) + continue + } + if procedureDirective.Steps[i].Title != expectedTitle { + t.Errorf("Step %d: title = %q, want %q", i, procedureDirective.Steps[i].Title, expectedTitle) + } + } + + t.Logf("Procedure directive parsed correctly with %d steps", len(procedureDirective.Steps)) +} + +func TestParseOrderedListProcedure(t *testing.T) { + testFile := "../../testdata/input-files/source/procedure-test.rst" + + procedures, err := ParseProceduresWithOptions(testFile, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions failed: %v", err) + } + + // Find the ordered list procedure + var orderedListProc *Procedure + for i := range procedures { + if procedures[i].Title == "Ordered List Procedure" { + orderedListProc = &procedures[i] + break + } + } + + if orderedListProc == nil { + t.Fatal("Could not find 'Ordered List Procedure'") + } + + // Verify it has the expected number of steps + if len(orderedListProc.Steps) != 4 { + t.Errorf("Expected 4 steps, got %d", len(orderedListProc.Steps)) + } + + t.Logf("Ordered list procedure parsed correctly with %d steps", len(orderedListProc.Steps)) +} + +func TestAbsolutePath(t *testing.T) { + // Test with relative path + relPath := "../../testdata/input-files/source/procedure-test.rst" + absPath, err := filepath.Abs(relPath) + if err != nil { + t.Fatalf("Failed to get absolute path: %v", err) + } + + // Parse with absolute path + procedures, err := ParseProceduresWithOptions(absPath, false) + if err != nil { + t.Fatalf("ParseProceduresWithOptions with absolute path failed: %v", err) + } + + if len(procedures) != 5 { + t.Errorf("Expected 5 procedures with absolute path, got %d", len(procedures)) + } + + t.Logf("Successfully parsed with absolute path: %s", absPath) +} diff --git a/audit-cli/internal/rst/procedure_types.go b/audit-cli/internal/rst/procedure_types.go new file mode 100644 index 0000000..cf4afb3 --- /dev/null +++ b/audit-cli/internal/rst/procedure_types.go @@ -0,0 +1,111 @@ +package rst + +import "regexp" + +// ProcedureType represents the type of procedure implementation. +type ProcedureType string + +const ( + // ProcedureDirective represents procedures using .. procedure:: directive + ProcedureDirective ProcedureType = "procedure-directive" + // OrderedList represents procedures using ordered lists + OrderedList ProcedureType = "ordered-list" +) + +// Procedure represents a parsed procedure from an RST file. +type Procedure struct { + Type ProcedureType // Type of procedure (directive or ordered list) + Title string // Title/heading above the procedure + Options map[string]string // Directive options (for procedure directive) + Steps []Step // Steps in the procedure + LineNum int // Line number where procedure starts (1-based) + EndLineNum int // Line number where procedure ends (1-based) + HasSubSteps bool // Whether this procedure contains sub-procedures + IsSubProcedure bool // Whether this is a sub-procedure within a step + ComposableTutorial *ComposableTutorial // Composable tutorial wrapping this procedure (if any) + TabSet *TabSetInfo // Tab set wrapping this procedure (if any) + TabID string // The specific tab ID this procedure belongs to (if part of a tab set) +} + +// TabSetInfo represents information about a tab set containing procedure variations. +// This is used for grouping procedures for analysis/reporting purposes. +type TabSetInfo struct { + TabIDs []string // All tab IDs in the set (for grouping) + Procedures map[string]Procedure // All procedures by tabid (for grouping) +} + +// Step represents a single step in a procedure. +type Step struct { + Title string // Step title (for .. step:: directive) + Content string // Step content (raw RST) + Options map[string]string // Step options + LineNum int // Line number where step starts + Variations []Variation // Variations within this step (tabs or selected content) + SubSteps []Step // Sub-steps (ordered lists within this step) +} + +// Variation represents a content variation within a step. +type Variation struct { + Type VariationType // Type of variation (tab or selected-content) + Options []string // Available options (tabids or selections) + Content map[string]string // Content for each option +} + +// VariationType represents the type of content variation. +type VariationType string + +const ( + // TabVariation represents variations using .. tabs:: directive + TabVariation VariationType = "tabs" + // SelectedContentVariation represents variations using .. selected-content:: directive + SelectedContentVariation VariationType = "selected-content" +) + +// ComposableTutorial represents a composable tutorial structure. +type ComposableTutorial struct { + Title string // Title/heading above the composable tutorial + Options []string // Available option names (e.g., ["interface", "language"]) + Defaults []string // Default selections (e.g., ["driver", "nodejs"]) + Selections []string // All unique selection combinations found + GeneralContent []string // Content lines that apply to all selections + LineNum int // Line number where tutorial starts + FilePath string // Path to the source file (for resolving includes) + Procedure *Procedure // The procedure within the composable tutorial + SelectedContentBlocks []SelectedContent // All selected-content blocks (for extracting multiple procedures) +} + +// TabSet represents a tabs directive containing procedures. +type TabSet struct { + Title string // Title/heading above the tabs + Tabs map[string][]string // Tab content by tabid (lines of RST) + TabIDs []string // Ordered list of tab IDs + Procedures map[string]Procedure // Parsed procedures by tabid + LineNum int // Line number where tabs start + FilePath string // Path to the source file (for resolving includes) +} + +// SelectedContent represents a selected-content block within a composable tutorial. +type SelectedContent struct { + Selections []string // The selections for this content (e.g., ["driver", "nodejs"]) + Content string // The content for this selection + LineNum int // Line number where this selected-content starts +} + +// Regular expressions for parsing ordered lists +var ( + // Matches numbered lists: 1. or 1) + numberedListRegex = regexp.MustCompile(`^(\s*)(\d+)[\.\)]\s+(.*)$`) + // Matches lettered lists: a. or a) or A. or A) + letteredListRegex = regexp.MustCompile(`^(\s*)([a-zA-Z])[\.\)]\s+(.*)$`) +) + +// YAMLStep represents a step in a YAML steps file +type YAMLStep struct { + Title string `yaml:"title"` + StepNum int `yaml:"stepnum"` + Level int `yaml:"level"` + Ref string `yaml:"ref"` + Pre string `yaml:"pre"` + Action interface{} `yaml:"action"` + Post string `yaml:"post"` +} diff --git a/audit-cli/testdata/input-files/source/includes/procedures/connect.rst b/audit-cli/testdata/input-files/source/includes/procedures/connect.rst new file mode 100644 index 0000000..c578569 --- /dev/null +++ b/audit-cli/testdata/input-files/source/includes/procedures/connect.rst @@ -0,0 +1,33 @@ +.. selected-content:: + :selections: driver, nodejs + + Create a connection using the Node.js driver: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const uri = process.env.MONGODB_URI; + const client = new MongoClient(uri); + await client.connect(); + +.. selected-content:: + :selections: driver, python + + Create a connection using the Python driver: + + .. code-block:: python + + from pymongo import MongoClient + import os + uri = os.environ['MONGODB_URI'] + client = MongoClient(uri) + +.. selected-content:: + :selections: atlas-cli, none + + Authenticate with the Atlas CLI: + + .. code-block:: bash + + atlas auth login + diff --git a/audit-cli/testdata/input-files/source/includes/procedures/install-deps.rst b/audit-cli/testdata/input-files/source/includes/procedures/install-deps.rst new file mode 100644 index 0000000..ebbc530 --- /dev/null +++ b/audit-cli/testdata/input-files/source/includes/procedures/install-deps.rst @@ -0,0 +1,27 @@ +.. selected-content:: + :selections: driver, nodejs + + Install the MongoDB Node.js driver: + + .. code-block:: bash + + npm install mongodb + +.. selected-content:: + :selections: driver, python + + Install the MongoDB Python driver: + + .. code-block:: bash + + pip install pymongo + +.. selected-content:: + :selections: atlas-cli, none + + Install the Atlas CLI: + + .. code-block:: bash + + brew install mongodb-atlas-cli + diff --git a/audit-cli/testdata/input-files/source/includes/procedures/operations.rst b/audit-cli/testdata/input-files/source/includes/procedures/operations.rst new file mode 100644 index 0000000..33f36b6 --- /dev/null +++ b/audit-cli/testdata/input-files/source/includes/procedures/operations.rst @@ -0,0 +1,31 @@ +.. selected-content:: + :selections: driver, nodejs + + Insert a document using Node.js: + + .. code-block:: javascript + + const db = client.db('test'); + const result = await db.collection('users').insertOne({ name: 'Alice' }); + console.log('Inserted document:', result.insertedId); + +.. selected-content:: + :selections: driver, python + + Insert a document using Python: + + .. code-block:: python + + db = client.test + result = db.users.insert_one({'name': 'Alice'}) + print('Inserted document:', result.inserted_id) + +.. selected-content:: + :selections: atlas-cli, none + + Create a cluster using the Atlas CLI: + + .. code-block:: bash + + atlas clusters create myCluster --provider AWS --region US_EAST_1 + diff --git a/audit-cli/testdata/input-files/source/procedure-test.rst b/audit-cli/testdata/input-files/source/procedure-test.rst new file mode 100644 index 0000000..5f4d1ac --- /dev/null +++ b/audit-cli/testdata/input-files/source/procedure-test.rst @@ -0,0 +1,291 @@ +============================ +Procedure Testing Examples +============================ + +This file contains various procedure examples for testing the procedure parsing and extraction functionality. + +Simple Procedure with Steps +============================ + +.. procedure:: + :style: normal + + .. step:: Create a database connection + + First, establish a connection to the database: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const client = new MongoClient('mongodb://localhost:27017'); + await client.connect(); + + .. step:: Insert a document + + Next, insert a document into the collection: + + .. code-block:: javascript + + const db = client.db('myDatabase'); + const collection = db.collection('myCollection'); + const result = await collection.insertOne({ name: 'Alice', age: 30 }); + + .. step:: Close the connection + + Finally, close the connection: + + .. code-block:: javascript + + await client.close(); + +Procedure with Tabs +==================== + +.. procedure:: + + .. step:: Connect to MongoDB + + Choose your preferred connection method: + + .. tabs:: + + .. tab:: MongoDB Shell + :tabid: shell + + Connect using the MongoDB Shell: + + .. code-block:: bash + + mongosh "mongodb://localhost:27017" + + .. tab:: Node.js Driver + :tabid: nodejs + + Connect using the Node.js driver: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const client = new MongoClient('mongodb://localhost:27017'); + await client.connect(); + + .. tab:: Python Driver + :tabid: python + + Connect using the Python driver: + + .. code-block:: python + + from pymongo import MongoClient + client = MongoClient('mongodb://localhost:27017') + + .. step:: Verify the connection + + Verify that you're connected: + + .. tabs:: + + .. tab:: MongoDB Shell + :tabid: shell + + .. code-block:: bash + + db.runCommand({ ping: 1 }) + + .. tab:: Node.js Driver + :tabid: nodejs + + .. code-block:: javascript + + await client.db('admin').command({ ping: 1 }); + + .. tab:: Python Driver + :tabid: python + + .. code-block:: python + + client.admin.command('ping') + +Composable Tutorial Example +============================ + +.. composable-tutorial:: + :options: interface, language + :defaults: driver, nodejs + + .. procedure:: + + .. step:: Install dependencies + + .. selected-content:: + :selections: driver, nodejs + + Install the MongoDB Node.js driver: + + .. code-block:: bash + + npm install mongodb + + .. selected-content:: + :selections: driver, python + + Install the MongoDB Python driver: + + .. code-block:: bash + + pip install pymongo + + .. selected-content:: + :selections: atlas-cli, none + + Install the Atlas CLI: + + .. code-block:: bash + + brew install mongodb-atlas-cli + + .. step:: Connect to your cluster + + .. selected-content:: + :selections: driver, nodejs + + Create a connection using the Node.js driver: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const uri = process.env.MONGODB_URI; + const client = new MongoClient(uri); + await client.connect(); + + .. selected-content:: + :selections: driver, python + + Create a connection using the Python driver: + + .. code-block:: python + + from pymongo import MongoClient + import os + uri = os.environ['MONGODB_URI'] + client = MongoClient(uri) + + .. selected-content:: + :selections: atlas-cli, none + + Authenticate with the Atlas CLI: + + .. code-block:: bash + + atlas auth login + + .. step:: Perform an operation + + General content that applies to all selections. + + .. selected-content:: + :selections: driver, nodejs + + Insert a document using Node.js: + + .. code-block:: javascript + + const db = client.db('test'); + const result = await db.collection('users').insertOne({ name: 'Alice' }); + console.log('Inserted document:', result.insertedId); + + .. selected-content:: + :selections: driver, python + + Insert a document using Python: + + .. code-block:: python + + db = client.test + result = db.users.insert_one({'name': 'Alice'}) + print('Inserted document:', result.inserted_id) + + .. selected-content:: + :selections: atlas-cli, none + + Create a cluster using the Atlas CLI: + + .. code-block:: bash + + atlas clusters create myCluster --provider AWS --region US_EAST_1 + +Ordered List Procedure +======================= + +1. Create a new directory for your project: + + .. code-block:: bash + + mkdir my-mongodb-project + cd my-mongodb-project + +2. Initialize a new Node.js project: + + .. code-block:: bash + + npm init -y + +3. Install the MongoDB driver: + + .. code-block:: bash + + npm install mongodb + +4. Create a connection file: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + const uri = 'mongodb://localhost:27017'; + const client = new MongoClient(uri); + +Procedure with Sub-steps +========================= + +.. procedure:: + + .. step:: Set up your environment + + a. Install Node.js from https://nodejs.org + b. Install MongoDB from https://www.mongodb.com/try/download/community + c. Verify installations: + + .. code-block:: bash + + node --version + mongod --version + + .. step:: Create your project + + a. Create a new directory + b. Initialize npm + c. Install dependencies + + .. code-block:: bash + + mkdir my-app && cd my-app + npm init -y + npm install mongodb + + .. step:: Write your code + + Create an `index.js` file with the following content: + + .. code-block:: javascript + + const { MongoClient } = require('mongodb'); + + async function main() { + const client = new MongoClient('mongodb://localhost:27017'); + await client.connect(); + console.log('Connected to MongoDB'); + await client.close(); + } + + main().catch(console.error); + diff --git a/audit-cli/testdata/input-files/source/procedure-with-includes.rst b/audit-cli/testdata/input-files/source/procedure-with-includes.rst new file mode 100644 index 0000000..26aab81 --- /dev/null +++ b/audit-cli/testdata/input-files/source/procedure-with-includes.rst @@ -0,0 +1,24 @@ +========================== +Procedure with Includes +========================== + +.. composable-tutorial:: + :options: interface, language + :defaults: driver, nodejs + + .. procedure:: + + .. step:: Install dependencies + + .. include:: /includes/procedures/install-deps.rst + + .. step:: Connect to your cluster + + .. include:: /includes/procedures/connect.rst + + .. step:: Perform an operation + + General content that applies to all selections. + + .. include:: /includes/procedures/operations.rst + diff --git a/audit-cli/testdata/input-files/source/tabs-with-procedures.rst b/audit-cli/testdata/input-files/source/tabs-with-procedures.rst new file mode 100644 index 0000000..4633281 --- /dev/null +++ b/audit-cli/testdata/input-files/source/tabs-with-procedures.rst @@ -0,0 +1,104 @@ +============================ +Tabs Containing Procedures +============================ + +This file tests tabs that contain different procedures. + +Installation Instructions +========================== + +Choose your operating system: + +.. tabs:: + + .. tab:: macOS + :tabid: macos + + .. procedure:: + + .. step:: Install Homebrew + + If you don't have Homebrew installed: + + .. code-block:: bash + + /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" + + .. step:: Install MongoDB + + Use Homebrew to install MongoDB: + + .. code-block:: bash + + brew tap mongodb/brew + brew install mongodb-community + + .. step:: Start MongoDB + + Start the MongoDB service: + + .. code-block:: bash + + brew services start mongodb-community + + .. tab:: Ubuntu + :tabid: ubuntu + + .. procedure:: + + .. step:: Import the public key + + Import the MongoDB public GPG key: + + .. code-block:: bash + + curl -fsSL https://www.mongodb.org/static/pgp/server-8.0.asc | \ + sudo gpg -o /usr/share/keyrings/mongodb-server-8.0.gpg \ + --dearmor + + .. step:: Create the list file + + Create the list file for Ubuntu: + + .. code-block:: bash + + echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-8.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-8.0.list + + .. step:: Install MongoDB + + Update the package database and install: + + .. code-block:: bash + + sudo apt-get update + sudo apt-get install -y mongodb-org + + .. step:: Start MongoDB + + Start the MongoDB service: + + .. code-block:: bash + + sudo systemctl start mongod + + .. tab:: Windows + :tabid: windows + + .. procedure:: + + .. step:: Download the installer + + Download the MongoDB MSI installer from the MongoDB Download Center. + + .. step:: Run the installer + + Double-click the downloaded `.msi` file and follow the installation wizard. + + .. step:: Start MongoDB + + Start MongoDB as a Windows service: + + .. code-block:: powershell + + net start MongoDB +