|
| 1 | +# Dead Code Analysis Architecture |
| 2 | + |
| 3 | +This document describes the architecture of the reanalyze dead code analysis pipeline. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The DCE (Dead Code Elimination) analysis is structured as a **pure pipeline** with four phases: |
| 8 | + |
| 9 | +1. **MAP** - Process each `.cmt` file independently → per-file data |
| 10 | +2. **MERGE** - Combine all per-file data → immutable project-wide view |
| 11 | +3. **SOLVE** - Compute dead/live status → immutable result with issues |
| 12 | +4. **REPORT** - Output issues (side effects only here) |
| 13 | + |
| 14 | +This design enables: |
| 15 | +- **Order independence** - Processing files in any order gives identical results |
| 16 | +- **Incremental updates** - Replace one file's data without reprocessing others |
| 17 | +- **Testability** - Each phase is independently testable with pure functions |
| 18 | +- **Parallelization potential** - Phases 1-3 work on immutable data |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Pipeline Diagram |
| 23 | + |
| 24 | +``` |
| 25 | +┌─────────────────────────────────────────────────────────────────────────────┐ |
| 26 | +│ DCE ANALYSIS PIPELINE │ |
| 27 | +└─────────────────────────────────────────────────────────────────────────────┘ |
| 28 | +
|
| 29 | + ┌─────────────┐ |
| 30 | + │ DceConfig.t │ (explicit configuration) |
| 31 | + └──────┬──────┘ |
| 32 | + │ |
| 33 | + ╔════════════════════════════════╪════════════════════════════════════════╗ |
| 34 | + ║ PHASE 1: MAP (per-file) │ ║ |
| 35 | + ╠════════════════════════════════╪════════════════════════════════════════╣ |
| 36 | + ║ ▼ ║ |
| 37 | + ║ ┌──────────┐ process_cmt_file ┌───────────────────────────────┐ ║ |
| 38 | + ║ │ file1.cmt├──────────────────────►│ file_data { │ ║ |
| 39 | + ║ └──────────┘ │ annotations: builder │ ║ |
| 40 | + ║ ┌──────────┐ process_cmt_file │ decls: builder │ ║ |
| 41 | + ║ │ file2.cmt├──────────────────────►│ refs: builder │ ║ |
| 42 | + ║ └──────────┘ │ file_deps: builder │ ║ |
| 43 | + ║ ┌──────────┐ process_cmt_file │ cross_file: builder │ ║ |
| 44 | + ║ │ file3.cmt├──────────────────────►│ } │ ║ |
| 45 | + ║ └──────────┘ └───────────────────────────────┘ ║ |
| 46 | + ║ │ ║ |
| 47 | + ║ Local mutable state OK │ file_data list ║ |
| 48 | + ╚══════════════════════════════════════════════════╪══════════════════════╝ |
| 49 | + │ |
| 50 | + ╔══════════════════════════════════════════════════╪══════════════════════╗ |
| 51 | + ║ PHASE 2: MERGE (combine builders) │ ║ |
| 52 | + ╠══════════════════════════════════════════════════╪══════════════════════╣ |
| 53 | + ║ ▼ ║ |
| 54 | + ║ ┌─────────────────────────────────────────────────────────────────┐ ║ |
| 55 | + ║ │ FileAnnotations.merge_all → annotations: FileAnnotations.t │ ║ |
| 56 | + ║ │ Declarations.merge_all → decls: Declarations.t │ ║ |
| 57 | + ║ │ References.merge_all → refs: References.t │ ║ |
| 58 | + ║ │ FileDeps.merge_all → file_deps: FileDeps.t │ ║ |
| 59 | + ║ │ CrossFileItems.merge_all → cross_file: CrossFileItems.t │ ║ |
| 60 | + ║ │ │ ║ |
| 61 | + ║ │ CrossFileItems.compute_optional_args_state │ ║ |
| 62 | + ║ │ → optional_args_state: State.t │ ║ |
| 63 | + ║ └─────────────────────────────────────────────────────────────────┘ ║ |
| 64 | + ║ │ ║ |
| 65 | + ║ Pure functions, immutable output │ merged data ║ |
| 66 | + ╚══════════════════════════════════════════════════╪══════════════════════╝ |
| 67 | + │ |
| 68 | + ╔══════════════════════════════════════════════════╪══════════════════════╗ |
| 69 | + ║ PHASE 3: SOLVE (pure deadness computation) │ ║ |
| 70 | + ╠══════════════════════════════════════════════════╪══════════════════════╣ |
| 71 | + ║ ▼ ║ |
| 72 | + ║ ┌─────────────────────────────────────────────────────────────────┐ ║ |
| 73 | + ║ │ DeadCommon.solveDead │ ║ |
| 74 | + ║ │ ~annotations ~decls ~refs ~file_deps │ ║ |
| 75 | + ║ │ ~optional_args_state ~config ~checkOptionalArg │ ║ |
| 76 | + ║ │ │ ║ |
| 77 | + ║ │ → AnalysisResult.t { issues: Issue.t list } │ ║ |
| 78 | + ║ └─────────────────────────────────────────────────────────────────┘ ║ |
| 79 | + ║ │ ║ |
| 80 | + ║ Pure function: immutable in → immutable out │ issues ║ |
| 81 | + ╚══════════════════════════════════════════════════╪══════════════════════╝ |
| 82 | + │ |
| 83 | + ╔══════════════════════════════════════════════════╪══════════════════════╗ |
| 84 | + ║ PHASE 4: REPORT (side effects at the edge) │ ║ |
| 85 | + ╠══════════════════════════════════════════════════╪══════════════════════╣ |
| 86 | + ║ ▼ ║ |
| 87 | + ║ ┌─────────────────────────────────────────────────────────────────┐ ║ |
| 88 | + ║ │ AnalysisResult.get_issues │ ║ |
| 89 | + ║ │ |> List.iter (fun issue -> Log_.warning ~loc issue.description) │ ║ |
| 90 | + ║ │ │ ║ |
| 91 | + ║ │ (Optional: EmitJson for JSON output) │ ║ |
| 92 | + ║ └─────────────────────────────────────────────────────────────────┘ ║ |
| 93 | + ║ ║ |
| 94 | + ║ Side effects only here: logging, JSON output ║ |
| 95 | + ╚════════════════════════════════════════════════════════════════════════╝ |
| 96 | +``` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Key Data Types |
| 101 | + |
| 102 | +| Type | Purpose | Mutability | |
| 103 | +|------|---------|------------| |
| 104 | +| `DceFileProcessing.file_data` | Per-file collected data | Builders (mutable during AST walk) | |
| 105 | +| `FileAnnotations.t` | Source annotations (`@dead`, `@live`) | Immutable after merge | |
| 106 | +| `Declarations.t` | All exported declarations (pos → Decl.t) | Immutable after merge | |
| 107 | +| `References.t` | Value/type references (pos → PosSet.t) | Immutable after merge | |
| 108 | +| `FileDeps.t` | Cross-file dependencies (file → FileSet.t) | Immutable after merge | |
| 109 | +| `OptionalArgsState.t` | Computed optional arg state per-decl | Immutable | |
| 110 | +| `AnalysisResult.t` | Solver output with Issue.t list | Immutable | |
| 111 | +| `DceConfig.t` | Analysis configuration | Immutable (passed explicitly) | |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Phase Details |
| 116 | + |
| 117 | +### Phase 1: MAP (Per-File Processing) |
| 118 | + |
| 119 | +**Entry point**: `DceFileProcessing.process_cmt_file` |
| 120 | + |
| 121 | +**Input**: `.cmt` file path + `DceConfig.t` |
| 122 | + |
| 123 | +**Output**: `file_data` containing builders for: |
| 124 | +- `annotations` - `@dead`, `@live` annotations from source |
| 125 | +- `decls` - Exported value/type/exception declarations |
| 126 | +- `refs` - References to other declarations |
| 127 | +- `file_deps` - Which files this file depends on |
| 128 | +- `cross_file` - Items needing cross-file resolution (optional args, exceptions) |
| 129 | + |
| 130 | +**Key property**: Local mutable state is OK here (performance). Each file is processed independently. |
| 131 | + |
| 132 | +### Phase 2: MERGE (Combine Builders) |
| 133 | + |
| 134 | +**Entry point**: `Reanalyze.runAnalysis` (merge section) |
| 135 | + |
| 136 | +**Input**: `file_data list` |
| 137 | + |
| 138 | +**Output**: Immutable project-wide data structures |
| 139 | + |
| 140 | +**Operations**: |
| 141 | +```ocaml |
| 142 | +let annotations = FileAnnotations.merge_all (file_data_list |> List.map (fun fd -> fd.annotations)) |
| 143 | +let decls = Declarations.merge_all (file_data_list |> List.map (fun fd -> fd.decls)) |
| 144 | +let refs = References.merge_all (file_data_list |> List.map (fun fd -> fd.refs)) |
| 145 | +let file_deps = FileDeps.merge_all (file_data_list |> List.map (fun fd -> fd.file_deps)) |
| 146 | +``` |
| 147 | + |
| 148 | +**Key property**: Merge operations are commutative - order of `file_data_list` doesn't matter. |
| 149 | + |
| 150 | +### Phase 3: SOLVE (Deadness Computation) |
| 151 | + |
| 152 | +**Entry point**: `DeadCommon.solveDead` |
| 153 | + |
| 154 | +**Input**: All merged data + config |
| 155 | + |
| 156 | +**Output**: `AnalysisResult.t` containing `Issue.t list` |
| 157 | + |
| 158 | +**Algorithm**: |
| 159 | +1. Build file dependency order (roots to leaves) |
| 160 | +2. Sort declarations by dependency order |
| 161 | +3. For each declaration, resolve references recursively |
| 162 | +4. Determine dead/live status based on reference count |
| 163 | +5. Collect issues for dead declarations |
| 164 | + |
| 165 | +**Key property**: Pure function - immutable in, immutable out. No side effects. |
| 166 | + |
| 167 | +### Phase 4: REPORT (Output) |
| 168 | + |
| 169 | +**Entry point**: `Reanalyze.runAnalysis` (report section) |
| 170 | + |
| 171 | +**Input**: `AnalysisResult.t` |
| 172 | + |
| 173 | +**Output**: Logging / JSON to stdout |
| 174 | + |
| 175 | +**Operations**: |
| 176 | +```ocaml |
| 177 | +AnalysisResult.get_issues analysis_result |
| 178 | +|> List.iter (fun issue -> Log_.warning ~loc:issue.loc issue.description) |
| 179 | +``` |
| 180 | + |
| 181 | +**Key property**: All side effects live here at the edge. The solver never logs directly. |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## Incremental Updates (Future) |
| 186 | + |
| 187 | +The architecture enables incremental updates when a file changes: |
| 188 | + |
| 189 | +1. Re-run Phase 1 for changed file only → new `file_data` |
| 190 | +2. Replace in `file_data` map (keyed by filename) |
| 191 | +3. Re-run Phase 2 (merge) - fast, pure function |
| 192 | +4. Re-run Phase 3 (solve) - fast, pure function |
| 193 | + |
| 194 | +The key insight: **immutable data structures enable safe incremental updates** - you can swap one file's data without affecting others. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Testing |
| 199 | + |
| 200 | +**Order-independence test**: Run with `-test-shuffle` flag to randomize file processing order. The test (`make test-reanalyze-order-independence`) verifies that shuffled runs produce identical output. |
| 201 | + |
| 202 | +**Unit testing**: Each phase can be tested independently: |
| 203 | +- Phase 1: Process a single `.cmt` file, verify `file_data` |
| 204 | +- Phase 2: Merge known builders, verify merged result |
| 205 | +- Phase 3: Call solver with known inputs, verify issues |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## Key Modules |
| 210 | + |
| 211 | +| Module | Responsibility | |
| 212 | +|--------|---------------| |
| 213 | +| `Reanalyze` | Entry point, orchestrates pipeline | |
| 214 | +| `DceFileProcessing` | Phase 1: Per-file AST processing | |
| 215 | +| `DceConfig` | Configuration (CLI flags + run config) | |
| 216 | +| `DeadCommon` | Phase 3: Solver (`solveDead`) | |
| 217 | +| `Declarations` | Declaration storage (builder/immutable) | |
| 218 | +| `References` | Reference tracking (builder/immutable) | |
| 219 | +| `FileAnnotations` | Source annotation tracking | |
| 220 | +| `FileDeps` | Cross-file dependency graph | |
| 221 | +| `CrossFileItems` | Cross-file optional args and exceptions | |
| 222 | +| `AnalysisResult` | Immutable solver output | |
| 223 | +| `Issue` | Issue type definitions | |
| 224 | +| `Log_` | Phase 4: Logging output | |
| 225 | + |
0 commit comments