# Fix: Incomplete Teardown When Provision Phase Fails #6224
+135
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: Incomplete Teardown When Provision Phase Fails
GitHub Issue: #6223
Branch:
fix/incomplete-teardown-on-provision-failureRelated: Issue #1239
Problem Statement
PerfKitBenchmarker fails to delete network and firewall resources when a benchmark fails during the provision phase. This results in:
run_uriRoot Cause Analysis
Three-Layer Failure Mechanism
spec.networksdictionary gets corrupted with JSON string fragments as keys instead of properGceNetworkobjectsdeleted=Trueflag prevents re-execution during teardown-only runs.Delete()calls to silently failEvidence from Run URI
e848137dPickle file examination revealed:
Result: Teardown completed in 0.006 seconds with no actual deletion.
Orphaned Resources:
pkb-network-e848137ddefault-internal-10-0-0-0-8-e848137d,perfkit-firewall-e848137d-22-22Solution Design
Three-Layer Defense Strategy
deletedflag to force cleanup attemptsDelete()methods before calling themgcloudcommands to delete resources by name patternImplementation
File Modified
perfkitbenchmarker/benchmark_spec.pyChanges Made
1. Modified
Delete()Method (Line ~1102)Before:
After:
Rationale: The
deletedflag can be set prematurely when provision fails. During teardown-only runs (--run_stage=teardown), we must attempt cleanup regardless of this flag.2. Added Validation Methods (Lines ~1150-1180)
Rationale: Detect corrupted pickle data before attempting to call methods on invalid objects. This prevents silent failures and enables fallback cleanup.
3. Added Fallback Cleanup Method (Lines ~1182-1250)
Rationale: When pickle data is corrupted, we can't rely on object methods. Direct
gcloudcommands provide a reliable fallback that works regardless of object state.4. Integrated Validation and Fallback (Lines ~1195-1210)
Rationale: Integrate validation into the main deletion flow. If objects are invalid, skip normal deletion and use fallback cleanup immediately.
Testing
Test Case: Run URI
e848137dInitial State (before fix):
Teardown Command:
Results (after fix):
Verification:
Success: All orphaned resources successfully deleted.
Impact Analysis
Positive Impacts
Risk Assessment
check=Falseto prevent cleanup failures from blocking teardownEdge Cases Handled
False, triggers fallbackhasattr()checkFuture Improvements
Short Term
Long Term
Related Work
This fix addresses the same class of issues described in:
object_storage_servicebenchmarkChecklist
Notes
subprocess.run()withcheck=Falseto ensure cleanup attempts don't block teardowndeletedflag is set after fallback cleanup to prevent infinite loopsAuthor: SADA Engineering Team
Date: November 17, 2025
Tested On: PKB v1.12.0-5933-g20771be8 (SADA fork)