-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Describe the bug
The health check script fails when the Forest node is downloading tipsets.
For example:
- The node starts at epoch 5356831
- The latest epoch is 6356831 (a large gap, requiring significant sync time)
- During syncing (downloading tipsets), the Head epoch remains fixed at 5356831 until the download is finished
This causes issues with the health check script, which depends on the Head epoch to verify node progress.
In the health check script:
- The script continuously fetches the node’s Head epoch metric.
- It compares the current value to the starting epoch (5356831 in this case).
- Since the Head epoch doesn’t change during the sync, the script incorrectly concludes that the node isn’t making progress.
- As a result, the health check fails, even though the node is actively syncing.
Solution
When the Forest node is syncing (downloading tipsets), the health check should wait until the download completes before verifying node health.
This ensures that a node in the middle of sync isn’t marked as unhealthy.
To Reproduce
Steps to reproduce the behavior:
- Start forest node with very old snapshot and run the health check script
Log output
Log Output
⏳ Waiting for Forest to start syncing (up to 7200 seconds)...
✅ Forest started syncing
⏳ Waiting for the health probe to finish...
👉 Tipset start: 5356831, end: 5356831
❌ Tipset epoch didn't move forward.
✅ No sync errors
❌ Health check failed
Expected behaviour
The health check should not fail while the node is actively downloading tipsets during sync.
It should wait or account for the syncing state before concluding that the node is unhealthy.
Screenshots
Environment (please complete the following information):
- OS:
- Rust version(e.g.
rustc --version) - Branch/commit
Other information and links
Metadata
Metadata
Assignees
Labels
No labels