-
Notifications
You must be signed in to change notification settings - Fork 125
[WIP] Fix system emulation reboot #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Previously, the SBI RST extension did not distinguish between reboot and shutdown type. When a userspace reboot command was issued, the hart was incorrectly halted as if it were a shutdown. These changes fix the issue by properly detecting and handling two reboot types: cold and warm. Each type has its own handler: rv_cold_reboot() and rv_warm_reboot(). Since the initial system power-on is treated as a cold reboot, the initialization code has been refactored to share logic with the reboot path, adhering to the DRY principle. Additionally, device reset helper functions (plic_reset(), u8250_reset()) are introduced to support peripheral reinitialization during reboot. Key changes: 1. Rename rv_reset() to rv_cold_reboot() - Full system reset including processor, memory, and all peripherals. The initial power-on is treated as a cold reboot. 2. Introduce rv_warm_reboot() - Faster reboot that only resets processor and memory, skipping peripheral reinitialization. Can be triggered via echo "warm" > /sys/kernel/reboot/mode in guestOS. 3. Refactor boot image loading - Extract load_boot_images() helper to load kernel, DTB, and initrd, reducing code duplication between cold and warm reboot paths. 4. Introduce rv_reset_hart() - Static helper function to reset only hart state (GPRs, CSRs, PC), shared by both reboot modes. 5. Add reboot-safe resource management - Add "check for reboot" comments throughout initialization to reuse already-allocated resources (memory, fd_map, PLIC, UART, vblk, block_map) instead of re-allocating. 6. Use calloc for vblk/disk arrays - Changed from malloc to calloc so pointers are initialized to NULL, simplifying reboot checks. 7. Use setjmp/longjmp for clean reboot - Reboot rewrites all registers, causing call stack values to become stale. setjmp in rv_step() establishes a return point, longjmp after reboot provides a clean call stack. 8. Add plic_reset() and u8250_reset() - New device reset functions to reinitialize state without free/realloc. 9. Add sbi_rst_type_str() and sbi_rst_reason_str() - Helper functions for human-readable SBI reset type/reason in dmesg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 issues found across 10 files
Prompt for AI agents (all 7 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/syscall.c">
<violation number="1" location="src/syscall.c:541">
P2: Passing `NULL` to `%s` format specifier is undefined behavior. If `a0` or `a1` contains an unrecognized value, the helper functions return `NULL` which is then passed to `rv_log_info`. Consider returning a fallback string like `"unknown"` instead of `NULL`.</violation>
<violation number="2" location="src/syscall.c:554">
P1: Missing error handling for unknown reset types. If `a0` is not SHUTDOWN, COLD_REBOOT, or WARM_REBOOT, the code silently falls through without setting return values. Add an `else` clause to return `SBI_ERR_NOT_SUPPORTED`.</violation>
</file>
<file name="src/riscv.c">
<violation number="1" location="src/riscv.c:1070">
P2: Missing validation for empty vblk device path. The original code checked `if (!vblk_device_str[0])` to prevent empty disk paths. Without this check, an empty path would result in `vblk_device` being NULL after strtok processing, potentially causing issues in `virtio_blk_init()`.</violation>
<violation number="2" location="src/riscv.c:1082">
P2: Regression: Tilde (`~`) expansion for vblk device paths was removed. Users who specify paths like `~/disk.img` will now have those paths treated literally instead of expanding to their home directory, breaking existing configurations.</violation>
<violation number="3" location="src/riscv.c:1121">
P1: Memory leak on reboot: memory pools `block_mp` and `block_ir_mp` are created unconditionally without checking if they already exist. Unlike other resources that have `/* check for reboot */` guards, these pools are always recreated, leaking the previous allocations on reboot.</violation>
<violation number="4" location="src/riscv.c:1133">
P1: Memory leak on reboot (JIT mode): `jit_state` and `block_cache` are created unconditionally without reboot checks. This could cause memory leaks on reboot and may be related to the JIT segfault mentioned in the PR description.</violation>
</file>
<file name="src/riscv.h">
<violation number="1" location="src/riscv.h:285">
P2: These SBI reset type/reason macros use very generic names that may collide with system headers or other code. Consider using the established `SBI_RST_` prefix for consistency with existing naming conventions (e.g., `SBI_RST_TYPE_SHUTDOWN`, `SBI_RST_TYPE_COLD_REBOOT`, `SBI_RST_REASON_NONE`).</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| rv_halt(rv); | ||
| rv_set_reg(rv, rv_reg_a0, SBI_SUCCESS); | ||
| rv_set_reg(rv, rv_reg_a1, 0); | ||
| rv_log_info("System reset: type=%s, reason=%s", sbi_rst_type_str(a0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Passing NULL to %s format specifier is undefined behavior. If a0 or a1 contains an unrecognized value, the helper functions return NULL which is then passed to rv_log_info. Consider returning a fallback string like "unknown" instead of NULL.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall.c, line 541:
<comment>Passing `NULL` to `%s` format specifier is undefined behavior. If `a0` or `a1` contains an unrecognized value, the helper functions return `NULL` which is then passed to `rv_log_info`. Consider returning a fallback string like `"unknown"` instead of `NULL`.</comment>
<file context>
@@ -505,10 +538,31 @@ static void syscall_sbi_rst(riscv_t *rv)
- rv_halt(rv);
- rv_set_reg(rv, rv_reg_a0, SBI_SUCCESS);
- rv_set_reg(rv, rv_reg_a1, 0);
+ rv_log_info("System reset: type=%s, reason=%s", sbi_rst_type_str(a0),
+ sbi_rst_reason_str(a1));
+ if (a0 == SHUTDOWN) {
</file context>
| * | ||
| * The setjmp point is in rv_step() in src/emulate.c. | ||
| */ | ||
| longjmp(rv->reboot_jmp, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Missing error handling for unknown reset types. If a0 is not SHUTDOWN, COLD_REBOOT, or WARM_REBOOT, the code silently falls through without setting return values. Add an else clause to return SBI_ERR_NOT_SUPPORTED.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall.c, line 554:
<comment>Missing error handling for unknown reset types. If `a0` is not SHUTDOWN, COLD_REBOOT, or WARM_REBOOT, the code silently falls through without setting return values. Add an `else` clause to return `SBI_ERR_NOT_SUPPORTED`.</comment>
<file context>
@@ -505,10 +538,31 @@ static void syscall_sbi_rst(riscv_t *rv)
+ *
+ * The setjmp point is in rv_step() in src/emulate.c.
+ */
+ longjmp(rv->reboot_jmp, 1);
+ } else if (a0 == WARM_REBOOT) { /* reset halt only, echo "warm" >
+ /sys/kernel/reboot/mode to set */
</file context>
| vblk_opts[vblk_opt_idx++] = opt; | ||
| opt = strtok(NULL, ","); | ||
| } | ||
| char *vblk_device = vblk_opts[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Regression: Tilde (~) expansion for vblk device paths was removed. Users who specify paths like ~/disk.img will now have those paths treated literally instead of expanding to their home directory, breaking existing configurations.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1082:
<comment>Regression: Tilde (`~`) expansion for vblk device paths was removed. Users who specify paths like `~/disk.img` will now have those paths treated literally instead of expanding to their home directory, breaking existing configurations.</comment>
<file context>
@@ -1033,53 +806,374 @@ void rv_reset(riscv_t *rv, riscv_word_t pc)
+ vblk_opts[vblk_opt_idx++] = opt;
+ opt = strtok(NULL, ",");
+ }
+ char *vblk_device = vblk_opts[0];
+ char *vblk_readonly = vblk_opts[1];
+
</file context>
| for (int i = 0; i < attr->vblk_cnt; i++) { | ||
| /* Currently, only used for block image path and permission */ | ||
| #define MAX_OPTS 2 | ||
| char *vblk_device_str = attr->data.system.vblk_device[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Missing validation for empty vblk device path. The original code checked if (!vblk_device_str[0]) to prevent empty disk paths. Without this check, an empty path would result in vblk_device being NULL after strtok processing, potentially causing issues in virtio_blk_init().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1070:
<comment>Missing validation for empty vblk device path. The original code checked `if (!vblk_device_str[0])` to prevent empty disk paths. Without this check, an empty path would result in `vblk_device` being NULL after strtok processing, potentially causing issues in `virtio_blk_init()`.</comment>
<file context>
@@ -1033,53 +806,374 @@ void rv_reset(riscv_t *rv, riscv_word_t pc)
+ for (int i = 0; i < attr->vblk_cnt; i++) {
+/* Currently, only used for block image path and permission */
+#define MAX_OPTS 2
+ char *vblk_device_str = attr->data.system.vblk_device[i];
+ char *vblk_opts[MAX_OPTS] = {NULL};
+ int vblk_opt_idx = 0;
</file context>
| char *vblk_device_str = attr->data.system.vblk_device[i]; | |
| char *vblk_device_str = attr->data.system.vblk_device[i]; | |
| if (!vblk_device_str || !vblk_device_str[0]) { | |
| rv_log_error("Disk path cannot be empty"); | |
| exit(EXIT_FAILURE); | |
| } |
| } | ||
| #else | ||
| INIT_LIST_HEAD(&rv->block_list); | ||
| rv->jit_state = jit_state_init(CODE_CACHE_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Memory leak on reboot (JIT mode): jit_state and block_cache are created unconditionally without reboot checks. This could cause memory leaks on reboot and may be related to the JIT segfault mentioned in the PR description.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1133:
<comment>Memory leak on reboot (JIT mode): `jit_state` and `block_cache` are created unconditionally without reboot checks. This could cause memory leaks on reboot and may be related to the JIT segfault mentioned in the PR description.</comment>
<file context>
@@ -1033,53 +806,374 @@ void rv_reset(riscv_t *rv, riscv_word_t pc)
+ }
+#else
+ INIT_LIST_HEAD(&rv->block_list);
+ rv->jit_state = jit_state_init(CODE_CACHE_SIZE);
+ if (!rv->jit_state) {
+ rv_log_fatal("Failed to initialize JIT state");
</file context>
| rv->block_mp = mpool_create(sizeof(block_t) << BLOCK_MAP_CAPACITY_BITS, | ||
| sizeof(block_t)); | ||
| rv->block_ir_mp = mpool_create( | ||
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Memory leak on reboot: memory pools block_mp and block_ir_mp are created unconditionally without checking if they already exist. Unlike other resources that have /* check for reboot */ guards, these pools are always recreated, leaking the previous allocations on reboot.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1121:
<comment>Memory leak on reboot: memory pools `block_mp` and `block_ir_mp` are created unconditionally without checking if they already exist. Unlike other resources that have `/* check for reboot */` guards, these pools are always recreated, leaking the previous allocations on reboot.</comment>
<file context>
@@ -1033,53 +806,374 @@ void rv_reset(riscv_t *rv, riscv_word_t pc)
+#endif /* !RV32_HAS(SYSTEM) || (RV32_HAS(SYSTEM) && RV32_HAS(ELF_LOADER)) */
+
+ /* create block and IRs memory pool */
+ rv->block_mp = mpool_create(sizeof(block_t) << BLOCK_MAP_CAPACITY_BITS,
+ sizeof(block_t));
+ rv->block_ir_mp = mpool_create(
</file context>
| rv->block_mp = mpool_create(sizeof(block_t) << BLOCK_MAP_CAPACITY_BITS, | |
| sizeof(block_t)); | |
| rv->block_ir_mp = mpool_create( | |
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); | |
| if (!rv->block_mp) { /* check for reboot */ | |
| rv->block_mp = mpool_create(sizeof(block_t) << BLOCK_MAP_CAPACITY_BITS, | |
| sizeof(block_t)); | |
| rv->block_ir_mp = mpool_create( | |
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); | |
| } |
| /* Allows the supervisor to request system-level reboot or shutdown. */ | ||
| #define SBI_EID_RST 0x53525354 | ||
| #define SBI_RST_SYSTEM_RESET 0 | ||
| #define SHUTDOWN 0x0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: These SBI reset type/reason macros use very generic names that may collide with system headers or other code. Consider using the established SBI_RST_ prefix for consistency with existing naming conventions (e.g., SBI_RST_TYPE_SHUTDOWN, SBI_RST_TYPE_COLD_REBOOT, SBI_RST_REASON_NONE).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.h, line 285:
<comment>These SBI reset type/reason macros use very generic names that may collide with system headers or other code. Consider using the established `SBI_RST_` prefix for consistency with existing naming conventions (e.g., `SBI_RST_TYPE_SHUTDOWN`, `SBI_RST_TYPE_COLD_REBOOT`, `SBI_RST_REASON_NONE`).</comment>
<file context>
@@ -282,6 +282,11 @@ enum TRAP_CODE {
/* Allows the supervisor to request system-level reboot or shutdown. */
#define SBI_EID_RST 0x53525354
#define SBI_RST_SYSTEM_RESET 0
+#define SHUTDOWN 0x0
+#define COLD_REBOOT 0x1
+#define WARM_REBOOT 0x2
</file context>
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Details
| Benchmark suite | Current: ba3011f | Previous: 577e653 | Ratio |
|---|---|---|---|
Dhrystone |
1305 Average DMIPS over 10 runs |
1335 Average DMIPS over 10 runs |
1.02 |
Coremark |
956.404 Average iterations/sec over 10 runs |
967.171 Average iterations/sec over 10 runs |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
Description
Previously, the SBI RST extension did not distinguish between reboot and shutdown type. When a userspace reboot command was issued, the hart was incorrectly halted as if it were a shutdown.
These changes fix the issue by properly detecting and handling two reboot types: cold and warm. Each type has its own handler: rv_cold_reboot() and rv_warm_reboot(). Since the initial system power-on is treated as a cold reboot, the initialization code has been refactored to share logic with the reboot path, adhering to the DRY principle. Additionally, device reset helper functions (plic_reset(), u8250_reset()) are introduced to support peripheral reinitialization during reboot.
Key changes:
Rename rv_reset() to rv_cold_reboot() - Full system reset including processor, memory, and all peripherals. The initial power-on is treated as a cold reboot.
Introduce rv_warm_reboot() - Faster reboot that only resets processor and memory, skipping peripheral reinitialization. Can be triggered via echo "warm" > /sys/kernel/reboot/mode in guestOS.
Refactor boot image loading - Extract load_boot_images() helper to load kernel, DTB, and initrd, reducing code duplication between cold and warm reboot paths.
Introduce rv_reset_hart() - Static helper function to reset only hart state (GPRs, CSRs, PC), shared by both reboot modes.
Add reboot-safe resource management - Add "check for reboot" comments throughout initialization to reuse already-allocated resources (memory, fd_map, PLIC, UART, vblk, block_map) instead of re-allocating.
Use calloc for vblk/disk arrays - Changed from malloc to calloc so pointers are initialized to NULL, simplifying reboot checks.
Use setjmp/longjmp for clean reboot - Reboot rewrites all registers, causing call stack values to become stale. setjmp in rv_step() establishes a return point, longjmp after reboot provides a clean call stack.
Add plic_reset() and u8250_reset() - New device reset functions to reinitialize state without free/realloc.
Add sbi_rst_type_str() and sbi_rst_reason_str() - Helper functions for human-readable SBI reset type/reason in dmesg.
Testing
# rebootExpectation
The guestOS is rebooted and the behavior is similar as the demo video.
Demo
reboot-demo.mp4
The guestOS userspace applications runs smoothly even after reboot. This demo covers 4 phases:
power-on,cold reboot,warm reboot,poweroff.Bugs / WIP
Summary by cubic
Fixes system emulation reboot by correctly handling SBI reset types and adding cold and warm reboot paths. Reboots no longer halt the hart and now return cleanly to the run loop.
New Features
Refactors
Written for commit ba3011f. Summary will update automatically on new commits.