Add `run_scoped` method to run closures and conditionally perform partial resets #291

kryesh · 2025-10-12T03:57:32Z

This PR adds a "CheckPoint" struct that can be used to save a position for a Bump that can be later reset to, allowing for partial resets of the allocator.

I have a use case where I build a collection from many complex structs that use a bump allocator, however these structs must pass a filter before being added to the collection, In cases where very few events make it through the filter I found that the bump allocator would grow to be massive due to leaked allocations from rejected structs.
After testing using a second bump allocator as scratch space that resolved the memory leak issue however caused a ~20% performance hit due to the extra memory copies, this PR solves the problem by taking a checkpoint before building the complex struct, then performing a partial reset if the struct is rejected.

Not sure if this kind of conditional logic is used elsewhere but figured I'd make a PR anyway in case it's useful, happy to update docs etc if needed.

fitzgen

Thanks for the PR!

I do want to add a similar API to bumpalo, but I think that the exact details here are not what we want. I think the API as proposed here is not quite powerful enough to satisfy most uses cases that people have for checkpoints, particularly the ability to do stuff like the following:

let mut bump = Bump::new();

let x = bump.alloc(5);

let checkpoint = bump.checkpoint();

let y = bump.alloc(10);
foo(x, y); // okay

drop(checkpoint); // or `bump.reset_to_checkpoint(checkpoint)` or whatever...

bar(x); // still okay

// bar(y); // NOT OKAY, should cause borrow checker errors

The API in this PR will not allow x to continue to be used across sub-arenas/checkpoints. And to be fair, I don't the exact pseudocode above can be implemented by any safe+sound Rust API because without changing some API details we have the conflicting requirements that (a) the bump has live shared borrows from earlier allocations and (b) we must have exclusive access at reset points to act as a barrier and ensure that none of the allocations within the nested scope/checkpoint remain live. The only way I am aware of resolving this is to separate the underlying storage and the allocator/checkpoint/scope so that the allocator/checkpoint/scope has a lifetime and borrow of the storage and then you can make sub-allocators/checkpoints/scopes with sub-lifetimes. And each allocation would have the allocator's/checkpoint's/scope's lifetime, not the lifetime of the underlying storage.

There is some existing discussion of this feature in #105, and a proposed API, although it would be a breaking change. It might be possible to come up with something that is semver compatible though, even if not quite as nice API-wise. Anyways, I suggest reading up on that thread and grokking the API sketch in there before continuing any further down this line of investigation.

…fetimes for temporary allocations that may be reset

kryesh · 2025-10-22T01:27:45Z

Thanks for the feedback!
I had a read through #105 and came up with run_scoped as a minimal adaptation of my current code.

I made checkpoint and the associated methods private, then added a run_scoped method that provides temporary access to the allocator. If the function returns Some(T) then the result can persist on the Bump, but if the function returns None then the checkpoint is restored and the position of the allocator is reset.

The API could definitely be polished up a bit, maybe adding a Result variant as well so it's not just Option but I think it addresses the potential issues with the borrow checker.

As for functionality, the run_scoped method was sufficient for my use case at least - you can see the update for the changed api here: https://codeberg.org/Kryesh/crystalline/commit/da364cd0781bfbb88632ce1ec8c4e1a60a43737f?style=split&whitespace=show-all
My app probably shouldn't be used as a reference however since I do bunch of casts to 'static

fitzgen

Thanks for your patience, I've been traveling a bunch lately and haven't had time to sit down with this PR until just now.

I think the maybe-reset-maybe-not behavior is fairly niche, and not an API I want to support long-term. That said, I think it would be a failing of this crate if that functionality were impossible to build on top of bumpalo's public APIs.

So, what do you think of the following plan?

Add a pub struct RawCheckpoint { ... } type
Add pub fn Bump::raw_checkpoint(&self) -> RawCheckpoint and pub unsafe fn Bump::reset_to_raw_checkpoint(&self, checkpoint: RawCheckpoint) methods
Add pub fn Bump::with_scope<F, T>(&self, f: F) -> T where F: for<'a> FnOnce(&'a Bump) -> T

The first two items allow you to implement run_scoped with its dynamic-reset behavior on top of this crate, the third item is the typical scoped usage described in #105 and would be implemented on top of the first two as well.

Aside: I think you might be able to improve the ergonomics of your run_scoped method a little bit by defining a custom type with a default type parameter, instead of using Option, so that you don't ever have to type out Option::<()>::None:

pub enum Scope<T = ()> {
    Reset,
    Advance(T),
}

And then Scope::Reset shouldn't need an explicit turbofish.

kryesh · 2025-11-22T22:45:28Z

No problem,

I think that approach makes sense, sounds like the first 2 steps are pretty close to my original PR but with the reset function marked as unsafe - is that right?

If so I'll update the PR to match, I won't look at implementing Bump::with_scope though since that approach looks like it needs something more complex that just borrowing &self like I've got here.

Thanks for the tip on run_scoped as well! will definitely use that

fitzgen · 2025-11-24T19:15:36Z

I think that approach makes sense, sounds like the first 2 steps are pretty close to my original PR but with the reset function marked as unsafe - is that right?

Yes, and with the reset method taking &self instead of &mut self, so that it can be a proper unsafe building block for creating custom scoped behaviors like your desired maybe-reset-maybe-not.

fitzgen

Thanks! Some comments inline below.

Can you add a quickcheck test that uses this new API? See tests/all/quickchecks.rs. I'm thinking it should look something like this:

#[derive(Arbitrary)]
#[cfg(feature = "allocator_api")]
enum RawCheckpointOp {
    TakeCheckpoint,
    ResetToCheckpoint,
    DropCheckpoint,

    Alloc(u8, usize),
    Realloc(u8, usize),
    Dealloc,

    Reset,
}

#[cfg(feature = "allocator_api")]
fn stress_raw_checkpoints(ops: Vec<RawCheckpointOp>) {
    use std::alloc::{Allocator, Layout};

    const MAX_SIZE: usize = 100;
    const MAX_OPS: usize = 100;

    let mut bump = Bump::new();
    let mut live_allocs: Vec<(u8, NonNull<[u8]>, Layout)> = Vec::new();
    let mut checkpoints: Vec<(usize, RawCheckpoint)> = Vec::new();

    let check_alloc = |expected_byte: u8, alloc: NonNull<[u8]>| {
        let alloc = unsafe { alloc.as_ref() };
        for actual_byte in alloc {
            assert_eq!(actual_byte, expected_byte);
        }
    };

    ops.truncate(MAX_OPS);
    for op in ops {
        use RawCheckpointOp::*;
        match op {
            TakeCheckpoint => {
                checkpoints.push(bump.raw_checkpoint());
            }

            ResetToCheckpoint => {
                if let Some((len, checkpoint)) = checkpoints.pop() {
                    for (byte, alloc, _layout) in live_allocs.get(len..) {
                        check_alloc(byte, alloc);
                    }
                    live_allocs.truncate(len);
                    unsafe {
                        bump.reset_to_raw_checkpoint(checkpoint);
                    }
                }
            }

            DropCheckpoint => {
                let _ = checkpoints.pop();
            }

            Alloc(byte, size) => {
                let size = std::cmp::min(size, MAX_SIZE);
                let layout = Layout::array::<u8>(size).unwrap();
                let ptr = bump.allocate(layout).unwrap();
                unsafe {
                    ptr.cast::<u8>().write_bytes(byte, ptr.len());
                }
                live_allocs.push((byte, ptr, layout));
            }

            Realloc(new_byte, new_size) => {
                if let Some((old_byte, old_ptr, old_layout)) = live_allocs.pop() {
                    check_alloc(old_byte, old_ptr);

                    let new_size = std::cmp::min(new_size, MAX_SIZE);
                    let new_layout = Layout::array::<u8>(new_size).unwrap();

                    let new_ptr = if new_size > old_layout.size() {
                        unsafe { bump.grow(old_ptr, old_layout, new_layout).unwrap() }
                    } else {
                        unsafe { bump.shrink(old_ptr, old_layout, new_layout).unwrap() }
                    };

                    unsafe {
                        new_ptr.cast::<u8>().write_bytes(new_byte, new_ptr.len());
                    }
                    live_allocs.push((new_byte, new_ptr, new_layout));
                }
            }

            Dealloc => {
                if let Some((byte, ptr, layout)) = live_allocs.pop() {
                    check_alloc(byte, ptr);

                    unsafe {
                        bump.deallocate(ptr, layout);
                    }
                }
            }

            Reset => {
                for (byte, ptr, _layout) in live_allocs.drain(..) {
                    check_alloc(byte, ptr);
                }

                checkpoints.clear();
                bump.reset();
            }
        }
    }

    // Reset any extant checkpoints, checking its in-scope allocations.
    for (len, checkpoint) in checkpoints {
        for (byte, alloc, _layout) in live_allocs.get(len..) {
            check_alloc(byte, alloc);
        }
        live_allocs.truncate(len);
        unsafe {
            bump.reset_to_raw_checkpoint(checkpoint);
        }
    }

    // Check any extant allocations.
    for (byte, ptr, _layout) in live_allocs {
        check_alloc(byte, ptr);
    }
}

This should cover the realloc-to-before-the-checkpoint scenario I mention inline below, FWIW.

Thanks again!

fitzgen · 2025-12-01T19:43:08Z

src/lib.rs

+pub struct RawCheckpoint {
+    chunk_data: NonNull<u8>,
+    size: usize,
+    ptr: NonNull<u8>,
+}


Is there a benefit to storing these chunk_data and size fields instead of just a chunk: NonNull<ChunkFooter> field? The latter will result in smaller RawCheckpoints and less data to juggle/shepherd around and less things to check in RawCheckpoint::matches.

That makes sense - I'll update it

fitzgen · 2025-12-01T19:52:05Z

src/lib.rs

-            let chunk = unsafe { self.current_chunk_footer.get().as_ref() };
-            (chunk.data, chunk.layout.size(), chunk.ptr.get())
-        };
+    /// This will never free chunks, it will only change the pointer position of the current chunk.


This method should free any chunks beyond the checkpoint's chunk, or else the following usage pattern could "leak" when it really should not:

let mut bump = Bump::new(); loop { let checkpoint = bump.raw_checkpoint(); foo(&bump); unsafe { bump.reset_to_raw_checkpoint(checkpoint); } }

I think there's some nuance for this one - if checkpoints are used in a hot loop and trigger a new chunk allocation, then future iterations are also likely to trigger an allocation as well if the new chunk is freed.
Maybe a middle ground would be appropriate? Only keep the largest new chunk, if there have been multiple growths then free the intermediate ones since we know they won't be used for anything.
A pathological example of this would be creating a checkpoint for a Bump with no capacity since that would always trigger a new allocation after resetting to the checkpoint

fitzgen · 2025-12-01T20:25:03Z

src/lib.rs

+    /// # Safety
+    /// **This will invalidate references in a way that cannot be validated by the borrow checker!**
+    ///
+    /// Any allocation made between checkpoint creation and this function being called may be overwritten
+    /// by future allocations to this `Bump`. It is your responsibility to ensure that these allocations
+    /// are never reused.


I think we can avoid talking about how existing allocations made after the checkpoint might be overwritten, that is an implementation detail, and instead just focus on what the user must to do maintain safety (which is to never to access any allocations made between the checkpoint's creation and this call to reset_to_raw_checkpoint again).

Also we should document that you must not have called Bump::reset between creating the checkpoint and calling reset_to_raw_checkpoint.

Also we should document that you must reset to checkpoints in a LIFO-esque manner (LIFO but you're allowed to drop checkpoints rather than reset to them: that is, once you reset to a checkpoint A you can never reset to another checkpoint B that was created after A but before reset_to_raw_checkpoint(A)).

Also we should note that the given checkpoint must be associated with this Bump (i.e. returned by a call to raw_checkpoint on this Bump). And we should also debug_assert! that property by walking the chunk linked list and checking that we find this checkpoint's chunk. And we should also debug assert that this checkpoint's ptr is (a) within its chunk and (b) is greater than or equal to the current ptr.

Hmm, but now that I think about it, I think (b) might not actually hold 100% of the time. Consider the sequence:

create Bump

allocate a "relatively big" allocation

take a checkpoint

realloc the allocation such that it shrinks and we actually back up our bump's pointer

reset to the checkpoint

In this case, the bump's pointer could actually now be greater than the checkpoint's pointer. So I think we still want to debug-assert (a) but instead of (b) we want to only update the bump's pointer to the checkpoint's pointer when the checkpoint's pointer is greater than the bump's pointer.

Let's add a test that covers this case as well.

Finally, can you add an example to this doc comment? Something like this:

let bump = Bump::new(); let before_a = bump.raw_checkpoint(); let a = bump.alloc('a'); assert_eq!(*a, 'a'); // okay { let before_b = bump.raw_checkpoint(); let b = bump.alloc('b'); assert_eq!(*a, 'a'); // okay assert_eq!(*b, 'b'); // okay { let before_c = bump.raw_checkpoint(); let c = bump.alloc('c'); assert_eq!(*a, 'a'); // okay assert_eq!(*b, 'b'); // okay assert_eq!(*c, 'c'); // okay unsafe { bump.reset_to_raw_checkpoint(before_c); } assert_eq!(*a, 'a'); // still okay assert_eq!(*b, 'b'); // still okay // Not okay! This would be a use-after-free bug! // // assert_eq!(*c, 'c'); } unsafe { bump.reset_to_raw_checkpoint(before_b); } assert_eq!(*a, 'a'); // still okay // Not okay! This would be a use-after-free bug! // // assert_eq!(*b, 'b'); } unsafe { bump.reset_to_raw_checkpoint(before_a); } // Not okay! This would be a use-after-free bug! // // assert_eq!(*a, 'a');

…ng internal fields to compare

…_list` to fit checkpoint use case, update docs

kryesh · 2025-12-03T08:52:26Z

Have updated to address some of the feedback, still need to add tests though - not familiar with quickcheck so will need to figure that out. Probably won't have time to look at it until the weekend

Start implementing checkpoint mechanism for Bump

907e5fb

fitzgen reviewed Oct 21, 2025

View reviewed changes

Make checkpoint struct private, add run_scoped method to control li…

349e939

…fetimes for temporary allocations that may be reset

kryesh added 4 commits October 22, 2025 15:26

Clean up run_scoped implementation

a618e75

Fix lifetime annotations for run_scoped

c44b6e2

Update docs

88123e3

Remove CheckPoint struct because it was unnecessary

cb238eb

kryesh requested a review from fitzgen October 25, 2025 03:44

Reduce scope test iterations

2bf0423

kryesh changed the title ~~Add "checkpoint" to support partial arena resets~~ Add run_scoped method to run closures and conditonally perform partial resets Oct 25, 2025

kryesh changed the title ~~Add run_scoped method to run closures and conditonally perform partial resets~~ Add run_scoped method to run closures and conditionally perform partial resets Nov 11, 2025

fitzgen reviewed Nov 19, 2025

View reviewed changes

Revert run_scoped and implement proposed API from fitzgen#291

1faf4e2

fitzgen reviewed Dec 1, 2025

View reviewed changes

kryesh added 3 commits December 3, 2025 18:12

Change RawCheckpoint to use a pointer to the chunk instead of stori…

4aef372

…ng internal fields to compare

Free intermediate chunks on checkpoint reset, refactor `dealloc_chunk…

b581ae3

…_list` to fit checkpoint use case, update docs

Add example to doc comment

4a324ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `run_scoped` method to run closures and conditionally perform partial resets #291

Add `run_scoped` method to run closures and conditionally perform partial resets #291

kryesh commented Oct 12, 2025

Uh oh!

fitzgen left a comment

Uh oh!

kryesh commented Oct 22, 2025 •

edited

Loading

Uh oh!

fitzgen left a comment

Uh oh!

kryesh commented Nov 22, 2025

Uh oh!

fitzgen commented Nov 24, 2025

Uh oh!

fitzgen left a comment

Uh oh!

fitzgen Dec 1, 2025

Uh oh!

kryesh Dec 3, 2025

Uh oh!

fitzgen Dec 1, 2025

Uh oh!

kryesh Dec 3, 2025 •

edited

Loading

Uh oh!

fitzgen Dec 1, 2025

Uh oh!

kryesh commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add run_scoped method to run closures and conditionally perform partial resets #291

Are you sure you want to change the base?

Add run_scoped method to run closures and conditionally perform partial resets #291

Conversation

kryesh commented Oct 12, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

kryesh commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

kryesh commented Nov 22, 2025

Uh oh!

fitzgen commented Nov 24, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

fitzgen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

kryesh Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

fitzgen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

kryesh Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fitzgen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

kryesh commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `run_scoped` method to run closures and conditionally perform partial resets #291

Add `run_scoped` method to run closures and conditionally perform partial resets #291

kryesh commented Oct 22, 2025 •

edited

Loading

kryesh Dec 3, 2025 •

edited

Loading