WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@kryesh
Copy link

@kryesh kryesh commented Oct 12, 2025

This PR adds a "CheckPoint" struct that can be used to save a position for a Bump that can be later reset to, allowing for partial resets of the allocator.

I have a use case where I build a collection from many complex structs that use a bump allocator, however these structs must pass a filter before being added to the collection, In cases where very few events make it through the filter I found that the bump allocator would grow to be massive due to leaked allocations from rejected structs.
After testing using a second bump allocator as scratch space that resolved the memory leak issue however caused a ~20% performance hit due to the extra memory copies, this PR solves the problem by taking a checkpoint before building the complex struct, then performing a partial reset if the struct is rejected.

Not sure if this kind of conditional logic is used elsewhere but figured I'd make a PR anyway in case it's useful, happy to update docs etc if needed.

Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I do want to add a similar API to bumpalo, but I think that the exact details here are not what we want. I think the API as proposed here is not quite powerful enough to satisfy most uses cases that people have for checkpoints, particularly the ability to do stuff like the following:

let mut bump = Bump::new();

let x = bump.alloc(5);

let checkpoint = bump.checkpoint();

let y = bump.alloc(10);
foo(x, y); // okay

drop(checkpoint); // or `bump.reset_to_checkpoint(checkpoint)` or whatever...

bar(x); // still okay

// bar(y); // NOT OKAY, should cause borrow checker errors

The API in this PR will not allow x to continue to be used across sub-arenas/checkpoints. And to be fair, I don't the exact pseudocode above can be implemented by any safe+sound Rust API because without changing some API details we have the conflicting requirements that (a) the bump has live shared borrows from earlier allocations and (b) we must have exclusive access at reset points to act as a barrier and ensure that none of the allocations within the nested scope/checkpoint remain live. The only way I am aware of resolving this is to separate the underlying storage and the allocator/checkpoint/scope so that the allocator/checkpoint/scope has a lifetime and borrow of the storage and then you can make sub-allocators/checkpoints/scopes with sub-lifetimes. And each allocation would have the allocator's/checkpoint's/scope's lifetime, not the lifetime of the underlying storage.

There is some existing discussion of this feature in #105, and a proposed API, although it would be a breaking change. It might be possible to come up with something that is semver compatible though, even if not quite as nice API-wise. Anyways, I suggest reading up on that thread and grokking the API sketch in there before continuing any further down this line of investigation.

…fetimes for temporary allocations that may be reset
@kryesh
Copy link
Author

kryesh commented Oct 22, 2025

Thanks for the feedback!
I had a read through #105 and came up with run_scoped as a minimal adaptation of my current code.

I made checkpoint and the associated methods private, then added a run_scoped method that provides temporary access to the allocator. If the function returns Some(T) then the result can persist on the Bump, but if the function returns None then the checkpoint is restored and the position of the allocator is reset.

The API could definitely be polished up a bit, maybe adding a Result variant as well so it's not just Option but I think it addresses the potential issues with the borrow checker.

As for functionality, the run_scoped method was sufficient for my use case at least - you can see the update for the changed api here: https://codeberg.org/Kryesh/crystalline/commit/da364cd0781bfbb88632ce1ec8c4e1a60a43737f?style=split&whitespace=show-all
My app probably shouldn't be used as a reference however since I do bunch of casts to 'static

@kryesh kryesh requested a review from fitzgen October 25, 2025 03:44
@kryesh kryesh changed the title Add "checkpoint" to support partial arena resets Add run_scoped method to run closures and conditonally perform partial resets Oct 25, 2025
@kryesh kryesh changed the title Add run_scoped method to run closures and conditonally perform partial resets Add run_scoped method to run closures and conditionally perform partial resets Nov 11, 2025
Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, I've been traveling a bunch lately and haven't had time to sit down with this PR until just now.

I think the maybe-reset-maybe-not behavior is fairly niche, and not an API I want to support long-term. That said, I think it would be a failing of this crate if that functionality were impossible to build on top of bumpalo's public APIs.

So, what do you think of the following plan?

  • Add a pub struct RawCheckpoint { ... } type
  • Add pub fn Bump::raw_checkpoint(&self) -> RawCheckpoint and pub unsafe fn Bump::reset_to_raw_checkpoint(&self, checkpoint: RawCheckpoint) methods
  • Add pub fn Bump::with_scope<F, T>(&self, f: F) -> T where F: for<'a> FnOnce(&'a Bump) -> T

The first two items allow you to implement run_scoped with its dynamic-reset behavior on top of this crate, the third item is the typical scoped usage described in #105 and would be implemented on top of the first two as well.


Aside: I think you might be able to improve the ergonomics of your run_scoped method a little bit by defining a custom type with a default type parameter, instead of using Option, so that you don't ever have to type out Option::<()>::None:

pub enum Scope<T = ()> {
    Reset,
    Advance(T),
}

And then Scope::Reset shouldn't need an explicit turbofish.

@kryesh
Copy link
Author

kryesh commented Nov 22, 2025

No problem,

I think that approach makes sense, sounds like the first 2 steps are pretty close to my original PR but with the reset function marked as unsafe - is that right?

If so I'll update the PR to match, I won't look at implementing Bump::with_scope though since that approach looks like it needs something more complex that just borrowing &self like I've got here.

Thanks for the tip on run_scoped as well! will definitely use that

@fitzgen
Copy link
Owner

fitzgen commented Nov 24, 2025

I think that approach makes sense, sounds like the first 2 steps are pretty close to my original PR but with the reset function marked as unsafe - is that right?

Yes, and with the reset method taking &self instead of &mut self, so that it can be a proper unsafe building block for creating custom scoped behaviors like your desired maybe-reset-maybe-not.

Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments inline below.

Can you add a quickcheck test that uses this new API? See tests/all/quickchecks.rs. I'm thinking it should look something like this:

#[derive(Arbitrary)]
#[cfg(feature = "allocator_api")]
enum RawCheckpointOp {
    TakeCheckpoint,
    ResetToCheckpoint,
    DropCheckpoint,

    Alloc(u8, usize),
    Realloc(u8, usize),
    Dealloc,

    Reset,
}

#[cfg(feature = "allocator_api")]
fn stress_raw_checkpoints(ops: Vec<RawCheckpointOp>) {
    use std::alloc::{Allocator, Layout};

    const MAX_SIZE: usize = 100;
    const MAX_OPS: usize = 100;

    let mut bump = Bump::new();
    let mut live_allocs: Vec<(u8, NonNull<[u8]>, Layout)> = Vec::new();
    let mut checkpoints: Vec<(usize, RawCheckpoint)> = Vec::new();

    let check_alloc = |expected_byte: u8, alloc: NonNull<[u8]>| {
        let alloc = unsafe { alloc.as_ref() };
        for actual_byte in alloc {
            assert_eq!(actual_byte, expected_byte);
        }
    };

    ops.truncate(MAX_OPS);
    for op in ops {
        use RawCheckpointOp::*;
        match op {
            TakeCheckpoint => {
                checkpoints.push(bump.raw_checkpoint());
            }

            ResetToCheckpoint => {
                if let Some((len, checkpoint)) = checkpoints.pop() {
                    for (byte, alloc, _layout) in live_allocs.get(len..) {
                        check_alloc(byte, alloc);
                    }
                    live_allocs.truncate(len);
                    unsafe {
                        bump.reset_to_raw_checkpoint(checkpoint);
                    }
                }
            }

            DropCheckpoint => {
                let _ = checkpoints.pop();
            }

            Alloc(byte, size) => {
                let size = std::cmp::min(size, MAX_SIZE);
                let layout = Layout::array::<u8>(size).unwrap();
                let ptr = bump.allocate(layout).unwrap();
                unsafe {
                    ptr.cast::<u8>().write_bytes(byte, ptr.len());
                }
                live_allocs.push((byte, ptr, layout));
            }

            Realloc(new_byte, new_size) => {
                if let Some((old_byte, old_ptr, old_layout)) = live_allocs.pop() {
                    check_alloc(old_byte, old_ptr);

                    let new_size = std::cmp::min(new_size, MAX_SIZE);
                    let new_layout = Layout::array::<u8>(new_size).unwrap();

                    let new_ptr = if new_size > old_layout.size() {
                        unsafe { bump.grow(old_ptr, old_layout, new_layout).unwrap() }
                    } else {
                        unsafe { bump.shrink(old_ptr, old_layout, new_layout).unwrap() }
                    };

                    unsafe {
                        new_ptr.cast::<u8>().write_bytes(new_byte, new_ptr.len());
                    }
                    live_allocs.push((new_byte, new_ptr, new_layout));
                }
            }

            Dealloc => {
                if let Some((byte, ptr, layout)) = live_allocs.pop() {
                    check_alloc(byte, ptr);

                    unsafe {
                        bump.deallocate(ptr, layout);
                    }
                }
            }

            Reset => {
                for (byte, ptr, _layout) in live_allocs.drain(..) {
                    check_alloc(byte, ptr);
                }

                checkpoints.clear();
                bump.reset();
            }
        }
    }

    // Reset any extant checkpoints, checking its in-scope allocations.
    for (len, checkpoint) in checkpoints {
        for (byte, alloc, _layout) in live_allocs.get(len..) {
            check_alloc(byte, alloc);
        }
        live_allocs.truncate(len);
        unsafe {
            bump.reset_to_raw_checkpoint(checkpoint);
        }
    }

    // Check any extant allocations.
    for (byte, ptr, _layout) in live_allocs {
        check_alloc(byte, ptr);
    }
}

This should cover the realloc-to-before-the-checkpoint scenario I mention inline below, FWIW.

Thanks again!

Comment on lines 384 to 388
pub struct RawCheckpoint {
chunk_data: NonNull<u8>,
size: usize,
ptr: NonNull<u8>,
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to storing these chunk_data and size fields instead of just a chunk: NonNull<ChunkFooter> field? The latter will result in smaller RawCheckpoints and less data to juggle/shepherd around and less things to check in RawCheckpoint::matches.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I'll update it

src/lib.rs Outdated
let chunk = unsafe { self.current_chunk_footer.get().as_ref() };
(chunk.data, chunk.layout.size(), chunk.ptr.get())
};
/// This will never free chunks, it will only change the pointer position of the current chunk.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should free any chunks beyond the checkpoint's chunk, or else the following usage pattern could "leak" when it really should not:

let mut bump = Bump::new();
loop {
    let checkpoint = bump.raw_checkpoint();
    foo(&bump);
    unsafe {
        bump.reset_to_raw_checkpoint(checkpoint);
    }
}

Copy link
Author

@kryesh kryesh Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some nuance for this one - if checkpoints are used in a hot loop and trigger a new chunk allocation, then future iterations are also likely to trigger an allocation as well if the new chunk is freed.
Maybe a middle ground would be appropriate? Only keep the largest new chunk, if there have been multiple growths then free the intermediate ones since we know they won't be used for anything.
A pathological example of this would be creating a checkpoint for a Bump with no capacity since that would always trigger a new allocation after resetting to the checkpoint

src/lib.rs Outdated
Comment on lines 1050 to 1055
/// # Safety
/// **This will invalidate references in a way that cannot be validated by the borrow checker!**
///
/// Any allocation made between checkpoint creation and this function being called may be overwritten
/// by future allocations to this `Bump`. It is your responsibility to ensure that these allocations
/// are never reused.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid talking about how existing allocations made after the checkpoint might be overwritten, that is an implementation detail, and instead just focus on what the user must to do maintain safety (which is to never to access any allocations made between the checkpoint's creation and this call to reset_to_raw_checkpoint again).

Also we should document that you must not have called Bump::reset between creating the checkpoint and calling reset_to_raw_checkpoint.

Also we should document that you must reset to checkpoints in a LIFO-esque manner (LIFO but you're allowed to drop checkpoints rather than reset to them: that is, once you reset to a checkpoint A you can never reset to another checkpoint B that was created after A but before reset_to_raw_checkpoint(A)).

Also we should note that the given checkpoint must be associated with this Bump (i.e. returned by a call to raw_checkpoint on this Bump). And we should also debug_assert! that property by walking the chunk linked list and checking that we find this checkpoint's chunk. And we should also debug assert that this checkpoint's ptr is (a) within its chunk and (b) is greater than or equal to the current ptr.

Hmm, but now that I think about it, I think (b) might not actually hold 100% of the time. Consider the sequence:

  • create Bump
  • allocate a "relatively big" allocation
  • take a checkpoint
  • realloc the allocation such that it shrinks and we actually back up our bump's pointer
  • reset to the checkpoint

In this case, the bump's pointer could actually now be greater than the checkpoint's pointer. So I think we still want to debug-assert (a) but instead of (b) we want to only update the bump's pointer to the checkpoint's pointer when the checkpoint's pointer is greater than the bump's pointer.

Let's add a test that covers this case as well.

Finally, can you add an example to this doc comment? Something like this:

let bump = Bump::new();

let before_a = bump.raw_checkpoint();
let a = bump.alloc('a');

assert_eq!(*a, 'a'); // okay

{
    let before_b = bump.raw_checkpoint();
    let b = bump.alloc('b');

    assert_eq!(*a, 'a'); // okay
    assert_eq!(*b, 'b'); // okay

    {
        let before_c = bump.raw_checkpoint();
        let c = bump.alloc('c');

        assert_eq!(*a, 'a'); // okay
        assert_eq!(*b, 'b'); // okay
        assert_eq!(*c, 'c'); // okay

        unsafe {
            bump.reset_to_raw_checkpoint(before_c);
        }

        assert_eq!(*a, 'a'); // still okay
        assert_eq!(*b, 'b'); // still okay

        // Not okay! This would be a use-after-free bug!
        //
        //     assert_eq!(*c, 'c');
    }

    unsafe {
        bump.reset_to_raw_checkpoint(before_b);
    }

    assert_eq!(*a, 'a'); // still okay

    // Not okay! This would be a use-after-free bug!
    //
    //     assert_eq!(*b, 'b');
}

unsafe {
    bump.reset_to_raw_checkpoint(before_a);
}

// Not okay! This would be a use-after-free bug!
//
//     assert_eq!(*a, 'a');

@kryesh
Copy link
Author

kryesh commented Dec 3, 2025

Have updated to address some of the feedback, still need to add tests though - not familiar with quickcheck so will need to figure that out. Probably won't have time to look at it until the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants