-
Notifications
You must be signed in to change notification settings - Fork 9
Description
This is a feature request to be able to run validation/checkpointing more often than once an epoch. For very large datasets, it feels unreasonable to only be able to run validation once an epoch, especially if it takes a couple hours to complete an epoch. Running validation more often would be useful at least for a faster response time for tuning parameters.
I was able to get around this issue by creating a hack with our usage of PyMarlin where we would set the max_steps_per_epoch to the desired logging frequency for validation. However, this requires modifying the input dataset to track where it currently is in the actual epoch and modifying the number of epochs supplied to the trainer to take into account "logging epochs". It also causes PyMarlin to now inaccurately report the actual epochs the model is trained on.
Overall, the request would be to either to integrate the hack into PyMarlin's logic for a better experience from the user's perspective, or implement more frequent validation/checkpointing through a different method. I am more than happy to supply the code for the hack.