Concerning BatchGradientVerificationCallback, in case one is using BatchNorm, then the gradients are mixed between samples in the batch, hence it make no sense to use it, nowdays almost everybody is using BatchNorm, what would be a realistic application of the same feature in such models?