-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hi, thanks for this great benchmark.
I have a question about the hyper-parameter tuning.
see,

the training accuracy and validation accuracy are good at the hyper-parameter sweeping stage. And toolkit chooses "Learning rate 0.01, L2 lambda 0.0001" as the best one for the final 50 epochs.
However, the performance of the model with the selected hyper-parameter is extremely bad.
see,

.
Have you ever faced this problem? this problem mainly shows in dtd, fer2013, and resics45 datasets. Usually, this problem occurs when a relatively large LR (like 0.01) is selected in the sweeping stage.
I don't think this problem comes from the gap between the validation set and testing set, because you can see training accuracy is also bad for the final 50 epochs of training.