VisionDataModule by default creates non-standard dataset splits

## 🐛 Bug

`VisionDataModule` by default subdivides the train split into a train and val split: https://github.com/Lightning-Universe/lightning-bolts/blob/541f7018492b6a3b16558f9ea2a763b02a007a66/src/pl_bolts/datamodules/vision_datamodule.py#L109

This behavior can be disabled by setting `val_split=0` (in some cases also by setting `strict_val_split=True`), but is enabled by default. 

For example, CIFAR-10 is supposed to have 50,000 train and 10,000 test images (see [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html)). There is no official val split. When using `CIFAR10DataModule`, you instead get a 40,000 image train split and a 10,000 image val split.

The documentation of the affected modules does not make this behavior clear. E.g. the docstring for `CIFAR10DataModule` describes it as "Standard CIFAR10, train, val, test splits and transforms", which seems misleading. Documentation for other affected data modules is similar.

As a result, users of many vision data modules will not be able to reproduce results on standard datasets such as CIFAR-10, unless they explicitly disable this behavior.

As far as I can tell, this affects all classes that inherit from `VisionDataModule`:
* `BinaryMNISTDataModule`
* `CIFAR10DataModule`
* `TinyCIFAR10DataModule`
* `EMNISTDataModule` (which has a `strict_val_split` to disable the unexpected behavior)
* `FashionMNISTDataModule`
* `MNISTDataModule`

### Expected behavior

I would generally expect splits to be the same as those published by the dataset authors. I understand that a val split may be required by Pytorch Lightning. However, I would not expect splits to be changed without by default and without warning.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VisionDataModule by default creates non-standard dataset splits #1096

🐛 Bug

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VisionDataModule by default creates non-standard dataset splits #1096

Description

🐛 Bug

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions