A variety of hparams used to train vit, convnext, vit-hybrids (maxvit, coatnet) recently in timm
All variations on the same theme (DeiT / Swin pretraining) but with different tweaks here and there.
These were all run on 4-8 GPU or TPU devices, they use --lr-base which rescales the LR automatically based on global batch size (relative to --lr-base-size) so adapting to different GPU counts will work well within a range, running at significanly lower or higher global batch sizes will require re-running a LR search.
More recntly, DeiT-III has shown to be a very compelling set of hparams for vit like models, but I've yet to do full runs myself, but theirs can be adapted to timm train scripts (3A aug added recently).
https://github.com/facebookresearch/deit/blob/main/README_revenge.md
To use the yaml files directly w/ timm train script.