train – Trainer for the language models

train.py

usage: train.py [-h] [--model MODEL] [--max_epochs MAX_EPOCHS]
                [--batch_size BATCH_SIZE] [--updates UPDATES] [--profile]
                [--dbg] [--reset_cache] [--subset SUBSET]
                [--conv_ckpt CONV_CKPT] [--tf32 TF32] [--layers LAYERS]
                [--heads HEADS] [--hidden_size HIDDEN_SIZE]
                [--continue_from CONTINUE_FROM]
-h, --help

show this help message and exit

--model <model>

Model to use for pretraining.

--max_epochs <max_epochs>

Number of epochs to pretrain for.

--batch_size <batch_size>

Batch size to use for pretraining.

--updates <updates>

Batches to wait before logging training progress.

--profile

Whether to profile the training process.

--dbg

Whether to run with single thread.

--reset_cache

Whether to reset the cache before training.

--subset <subset>

Fraction of the dataset to use across train/val/test.

--conv_ckpt <conv_ckpt>

Converts a Lightning checkpoint to a HuggingFace checkpoint.

--tf32 <tf32>

Whether to use tf32 precision on Ampere GPUs.

--layers <layers>

Number of layers to use for the model.

--heads <heads>

Number of heads to use for the model.

--hidden_size <hidden_size>

Size for the hidden state of the model if applicable.

--continue_from <continue_from>

Path to a checkpoint to continue training from.