train – Trainer for the language models

train.py¶

usage: train.py [-h] [--model MODEL] [--max_epochs MAX_EPOCHS]
                [--batch_size BATCH_SIZE] [--updates UPDATES] [--profile]
                [--dbg] [--reset_cache] [--subset SUBSET]
                [--conv_ckpt CONV_CKPT] [--tf32 TF32] [--layers LAYERS]
                [--heads HEADS] [--hidden_size HIDDEN_SIZE]
                [--continue_from CONTINUE_FROM]

-h, --help¶: show this help message and exit

--model <model>¶: Model to use for pretraining.

--max_epochs <max_epochs>¶: Number of epochs to pretrain for.

--batch_size <batch_size>¶: Batch size to use for pretraining.

--updates <updates>¶: Batches to wait before logging training progress.

--profile¶: Whether to profile the training process.

--dbg¶: Whether to run with single thread.

--reset_cache¶: Whether to reset the cache before training.

--subset <subset>¶: Fraction of the dataset to use across train/val/test.

--conv_ckpt <conv_ckpt>¶: Converts a Lightning checkpoint to a HuggingFace checkpoint.

--tf32 <tf32>¶: Whether to use tf32 precision on Ampere GPUs.

--layers <layers>¶: Number of layers to use for the model.

--heads <heads>¶: Number of heads to use for the model.

--hidden_size <hidden_size>¶: Size for the hidden state of the model if applicable.

--continue_from <continue_from>¶: Path to a checkpoint to continue training from.