Pytorch-Lightning let us use Pytorch based code and easily add extra features such as distributed computing over several GPU's and machines, half precision training, gradient accumulation…
In this example, we optimize following hyper-parameters:
It's interesting to see the possible combination of parameters to have a good performance (here defined by low val_loss):
grad_batchescannot be too high, probably because of the limited dataset which would cause less weight updates by epoch
num_layersis best at 3 or 4
Note: it's important to keep in mind that we limited training to 20 epochs. Deeper networks typically need more time to be trained.
Pytorch-Ligthning includes a logger for W&B that can be called simply with:
from pytorch_lightning.loggers import WandbLogger from pytorch_lightning import Trainer wandb_logger = WandbLogger() trainer = Trainer(logger=wandb_logger)
Refer to the documentation for more details.
Hyper-parameters can be defined manually and every run is automatically logged onto Weights & Biases for easier analysis/interpretation of results and how to optimize the architecture.
You can also run sweeps to optimize automatically hyper-parameters.
Note: this example has been adapted from Pytorch-Lightning examples.