PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. With Neptune integration you can:
Note
This integration is tested with pytorch-lightning==1.0.7, and neptune-client==0.4.132.
To get started with this integration, follow the Quickstart below. You can also skip the basics and take a look at the advanced options.
If you want to try things out and focus only on the code you can either:
You can also check this public project with example experiments: PyTorch Lightning integration.
This quickstart will show you how to log PyTorch Lightning experiments to Neptune using NeptuneLogger (part of the pytorch-lightning library).
As a result you will have an experiment logged to Neptune. It will have train loss and epoch (visualized as charts), parameters, hardware utilization charts and experiment metadata.
You have Python 3.x and following libraries installed:
You also need minimal familiarity with the PyTorch Lightning. Have a look at the “Lightning in 2 steps” guide to get started.
Import necessary libraries.
import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
Notice pytorch_lightning at the bottom.
Define Python dictionary with hyper-parameters for model training.
PARAMS = {'max_epochs': 3,
'learning_rate': 0.005,
'batch_size': 32}
This dictionary will later be passed to the Neptune logger (you will see how to do it in step 4), so that you will see hyper-parameters in experiment Parameters tab.
Implement minimal example of the pl.LightningModule and simple DataLoader.
# pl.LightningModule
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=PARAMS['learning_rate'])
# DataLoader
train_loader = DataLoader(MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()),
batch_size=PARAMS['batch_size'])
Few explanations here:
self.log('train_loss', loss)
This loss will be logged to Neptune during training as a train_loss. You will see it in the Experiment’s Charts tab (as “train_loss” chart) and Logs tab (as raw numeric values).
Instantiate NeptuneLogger with necessary parameters.
from pytorch_lightning.loggers.neptune import NeptuneLogger
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project_name="shared/pytorch-lightning-integration",
params=PARAMS)
NeptuneLogger is an object that integrates Neptune with PyTorch Lightning allowing you to track experiments. It’s a part of the lightning library. In this minimalist example we use public user “neptuner”, who has public token: “ANONYMOUS”.
Tip
You can also use your API token. Read more about how to securely set Neptune API token.
Pass instantiated NeptuneLogger to the pl.Trainer.
trainer = pl.Trainer(max_epochs=PARAMS['max_epochs'],
logger=neptune_logger)
Simply pass neptune_logger to the Trainer, so that lightning will use this logger. Notice, that max_epochs is from the PARAMS dictionary.
Fit model to the data.
model = LitModel()
trainer.fit(model, train_loader)
At this point you are all set to fit the model. Neptune logger will collect metrics and show them in the UI.
You just learned how to start logging PyTorch Lightning experiments to Neptune, by using Neptune logger which is part of the lightning library.
Above training is logged to Neptune in near real-time. Click on the link that was outputted to the console or go here to explore an experiment similar to yours. In particular check:
Check this experiment here or view quickstart code as a plain Python script on GitHub.
To learn more about advanced options that Neptune logger has to offer, follow sections below as each describes one functionality.
If you want to try things out and focus only on the code you can either:
You can also check this public project with example experiments: PyTorch Lightning integration.
In addition to the contents of the “Before you start” section in Quickstart, you also need to have scikit-learn and scikit-plot installed.
pip install scikit-learn==0.23.2 scikit-plot==0.3.7
Check scikit-learn installation guide or scikit-plot github project for more info.
Create NeptuneLogger with advanced parameters.
from pytorch_lightning.loggers.neptune import NeptuneLogger
ALL_PARAMS = {...}
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project_name="shared/pytorch-lightning-integration",
close_after_fit=False,
experiment_name="train-on-MNIST",
params=ALL_PARAMS,
tags=['1.x', 'advanced'],
)
In the NeptuneLogger - besides required api_key and project_name, you can specify other options, notably:
Tip
Use neptune_logger.experiment.ABC to call methods that you would use, when working with neptune client, for example:
Check more methods here: experiment methods.
In the pl.LightningModule loss logging for train, validation and test.
class LitModel(pl.LightningModule):
(...)
def training_step(self, batch, batch_idx):
(...)
loss = ...
self.log('train_loss', loss, prog_bar=False)
def validation_step(self, batch, batch_idx):
(...)
loss = ...
self.log('val_loss', loss, prog_bar=False)
def test_step(self, batch, batch_idx):
(...)
loss = ...
self.log('test_loss', loss, prog_bar=False)
Loss values will be tracked in Neptune automatically.
Tip
Trainer parameter: log_every_n_steps controls how frequent the logging is. Keep this parameter relatively high, say >100 for longer experiments.
In the pl.LightningModule implement accuracy score and log it.
class LitModel(pl.LightningModule):
(...)
def training_epoch_end(self, outputs):
for output in outputs:
(...)
acc = accuracy_score(y_true, y_pred)
self.log('train_acc', acc)
def validation_epoch_end(self, outputs):
for output in outputs:
(...)
acc = accuracy_score(y_true, y_pred)
self.log('val_acc', acc)
def test_epoch_end(self, outputs):
for output in outputs:
(...)
acc = accuracy_score(y_true, y_pred)
self.log('test_acc', acc)
Accuracy score will be calculated and logged after every train, validation and test epoch.
Tip
You can find full implementation of all metrics logging in this GitHub or in
.
Implement learning rate monitor as Callback
from pytorch_lightning.callbacks import LearningRateMonitor
# Add scheduler to the optimizer
class LitModel(pl.LightningModule):
(...)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
scheduler = LambdaLR(optimizer, lambda epoch: self.decay_factor ** epoch)
return [optimizer], [scheduler]
# Instantiate LearningRateMonitor Callback
lr_logger = LearningRateMonitor(logging_interval='epoch')
# Pass lr_logger to the pl.Trainer as callback
trainer = pl.Trainer(logger=neptune_logger,
callbacks=[lr_logger])
Learning rate scheduler is defined in the configure_optimizers. It will change lr values after each epoch. These values will be tracked to Neptune automatically.
In the pl.LightningModule implement logic for identifying and logging misclassified images.
class LitModel(pl.LightningModule):
(...)
def test_step(self, batch, batch_idx):
x, y = batch
(...)
y_true = ...
y_pred = ...
for j in np.where(np.not_equal(y_true, y_pred))[0]:
img = np.squeeze(x[j].cpu().detach().numpy())
img[img < 0] = 0
img = (img / img.max()) * 256
neptune_logger.experiment.log_image(
'test_misclassified_images',
img,
description='y_pred={}, y_true={}'.format(y_pred[j], y_true[j]))
Set pl.Trainer to log gradient norm.
trainer = pl.Trainer(logger=neptune_logger,
track_grad_norm=2)
Neptune will visualize gradient norm automatically.
Tip
When you use track_grad_norm it’s recommended to also set log_every_n_steps to something >100, so that you will avoid logging large amount of data.
Use ModelCheckpoint to make checkpoint during training, then log saved checkpoints to Neptune.
from pytorch_lightning.callbacks import ModelCheckpoint
# Instantiate ModelCheckpoint
model_checkpoint = ModelCheckpoint(filepath='my_model/checkpoints/{epoch:02d}-{val_loss:.2f}',
save_weights_only=True,
save_top_k=3,
monitor='val_loss',
period=1)
# Pass it to the pl.Trainer
trainer = pl.Trainer(logger=neptune_logger,
checkpoint_callback=model_checkpoint)
# Log model checkpoint to Neptune
for k in model_checkpoint.best_k_models.keys():
model_name = 'checkpoints/' + k.split('/')[-1]
neptune_logger.experiment.log_artifact(k, model_name)
# Log score of the best model checkpoint.
neptune_logger.experiment.set_property('best_model_score', model_checkpoint.best_model_score.tolist())
Tip
You can find full example implementation in this GitHub or in
.
Log confusion metrics after test time.
import matplotlib.pyplot as plt
from scikitplot.metrics import plot_confusion_matrix
model.freeze()
test_data = dm.test_dataloader()
y_true = np.array([])
y_pred = np.array([])
for i, (x, y) in enumerate(test_data):
y = y.cpu().detach().numpy()
y_hat = model.forward(x).argmax(axis=1).cpu().detach().numpy()
y_true = np.append(y_true, y)
y_pred = np.append(y_pred, y_hat)
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment.log_image('confusion_matrix', fig)
Log model summary and number of GPUs used in the experiment.
# Log model summary
for chunk in [x for x in str(model).split('\n')]:
neptune_logger.experiment.log_text('model_summary', str(chunk))
# Log number of GPU units used
neptune_logger.experiment.set_property('num_gpus', trainer.num_gpus)
Close Neptune logger and experiment once everything is logged.
neptune_logger.experiment.stop()
NeptuneLogger was created with close_after_fit=False, so we need to close Neptune experiment explicitly at the end. Again, this is only for Notebooks, as in scripts logger is closed automatically at the end of the script execution.
You just learned how to log PyTorch Lightning experiments to Neptune, by using Neptune logger which is part of the lightning library.
Above training is logged to Neptune in near real-time. Click on the link that was outputted to the console or charts to explore an experiment similar to yours.
In particular check:
Check this experiment (charts) or view above code snippets as a plain Python script on GitHub.
Please visit the Getting help page. Everything regarding support is there.
Here are other integrations with libraries from the PyTorch ecosystem:
You may also like these two integrations: