Learner#

class Learner[source]#

Bases: ABC

Abstract training and prediction routines for a model.

This can be subclassed to handle different computer vision tasks.

The datasets, model, optimizer, and schedulers will be generated from the cfg if not specified in the constructor.

If instantiated with training=False, the training apparatus (loss, optimizer, scheduler, logging, etc.) will not be set up and the model will be put into eval mode.

Note that various training and prediction methods have the side effect of putting Learner.model into training or eval mode. No attempt is made to put the model back into the mode it was previously in.

__init__(cfg: LearnerConfig, output_dir: Optional[str] = None, train_ds: Optional[Dataset] = None, valid_ds: Optional[Dataset] = None, test_ds: Optional[Dataset] = None, model: Optional[torch.nn.Module] = None, loss: Optional[Callable] = None, optimizer: Optional[Optimizer] = None, epoch_scheduler: Optional[_LRScheduler] = None, step_scheduler: Optional[_LRScheduler] = None, tmp_dir: Optional[str] = None, model_weights_path: Optional[str] = None, model_def_path: Optional[str] = None, loss_def_path: Optional[str] = None, training: bool = True)[source]#

Constructor.

Parameters
  • cfg (LearnerConfig) – LearnerConfig.

  • train_ds (Optional[Dataset], optional) – The dataset to use for training. If None, will be generated from cfg.data. Defaults to None.

  • valid_ds (Optional[Dataset], optional) – The dataset to use for validation. If None, will be generated from cfg.data. Defaults to None.

  • test_ds (Optional[Dataset], optional) – The dataset to use for testing. If None, will be generated from cfg.data. Defaults to None.

  • model (Optional[nn.Module], optional) – The model. If None, will be generated from cfg.model. Defaults to None.

  • loss (Optional[Callable], optional) – The loss function. If None, will be generated from cfg.solver. Defaults to None.

  • optimizer (Optional[Optimizer], optional) – The optimizer. If None, will be generated from cfg.solver. Defaults to None.

  • epoch_scheduler (Optional[_LRScheduler], optional) – The scheduler that updates after each epoch. If None, will be generated from cfg.solver. Defaults to None.

  • step_scheduler (Optional[_LRScheduler], optional) – The scheduler that updates after each optimizer-step. If None, will be generated from cfg.solver. Defaults to None.

  • tmp_dir (Optional[str], optional) – A temporary directory to use for downloads etc. If None, will be auto-generated. Defaults to None.

  • model_weights_path (Optional[str], optional) – URI of model weights to initialize the model with. Defaults to None.

  • model_def_path (Optional[str], optional) – A local path to a directory with a hubconf.py. If provided, the model definition is imported from here. This is used when loading an external model from a model-bundle. Defaults to None.

  • loss_def_path (Optional[str], optional) – A local path to a directory with a hubconf.py. If provided, the loss function definition is imported from here. This is used when loading an external loss function from a model-bundle. Defaults to None.

  • training (bool, optional) – If False, the training apparatus (loss, optimizer, scheduler, logging, etc.) will not be set up and the model will be put into eval mode. If True, the training apparatus will be set up and the model will be put into training mode. Defaults to True.

  • output_dir (Optional[str]) –

Methods

__init__(cfg[, output_dir, train_ds, ...])

Constructor.

build_dataloaders()

Set the DataLoaders for train, validation, and test sets.

build_datasets()

build_epoch_scheduler([start_epoch])

Returns an LR scheduler that changes the LR each epoch.

build_loss([loss_def_path])

Build a loss Callable.

build_metric_names()

Returns names of metrics used to validate model at each epoch.

build_model([model_def_path])

Build a PyTorch model.

build_optimizer()

Returns optimizer.

build_step_scheduler([start_epoch])

Returns an LR scheduler that changes the LR each step.

eval_model(split)

Evaluate model using a particular dataset split.

from_model_bundle(model_bundle_uri[, ...])

Create a Learner from a model bundle.

get_collate_fn()

Returns a custom collate_fn to use in DataLoader.

get_dataloader(split)

Get the DataLoader for a split.

get_start_epoch()

Get start epoch.

get_train_sampler(train_ds)

Return a sampler to use for the training dataloader or None to not use any.

get_visualizer_class()

Returns a Visualizer class object for plotting data samples.

load_checkpoint()

Load last weights from previous run if available.

load_init_weights([model_weights_path])

Load the weights to initialize model.

load_weights(uri, **kwargs)

Load model weights from a file.

log_data_stats()

Log stats about each DataSet.

main()

Main training sequence.

normalize_input(x)

Normalize x to [0, 1].

numpy_predict(x[, raw_out])

Make a prediction using an image or batch of images in numpy format.

on_epoch_end(curr_epoch, metrics)

Hook that is called at end of epoch.

on_overfit_start()

Hook that is called at start of overfit routine.

on_train_start()

Hook that is called at start of train routine.

output_to_numpy(out)

Convert output of model to numpy format.

overfit()

Optimize model using the same batch repeatedly.

plot_dataloader(dl, output_path[, ...])

Plot images and ground truth labels for a DataLoader.

plot_dataloaders([batch_limit, show])

Plot images and ground truth labels for all DataLoaders.

plot_predictions(split[, batch_limit, show])

Plot predictions for a split.

post_forward(x)

Post process output of call to model().

predict(x[, raw_out])

Make prediction for an image or batch of images.

predict_dataloader(dl[, batched_output, ...])

Returns an iterator over predictions on the given dataloader.

predict_dataset(dataset[, return_format, ...])

Returns an iterator over predictions on the given dataset.

prob_to_pred(x)

Convert a Tensor with prediction probabilities to class ids.

run_tensorboard()

Run TB server serving logged stats.

save_model_bundle()

Save a model bundle.

setup_data()

Set datasets and dataLoaders for train, validation, and test sets.

setup_loss([loss_def_path])

Setup self.loss.

setup_model([model_weights_path, model_def_path])

Setup self.model.

setup_tensorboard()

Setup for logging stats to TB.

setup_training([loss_def_path])

stop_tensorboard()

Stop TB logging and server if it's running.

sync_from_cloud()

Sync any previous output in the cloud to output_dir.

sync_to_cloud()

Sync any output to the cloud at output_uri.

to_batch(x)

Ensure that image array has batch dimension.

to_device(x, device)

Load Tensors onto a device.

train([epochs])

Training loop that will attempt to resume training if appropriate.

train_end(outputs, num_samples)

Aggregate the ouput of train_step at the end of the epoch.

train_epoch(optimizer[, step_scheduler])

Train for a single epoch.

train_step(batch, batch_ind)

Compute loss for a single training batch.

validate_end(outputs, num_samples)

Aggregate the ouput of validate_step at the end of the epoch.

validate_epoch(dl)

Validate for a single epoch.

validate_step(batch, batch_ind)

Compute metrics on validation batch.

__init__(cfg: LearnerConfig, output_dir: Optional[str] = None, train_ds: Optional[Dataset] = None, valid_ds: Optional[Dataset] = None, test_ds: Optional[Dataset] = None, model: Optional[torch.nn.Module] = None, loss: Optional[Callable] = None, optimizer: Optional[Optimizer] = None, epoch_scheduler: Optional[_LRScheduler] = None, step_scheduler: Optional[_LRScheduler] = None, tmp_dir: Optional[str] = None, model_weights_path: Optional[str] = None, model_def_path: Optional[str] = None, loss_def_path: Optional[str] = None, training: bool = True)[source]#

Constructor.

Parameters
  • cfg (LearnerConfig) – LearnerConfig.

  • train_ds (Optional[Dataset], optional) – The dataset to use for training. If None, will be generated from cfg.data. Defaults to None.

  • valid_ds (Optional[Dataset], optional) – The dataset to use for validation. If None, will be generated from cfg.data. Defaults to None.

  • test_ds (Optional[Dataset], optional) – The dataset to use for testing. If None, will be generated from cfg.data. Defaults to None.

  • model (Optional[nn.Module], optional) – The model. If None, will be generated from cfg.model. Defaults to None.

  • loss (Optional[Callable], optional) – The loss function. If None, will be generated from cfg.solver. Defaults to None.

  • optimizer (Optional[Optimizer], optional) – The optimizer. If None, will be generated from cfg.solver. Defaults to None.

  • epoch_scheduler (Optional[_LRScheduler], optional) – The scheduler that updates after each epoch. If None, will be generated from cfg.solver. Defaults to None.

  • step_scheduler (Optional[_LRScheduler], optional) – The scheduler that updates after each optimizer-step. If None, will be generated from cfg.solver. Defaults to None.

  • tmp_dir (Optional[str], optional) – A temporary directory to use for downloads etc. If None, will be auto-generated. Defaults to None.

  • model_weights_path (Optional[str], optional) – URI of model weights to initialize the model with. Defaults to None.

  • model_def_path (Optional[str], optional) – A local path to a directory with a hubconf.py. If provided, the model definition is imported from here. This is used when loading an external model from a model-bundle. Defaults to None.

  • loss_def_path (Optional[str], optional) – A local path to a directory with a hubconf.py. If provided, the loss function definition is imported from here. This is used when loading an external loss function from a model-bundle. Defaults to None.

  • training (bool, optional) – If False, the training apparatus (loss, optimizer, scheduler, logging, etc.) will not be set up and the model will be put into eval mode. If True, the training apparatus will be set up and the model will be put into training mode. Defaults to True.

  • output_dir (Optional[str]) –

build_dataloaders() Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader][source]#

Set the DataLoaders for train, validation, and test sets.

Return type

Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader]

build_datasets() Tuple[Dataset, Dataset, Dataset][source]#
Return type

Tuple[Dataset, Dataset, Dataset]

build_epoch_scheduler(start_epoch: int = 0) _LRScheduler[source]#

Returns an LR scheduler that changes the LR each epoch.

Parameters

start_epoch (int) –

Return type

_LRScheduler

build_loss(loss_def_path: Optional[str] = None) Callable[source]#

Build a loss Callable.

Parameters

loss_def_path (Optional[str]) –

Return type

Callable

build_metric_names() List[str][source]#

Returns names of metrics used to validate model at each epoch.

Return type

List[str]

build_model(model_def_path: Optional[str] = None) torch.nn.Module[source]#

Build a PyTorch model.

Parameters

model_def_path (Optional[str]) –

Return type

torch.nn.Module

build_optimizer() Optimizer[source]#

Returns optimizer.

Return type

Optimizer

build_step_scheduler(start_epoch: int = 0) _LRScheduler[source]#

Returns an LR scheduler that changes the LR each step.

Parameters

start_epoch (int) –

Return type

_LRScheduler

eval_model(split: str)[source]#

Evaluate model using a particular dataset split.

Gets validation metrics and saves them along with prediction plots.

Parameters

split (str) – the dataset split to use: train, valid, or test.

classmethod from_model_bundle(model_bundle_uri: str, tmp_dir: Optional[str] = None, cfg: Optional[LearnerConfig] = None, training: bool = False, **kwargs) Learner[source]#

Create a Learner from a model bundle.

Note

This is the bundle saved in train/model-bundle.zip and not bundle/model-bundle.zip.

Parameters
  • model_bundle_uri (str) – URI of the model bundle.

  • tmp_dir (Optional[str], optional) – Optional temporary directory. Will be used for unzipping bundle and also passed to the default constructor. If None, will be auto-generated. Defaults to None.

  • cfg (Optional[LearnerConfig], optional) – If None, will be read from the bundle. Defaults to None.

  • training (bool, optional) – If False, the training apparatus (loss, optimizer, scheduler, logging, etc.) will not be set up and the model will be put into eval mode. If True, the training apparatus will be set up and the model will be put into training mode. Defaults to True.

  • **kwargs – See Learner.__init__().

Raises

FileNotFoundError – If using custom Albumentations transforms and definition file is not found in bundle.

Returns

Object of the Learner subclass on which this was called.

Return type

Learner

get_collate_fn() Optional[callable][source]#

Returns a custom collate_fn to use in DataLoader.

None is returned if default collate_fn should be used.

See https://pytorch.org/docs/stable/data.html#working-with-collate-fn

Return type

Optional[callable]

get_dataloader(split: str) torch.utils.data.DataLoader[source]#

Get the DataLoader for a split.

Parameters

split (str) – a split name which can be train, valid, or test

Return type

torch.utils.data.DataLoader

get_start_epoch() int[source]#

Get start epoch.

If training was interrupted, this returns the last complete epoch + 1.

Return type

int

get_train_sampler(train_ds: Dataset) Optional[Sampler][source]#

Return a sampler to use for the training dataloader or None to not use any.

Parameters

train_ds (Dataset) –

Return type

Optional[Sampler]

abstract get_visualizer_class() Type[Visualizer][source]#

Returns a Visualizer class object for plotting data samples.

Return type

Type[Visualizer]

load_checkpoint()[source]#

Load last weights from previous run if available.

load_init_weights(model_weights_path: Optional[str] = None) None[source]#

Load the weights to initialize model.

Parameters

model_weights_path (Optional[str]) –

Return type

None

load_weights(uri: str, **kwargs) None[source]#

Load model weights from a file.

Parameters

uri (str) –

Return type

None

log_data_stats()[source]#

Log stats about each DataSet.

main()[source]#

Main training sequence.

This plots the dataset, runs a training and validation loop (which will resume if interrupted), logs stats, plots predictions, and syncs results to the cloud.

normalize_input(x: ndarray) ndarray[source]#

Normalize x to [0, 1].

If x.dtype is a subtype of np.unsignedinteger, normalize it to [0, 1] using the max possible value of that dtype. Otherwise, assume it is in [0, 1] already and do nothing.

Parameters

x (np.ndarray) – an image or batch of images

Returns

the same array scaled to [0, 1].

Return type

ndarray

numpy_predict(x: ndarray, raw_out: bool = False) ndarray[source]#

Make a prediction using an image or batch of images in numpy format. If x.dtype is a subtype of np.unsignedinteger, it will be normalized to [0, 1] using the max possible value of that dtype. Otherwise, x will be assumed to be in [0, 1] already and will be cast to torch.float32 directly.

Parameters
  • x (ndarray) – (ndarray) of shape [height, width, channels] or [batch_sz, height, width, channels]

  • raw_out (bool) – if True, return prediction probabilities

Returns

predictions using numpy arrays

Return type

ndarray

on_epoch_end(curr_epoch, metrics)[source]#

Hook that is called at end of epoch.

Writes metrics to CSV and TB, and saves model.

on_overfit_start()[source]#

Hook that is called at start of overfit routine.

on_train_start()[source]#

Hook that is called at start of train routine.

output_to_numpy(out: torch.Tensor) ndarray[source]#

Convert output of model to numpy format.

Parameters

out (torch.Tensor) – the output of the model in PyTorch format

Return type

ndarray

Returns: the output of the model in numpy format

overfit()[source]#

Optimize model using the same batch repeatedly.

plot_dataloader(dl: torch.utils.data.DataLoader, output_path: str, batch_limit: Optional[int] = None, show: bool = False)[source]#

Plot images and ground truth labels for a DataLoader.

Parameters
plot_dataloaders(batch_limit: Optional[int] = None, show: bool = False)[source]#

Plot images and ground truth labels for all DataLoaders.

Parameters
plot_predictions(split: str, batch_limit: Optional[int] = None, show: bool = False)[source]#

Plot predictions for a split.

Uses the first batch for the corresponding DataLoader.

Parameters
  • split (str) – dataset split. Can be train, valid, or test.

  • batch_limit (Optional[int]) – optional limit on (rendered) batch size

  • show (bool) –

post_forward(x: Any) Any[source]#

Post process output of call to model().

Useful for when predictions are inside a structure returned by model().

Parameters

x (Any) –

Return type

Any

predict(x: torch.Tensor, raw_out: bool = False) Any[source]#

Make prediction for an image or batch of images.

Parameters
  • x (Tensor) – Image or batch of images as a float Tensor with pixel values normalized to [0, 1].

  • raw_out (bool) – if True, return prediction probabilities

Returns

the predictions, in probability form if raw_out is True, in class_id form

otherwise

Return type

Any

predict_dataloader(dl: torch.utils.data.DataLoader, batched_output: bool = True, return_format: Literal['xyz', 'yz', 'z'] = 'z', raw_out: bool = True, predict_kw: dict = {}) Union[Iterator[Any], Iterator[Tuple[Any, ...]]][source]#

Returns an iterator over predictions on the given dataloader.

Parameters
  • dl (DataLoader) – The dataloader to make predictions on.

  • batched_output (bool, optional) – If True, return batches of x, y, z as defined by the dataloader. If False, unroll the batches into individual items. Defaults to True.

  • return_format (Literal['xyz', 'yz', 'z'], optional) – Format of the return elements of the returned iterator. Must be one of: ‘xyz’, ‘yz’, and ‘z’. If ‘xyz’, elements are 3-tuples of x, y, and z. If ‘yz’, elements are 2-tuples of y and z. If ‘z’, elements are (non-tuple) values of z. Where x = input image, y = ground truth, and z = prediction. Defaults to ‘z’.

  • raw_out (bool, optional) – If true, return raw predicted scores. Defaults to True.

  • predict_kw (dict) – Dict with keywords passed to Learner.predict(). Useful if a Learner subclass implements a custom predict() method.

Raises

ValueError – If return_format is not one of the allowed values.

Returns

If return_format

is ‘z’, the returned value is an iterator of whatever type the predictions are. Otherwise, the returned value is an iterator of tuples.

Return type

Union[Iterator[Any], Iterator[Tuple[Any, …]]]

predict_dataset(dataset: Dataset, return_format: Literal['xyz', 'yz', 'z'] = 'z', raw_out: bool = True, numpy_out: bool = False, predict_kw: dict = {}, dataloader_kw: dict = {}, progress_bar: bool = True, progress_bar_kw: dict = {}) Union[Iterator[Any], Iterator[Tuple[Any, ...]]][source]#

Returns an iterator over predictions on the given dataset.

Parameters
  • dataset (Dataset) – The dataset to make predictions on.

  • return_format (Literal['xyz', 'yz', 'z'], optional) – Format of the return elements of the returned iterator. Must be one of: ‘xyz’, ‘yz’, and ‘z’. If ‘xyz’, elements are 3-tuples of x, y, and z. If ‘yz’, elements are 2-tuples of y and z. If ‘z’, elements are (non-tuple) values of z. Where x = input image, y = ground truth, and z = prediction. Defaults to ‘z’.

  • raw_out (bool, optional) – If true, return raw predicted scores. Defaults to True.

  • numpy_out (bool, optional) – If True, convert predictions to numpy arrays before returning. Defaults to False.

  • predict_kw (dict) – Dict with keywords passed to Learner.predict(). Useful if a Learner subclass implements a custom predict() method.

  • dataloader_kw (dict) – Dict with keywords passed to the DataLoader constructor.

  • progress_bar (bool, optional) – If True, display a progress bar. Since this function returns an iterator, the progress bar won’t be visible until the iterator is consumed. Defaults to True.

  • progress_bar_kw (dict) – Dict with keywords passed to tqdm.

Raises

ValueError – If return_format is not one of the allowed values.

Returns

If return_format

is ‘z’, the returned value is an iterator of whatever type the predictions are. Otherwise, the returned value is an iterator of tuples.

Return type

Union[Iterator[Any], Iterator[Tuple[Any, …]]]

prob_to_pred(x: torch.Tensor) torch.Tensor[source]#

Convert a Tensor with prediction probabilities to class ids.

The class ids should be the classes with the maximum probability.

Parameters

x (torch.Tensor) –

Return type

torch.Tensor

run_tensorboard()[source]#

Run TB server serving logged stats.

save_model_bundle()[source]#

Save a model bundle.

This is a zip file with the model weights in .pth format and a serialized copy of the LearningConfig, which allows for making predictions in the future.

setup_data()[source]#

Set datasets and dataLoaders for train, validation, and test sets.

setup_loss(loss_def_path: Optional[str] = None) None[source]#

Setup self.loss.

Parameters
  • loss_def_path (str, optional) – Loss definition path. Will be

  • None. (available when loading from a bundle. Defaults to) –

Return type

None

setup_model(model_weights_path: Optional[str] = None, model_def_path: Optional[str] = None) None[source]#

Setup self.model.

Parameters
  • model_weights_path (Optional[str], optional) – Path to model weights. Will be available when loading from a bundle. Defaults to None.

  • model_def_path (Optional[str], optional) – Path to model definition. Will be available when loading from a bundle. Defaults to None.

Return type

None

setup_tensorboard()[source]#

Setup for logging stats to TB.

setup_training(loss_def_path: Optional[str] = None) None[source]#
Parameters

loss_def_path (Optional[str]) –

Return type

None

stop_tensorboard()[source]#

Stop TB logging and server if it’s running.

sync_from_cloud()[source]#

Sync any previous output in the cloud to output_dir.

sync_to_cloud()[source]#

Sync any output to the cloud at output_uri.

to_batch(x: torch.Tensor) torch.Tensor[source]#

Ensure that image array has batch dimension.

Parameters

x (torch.Tensor) – assumed to be either image or batch of images

Returns

x with extra batch dimension of length 1 if needed

Return type

torch.Tensor

to_device(x: Any, device: str) Any[source]#

Load Tensors onto a device.

Parameters
  • x (Any) – some object with Tensors in it

  • device (str) – ‘cpu’ or ‘cuda’

Returns

x but with any Tensors in it on the device

Return type

Any

train(epochs: Optional[int] = None)[source]#

Training loop that will attempt to resume training if appropriate.

Parameters

epochs (Optional[int]) –

train_end(outputs: List[Dict[str, float]], num_samples: int) Dict[str, float][source]#

Aggregate the ouput of train_step at the end of the epoch.

Parameters
  • outputs (List[Dict[str, float]]) – a list of outputs of train_step

  • num_samples (int) – total number of training samples processed in epoch

Return type

Dict[str, float]

train_epoch(optimizer: Optimizer, step_scheduler: Optional[_LRScheduler] = None) Dict[str, float][source]#

Train for a single epoch.

Parameters
  • optimizer (Optimizer) –

  • step_scheduler (Optional[_LRScheduler]) –

Return type

Dict[str, float]

abstract train_step(batch: Any, batch_ind: int) Dict[str, float][source]#

Compute loss for a single training batch.

Parameters
  • batch (Any) – batch data needed to compute loss

  • batch_ind (int) – index of batch within epoch

Returns

dict with ‘train_loss’ as key and possibly other losses

Return type

Dict[str, float]

validate_end(outputs: List[Dict[str, float]], num_samples: int) Dict[str, float][source]#

Aggregate the ouput of validate_step at the end of the epoch.

Parameters
  • outputs (List[Dict[str, float]]) – a list of outputs of validate_step

  • num_samples (int) – total number of validation samples processed in epoch

Return type

Dict[str, float]

validate_epoch(dl: torch.utils.data.DataLoader) Dict[str, float][source]#

Validate for a single epoch.

Parameters

dl (torch.utils.data.DataLoader) –

Return type

Dict[str, float]

abstract validate_step(batch: Any, batch_ind: int) Dict[str, float][source]#

Compute metrics on validation batch.

Parameters
  • batch (Any) – batch data needed to compute validation metrics

  • batch_ind (int) – index of batch within epoch

Returns

dict with metric names mapped to metric values

Return type

Dict[str, float]