DataConfig#

Note

All Configs are derived from rastervision.pipeline.config.Config, which itself is a pydantic Model.

pydantic model DataConfig[source]#

Config related to dataset for training and testing.

Show JSON schema
{
   "title": "DataConfig",
   "description": "Config related to dataset for training and testing.",
   "type": "object",
   "properties": {
      "class_names": {
         "title": "Class Names",
         "description": "Names of classes.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "class_colors": {
         "title": "Class Colors",
         "description": "Colors used to display classes. Can be color 3-tuples in list form.",
         "type": "array",
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "type": "array",
                  "minItems": 3,
                  "maxItems": 3,
                  "items": [
                     {
                        "type": "integer"
                     },
                     {
                        "type": "integer"
                     },
                     {
                        "type": "integer"
                     }
                  ]
               }
            ]
         }
      },
      "img_channels": {
         "title": "Img Channels",
         "description": "The number of channels of the training images.",
         "exclusiveMinimum": 0,
         "type": "integer"
      },
      "img_sz": {
         "title": "Img Sz",
         "description": "Length of a side of each image in pixels. This is the size to transform it to during training, not the size in the raw dataset.",
         "default": 256,
         "exclusiveMinimum": 0,
         "type": "integer"
      },
      "train_sz": {
         "title": "Train Sz",
         "description": "If set, the number of training images to use. If fewer images exist, then an exception will be raised.",
         "type": "integer"
      },
      "train_sz_rel": {
         "title": "Train Sz Rel",
         "description": "If set, the proportion of training images to use.",
         "type": "number"
      },
      "num_workers": {
         "title": "Num Workers",
         "description": "Number of workers to use when DataLoader makes batches.",
         "default": 4,
         "type": "integer"
      },
      "augmentors": {
         "title": "Augmentors",
         "description": "Names of albumentations augmentors to use for training batches. Choices include: ['Blur', 'RandomRotate90', 'HorizontalFlip', 'VerticalFlip', 'GaussianBlur', 'GaussNoise', 'RGBShift', 'ToGray']. Alternatively, a custom transform can be provided via the aug_transform option.",
         "default": [
            "RandomRotate90",
            "HorizontalFlip",
            "VerticalFlip"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "base_transform": {
         "title": "Base Transform",
         "description": "An Albumentations transform serialized as a dict that will be applied to all datasets: training, validation, and test. This transformation is in addition to the resizing due to img_sz. This is useful for, for example, applying the same normalization to all datasets.",
         "type": "object"
      },
      "aug_transform": {
         "title": "Aug Transform",
         "description": "An Albumentations transform serialized as a dict that will be applied as data augmentation to the training dataset. This transform is applied before base_transform. If provided, the augmentors option is ignored.",
         "type": "object"
      },
      "plot_options": {
         "title": "Plot Options",
         "description": "Options to control plotting.",
         "default": {
            "transform": {
               "__version__": "1.4.1",
               "transform": {
                  "__class_fullname__": "rastervision.pytorch_learner.utils.utils.MinMaxNormalize",
                  "always_apply": false,
                  "p": 1.0,
                  "min_val": 0.0,
                  "max_val": 1.0,
                  "dtype": 5
               }
            },
            "channel_display_groups": null,
            "type_hint": "plot_options"
         },
         "allOf": [
            {
               "$ref": "#/definitions/PlotOptions"
            }
         ]
      },
      "preview_batch_limit": {
         "title": "Preview Batch Limit",
         "description": "Optional limit on the number of items in the preview plots produced during training.",
         "type": "integer"
      },
      "type_hint": {
         "title": "Type Hint",
         "default": "data",
         "enum": [
            "data"
         ],
         "type": "string"
      }
   },
   "additionalProperties": false,
   "definitions": {
      "PlotOptions": {
         "title": "PlotOptions",
         "description": "Config related to plotting.",
         "type": "object",
         "properties": {
            "transform": {
               "title": "Transform",
               "description": "An Albumentations transform serialized as a dict that will be applied to each image before it is plotted. Mainly useful for undoing any data transformation that you do not want included in the plot, such as normalization. The default value will shift and scale the image so the values range from 0.0 to 1.0 which is the expected range for the plotting function. This default is useful for cases where the values after normalization are close to zero which makes the plot difficult to see.",
               "default": {
                  "__version__": "1.4.1",
                  "transform": {
                     "__class_fullname__": "rastervision.pytorch_learner.utils.utils.MinMaxNormalize",
                     "always_apply": false,
                     "p": 1.0,
                     "min_val": 0.0,
                     "max_val": 1.0,
                     "dtype": 5
                  }
               },
               "type": "object"
            },
            "channel_display_groups": {
               "title": "Channel Display Groups",
               "description": "Groups of image channels to display together as a subplot when plotting the data and predictions. Can be a list or tuple of groups (e.g. [(0, 1, 2), (3,)]) or a dict containing title-to-group mappings (e.g. {\"RGB\": [0, 1, 2], \"IR\": [3]}), where each group is a list or tuple of channel indices and title is a string that will be used as the title of the subplot for that group.",
               "anyOf": [
                  {
                     "type": "object",
                     "additionalProperties": {
                        "type": "array",
                        "items": {
                           "type": "integer",
                           "minimum": 0
                        }
                     }
                  },
                  {
                     "type": "array",
                     "items": {
                        "type": "array",
                        "items": {
                           "type": "integer",
                           "minimum": 0
                        }
                     }
                  }
               ]
            },
            "type_hint": {
               "title": "Type Hint",
               "default": "plot_options",
               "enum": [
                  "plot_options"
               ],
               "type": "string"
            }
         },
         "additionalProperties": false
      }
   }
}

Config
  • extra: str = forbid

  • validate_assignment: bool = True

Fields
Validators
field aug_transform: Optional[dict] = None#

An Albumentations transform serialized as a dict that will be applied as data augmentation to the training dataset. This transform is applied before base_transform. If provided, the augmentors option is ignored.

Validated by
field augmentors: List[str] = ['RandomRotate90', 'HorizontalFlip', 'VerticalFlip']#

Names of albumentations augmentors to use for training batches. Choices include: [‘Blur’, ‘RandomRotate90’, ‘HorizontalFlip’, ‘VerticalFlip’, ‘GaussianBlur’, ‘GaussNoise’, ‘RGBShift’, ‘ToGray’]. Alternatively, a custom transform can be provided via the aug_transform option.

Validated by
field base_transform: Optional[dict] = None#

An Albumentations transform serialized as a dict that will be applied to all datasets: training, validation, and test. This transformation is in addition to the resizing due to img_sz. This is useful for, for example, applying the same normalization to all datasets.

Validated by
field class_colors: Optional[List[Union[str, Tuple[int, int, int]]]] = None#

Colors used to display classes. Can be color 3-tuples in list form.

Validated by
field class_names: List[str] = []#

Names of classes.

Validated by
field img_channels: Optional[PositiveInt] = None#

The number of channels of the training images.

Constraints
  • exclusiveMinimum = 0

Validated by
field img_sz: PositiveInt = 256#

Length of a side of each image in pixels. This is the size to transform it to during training, not the size in the raw dataset.

Constraints
  • exclusiveMinimum = 0

Validated by
field num_workers: int = 4#

Number of workers to use when DataLoader makes batches.

Validated by
field plot_options: Optional[PlotOptions] = PlotOptions(transform={'__version__': '1.4.1', 'transform': {'__class_fullname__': 'rastervision.pytorch_learner.utils.utils.MinMaxNormalize', 'always_apply': False, 'p': 1.0, 'min_val': 0.0, 'max_val': 1.0, 'dtype': 5}}, channel_display_groups=None)#

Options to control plotting.

Validated by
field preview_batch_limit: Optional[int] = None#

Optional limit on the number of items in the preview plots produced during training.

Validated by
field train_sz: Optional[int] = None#

If set, the number of training images to use. If fewer images exist, then an exception will be raised.

Validated by
field train_sz_rel: Optional[float] = None#

If set, the proportion of training images to use.

Validated by
field type_hint: Literal['data'] = 'data'#
Validated by
build(tmp_dir: Optional[str] = None) Tuple[torch.utils.data.Dataset, torch.utils.data.Dataset, torch.utils.data.Dataset][source]#

Build and return train, val, and test datasets.

Parameters

tmp_dir (Optional[str]) –

Return type

Tuple[torch.utils.data.Dataset, torch.utils.data.Dataset, torch.utils.data.Dataset]

build_dataset(split: Literal['train', 'valid', 'test'], tmp_dir: Optional[str] = None) torch.utils.data.Dataset[source]#

Build and return dataset for a single split.

Parameters
Return type

torch.utils.data.Dataset

validator ensure_class_colors  »  all fields[source]#
Parameters

values (dict) –

Return type

dict

classmethod from_file(uri: str) Config#

Deserialize a Config from a JSON file, upgrading if possible.

Parameters

uri (str) – URI to load from.

Return type

Config

get_bbox_params() Optional[BboxParams][source]#

Returns BboxParams used by albumentations for data augmentation.

Return type

Optional[BboxParams]

get_custom_albumentations_transforms() List[dict][source]#

Returns all custom transforms found in this config.

This should return all serialized albumentations transforms with a ‘lambda_transforms_path’ field contained in this config or in any of its members no matter how deeply neseted.

The purpose is to make it easier to adjust their paths all at once while saving to or loading from a bundle.

Return type

List[dict]

get_data_transforms() Tuple[BasicTransform, BasicTransform][source]#

Get albumentations transform objects for data augmentation.

Returns a 2-tuple of a “base” transform and an augmentation transform. The base transform comprises a resize transform based on img_sz followed by the transform specified in base_transform. The augmentation transform comprises the base transform followed by either the transform in aug_transform (if specified) or the transforms in the augmentors field.

The augmentation transform is intended to be used for training data, and the base transform for all other data where data augmentation is not desirable, such as validation or prediction.

Returns

base transform and augmentation transform.

Return type

Tuple[BasicTransform, BasicTransform]

random_subset_dataset(ds: torch.utils.data.Dataset, size: Optional[int] = None, fraction: Optional[ConstrainedFloatValue] = None) torch.utils.data.Subset[source]#
Parameters
Return type

torch.utils.data.Subset

recursive_validate_config()#

Recursively validate hierarchies of Configs.

This uses reflection to call validate_config on a hierarchy of Configs using a depth-first pre-order traversal.

revalidate()#

Re-validate an instantiated Config.

Runs all Pydantic validators plus self.validate_config().

Adapted from: https://github.com/samuelcolvin/pydantic/issues/1864#issuecomment-679044432

to_file(uri: str, with_rv_metadata: bool = True) None#

Save a Config to a JSON file, optionally with RV metadata.

Parameters
  • uri (str) – URI to save to.

  • with_rv_metadata (bool) – If True, inject Raster Vision metadata such as plugin_versions, so that the config can be upgraded when loaded.

Return type

None

update(*args, **kwargs)#

Update any fields before validation.

Subclasses should override this to provide complex default behavior, for example, setting default values as a function of the values of other fields. The arguments to this method will vary depending on the type of Config.

validator validate_augmentors  »  augmentors[source]#
Parameters

v (str) –

Return type

str

validate_config()#

Validate fields that should be checked after update is called.

This is to complement the builtin validation that Pydantic performs at the time of object construction.

validate_list(field: str, valid_options: List[str])#

Validate a list field.

Parameters
  • field (str) – name of field to validate

  • valid_options (List[str]) – values that field is allowed to take

Raises

ConfigError – if field is invalid

validator validate_plot_options  »  all fields[source]#
Parameters

values (dict) –

Return type

dict

property num_classes#