ClassificationImageDataConfig#

Note

All Configs are derived from rastervision.pipeline.config.Config, which itself is a pydantic Model.

pydantic model ClassificationImageDataConfig[source]#

Configure ClassificationImageDatasets.

Show JSON schema
{
   "title": "ClassificationImageDataConfig",
   "description": "Configure :class:`ClassificationImageDatasets <.ClassificationImageDataset>`.",
   "type": "object",
   "properties": {
      "class_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/ClassConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Class config."
      },
      "img_channels": {
         "anyOf": [
            {
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The number of channels of the training images.",
         "title": "Img Channels"
      },
      "img_sz": {
         "default": 256,
         "description": "Length of a side of each image in pixels. This is the size to transform it to during training, not the size in the raw dataset.",
         "exclusiveMinimum": 0,
         "title": "Img Sz",
         "type": "integer"
      },
      "train_sz": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "If set, the number of training images to use. If fewer images exist, then an exception will be raised.",
         "title": "Train Sz"
      },
      "train_sz_rel": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "If set, the proportion of training images to use.",
         "title": "Train Sz Rel"
      },
      "num_workers": {
         "default": 4,
         "description": "Number of workers to use when DataLoader makes batches.",
         "title": "Num Workers",
         "type": "integer"
      },
      "augmentors": {
         "default": [
            "RandomRotate90",
            "HorizontalFlip",
            "VerticalFlip"
         ],
         "description": "Names of albumentations augmentors to use for training batches. Choices include: ['Blur', 'RandomRotate90', 'HorizontalFlip', 'VerticalFlip', 'GaussianBlur', 'GaussNoise', 'RGBShift', 'ToGray']. Alternatively, a custom transform can be provided via the aug_transform option.",
         "items": {
            "type": "string"
         },
         "title": "Augmentors",
         "type": "array"
      },
      "base_transform": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "An Albumentations transform serialized as a dict that will be applied to all datasets: training, validation, and test. This transformation is in addition to the resizing due to img_sz. This is useful for, for example, applying the same normalization to all datasets.",
         "title": "Base Transform"
      },
      "aug_transform": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "An Albumentations transform serialized as a dict that will be applied as data augmentation to the training dataset. This transform is applied before base_transform. If provided, the augmentors option is ignored.",
         "title": "Aug Transform"
      },
      "plot_options": {
         "anyOf": [
            {
               "$ref": "#/$defs/PlotOptions"
            },
            {
               "type": "null"
            }
         ],
         "default": {
            "transform": {
               "__version__": "1.4.14",
               "transform": {
                  "__class_fullname__": "rastervision.pytorch_learner.utils.utils.MinMaxNormalize",
                  "dtype": 5,
                  "max_val": 1.0,
                  "min_val": 0.0,
                  "p": 1.0
               }
            },
            "channel_display_groups": null,
            "type_hint": "plot_options"
         },
         "description": "Options to control plotting."
      },
      "preview_batch_limit": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Optional limit on the number of items in the preview plots produced during training.",
         "title": "Preview Batch Limit"
      },
      "type_hint": {
         "const": "classification_image_data",
         "default": "classification_image_data",
         "enum": [
            "classification_image_data"
         ],
         "title": "Type Hint",
         "type": "string"
      },
      "data_format": {
         "allOf": [
            {
               "$ref": "#/$defs/ClassificationDataFormat"
            }
         ],
         "default": "image_folder"
      },
      "uri": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "One of the following:\n(1) a URI of a directory containing \"train\", \"valid\", and (optionally) \"test\" subdirectories;\n(2) a URI of a zip file containing (1);\n(3) a list of (2);\n(4) a URI of a directory containing zip files containing (1).",
         "title": "Uri"
      },
      "group_uris": {
         "anyOf": [
            {
               "items": {
                  "anyOf": [
                     {
                        "type": "string"
                     },
                     {
                        "items": {
                           "type": "string"
                        },
                        "type": "array"
                     }
                  ]
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "This can be set instead of uri in order to specify groups of chips. Each element in the list is expected to be an object of the same form accepted by the uri field. The purpose of separating chips into groups is to be able to use the group_train_sz field.",
         "title": "Group Uris"
      },
      "group_train_sz": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "If group_uris is set, this can be used to specify the number of chips to use per group. Only applies to training chips. This can either be a single value that will be used for all groups or a list of values (one for each group).",
         "title": "Group Train Sz"
      },
      "group_train_sz_rel": {
         "anyOf": [
            {
               "maximum": 1.0,
               "minimum": 0.0,
               "type": "number"
            },
            {
               "items": {
                  "maximum": 1.0,
                  "minimum": 0.0,
                  "type": "number"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Relative version of group_train_sz. Must be a float in [0, 1]. If group_uris is set, this can be used to specify the proportion of the total chips in each group to use per group. Only applies to training chips. This can either be a single value that will be used for all groups or a list of values (one for each group).",
         "title": "Group Train Sz Rel"
      }
   },
   "$defs": {
      "ClassConfig": {
         "additionalProperties": false,
         "description": "Configure class information for a machine learning task.",
         "properties": {
            "names": {
               "description": "Names of classes. The i-th class in this list will have class ID = i.",
               "items": {
                  "type": "string"
               },
               "title": "Names",
               "type": "array"
            },
            "colors": {
               "anyOf": [
                  {
                     "items": {
                        "anyOf": [
                           {
                              "type": "string"
                           },
                           {
                              "items": {},
                              "type": "array"
                           }
                        ]
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Colors used to visualize classes. Can be color strings accepted by matplotlib or RGB tuples. If None, a random color will be auto-generated for each class.",
               "title": "Colors"
            },
            "null_class": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Optional name of class in `names` to use as the null class. This is used in semantic segmentation to represent the label for imagery pixels that are NODATA or that are missing a label. If None and the class names include \"null\", it will automatically be used as the null class. If None, and this Config is part of a SemanticSegmentationConfig, a null class will be added automatically.",
               "title": "Null Class"
            },
            "type_hint": {
               "const": "class_config",
               "default": "class_config",
               "enum": [
                  "class_config"
               ],
               "title": "Type Hint",
               "type": "string"
            }
         },
         "required": [
            "names"
         ],
         "title": "ClassConfig",
         "type": "object"
      },
      "ClassificationDataFormat": {
         "const": "image_folder",
         "enum": [
            "image_folder"
         ],
         "title": "ClassificationDataFormat",
         "type": "string"
      },
      "PlotOptions": {
         "additionalProperties": false,
         "description": "Config related to plotting.",
         "properties": {
            "transform": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": {
                  "__version__": "1.4.14",
                  "transform": {
                     "__class_fullname__": "rastervision.pytorch_learner.utils.utils.MinMaxNormalize",
                     "dtype": 5,
                     "max_val": 1.0,
                     "min_val": 0.0,
                     "p": 1.0
                  }
               },
               "description": "An Albumentations transform serialized as a dict that will be applied to each image before it is plotted. Mainly useful for undoing any data transformation that you do not want included in the plot, such as normalization. The default value will shift and scale the image so the values range from 0.0 to 1.0 which is the expected range for the plotting function. This default is useful for cases where the values after normalization are close to zero which makes the plot difficult to see.",
               "title": "Transform"
            },
            "channel_display_groups": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "items": {
                           "minimum": 0,
                           "type": "integer"
                        },
                        "type": "array"
                     },
                     "type": "object"
                  },
                  {
                     "items": {
                        "items": {
                           "minimum": 0,
                           "type": "integer"
                        },
                        "type": "array"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Groups of image channels to display together as a subplot when plotting the data and predictions. Can be a list or tuple of groups (e.g. [(0, 1, 2), (3,)]) or a dict containing title-to-group mappings (e.g. {\"RGB\": [0, 1, 2], \"IR\": [3]}), where each group is a list or tuple of channel indices and title is a string that will be used as the title of the subplot for that group.",
               "title": "Channel Display Groups"
            },
            "type_hint": {
               "const": "plot_options",
               "default": "plot_options",
               "enum": [
                  "plot_options"
               ],
               "title": "Type Hint",
               "type": "string"
            }
         },
         "title": "PlotOptions",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • validate_assignment: bool = True

Fields:
Validators:
field aug_transform: dict | None = None#

An Albumentations transform serialized as a dict that will be applied as data augmentation to the training dataset. This transform is applied before base_transform. If provided, the augmentors option is ignored.

Validated by:
field augmentors: list[str] = ['RandomRotate90', 'HorizontalFlip', 'VerticalFlip']#

Names of albumentations augmentors to use for training batches. Choices include: [‘Blur’, ‘RandomRotate90’, ‘HorizontalFlip’, ‘VerticalFlip’, ‘GaussianBlur’, ‘GaussNoise’, ‘RGBShift’, ‘ToGray’]. Alternatively, a custom transform can be provided via the aug_transform option.

Validated by:
  • validate_augmentors

  • validate_group_uris

  • validate_plot_options

field base_transform: dict | None = None#

An Albumentations transform serialized as a dict that will be applied to all datasets: training, validation, and test. This transformation is in addition to the resizing due to img_sz. This is useful for, for example, applying the same normalization to all datasets.

Validated by:
field class_config: ClassConfig | None = None#

Class config.

Validated by:
  • validate_group_uris

  • validate_plot_options

field data_format: ClassificationDataFormat = ClassificationDataFormat.image_folder#
Validated by:
  • validate_group_uris

  • validate_plot_options

field group_train_sz: int | list[int] | None = None#

If group_uris is set, this can be used to specify the number of chips to use per group. Only applies to training chips. This can either be a single value that will be used for all groups or a list of values (one for each group).

Validated by:
  • validate_group_uris

  • validate_plot_options

field group_train_sz_rel: Proportion | list[Proportion] | None = None#

Relative version of group_train_sz. Must be a float in [0, 1]. If group_uris is set, this can be used to specify the proportion of the total chips in each group to use per group. Only applies to training chips. This can either be a single value that will be used for all groups or a list of values (one for each group).

Validated by:
  • validate_group_uris

  • validate_plot_options

field group_uris: list[str | list[str]] | None = None#

This can be set instead of uri in order to specify groups of chips. Each element in the list is expected to be an object of the same form accepted by the uri field. The purpose of separating chips into groups is to be able to use the group_train_sz field.

Validated by:
  • validate_group_uris

  • validate_plot_options

field img_channels: PosInt | None = None#

The number of channels of the training images.

Validated by:
  • validate_group_uris

  • validate_plot_options

field img_sz: PosInt = 256#

Length of a side of each image in pixels. This is the size to transform it to during training, not the size in the raw dataset.

Constraints:
  • gt = 0

Validated by:
  • validate_group_uris

  • validate_plot_options

field num_workers: int = 4#

Number of workers to use when DataLoader makes batches.

Validated by:
  • validate_group_uris

  • validate_plot_options

field plot_options: PlotOptions | None = PlotOptions(transform={'__version__': '1.4.14', 'transform': {'__class_fullname__': 'rastervision.pytorch_learner.utils.utils.MinMaxNormalize', 'p': 1.0, 'min_val': 0.0, 'max_val': 1.0, 'dtype': 5}}, channel_display_groups=None)#

Options to control plotting.

Validated by:
  • validate_group_uris

  • validate_plot_options

field preview_batch_limit: int | None = None#

Optional limit on the number of items in the preview plots produced during training.

Validated by:
  • validate_group_uris

  • validate_plot_options

field train_sz: int | None = None#

If set, the number of training images to use. If fewer images exist, then an exception will be raised.

Validated by:
  • validate_group_uris

  • validate_plot_options

field train_sz_rel: float | None = None#

If set, the proportion of training images to use.

Validated by:
  • validate_group_uris

  • validate_plot_options

field type_hint: Literal['classification_image_data'] = 'classification_image_data'#
Validated by:
  • validate_group_uris

  • validate_plot_options

field uri: str | list[str] | None = None#

One of the following: (1) a URI of a directory containing “train”, “valid”, and (optionally) “test” subdirectories; (2) a URI of a zip file containing (1); (3) a list of (2); (4) a URI of a directory containing zip files containing (1).

Validated by:
  • validate_group_uris

  • validate_plot_options

build(tmp_dir: str) tuple[torch.utils.data.dataset.Dataset, torch.utils.data.dataset.Dataset, torch.utils.data.dataset.Dataset]#

Build an instance of the corresponding type of object using this config.

For example, BackendConfig will build a Backend object. The arguments to this method will vary depending on the type of Config.

Parameters:

tmp_dir (str) –

Return type:

tuple[torch.utils.data.dataset.Dataset, torch.utils.data.dataset.Dataset, torch.utils.data.dataset.Dataset]

build_dataset(split: Literal['train', 'valid', 'test'], tmp_dir: str | None = None) Dataset#

Build and return dataset for a single split.

Parameters:
  • split (Literal['train', 'valid', 'test']) –

  • tmp_dir (str | None) –

Return type:

Dataset

classmethod deserialize(inp: str | dict | Config) Self#

Deserialize Config from a JSON file or dict, upgrading if possible.

If inp is already a Config, it is returned as is.

Parameters:

inp (str | dict | Config) – a URI to a JSON file or a dict.

Return type:

Self

dir_to_dataset(data_dir: str, transform: BasicTransform) ClassificationImageDataset[source]#
Parameters:
  • data_dir (str) –

  • transform (BasicTransform) –

Return type:

ClassificationImageDataset

classmethod from_dict(cfg_dict: dict) Self#

Deserialize Config from a dict.

Parameters:

cfg_dict (dict) – Dict to deserialize.

Return type:

Self

classmethod from_file(uri: str) Self#

Deserialize Config from a JSON file, upgrading if possible.

Parameters:

uri (str) – URI to load from.

Return type:

Self

get_bbox_params() albumentations.core.bbox_utils.BboxParams | None#

Returns BboxParams used by albumentations for data augmentation.

Return type:

albumentations.core.bbox_utils.BboxParams | None

get_custom_albumentations_transforms() list[dict]#

Returns all custom transforms found in this config.

This should return all serialized albumentations transforms with a ‘lambda_transforms_path’ field contained in this config or in any of its members no matter how deeply neseted.

The purpose is to make it easier to adjust their paths all at once while saving to or loading from a bundle.

Return type:

list[dict]

get_data_dirs(uri: str | list[str], unzip_dir: str) list[str]#

Extract data dirs from uri.

Data dirs are directories containing “train”, “valid”, and (optionally) “test” subdirectories.

Parameters:
  • uri (str | list[str]) –

    A URI or a list of URIs of one of the following:

    1. a URI of a directory containing “train”, “valid”, and (optionally) “test” subdirectories

    2. a URI of a zip file containing (1)

    3. a list of (2)

    4. a URI of a directory containing zip files containing (1)

  • unzip_dir (str) – Directory where zip files will be extracted to, if needed.

Returns:

Paths to directories that each contain contents of one zip file.

Return type:

list[str]

get_data_transforms() tuple[albumentations.core.transforms_interface.BasicTransform, albumentations.core.transforms_interface.BasicTransform]#

Get albumentations transform objects for data augmentation.

Returns a 2-tuple of a “base” transform and an augmentation transform. The base transform comprises a resize transform based on img_sz followed by the transform specified in base_transform. The augmentation transform comprises the base transform followed by either the transform in aug_transform (if specified) or the transforms in the augmentors field.

The augmentation transform is intended to be used for training data, and the base transform for all other data where data augmentation is not desirable, such as validation or prediction.

Returns:

base transform and augmentation transform.

Return type:

tuple[albumentations.core.transforms_interface.BasicTransform, albumentations.core.transforms_interface.BasicTransform]

random_subset_dataset(ds: Dataset, size: int | None = None, fraction: Optional[float] = None) Subset#
Parameters:
Return type:

Subset

recursive_validate_config()#

Recursively validate hierarchies of Configs.

This uses reflection to call validate_config on a hierarchy of Configs using a depth-first pre-order traversal.

revalidate()#

Re-validate an instantiated Config.

Runs all Pydantic validators plus self.validate_config().

to_file(uri: str, with_rv_metadata: bool = True) None#

Save a Config to a JSON file, optionally with RV metadata.

Parameters:
  • uri (str) – URI to save to.

  • with_rv_metadata (bool) – If True, inject Raster Vision metadata such as plugin_versions, so that the config can be upgraded when loaded.

Return type:

None

unzip_data(zip_uris: list[str], unzip_dir: str) list[str]#

Unzip dataset zip files.

Parameters:
  • zip_uris (list[str]) – A list of URIs of zip files:

  • unzip_dir (str) – Directory where zip files will be extracted to.

Returns:

Paths to directories that each contain contents of one zip file.

Return type:

list[str]

update(*args, **kwargs)#

Update any fields before validation.

Subclasses should override this to provide complex default behavior, for example, setting default values as a function of the values of other fields. The arguments to this method will vary depending on the type of Config.

validator validate_augmentors  »  augmentors#
Parameters:

v (list[str]) –

Return type:

list[str]

validate_config()#

Validate fields that should be checked after update is called.

This is to complement the builtin validation that Pydantic performs at the time of object construction.

validator validate_group_uris  »  all fields#
Return type:

Self

validate_list(field: str, valid_options: list[str])#

Validate a list field.

Parameters:
  • field (str) – name of field to validate

  • valid_options (list[str]) – values that field is allowed to take

Raises:

ConfigError – if field is invalid

validator validate_plot_options  »  all fields#
Return type:

Self

property class_colors#
property class_names#
property num_classes#