Codebase Design Patterns¶
Configuration vs Entity¶
In Raster Vision we keep a separation between configuration of a thing and the creation of the thing itself.
This allows us to keep the client environment, i.e. the environment that is running the rastervision
CLI
application, and the runner environment, i.e. the environment that is actually running commands, totally separate. This means you can install Raster Vision and run experiments on a machine that doesn’t have a GPU or any machine learning library installed, but can issue commands to an environment that does. This also lets us work with configuration on the client side very quickly, and leave all the heavy lifting to the runner side.
This separation is expressed in a core design principle that is seen across the codebase: the use of the Config
and ConfigBuilder
classes.
Config¶
The Config
class represents the configuration of a component of the experiment. It is a declarative encapsulation of exactly what we want to run, without actually running anything. We are able to serialize Configs, and because they describe exactly what we want to do, they become historical artifacts about what happened, messages for running on remote systems, and records that let us repeat experiments and verify results.
The construction of configuration can include some heavy logic, and we want a clean separation from the Config
and the way we build it. This is why each Config
has a separate ConfigBuilder
class.
ConfigBuilder¶
The ConfigBuilder
classes are the main interaction point for users of Raster Vision. They are generally instantiated when client code calls the static .builder()
method on the Config
. If there are multiple types of builders, a key is used to state which builder should be returned (e.g. with rv.BackendConfig.builder(rv.KERAS_CLASSIFICATION)
. The usage of keys to return specific builder types allows for two things: 1. a standard interface for constructing builders that only changes based on the parameter passed in, and 2. a way for plugins to register their own keys, so that using plugins feels exactly like using core Raster Vision code.
The ConfigBuilders are immutable data structures that use a fluent builder pattern. When you call a method on a builder that sets a property, what you’re actually doing is creating a copy of the builder and returning it. Not modifying internal state allows us to fork builders into different transformed objects without having to worry about modifying the internal properties of the builders earlier in the chain of modifications. Using a fluent builder pattern also gives us a readable and standard way of creating and transforming ConfigBuilders and Configs.
The ConfigBuilder
also has a .validate()
method that is called whenever .build()
is called, which gives the ConfigBuilder
the chance to make sure all required properties are set and are sane. One major advantage of using the ConfigBuilder
pattern over simply having long __init__
methods on Config
objects is that you can set up builders in one part of the code, without setting required properties, and pass it off to another decoupled part of the code that can use the builder further. As long as the required properties are set before build()
is called, you can set as little or as many properties as you want.
Fluent Builder Pattern¶
The ConfigBuilders in Raster Vision use a fluent builder design pattern. This allows the composition and chaining together of transformations on builders, which encourages readable configuration code. The usage of builders is always as follows:
The
Config
type (SceneConfig
,TaskConfig
, etc) will always be available through the top level import (which generally isimport rastervision as rv
)The
ConfigBuilder
is created from the staticbuilder
method on theConfig
class, e.g.rv.TaskConfig.builder(rv.OBJECT_DETECTION)
. Keys for builder types are also always exposed in the top level package (unless your key is for a custom plugin, in which case you’re on your own).The builder is then transformed using the .with_*() methods. Each call to a .with_*() method returns a new copy of the builder with the modifications set, which means you can chain them together. This is the “fluent” part of the fluent builder pattern.
You call
.build()
when you are ready for your fully bakedConfig
object.
You can also call .to_builder()
on any Config
object, which lets you move between the Config
and ConfigBuilder
space easily. This is useful when you want to take a config that was deserialized or constructed in some other way and use it as a base for further transformation.
Global Registry¶
Another major design pattern of Raster Vision is the use of a global registry. This is what gives the ability for the single interface to construct all subclass builders through the static builder()
method on the Config
via a key, e.g. rv.RasterSourceConfig.builder(rv.GEOTIFF_SOURCE)
. The key is used to look up what ConfigBuilders are registered inside the global registery, and the registry determines what builder to return from the build()
call. More importantly, this enables Raster Vision to have a flexible system to create Plugins out of anything that has a keyed ConfigBuilder
. The registry pattern goes beyond Configs and ConfigBuilders, though: this is also how internal classes and plugins are chosen for Default Providers, ExperimentRunners, and FileSystems.
Configuration Topics¶
Configuration objects have a couple of methods that require some understanding if you’d like deeper knowledge of how Raster Vision works - for example if you are creating plugins.
Implicit Configuration¶
Configuration values can be set implicitly from other configuration. For example, if my backend
requires a model_uri
to save a model to, and it is not set, the configuration may set
it to /opt/data/rv_root/train/experiment-name/model.hdf
. This was implicitly set by knowing the
root URI for the train command is /opt/data/rv_root/train/experiment-name
, which is set on the
experiment (by default constructed from the root_uri
and experiment_id
).
The mechanism that allows this is that configurations
implement a method called update_for_command
, with the following signature:
-
class
rastervision.core.
Config
¶ -
update_for_command
(command_type, experiment_config, context=None, io_def=None)¶ Updates this configuration for the given command
Note: While configuration is immutable for client facing operations, this is an internal operation and mutates the configuration.
- Args:
- command_type: The command type that is currently being
preprocessed. experiment_config: The experiment configuration that this configuration is a part of.
- context: Optional list of parent configurations, to allow for child
configurations contained in collections to understand their context in the experiment configuration.
- Returns:
Nothing. Call should mutate the configuration object itself.
-
This method is called before running commands on an experiment, and gives the configuration a
chance to update any values it needs to based on the experiment and any other context it needs.
The context argument is, for example, the SceneConfig
that the configuration is attached
to (e.g. a RasterSourceConfig
). Context should be set whenever a parent configuration
calls update_for_command
on child configuration, when that parent configuration is part
of a collection of configurations (e.g., the collection of SceneConfig
s in a DataSetConfig
).
Reporting IO¶
Raster Vision requires that configuration reports on its input and output files, which allows it to tie
together commands into a Directed Acyclic Graph of operations that the ExperimentRunner
s can execute.
The way this reporting happens is through the report_io
method on Config
.
-
class
rastervision.core.
Config
-
report_io
(command_type, io_def)¶ Updates the given CommandIODefinition.
So that it includes the inputs, outputs, and missing files for this configuration at this command.
- Args:
command_type: The command type that is currently being preprocessed. io_def: The CommandIODefinition that this call should modify.
- Returns: Nothing. This call should make the appropriate calls to the
given io_def to mutate its state.
-
For each specific command, configuration should set any input files or directories onto the io_def
through the add add_input
method, and set any output files or directories using the add_output
method.
If a configuration does not correctly report on its IO, it could result in commands not running or
rerunning happening even though output already exists and the --rerun
flag is not used. This
can be a common pitfall for plugin development, and care should be taken to ensure that IO is
properly being reported. The --dry-run
flag with the -v
verbosity flag can be useful here
for ensuring the IO that is reported is what is expected.