CHANGELOG#

Raster Vision 0.21.3#

  • Features:

    • Allow reading pre-chipped datasets with non-RGB TIFF chips (#1932)

  • Fixes:

    • Normalize pixel values in the Spacenet Vegas examples (#1930)

    • Account for bbox when saving predictions (#1931)

    • Ensure SS datasets always return label array with correct dtype (#1954)

    • Fix bug in Visualizer when plotting temporal data w/ batch size 1 (#1958)

    • Allow specifying chip_sz in StatsTransform.from_raster_sources() (#1933)

    • Misc. minor fixes (#1933)

  • Docs:

    • Update release instructions to simplify patch release process (#1934)

  • Maintenance:

    • Bump pillow to address CVE-2023-4863 (#1952)

    • Update CI and release workflows to free up disk space before building docker image (#1953, #1959)

Raster Vision 0.21.2#

  • Features:

    • Save model weights for each epoch 1720 (#1921)

  • Bug fixes:

    • Do not require every plugin recorded in the model bundle to be installed when using it (#1916)

    • Fix rastervision.core dependencies (#1920)

  • Docs:

    • Document config upgrader mechanism (#1917)

  • Maintenance:

    • Update GitHub actions’ versions (#1913, #1926)

    • Add scripts for building packages and publishing them to PyPi (#1915)

    • Add unit tests for VsiFileSystem (#1918)

Raster Vision 0.21.1#

  • Bug fixes:

    • Chipping: Try hard to return a window, but fail gracefully if not (#1898)

    • Fix inconsistent handling of RasterSourceConfig.bbox’s type (#1899)

  • Refactoring:

    • log.warn -> log.warning (#1901)

  • Dependencies:

    • Bump pycocotools from 2.0.6 to 2.0.7 in /rastervision_pytorch_learner (#1893)

  • Docs:

  • Docker:

    • Update Docker build to improve caching and image size (#1866, #1897)

    • Replace miniconda with micromamba in the Docker image (#1870, #1897)

  • CI:

    • Split CI tests into smaller pieces ; prune docker (#1873, #1874)

Raster Vision 0.21#

This release brings some exciting new functionality to Raster Vision.

Highlights:

API changes:

  • To crop the extent of a RasterSource (or LabelSource), you now have to specify bbox instead of extent. The term “extent”, as used in the codebase, has also been redefined to always be the box Box(0, 0, height, width), where height and width are the height and width of the bbox.

  • GeoJSONVectorSource can now take a list of URIs, allowing geometries to be read from multiple files.

  • VectorOutputConfig (and subclasses) no longer require uri to be specified.

Features#

  • Add XarraySource to make it easier to consume imagery fetched from a STAC API (#1764)

  • Add experimental ONNX support (#1792)

  • Add support for temporal data (#1803, #1815)

Fixes/minor improvements/refactoring#

  • Improve efficiency of positive-window sampling in ObjectDetectionRandomWindowGeoDataset by filtering labels by AOI (#1705)

  • Misc object detection fixes and improvements (#1711)

  • Allow GeoJSONVectorSource to accept multiple URIs (#1712)

  • Allow specifying extra args for default model in ModelConfig (#1713)

  • Ensure RasterSource and LabelSource extents match up in Scene (#1740)

  • Allow all constituent object detection losses to be logged (#1716)

  • Remove the uri field from VectorOutputConfig (#1762)

  • Fix bugs related to extent-cropping (#1774, #1786, #1793)

  • Fix legend placement in SemanticSegmentationVisualizer plots (#1783)

  • Misc. refactoring and fixes (#1838)

  • Update tutorial notebooks + misc. minor changes (#1839)

  • Improve geometry-related validation in Scene and GeoJSONVectorSource and fix a bug in AoiSampler (#1856)

Development/maintenance#

  • Disable PDF build of docs (#1714)

  • Improve Codecov exclusion settings, add some more unit tests, and add a unit test README (#1717)

  • Fix CI errors (#1763)

  • Factor out numpy-like array indexing implementation and add unit tests (#1765)

  • Remove deprecated codecov dependency (#1775)

  • Add CITATION.cff (#1789, #1790)

  • Minor refactoring of learner.py for readability (#1791)

  • Conform to new torchvision API for specifying pretrained weights (#1794)

  • Use more concise cross-referencing syntax in docs (#1809)

  • Misc. documentation improvements (#1840)

  • Update dependencies (#1749, #1756, #1760, #1761, #1797, #1798, #1799, #1805, #1811)

  • Pre-release fixes and improvements (#1857)


Raster Vision 0.20.2#

  • Bump triangle from version 20200424 to 20220202 in rastervision_pytorch_learner (#1580)

  • Update example plugin __init__.py files to include registry.set_plugin_version() calls (#1665)

  • Add error handling for empty DataLoader in Visualizer.get_batch() (#1672)

  • Only set default stride if stride value is missing in GeoDataWindowConfig (#1674)

  • Minor doc and type-hint fixes and refactoring for OD (#1675, #1676)


Raster Vision 0.20.1#

Fixes#

  • Do not install rastervision_gdal_vsi by default (#1622)

  • Do not set cfg.model.pretrained=False in Learner.from_model_bundle() (#1626)

  • Fix docker build errors (#1629)

  • Documentation:

    • Improve docstrings for most commonly used classes and configs (#1630)

    • Minor textual fixes for the pre-chipped datasets tutorial (#1623)

    • Add comment about password for the ISPRS Potsdam dataset (#1627)

  • README:

    • fix broken links (#1608)

    • make CV-tasks image slightly smaller (#1624)


Raster Vision 0.20#

This release brings major improvements to Raster Vision’s usability as well as its usefulness.

Whereas previously Raster Vision was a framework where users could configure a pipeline and then let it run, it is now also a library from which users can pick individual components and use them to build new things.

We have also significantly improved the documentation. Most notably, it now contains detailed tutorial notebooks as well a full API reference. The documentation for the Raster Vision pipeline, which used to make up most of the documentation in previous versions, is now located in the The Raster Vision Pipeline section.

In terms of features, some highlights are:

  • Support for multiband imagery, introduced in v0.13 for semantic segmentation, is now also available for chip classification and object detection. (#1345)

  • Improved data fusion: the MultiRasterSource can now combine RasterSources with varying extents and resolutions. (#1308)

  • You can now discard edges of predicted chips in semantic segmentation in order to reduce boundary artifacts (#1486). This can be used in addition to the previously introduced ability to average overlapping regions in adjacent chips.

  • Progress-bars will now be shown for all downloads and uploads as well as other time-consuming operations that take longer than 5 seconds.

  • Improved caching of downloads: Raster Vision can now cache downloads. Also a bug that caused Raster Vision to download the same image multiple times has been fixed, resulting in significant speedups.

Warning

This release breaks backward-compatibility with previous versions.

Features#

  • Extend multiband support to all tasks (#1345)

  • Add support for external models for object detection (#1337)

  • Allow MultiRasterSource to read from sub raster sources with non-identical extents and resolutions (#1308)

  • Allow discarding edges of predicted chips in semantic segmentation (#1486)

  • Add numpy-like array indexing and slicing to RasterSource and LabelSource (#1470)

  • Make RandomWindowGeoDataset more efficient when sampling chips from scenes with sparse AOIs (#1225)

  • Add support for Albumentations’ lambda transforms (#1368)

  • Provide grouping mechanism for scenes and use it in the analyze and eval stages (#1375)

  • Update STAC-reading functionality to make it compatible with STAC v1.0.* (#1243)

  • Add progress bars for downloads and uploads (#1343)

  • Allow caching downloads (#1450)

Refactoring#

  • Refactor Learner and related configs to be more flexible and easier to use in a notebook (#1413)

  • Refactor to make it easier to programmatically make predictions on new scenes (#1434)

  • Refactor: make Evaluator easier to use independently (#1438)

  • Refactor vector data handling (#1437, #1461)

  • Add GeoDataset.from_uris() for convenient initialization of GeoDatasets (#1462, #1588)

  • Add Labels.save() convenience method (#1486)

  • Factor out dataset visualization into a Visualizer class (#1476)

  • Replace STRTree with GeoPandas GeoDataFrame-based spatial joins in ChipClassificaitonLabelSource and RasterizedSource (#1470)

  • Remove ActivateMixin entirely (#1470)

  • Remove the mask-to-polygons dependency (#1470)

Documentation#

Fixes#

  • Speed up RGBClassTransformer by an order of magnitude (#1485)

  • Fix rastervision_pipeline entry point to ensure commands from other plugins are available (#1250)

  • Fix incorrect F1 scores when aggregating evals for scenes in the eval stage (#1386)

  • Fix bug in semantic segmentation prediction output paths (#1354)

  • Do not zero out null class pixels when creating semantic segmentation training chips (#1556)

  • Fix a bug in DataConfig validation and refactor ClassConfig (#1436)

  • Fix #1052 (#1451)

  • Fix #991 and #1452 (#1484)

  • Fix #1430 (#1495)

  • Misc. fixes (#1260, #1281, #1453)

Development/maintenance#

  • Make the semantic segmentation integration test more deterministic (#1261)

  • Migrate from Travis to GitHub Actions (#1218)

  • Add Github issue templates (#1242, #1288, #1420)

  • Switch from Gitter to Github Discussions (#1464, #1465)

  • Update cloudformation template to allow use of on-demand GPU instances (#1482)

  • Add option to build ARM64 Docker image (#1545, #1559)

  • Make docker/run automatically find a free port for Jupyter server if the default port is already taken (#1558)

  • Set tutorial-notebooks path as the default jupyter path in docker/run (#1595)


Raster Vision 0.13.1#

Bug Fixes#

  • Fix image plot by adding default plot transform #1144

Raster Vision 0.13#

This release presents a major jump in Raster Vision’s power and flexibility. The most significant changes are:

Support arbitrary models and loss functions (#985, #992)#

Raster Vision is no longer restricted to using the built in models and loss functions. It is now possible to import models and loss functions from a GitHub repo or a URI or a zip file as long as they interface correctly with RV’s learner code. This means that you can now easily swap models in your existing training pipelines, allowing you to take advantage of the latest models or to make customizations that help with your specific task; all with minimal changes.

This is made possible by PyTorch’s hub module.

Currently not supported for Object Detection.

Support for multiband images (even with Transfer Learning) (#972)#

It is now possible to train on imagery with more than 3 channels. Raster Vision automatically modifies the model to be able to accept more than 3 channels. If using pretrained models, the pre-learned weights are retained.

The model modification cannot be performed automatically when using an external model. But as long as the external model supports multiband inputs, it will work correctly with RV.

Currently only supported for Semantic Segmentation.

Support for reading directly from raster sources during training without chipping (#1046)#

It is no longer necessary to go through a chip stage to produce a training dataset. You can instead provide the DatasetConfig directly to the PyTorch backend and RV will sample training chips on the fly during training. All the examples now use this as the default. Check them out to see how to use this feature.

Support for arbitrary Albumentations transforms (#1001)#

It is now possible to supply an arbitrarily complicated Albumentations transform for data augmentation. In the DataConfig subclasses, you can specify a base_transform that is applied every time (i.e. in training, validation, and prediction), an aug_transform that is only applied during training, and a plot_transform (via PlotOptions) to ensure that sample images are plotted correctly (e.g. use plot_transform to rescale a normalized image to 0-1).

Allow streaming reads from Rasterio sources (#1020)#

It is now possible to stream chips from a remote RasterioSource without first downloading the entire file. To enable, set allow_streaming=True in the RasterioSourceConfig.

Analyze stage no longer necessary when using non-uint8 rasters (#972)#

It is no longer necessary to go through an analyze stage to be able to convert non-uint8 rasters to uint8 chips. Chips can now be stored as numpy arrays, and will be normalized to float during training/prediction based on their specific data type. See spacenet_vegas.py for example usage.

Currently only supported for Semantic Segmentation.

Features#

  • Add support for multiband images #972

  • Add support for vector output to predict command #980

  • Add support for weighted loss for classification and semantic segmentation #977

  • Add multi raster source #978

  • Add support for fetching and saving external model definitions #985

  • Add support for external loss definitions #992

  • Upgrade to pyproj 2.6 #1000

  • Add support for arbitrary albumentations transforms #1001

  • Minor tweaks to regression learner #1013

  • Add ability to specify number of PyTorch reader processes #1008

  • Make img_sz specifiable #1012

  • Add ignore_last_class capability to segmentation #1017

  • Add filtering capability to segmentation sliding window chip generation #1018

  • Add raster transformer to remove NaNs from float rasters, add raster transformers to cast to arbitrary numpy types #1016

  • Add plot options for regression #1023

  • Add ability to use fewer channels w/ pretrained models #1026

  • Remove 4GB file size limit from VSI file system, allow streaming reads #1020

  • Add reclassification transformer for segmentation label rasters #1024

  • Allow filtering out chips based on proportion of NODATA pixels #1025

  • Allow ignore_last_class to take either a boolean or the literal ‘force’; in the latter case validation of that argument is skipped so that it can be used with external loss functions #1027

  • Add ability to crop raster source extent #1030

  • Accept immediate geometries in SceneConfig #1033

  • Only perform normalization on unsigned integer types #1028

  • Make group_uris specifiable and add group_train_sz_rel #1035

  • Make number of training and dataloader previews independent of batch size #1038

  • Allow continuing training from a model bundle #1022

  • Allow reading directly from raster source during training without chipping #1046

  • Remove external commands (obsoleted by external architectures and loss functions) #1047

  • Allow saving SS predictions as probabilities #1057

  • Update CUDA version from 10.1 to 10.2 #1115

  • Add integration tests for the nochip functionality #1116

  • Update examples to make use of the nochip functionality by default #1116

Bug Fixes#

  • Update all relevant saved URIs in config before instantiating Pipeline #993

  • Pass verbose flag to batch jobs #988

  • Fix: Ensure Integer class_id #990

  • Use --ipc=host by default when running the docker container #1077


Raster Vision 0.12#

This release presents a major refactoring of Raster Vision intended to simplify the codebase, and make it more flexible and customizable.

To learn about how to upgrade existing experiment configurations, perhaps the best approach is to read the source code of the Examples to get a feel for the new syntax. Unfortunately, existing predict packages will not be usable with this release, and upgrading and re-running the experiments will be necessary. For more advanced users who have written plugins or custom commands, the internals have changed substantially, and we recommend reading Architecture and Customization.

Since the changes in this release are sweeping, it is difficult to enumerate a list of all changes and associated PRs. Therefore, this change log describes the changes at a high level, along with some justifications and pointers to further documentation.

Simplified Configuration Schema#

We are still using a modular, programmatic approach to configuration, but have switched to using a Config base class which uses the Pydantic library. This allows us to define configuration schemas in a declarative fashion, and let the underlying library handle serialization, deserialization, and validation. In addition, this has allowed us to DRY up the configuration code, eliminate the use of Protobufs, and represent configuration from plugins in the same fashion as built-in functionality. To see the difference, compare the configuration code for ChipClassificationLabelSource in 0.11 (label_source.proto and chip_classification_label_source_config.py), and in 0.12 (chip_classification_label_source_config.py).

Abstracted out Pipelines#

Raster Vision includes functionality for running computational pipelines in local and remote environments, but previously, this functionality was tightly coupled with the “domain logic” of machine learning on geospatial data in the Experiment abstraction. This made it more difficult to add and modify commands, as well as use this functionality in other projects. In this release, we factored out the experiment running code into a separate rastervision.pipeline package, which can be used for defining, configuring, customizing, and running arbitrary computational pipelines.

Reorganization into Plugins#

The rest of Raster Vision is now written as a set of optional plugins that have Pipelines which implement the “domain logic” of machine learning on geospatial data. Implementing everything as optional (pip installable) plugins makes it easier to install subsets of Raster Vision functionality, eliminates separate code paths for built-in and plugin functionality, and provides (de facto) examples of how to write plugins. See Codebase Overview for more details.

More Flexible PyTorch Backends#

The 0.10 release added PyTorch backends for chip classification, semantic segmentation, and object detection. In this release, we abstracted out the common code for training models into a flexible Learner base class with subclasses for each of the computer vision tasks. This code is in the rastervision.pytorch_learner plugin, and is used by the Backends in rastervision.pytorch_backend. By decoupling Backends and Learners, it is now easier to write arbitrary Pipelines and new Backends that reuse the core model training code, which can be customized by overriding methods such as build_model. See Customizing Raster Vision.

Removed Tensorflow Backends#

The Tensorflow backends and associated Docker images have been removed. It is too difficult to maintain backends for multiple deep learning frameworks, and PyTorch has worked well for us. Of course, it’s still possible to write Backend plugins using any framework.

Other Changes#

  • For simplicity, we moved the contents of the raster-vision-examples and raster-vision-aws repos into the main repo. See Examples and Setup AWS Batch using CloudFormation.

  • To help people bootstrap new projects using RV, we added Bootstrap new projects with a template.

  • All the PyTorch backends now offer data augmentation using albumentations.

  • We removed the ability to automatically skip running commands that already have output, “tree workflows”, and “default providers”. We also unified the Experiment, Command, and Task classes into a single Pipeline class which is subclassed for different computer vision (or other) tasks. These features and concepts had little utility in our experience, and presented stumbling blocks to outside contributors and plugin writers.

  • Although it’s still possible to add new VectorSources and other classes for reading data, our philosophy going forward is to prefer writing pre-processing scripts to get data into the format that Raster Vision can already consume. The VectorTileVectorSource was removed since it violates this new philosophy.

  • We previously attempted to make predictions for semantic segmentation work in a streaming fashion (to avoid running out of RAM), but the implementation was buggy and complex. So we reverted to holding all predictions for a scene in RAM, and now assume that scenes are roughly < 20,000 x 20,000 pixels. This works better anyway from a parallelization standponit.

  • We switched to writing chips to disk incrementally during the CHIP command using a SampleWriter class to avoid running out of RAM.

  • The term “predict package” has been replaced with “model bundle”, since it rolls off the tongue better, and BUNDLE is the name of the command that produces it.

  • Class ids are now indexed starting at 0 instead of 1, which seems more intuitive. The “null class”, used for marking pixels in semantic segmentation that have not been labeled, used to be 0, and is now equal to len(class_ids).

  • The aws_batch runner was renamed batch due to a naming conflict, and the names of the configuration variables for Batch changed. See Running on AWS Batch.

Future Work#

The next big features we plan on developing are:

  • the ability to read and write data in STAC format using the label extension. This will facilitate integration with other tools such as GroundWork.


Raster Vision 0.11#

Features#

  • Added the possibility for chip classification to use data augmentors from the albumentations library to enhance the training data. #859

  • Updated the Quickstart doc with pytorch docker image and model #863

  • Added the possibility to deal with class imbalances through oversampling. #868


Raster Vision 0.11.0#

Bug Fixes#

  • Ensure randint args are ints #849

  • The augmentors were not serialized properly for the chip command #857

  • Fix problems with pretrained flag #860

  • Correctly get_local_path for some zxy tile URIS #865


Raster Vision 0.10#

Raster Vision 0.10.0#

Notes on switching to PyTorch-based backends#

The current backends based on Tensorflow have several problems:

  • They depend on third party libraries (Deeplab, TF Object Detection API) that are complex, not well suited to being used as dependencies within a larger project, and are each written in a different style. This makes the code for each backend very different from one other, and unnecessarily complex. This increases the maintenance burden, makes it difficult to customize, and makes it more difficult to implement a consistent set of functionality between the backends.

  • Tensorflow, in the maintainer’s opinion, is more difficult to write and debug than PyTorch (although this is starting to improve).

  • The third party libraries assume that training images are stored as PNG or JPG files. This limits our ability to handle more than three bands and more that 8-bits per channel. We have recently completed some research on how to train models on > 3 bands, and we plan on adding this functionality to Raster Vision.

Therefore, we are in the process of sunsetting the Tensorflow backends (which will probably be removed) and have implemented replacement PyTorch-based backends. The main things to be aware of in upgrading to this version of Raster Vision are as follows:

  • Instead of there being CPU and GPU Docker images (based on Tensorflow), there are now tf-cpu, tf-gpu, and pytorch (which works on both CPU and GPU) images. Using ./docker/build --tf or ./docker/build --pytorch will only build the TF or PyTorch images, respectively.

  • Using the TF backends requires being in the TF container, and similar for PyTorch. There are now --tf-cpu, --tf-gpu, and --pytorch-gpu options for the ./docker/run command. The default setting is to use the PyTorch image in the standard (CPU) Docker runtime.

  • The raster-vision-aws CloudFormation setup creates Batch resources for TF-CPU, TF-GPU, and PyTorch. It also now uses default AMIs provided by AWS, simplifying the setup process.

  • To easily switch between running TF and PyTorch jobs on Batch, we recommend creating two separate Raster Vision profiles with the Batch resources for each of them.

  • The way to use the ConfigBuilders for the new backends can be seen in the examples repo and the Backend reference

Features#

  • Add confusion matrix as metric for semantic segmentation #788

  • Add predict_chip_size as option for semantic segmentation #786

  • Handle “ignore” class for semantic segmentation #783

  • Add stochastic gradient descent (“SGD”) as an optimizer option for chip classification #792

  • Add option to determine if all touched pixels should be rasterized for rasterized RasterSource #803

  • Script to generate GeoTIFF from ZXY tile server #811

  • Remove QGIS plugin #818

  • Add PyTorch backends and add PyTorch Docker image #821 and #823.

Bug Fixes#

  • Fixed issue with configuration not being able to read lists #784

  • Fixed ConfigBuilders not supporting type annotations in __init__ #800


Raster Vision 0.9#

Raster Vision 0.9.0#

Features#

  • Add requester_pays RV config option #762

  • Unify Docker scripts #743

  • Switch default branch to master #726

  • Merge GeoTiffSource and ImageSource into RasterioSource #723

  • Simplify/clarify/test/validate RasterSource #721

  • Simplify and generalize geom processing #711

  • Predict zero for nodata pixels on semantic segmentation #701

  • Add support for evaluating vector output with AOIs #698

  • Conserve disk space when dealing with raster files #692

  • Optimize StatsAnalyzer #690

  • Include per-scene eval metrics #641

  • Make and save predictions and do eval chip-by-chip #635

  • Decrease semseg memory usage #630

  • Add support for vector tiles in .mbtiles files #601

  • Add support for getting labels from zxy vector tiles #532

  • Remove custom __deepcopy__ implementation from ConfigBuilders. #567

  • Add ability to shift raster images by given numbers of meters. #573

  • Add ability to generate GeoJSON segmentation predictions. #575

  • Add ability to run the DeepLab eval script. #653

  • Submit CPU-only stages to a CPU queue on Aws. #668

  • Parallelize CHIP and PREDICT commands #671

  • Refactor update_for_command to split out the IO reporting into report_io. #671

  • Add Multi-GPU Support to DeepLab Backend #590

  • Handle multiple AOI URIs #617

  • Give train_restart_dir Default Value #626

  • Use `make to manage local execution #664

  • Optimize vector tile processing #676

Bug Fixes#

  • Fix Deeplab resume bug: update path in checkpoint file #756

  • Allow Spaces in --channel-order Argument #731

  • Fix error when using predict packages with AOIs #674

  • Correct checkpoint name #624

  • Allow using default stride for semseg sliding window #745

  • Fix filter_by_aoi for ObjectDetectionLabels #746

  • Load null channel_order correctly #733

  • Handle Rasterio crs that doesn’t contain EPSG #725

  • Fixed issue with saving semseg predictions for non-georeferenced imagery #708

  • Fixed issue with handling width > height in semseg eval #627

  • Fixed issue with experiment configs not setting key names correctly #576

  • Fixed issue with Raster Sources that have channel order #576


Raster Vision 0.8#

Raster Vision 0.8.1#

Bug Fixes#

  • Allow multiploygon for chip classification #523

  • Remove unused args for AWS Batch runner #503

  • Skip over lines when doing chip classification, Use background_class_id for scenes with no polygons #507

  • Fix issue where get_matching_s3_keys fails when suffix is None #497