This page was generated from reading_raster_data.ipynb.


If running outside of the Docker image, you might need to set a couple of environment variables manually. You can do it like so:

import os
from subprocess import check_output

os.environ['GDAL_DATA'] = check_output('pip show rasterio | grep Location | awk \'{print $NF"/rasterio/gdal_data/"}\'', shell=True).decode().strip()
os.environ['AWS_NO_SIGN_REQUEST'] = 'YES'

See this Colab notebook for an example.

Reading raster data#

Reading rasters using RasterSource#

The RasterSource is Raster Vision’s abstraction for windowed reading from a raster image.

One concrete implementation of it is the RasterioSource which is a wrapper around the rasterio library and allows reading from all file formats supported by it.

We can create one from an image like shown below. allow_streaming=True allows us to take advantage of rasterio’s remote-file-reading capabilities; setting it to False will cause Raster Vision to download the image.

from import RasterioSource

img_uri = 's3://azavea-research-public-data/raster-vision/examples/spacenet/RGB-PanSharpen_AOI_2_Vegas_img205.tif'
raster_source = RasterioSource(img_uri, allow_streaming=True)
(650, 650, 3)

Reading chips#

RasterSources support numpy-like array slicing, so we can read a smaller chip within the full raster like so:

chip = raster_source[:400, :400]
(400, 400, 3)

The returned chip is a numpy array which we can plot using matplotlib. Note that since the values are uint16, we first normalize them before plotting.

from matplotlib import pyplot as plt

colors_mins = chip.reshape(-1, chip.shape[-1]).min(axis=0)
colors_maxs = chip.reshape(-1, chip.shape[-1]).max(axis=0)
chip_normalized = (chip - colors_mins) / (colors_maxs - colors_mins)

fig, ax = plt.subplots(figsize=(5, 5))

We can even do slightly fancier indexing. The example below reads a 400x400 chip from the the first band in the raster, subsampled to 100x100.

chip = raster_source[:400:4, :400:4, [0]]
print('chip.shape:', chip.shape)

fig, ax = plt.subplots(figsize=(5, 5))
chip.shape: (100, 100, 1)

The fancy slicing is just syntactic-sugar for the RasterioSource.get_chip() method. The last call is internally translated to the following call to the get_chip() method:

from import Box

chip = raster_source.get_chip(
    window=Box(ymin=0, xmin=0, ymax=400, xmax=400),
    out_shape=(100, 100),
(100, 100, 1)

Learn more about the handy Box class here.

Transforming rasters using RasterTransformers#

RasterSources accept a list of RasterTransformers, all of which are automatically applied (in the order specified) to each chip sampled from that RasterSource.

Below we’ll look at two such `RasterTransformer`s:

But Raster Vision also provides the following:

  • CastTransformer: type-cast chip.

  • NanTransformer: map NaN values to another value.

  • ReclassTransformer: map values to other values using a given mapping; most useful for modifying class IDs in semantic segmentation labels.

  • RGBClassTransformer: if your semantic segmentation labels are in the form of an RGB raster with different colors representing different classes, use this to map them to class IDs.


Above, we manually min-max normalized the uint16 chip to 0-1. If we wanted this normalization to be automatically applied to all chips sampled from a RasterSource, we can simply attach a MinMaxTransformer to that RasterSource.

from import RasterioSource, MinMaxTransformer

img_uri = 's3://azavea-research-public-data/raster-vision/examples/spacenet/RGB-PanSharpen_AOI_2_Vegas_img205.tif'
raster_source = RasterioSource(img_uri, allow_streaming=True)

raster_source_normalized = RasterioSource(
from matplotlib import pyplot as plt

chip = raster_source_normalized[:400, :400]

fig, ax = plt.subplots(figsize=(5, 5))


Another useful RasterTransformer is the StatsTransformer.

Unlike MinMaxTransformer, the StatsTransformer is able to deal with outlier values. It works by using channel means and standard deviations to convert values to z-scores and then clipping them to some number of standard deviations before scaling to 0-255 and converting to uint8.

We can create and use one like so:

from rastervision.core import RasterStats
from import RasterioSource, StatsTransformer

img_uri = 's3://azavea-research-public-data/raster-vision/examples/spacenet/RGB-PanSharpen_AOI_2_Vegas_img205.tif'
raster_source = RasterioSource(img_uri, allow_streaming=True)

stats_transformer = StatsTransformer.from_raster_sources(

raster_source_normalized = RasterioSource(
stats_transformer.means, stats_transformer.stds
(array([[[577.23597778, 716.4319    , 519.44995556]]]),
 array([[[351.254233  , 342.98780916, 200.60693   ]]]))
from matplotlib import pyplot as plt

chip_normalized = raster_source_normalized[:400, :400]

fig, ax = plt.subplots(figsize=(5, 5))

To get a closer look at StatsTransformer’s work, we can visualize each band’s pixel intensity distributions with and without it:

import seaborn as sns

band_names = 'RGB'

fig, (ax_l, ax_r) = plt.subplots(1, 2, squeeze=True, figsize=(12, 6))

# left
for i in range(chip.shape[-1]):
    sns.kdeplot(chip[..., i].flat, ax=ax_l, c=band_names[i].lower(), label=band_names[i])
ax_l.set_ylim((0, 0.01))
ax_l.set_xlabel('Pixel values')
ax_l.set_title('Without StatsTransformer')

# right
for i in range(chip_normalized.shape[-1]):
        chip_normalized[..., i].flat, ax=ax_r, c=band_names[i].lower(), label=band_names[i])
ax_r.set_xlabel('Pixel values')
ax_r.set_title('With StatsTransformer')

Subsetting and reordering bands#

Dealing with multi-band imagery is a commonn use case in the GIS domain. All parts of Raster Vision work seamlessly with arbitrary number of bands.

The following example shows how we can use RasterioSource to sample 6 specific channels from a 191-band hyperspectral image.

Data from:

!wget ""
!apt-get install unzip
!unzip ""

There are 2 ways to do this. We can either instantiate a RasterioSource like normal and sample the subset of bands using array indexing…

from import RasterioSource, MinMaxTransformer

img_uri = 'Hyperspectral_Project/dc.tif'
raster_source_hsi = RasterioSource(img_uri)

raster_source_hsi.shape, raster_source_hsi.dtype
((1280, 307, 191), dtype('int16'))
chip = raster_source_hsi[:100, :100, [10, 30, 50, 70, 110, 130]]
(100, 100, 6)

… Or we can specify the channel_order param while instantiating the RasterioSource, which will automatically restrict all sampled chips to these bands.

raster_source_hsi = RasterioSource(
    channel_order=[10, 30, 50, 70, 110, 130],

raster_source_hsi.shape, raster_source_hsi.dtype
((1280, 307, 6), dtype('uint8'))
chip = raster_source_hsi[200:500, :]
(300, 307, 6)
fig, axs = plt.subplots(2, 3, squeeze=True, figsize=(15, 9))

for i, (ax, ch) in enumerate(zip(axs.flat, raster_source_hsi.channel_order)):
    ax.matshow(chip[..., i])
    ax.set_title(f'Band {ch}')

Combining bands from different files using MultiRasterSource#

Another common use case is combining bands from multiple sources. This can done using a MultiRasterSource.

The following example combines RGB, SWIR, and SAR bands into a single 8-band RasterSource.

Data from: Cloud to Street - Microsoft flood dataset

uri_seninel_2_rgb = ''
uri_seninel_2_swir = ''
uri_seninel_1_vh = ''
uri_seninel_1_vv = ''

First, we create RasterSources for each individual source.

from import RasterioSource, MultiRasterSource, MinMaxTransformer

rs_sentinel_2_rgb = RasterioSource(uri_seninel_2_rgb, allow_streaming=True)
rs_sentinel_2_swir = RasterioSource(uri_seninel_2_swir, allow_streaming=True)

rs_seninel_1_vh = RasterioSource(uri_seninel_1_vh, allow_streaming=True, raster_transformers=[MinMaxTransformer()])
rs_seninel_1_vv = RasterioSource(uri_seninel_1_vv, allow_streaming=True, raster_transformers=[MinMaxTransformer()])

print('rs_sentinel_2_rgb', rs_sentinel_2_rgb.shape, rs_sentinel_2_rgb.dtype)
print('rs_sentinel_2_swir', rs_sentinel_2_swir.shape, rs_sentinel_2_swir.dtype)
print('rs_seninel_1_vh', rs_seninel_1_vh.shape, rs_seninel_1_vh.dtype)
print('rs_seninel_1_vv', rs_seninel_1_vv.shape, rs_seninel_1_vv.dtype)
rs_sentinel_2_rgb (512, 512, 3) uint8
rs_sentinel_2_swir (512, 512, 3) uint8
rs_seninel_1_vh (512, 512, 1) uint8
rs_seninel_1_vv (512, 512, 1) uint8

Next, we combine them into a MultiRasterSource.

raster_sources = [

raster_source_multi = MultiRasterSource(
    raster_sources=raster_sources, primary_source_idx=0)

raster_source_multi.shape, raster_source_multi.dtype
((512, 512, 8), dtype('uint8'))
chip = raster_source_multi[:, :]
(512, 512, 8)
from matplotlib import pyplot as plt

fig, axs = plt.subplots(1, 4, figsize=(20, 5))
(ax_rgb, ax_swir, ax_vh, ax_vv) = axs.flat

ax_rgb.matshow(chip[..., :3])
ax_rgb.set_title('RGB', fontsize=14)
ax_swir.matshow(chip[..., 3:6])
ax_swir.set_title('SWIR', fontsize=14)
ax_vh.matshow(chip[..., 6])
ax_vh.set_title('VH', fontsize=14)
ax_vv.matshow(chip[..., 7])
ax_vv.set_title('VV', fontsize=14)

Cropping the extent#

Sometimes you might want to crop a RasterSource. For example, if you want to use one part of it for training and another for validation.

The following example shows how we can crop out the top 200 pixels and left 200 pixels of the full raster.

from import RasterioSource, MinMaxTransformer
from import Box

img_uri = 's3://azavea-research-public-data/raster-vision/examples/spacenet/RGB-PanSharpen_AOI_2_Vegas_img205.tif'
raster_source_cropped = RasterioSource(
    extent=Box(200, 200, 650, 650),

raster_source_cropped.shape, raster_source_cropped.dtype
((450, 450, 3), dtype('uint8'))
from import Box
from matplotlib import pyplot as plt

chip = raster_source_cropped[:400, :400]

fig, ax = plt.subplots(figsize=(5, 5))
(400, 400, 3)