Python Quickstart

Reading and writing data files is a spatial data analyst’s bread and butter. This document explains how to use Rasterio to read existing files and to create new files. Some advanced topics are glossed over to be covered in more detail elsewhere in Rasterio’s documentation. Only the GeoTIFF format is used here, but the examples do apply to other raster data formats. It is presumed that Rasterio has been installed.

Opening a dataset in reading mode

Consider an “example.tif” file with 16-bit Landsat 8 imagery covering a part of the United States’s Colorado Plateau [1]. Because the imagery is large (70 MB) and has a wide dynamic range it is difficult to display it in a browser. A rescaled and dynamically squashed version is shown below.

_images/example.png

Import rasterio to begin.

>>> import rasterio

Next, open the file.

>>> dataset = rasterio.open('example.tif')

Rasterio’s rasterio.open() takes a path and returns a dataset object. The path may point to a file of any supported raster format. Rasterio will open it using the proper GDAL format driver. Dataset objects have some of the same attributes as Python file objects.

>>> dataset.name
'example.tif'
>>> dataset.mode
'r'
>>> dataset.closed
False

Dataset attributes

Properties of the raster data stored in “example.tif” can be accessed through attributes of the dataset object. Dataset objects have bands and this example has a band count of 1.

>>> dataset.count
1

A dataset band is an array of values representing the partial distribution of a single variable in 2-dimensional (2D) space. All band arrays of a dataset have the same number of rows and columns. The variable represented by the example dataset’s sole band is Level-1 digital numbers (DN) for the Landsat 8 Operational Land Imager (OLI) band 4 (wavelengths between 640-670 nanometers). These values can be scaled to radiance or reflectance values. The array of DN values has 7731 columns (its width) and 7871 rows (its height).

>>> dataset.width
7731
>>> dataset.height
7871

Some dataset attributes expose the properties of all dataset bands via a tuple of values, one per band. To get a mapping of band indexes to variable data types, apply a dictionary comprehension to the zip() product of a dataset’s indexes and dtypes attributes.

>>> {i: dtype for i, dtype in zip(dataset.indexes, dataset.dtypes)}
{1: 'uint16'}

The “example.tif” file’s sole band contains unsigned 16-bit integer values. The GeoTIFF format also supports signed integers and floats of different size.

Dataset georeferencing

A GIS raster dataset is different from an ordinary image; its elements (or “pixels”) are mapped to regions on the earth’s surface. Every pixels of a dataset is contained within a spatial bounding box.

>>> dataset.bounds
BoundingBox(left=358485.0, bottom=4028985.0, right=590415.0, top=4265115.0)

Our example covers the world from 358485 meters (in this case) to 590415 meters, left to right, and 4028985 meters to 4265115 meters bottom to top. It covers a region 231.93 kilometers wide by 236.13 kilometers high.

The bounds attribute is derived from a more fundamental attribute: the dataset’s geospatial transform.

>>> dataset.transform
Affine(30.0, 0.0, 358485.0,
       0.0, -30.0, 4265115.0)

The transform attribute is an affine transformation matrix that maps pixel locations in (row, col) coordinates to (x, y) spatial positions. The product of this matrix and (0, 0), the row and column coordinates of the upper left corner of the dataset, is the spatial position of the upper left corner.

>>> dataset.transform * (0, 0)
(358485.0, 4265115.0)

The position of the lower right corner is obtained similarly.

>>> dataset.transform * (dataset.width, dataset.height)
(590415.0, 4028985.0)

But what do these numbers mean? 4028985 meters from where? These coordinate values are relative to the origin of the dataset’s coordinate reference system (CRS).

>>> dataset.crs
CRS({'init': 'epsg:32612'})

“epsg:32612” identifies a particular coordinate reference system: UTM zone 12N. This system is used for mapping areas in the Northern Hemisphere between 108 and 114 degrees west. The upper left corner of the example dataset, (358485.0, 4265115.0), is 141.5 kilometers west of zone 12’s central meridian (111 degrees west) and 4265 kilometers north of the equator.

Coordinate reference systems are an advanced topic. Suffice it to say that between the crs attribute and the transform attribute a raster dataset is geo-referenced and can be compared to other GIS datasets.

Reading raster data

The raster array for a raster band can be accessed by calling dataset.read() with the band’s index number. Following the GDAL convention, bands are indexed from 1.

>>> dataset.indexes
(1,)
>>> band1 = dataset.read(1)

The read() method returns a Numpy N-D array.

>>> band1
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)

Values from the array can be had by their row, column index.

>>> band1[dataset.height // 2, dataset.width // 2]
17491

Spatial indexing

Datasets have a method of getting array indices for spatial points. To get the value for the pixel 100 kilometers east and 50 kilometers south of the dataset’s upper left corner, do the following.

>>> x, y = (dataset.bounds.left + 100000, dataset.bounds.top - 50000)
>>> row, col = dataset.index(x, y)
>>> row, col
(1666, 3333)
>>> band_one[row, col]
7566

To get the spatial coordinates of a pixel, use the dataset’s xy() method. The coordinates of the center of the image can be computed like this.

>>> dataset.xy(dataset.width // 2, dataset.height // 2)
(476550.0, 4149150.0)

Creating data

Reading data is only half the story. Using Rasterio dataset objects, arrays of values can be written to a raster data file and thus shared with other GIS applications such as QGIS.

As an example, consider an array of floating point values representing, e.g., a temperature or pressure anomaly field measured or modeled on a regular grid, 240 columns by 180 rows. The first and last grid points on the horizontal axis are located at 4.0 degrees west and 4.0 degrees east longitude, the first and last grid points on the vertical axis are located at 3 degrees south and 3 degrees north latitude.

>>> import numpy as np
>>> x = np.linspace(-4.0, 4.0, 240)
>>> y = np.linspace(-3.0, 3.0, 180)
>>> X, Y = np.meshgrid(x, y)
>>> Z1 = np.exp(-2 * np.log(2) * ((X - 0.5) ** 2 + (Y - 0.5) ** 2) / 1 ** 2)
>>> Z2 = np.exp(-3 * np.log(2) * ((X + 0.5) ** 2 + (Y + 0.5) ** 2) / 2.5 ** 2)
>>> Z = 10.0 * (Z2 - Z1)

The fictional field for this example consists of the difference of two Gaussian distributions and is represented by the array Z. Its contours are shown below.

_images/field.png

Opening a dataset in writing mode

To save this array along with georeferencing information to a new raster data file, call rasterio.open() with a path to the new file to be created, 'w' to specify writing mode, and several keyword arguments.

  • driver: the name of the desired format driver
  • width: the number of columns of the dataset
  • height: the number of rows of the dataset
  • count: a count of the dataset bands
  • dtype: the data type of the dataset
  • crs: a coordinate reference system identifier or description
  • transform: an affine transformation matrix, and
  • nodata: a “nodata” value

The first 5 of these keyword arguments parametrize fixed, format-specific properties of the data file and are required when opening a file to write. The last 3 are optional.

In this example the coordinate reference system will be “+proj=latlong”, which describes an equirectangular coordinate reference system with units of decimal degrees. The right affine transformation matrix can be computed using a function in the rasterio.transform module.

>>> from rasterio.transform import from_origin
>>> res = (x[-1] - x[0]) / 240.0
>>> transform = from_origin(x[0] - res / 2, y[-1] + res / 2, res, res)
>>> transform
Affine(0.033333333333333333, 0.0, -4.0166666666666666,
       0.0, -0.033333333333333333, 3.0166666666666666)

The upper left point in the example grid is at 3 degrees west and 2 degrees north. The raster pixel centered on this grid point extends res / 2, or 1/60 degrees, in each direction, hence the shift in the expression above.

A dataset for storing the example grid is opened like so

>>> new_dataset = rasterio.open('/tmp/new.tif', 'w', driver='GTiff',
...                             height=Z.shape[0], width=Z.shape[1],
...                             count=1, dtype=Z.dtype,
...                             crs='+proj=latlong', transform=transform)

Values for the height, width, and dtype keyword arguments are taken directly from attributes of the 2-D array, Z. Not all raster formats can support the 64-bit float values in Z, but the GeoTIFF format can.

Saving raster data

To save the grid, call the new dataset’s write() method with the grid and target band number as arguments.

>>> new_dataset.write(Z, 1)

and then call the close() method to sync data to disk and finish.

>>> new_dataset.close()

Because Rasterio’s dataset objects mimic Python’s file objects and implement Python’s context manager protocol, it is possible to do the following instead.

with rasterio.open('/tmp/new.tif', 'w', driver='GTiff', height=Z.shape[0],
                   width=Z.shape[1], count=1, dtype=Z.dtype,
                   crs='+proj=latlong', transform=transform) as dst:
    dst.write(Z, 1)

These are the basics of reading and writing raster data files. More features and examples are contained in the advanced topics section.

[1]“example.tif” is an alias for band 4 of Landsat scene LC80370342016194LGN00.