Python Quickstart

Reading and writing data files is a spatial data programmer’s bread and butter. This document explains how to use Rasterio to read existing files and to create new files. Some advanced topics are glossed over to be covered in more detail elsewhere in Rasterio’s documentation. Only the GeoTIFF format is used here, but the examples do apply to other raster data formats. It is presumed that Rasterio has been installed.

Opening a dataset in reading mode

Consider a GeoTIFF file named example.tif with 16-bit Landsat 8 imagery covering a part of the United States’s Colorado Plateau [1]. Because the imagery is large (70 MB) and has a wide dynamic range it is difficult to display it in a browser. A rescaled and dynamically squashed version is shown below.

Import rasterio to begin.

>>> import rasterio

Next, open the file.

>>> dataset = rasterio.open('example.tif')

Rasterio’s open() function takes a path string or path-like object and returns an opened dataset object. The path may point to a file of any supported raster format. Rasterio will open it using the proper GDAL format driver. Dataset objects have some of the same attributes as Python file objects.

>>> dataset.name
'example.tif'
>>> dataset.mode
'r'
>>> dataset.closed
False

Dataset attributes

Properties of the raster data stored in the example GeoTIFF can be accessed through attributes of the opened dataset object. Dataset objects have bands and this example has a band count of 1.

>>> dataset.count
1

A dataset band is an array of values representing the partial distribution of a single variable in 2-dimensional (2D) space. All band arrays of a dataset have the same number of rows and columns. The variable represented by the example dataset’s sole band is Level-1 digital numbers (DN) for the Landsat 8 Operational Land Imager (OLI) band 4 (wavelengths between 640-670 nanometers). These values can be scaled to radiance or reflectance values. The array of DN values is 7731 columns wide and 7871 rows high.

>>> dataset.width
7731
>>> dataset.height
7871

Some dataset attributes expose the properties of all dataset bands via a tuple of values, one per band. To get a mapping of band indexes to variable data types, apply a dictionary comprehension to the zip() product of a dataset’s DatasetReader.indexes and DatasetReader.dtypes attributes.

>>> {i: dtype for i, dtype in zip(dataset.indexes, dataset.dtypes)}
{1: 'uint16'}

The example file’s sole band contains unsigned 16-bit integer values. The GeoTIFF format also supports signed integers and floats of different size.

Dataset georeferencing

A GIS raster dataset is different from an ordinary image; its elements (or “pixels”) are mapped to regions on the earth’s surface. Every pixels of a dataset is contained within a spatial bounding box.

>>> dataset.bounds
BoundingBox(left=358485.0, bottom=4028985.0, right=590415.0, top=4265115.0)

Our example covers the world from 358485 meters (in this case) to 590415 meters, left to right, and 4028985 meters to 4265115 meters bottom to top. It covers a region 231.93 kilometers wide by 236.13 kilometers high.

The value of DatasetReader.bounds attribute is derived from a more fundamental attribute: the dataset’s geospatial transform.

>>> dataset.transform
Affine(30.0, 0.0, 358485.0,
       0.0, -30.0, 4265115.0)

A dataset’s DatasetReader.transform is an affine transformation matrix that maps pixel locations in (col, row) coordinates to (x, y) spatial positions. The product of this matrix and (0, 0), the column and row coordinates of the upper left corner of the dataset, is the spatial position of the upper left corner.

>>> dataset.transform * (0, 0)
(358485.0, 4265115.0)

The position of the lower right corner is obtained similarly.

>>> dataset.transform * (dataset.width, dataset.height)
(590415.0, 4028985.0)

But what do these numbers mean? 4028985 meters from where? These coordinate values are relative to the origin of the dataset’s coordinate reference system (CRS).

>>> dataset.crs
CRS.from_epsg(32612)

EPSG:32612 identifies a particular coordinate reference system: UTM zone 12N. This system is used for mapping areas in the Northern Hemisphere between 108 and 114 degrees west. The upper left corner of the example dataset, (358485.0, 4265115.0), is 141.5 kilometers west of zone 12’s central meridian (111 degrees west) and 4265 kilometers north of the equator.

Between the DatasetReader.crs and DatasetReader.transform attributes, the georeferencing of a raster dataset is described and the dataset can compared to other GIS datasets.

Reading raster data

Data from a raster band can be accessed by the band’s index number. Following the GDAL convention, bands are indexed from 1.

>>> dataset.indexes
(1,)
>>> band1 = dataset.read(1)

The DatasetReader.read() method returns a numpy.ndarray.

>>> band1
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)

Values from the array can be addressed by their row, column index.

>>> band1[dataset.height // 2, dataset.width // 2]
17491

Spatial indexing

Datasets have an DatasetReader.index() method for getting the array indices corresponding to points in georeferenced space. To get the value for the pixel 100 kilometers east and 50 kilometers south of the dataset’s upper left corner, do the following.

>>> x, y = (dataset.bounds.left + 100000, dataset.bounds.top - 50000)
>>> row, col = dataset.index(x, y)
>>> row, col
(1666, 3333)
>>> band1[row, col]
7566

To get the spatial coordinates of a pixel, use the dataset’s DatasetReader.xy() method. The coordinates of the center of the image can be computed like this.

>>> dataset.xy(dataset.height // 2, dataset.width // 2)
(476550.0, 4149150.0)

Creating data

Reading data is only half the story. Using Rasterio dataset objects, arrays of values can be written to a raster data file and thus shared with other GIS applications such as QGIS.

As an example, consider an array of floating point values representing, e.g., a temperature or pressure anomaly field measured or modeled on a regular grid, 240 columns by 180 rows. The first and last grid points on the horizontal axis are located at 4.0 degrees west and 4.0 degrees east longitude, the first and last grid points on the vertical axis are located at 3 degrees south and 3 degrees north latitude.

>>> import numpy as np
>>> x = np.linspace(-4.0, 4.0, 240)
>>> y = np.linspace(-3.0, 3.0, 180)
>>> X, Y = np.meshgrid(x, y)
>>> Z1 = np.exp(-2 * np.log(2) * ((X - 0.5) ** 2 + (Y - 0.5) ** 2) / 1 ** 2)
>>> Z2 = np.exp(-3 * np.log(2) * ((X + 0.5) ** 2 + (Y + 0.5) ** 2) / 2.5 ** 2)
>>> Z = 10.0 * (Z2 - Z1)

The fictional field for this example consists of the difference of two Gaussian distributions and is represented by the array Z. Its contours are shown below.

Opening a dataset in writing mode

To save this array along with georeferencing information to a new raster data file, call rasterio.open() with a path to the new file to be created, 'w' to specify writing mode, and several keyword arguments.

driver: the name of the desired format driver
width: the number of columns of the dataset
height: the number of rows of the dataset
count: a count of the dataset bands
dtype: the data type of the dataset
crs: a coordinate reference system identifier or description
transform: an affine transformation matrix, and
nodata: a “nodata” value

The first 5 of these keyword arguments parametrize fixed, format-specific properties of the data file and are required when opening a file to write. The last 3 are optional.

In this example the coordinate reference system will be '+proj=latlong', which describes an equirectangular coordinate reference system with units of decimal degrees. The proper affine transformation matrix can be computed from the matrix product of a translation and a scaling.

>>> from rasterio.transform import Affine
>>> res = (x[-1] - x[0]) / 240.0
>>> transform = Affine.translation(x[0] - res / 2, y[0] - res / 2) * Affine.scale(res, res)
>>> transform
Affine(0.033333333333333333, 0.0, -4.0166666666666666,
       0.0, 0.033333333333333333, -3.0166666666666666)

The upper left point in the example grid is at 3 degrees west and 2 degrees north. The raster pixel centered on this grid point extends res / 2, or 1/60 degrees, in each direction, hence the shift in the expression above.

A dataset for storing the example grid is opened like so

>>> new_dataset = rasterio.open(
...     '/tmp/new.tif',
...     'w',
...     driver='GTiff',
...     height=Z.shape[0],
...     width=Z.shape[1],
...     count=1,
...     dtype=Z.dtype,
...     crs='+proj=latlong',
...     transform=transform,
... )

Values for the height, width, and dtype keyword arguments are taken directly from attributes of the 2-D array, Z. Not all raster formats can support the 64-bit float values in Z, but the GeoTIFF format can.

Saving raster data

To copy the grid to the opened dataset, call the new dataset’s DatasetWriter.write() method with the grid and target band number as arguments.

>>> new_dataset.write(Z, 1)

Then call the DatasetWriter.close() method to sync data to disk and finish.

>>> new_dataset.close()

Because Rasterio’s dataset objects mimic Python’s file objects and implement Python’s context manager protocol, it is possible to do the following instead.

with rasterio.open(
    '/tmp/new.tif',
    'w',
    driver='GTiff',
    height=Z.shape[0],
    width=Z.shape[1],
    count=1,
    dtype=Z.dtype,
    crs='+proj=latlong',
    transform=transform,
) as dst:
    dst.write(Z, 1)

These are the basics of reading and writing raster data files. More features and examples are contained in the advanced topics section.