Contributing

Welcome to the Rasterio project. Here’s how we work.

Code of Conduct

First of all: the Rasterio project has a code of conduct. Please read the CODE_OF_CONDUCT.txt file, it’s important to all of us.

Rights

The BSD license (see LICENSE.txt) applies to all contributions.

Issue Conventions

The Rasterio issue tracker is for actionable issues.

Questions about installation, distribution, and usage should be taken to the project’s general discussion group. Opened issues which fall into one of these three categories may be perfunctorily closed.

Questions about development of Rasterio, brainstorming, requests for comment, and not-yet-actionable proposals are welcome in the project’s developers discussion group. Issues opened in Rasterio’s GitHub repo which haven’t been socialized there may be perfunctorily closed.

Rasterio is a relatively new project and highly active. We have bugs, both known and unknown.

Please search existing issues, open and closed, before creating a new one.

Rasterio employs C extension modules, so bug reports very often hinge on the following details:

  • Operating system type and version (Windows? Ubuntu 20.04? 18.04?)

  • The version and source of Rasterio (PyPI, Anaconda, or somewhere else?)

  • The version and source of GDAL (UbuntuGIS? Homebrew?)

Please provide these details as well as tracebacks and relevant logs. When using the $ rio CLI logging can be enabled with $ rio -v and verbosity can be increased with -vvv. Short scripts and datasets demonstrating the issue are especially helpful!

Design Principles

Rasterio’s API is both similar to and different from GDAL’s API and this is intentional.

  • Rasterio is a library for reading and writing raster datasets. Rasterio uses GDAL but is not a “Python binding for GDAL.”

  • Rasterio aims to hide, or at least contain, the complexity of GDAL.

  • Rasterio always prefers Python’s built-in protocols and types or Numpy protocols and types over concepts from GDAL’s data model.

  • Rasterio keeps I/O separate from other operations. rasterio.open() is the only library function that operates on filenames and URIs. dataset.read(), dataset.write(), and their mask counterparts are the methods that perform I/O.

  • Rasterio methods and functions should be free of side-effects and hidden inputs. This is challenging in practice because GDAL embraces global variables.

  • Rasterio leans on analogies to other familiar Python APIs.

Dataset Objects

Our term for the kind of object that allows read and write access to raster data is dataset object. A dataset object might be an instance of DatasetReader or DatasetWriter. The canonical way to create a dataset object is by using the rasterio.open() function.

This is analogous to Python’s use of file object.

Path Objects

A path object specifies the name and address of a dataset within some space (filesystem, internet, cloud) along with optional parameters. The first positional argument of rasterio.open() is a path. Some path objects also have an open method which can used used to create a dataset object.

Band Objects

Unlike GDAL’s original original data model, rasterio has no band objects. In this way it’s more like GDAL’s multi-dimensional API. A dataset’s read() method returns N-D arrays.

GDAL Context

GDAL depends on some global context: a format driver registry, dataset connection pool, a raster block cache, a file header cache. Rasterio depends on this, too, but unlike GDAL’s official Python bindings, delays initializing this context as long as possible and abstracts it with the help of a Python context manager.

Git Conventions

We use a variant of centralized workflow described in the Git Book. Since Rasterio 1.0 we tag and release versions in the form: x.y.z version from the maint-x.y branch.

Work on features in a new branch of the mapbox/rasterio repo or in a branch on a fork. Create a GitHub pull request when the changes are ready for review. We recommend creating a pull request as early as possible to give other developers a heads up and to provide an opportunity for valuable early feedback.

Conventions

The rasterio namespace contains both Python and C extension modules. All C extension modules are written using Cython. The Cython language is a superset of Python. Cython files end with .pyx and .pxd and are where we keep all the code that calls GDAL’s C functions.

Rasterio works with Python versions 3.6 through 3.9.

We strongly prefer code adhering to PEP8.

Tests are mandatory for new code. We use pytest. Use pytest’s parameterization feature.

We aspire to 100% coverage for Python modules but coverage of the Cython code is a future aspiration (#515).

Use darker to reformat code as you change it. We aren’t going to run black on everything all at once.

Type hints are welcome as a part of refactoring work or new feature development. We aren’t going to make a large initiative about adding hints to everything.

Changes should be noted in CHANGES.txt. New entries go above older entries.

New Containerized Development Environment

Rasterio has a new Dockerfile that can be used to create images and containers for exploring or testing the package.

The command make dockertest will build a Docker image based on one of the official GDAL images, start a container that mounts the working directory, and run python setup.py develop && python -m pytest in the container.

Historical Development Environment

If you prefer not to use the new development environment you may install rasterio’s dependencies directly onto your computer.

Developing Rasterio requires Python 3.6 or any final release after and including 3.10. We prefer developing with the most recent version of Python but recognize this is not possible for all contributors. A C compiler is also required to leverage existing protocols for extending Python with C or C++. See the Windows install instructions in the readme for more information about building on Windows.

Initial Setup

First, clone Rasterio’s git repo:

$ git clone https://github.com/rasterio/rasterio

Development should occur within a virtual environment to better isolate development work from custom environments.

In some cases installing a library with an accompanying executable inside a virtual environment causes the shell to initially look outside the environment for the executable. If this occurs try deactivating and reactivating the environment.

Installing GDAL

The GDAL library and its headers are required to build Rasterio. We do not have currently have guidance for any platforms other than Linux and OS X.

On Linux, GDAL and its headers should be available through your distro’s package manager. For Ubuntu the commands are:

$ sudo add-apt-repository ppa:ubuntugis/ppa
$ sudo apt-get update
$ sudo apt-get install gdal-bin libgdal-dev

On OS X, Homebrew is a reliable way to get GDAL.

$ brew install gdal

Python build requirements

Provision a virtualenv with Rasterio’s build requirements. Rasterio’s setup.py script will not run unless Cython and Numpy are installed, so do this first from the Rasterio repo directory.

Linux users may need to install some additional Numpy dependencies:

$ sudo apt-get install libatlas-dev libatlas-base-dev gfortran

then:

$ pip install -U pip
$ pip install -r requirements-dev.txt

Installing Rasterio

Rasterio, its Cython extensions, normal dependencies, and dev dependencies can be installed with $ pip. Installing Rasterio in editable mode while developing is very convenient but only affects the Python files. Specifying the [test] extra in the command below tells $ pip to also install Rasterio’s dev dependencies.

$ pip install -e .[test]

Any time a Cython (.pyx or .pxd) file is edited the extension modules need to be recompiled, which is most easily achieved with:

$ pip install -e .

When switching between Python versions the extension modules must be recompiled, which can be forced with $ touch rasterio/*.pyx and then re-installing with the command above. If this is not done an error claiming that an object has the wrong size, try recompiling is raised.

The dependencies required to build the docs can be installed with:

$ pip install -e .[docs]

Running the tests

Rasterio’s tests live in tests <tests/> and generally match the main package layout.

To run the entire suite and the code coverage report:

Note: rasterio must be installed in editable mode in order to run tests.

$ python -m pytest --cov rasterio --cov-report term-missing

A single test file:

$ python -m pytest tests/test_band.py

A single test:

$ python -m pytest tests/test_band.py::test_band

Additional Information

More technical information lives on the wiki.

The long term goal is to consolidate into this document.