Virtual Filesystems

Rasterio uses GDAL’s virtual filesystem interface to access datasets on the web, in cloud storage, in archive files, and in Python objects. Rasterio maps familiar URI schemes to GDAL virtual filesystem handlers. For example, the https URI scheme maps to GDAL’s /vsicurl/. The file URI scheme maps to GDAL’s ordinary filesystem handler and is the default for dataset URIs that have no other scheme.

To access a dataset in a local ZIP file like the one in Rasterio’s test suite, preprend zip to the URI of the local file and add the interior path to the dataset after a ! character. For example:

with rasterio.open("zip+file://tests/data/files.zip!RGB.byte.tif") as src:
    print(src.shape)

# Printed:
# (718, 791)

Or use zip as shorthand for zip+file.

with rasterio.open("zip://tests/data/files.zip!RGB.byte.tif") as src:
    print(src.shape)

# Printed:
# (718, 791)

Similarly, datasets in ZIP files served on the web can be accessed by using zip+https.

with rasterio.open("zip+https://github.com/rasterio/rasterio/files/13675561/files.zip!RGB.byte.tif") as src:
    print(src.shape)

# Printed:
# (718, 791)

Tar and gzip archives can be accessed in the same manner by prepending with tar or gz instead of zip.

For compatibility with legacy systems and workflows or very niche use cases, Rasterio can also use GDAL’s VSI filenames.

with rasterio.open("/vsizip/vsicurl/https://github.com/rasterio/rasterio/files/13675561/files.zip/RGB.byte.tif") as src:
    print(src.shape)

# Printed:
# (718, 791)

The prefixes on which GDAL filesystem handlers are registered are considered by Rasterio to be an implementation detail. You shouldn’t need to think about them when using Rasterio. Use familiar and standard URIs instead, like elsewhere on the internet.

with rasterio.open("https://github.com/rasterio/rasterio/raw/main/tests/data/RGB.byte.tif") as src:
    print(src.shape)

# Printed:
# (718, 791)

AWS S3

This is an extra feature that must be installed by executing

pip install rasterio[s3]

After you have configured your AWS credentials as explained in the boto3 guide you can read metadata and imagery from TIFFs stored as S3 objects with no change to your code.

with rasterio.open("s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF") as src:
    print(src.profile)

# Printed:
# {'blockxsize': 512,
#  'blockysize': 512,
#  'compress': 'deflate',
#  'count': 1,
#  'crs': {'init': u'epsg:32645'},
#  'driver': u'GTiff',
#  'dtype': 'uint16',
#  'height': 7791,
#  'interleave': 'band',
#  'nodata': None,
#  'tiled': True,
#  'transform': Affine(30.0, 0.0, 381885.0,
#        0.0, -30.0, 2512815.0),
#  'width': 7621}

Note

AWS pricing concerns While this feature can reduce latency by reading fewer bytes from S3 compared to downloading the entire TIFF and opening locally, it does make at least 3 GET requests to fetch a TIFF’s profile as shown above and likely many more to fetch all the imagery from the TIFF. Consult the AWS S3 pricing guidelines before deciding if aws.Session is for you.

Python file and filesystem openers

Datasets stored in proprietary systems or addressable only through protocols not directly supported by GDAL can be accessed using the opener keyword argument of rasterio.open. Here is an example of using fs_s3fs to access the dataset in sentinel-s2-l2a-cogs/45/C/VQ/2022/11/S2B_45CVQ_20221102_0_L2A/B01.tif from the sentinel-cogs AWS S3 bucket. Rasterio can access this without using the opener argument, but it makes a good usage example. Other custom openers would work in the same way.

import rasterio
from fs_s3fs import S3FS

fs = S3FS(
    bucket_name="sentinel-cogs",
    dir_path="sentinel-s2-l2a-cogs/45/C/VQ/2022/11/S2B_45CVQ_20221102_0_L2A",
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)

with rasterio.open("B01.tif", opener=fs.open) as src:
    print(src.profile)

In this code AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are placeholders for the appropriate credentials.

Read and write access is supported, with some limitations. Only one opener at a time may be thus registered for a filename and access mode pair. Openers are unregistered when the dataset is closed or its context is exited. The other limitation is that auxiliary and sidecar files cannot be accessed and thus formats depending on them cannot be used in this way.

To gain support for auxiliary “sidecar” files such as .aux.xml and .msk files that may accompany GeoTIFFs, an fsspec-like filesystem object may be used as the opener.

import rasterio
from fsspec

fs = fsspec.filesystem("s3", anon=True)

with rasterio.open(
    "sentinel-cogs/sentinel-s2-l2a-cogs/45/C/VQ/2022/11/S2B_45CVQ_20221102_0_L2A/B01.tif",
    opener=fs
) as src:
    print(src.profile)

This kind of filesystem opener object must provide the following methods: isdir(), isfile(), ls(), mtime(), open(), and size().

New in version 1.4.0