Virtual Filesystems
Rasterio uses GDAL’s virtual filesystem interface to access datasets
on the web, in cloud storage, in archive files, and in Python objects. Rasterio maps familiar URI schemes to GDAL virtual filesystem handlers. For example, the https
URI scheme maps to GDAL’s /vsicurl/
. The file
URI scheme maps to GDAL’s ordinary filesystem handler and is the default for dataset URIs that have no other scheme.
To access a dataset in a local ZIP file like the one in Rasterio’s test suite, preprend zip
to the URI of the local file and add the interior path to the dataset after a !
character. For example:
with rasterio.open("zip+file://tests/data/files.zip!RGB.byte.tif") as src:
print(src.shape)
# Printed:
# (718, 791)
Or use zip
as shorthand for zip+file
.
with rasterio.open("zip://tests/data/files.zip!RGB.byte.tif") as src:
print(src.shape)
# Printed:
# (718, 791)
Similarly, datasets in ZIP files served on the web can be accessed by using zip+https
.
with rasterio.open("zip+https://github.com/rasterio/rasterio/files/13675561/files.zip!RGB.byte.tif") as src:
print(src.shape)
# Printed:
# (718, 791)
Tar and gzip archives can be accessed in the same manner by prepending with tar
or gz
instead of zip
.
For compatibility with legacy systems and workflows or very niche use cases, Rasterio can also use GDAL’s VSI filenames.
with rasterio.open("/vsizip/vsicurl/https://github.com/rasterio/rasterio/files/13675561/files.zip/RGB.byte.tif") as src:
print(src.shape)
# Printed:
# (718, 791)
The prefixes on which GDAL filesystem handlers are registered are considered by Rasterio to be an implementation detail. You shouldn’t need to think about them when using Rasterio. Use familiar and standard URIs instead, like elsewhere on the internet.
with rasterio.open("https://github.com/rasterio/rasterio/raw/main/tests/data/RGB.byte.tif") as src:
print(src.shape)
# Printed:
# (718, 791)
AWS S3
Note
Requires GDAL 2.1.0
This is an extra feature that must be installed by executing
pip install rasterio[s3]
After you have configured your AWS credentials as explained in the boto3 guide you can read metadata and imagery from TIFFs stored as S3 objects with no change to your code.
with rasterio.open("s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF") as src:
print(src.profile)
# Printed:
# {'blockxsize': 512,
# 'blockysize': 512,
# 'compress': 'deflate',
# 'count': 1,
# 'crs': {'init': u'epsg:32645'},
# 'driver': u'GTiff',
# 'dtype': 'uint16',
# 'height': 7791,
# 'interleave': 'band',
# 'nodata': None,
# 'tiled': True,
# 'transform': Affine(30.0, 0.0, 381885.0,
# 0.0, -30.0, 2512815.0),
# 'width': 7621}
Note
AWS pricing concerns While this feature can reduce latency by reading fewer bytes from S3 compared to downloading the entire TIFF and opening locally, it does make at least 3 GET requests to fetch a TIFF’s profile as shown above and likely many more to fetch all the imagery from the TIFF. Consult the AWS S3 pricing guidelines before deciding if aws.Session is for you.