Statistics

Form `Readers`¶

rio-tiler's Readers provide simple .statistics method to retrieve dataset global statistics

with Reader("my.tif") as src:
    stats = src.statistics()

# Statistics result is in form of Dict[str, rio_tiler.models.BandStatistics]
print(stats.keys())
>>> ["b1", "b2", "b3"]

# rio_tiler.models.BandStatistics is a pydantic model
print(stats["1"].model_dump().keys())
[
    "min",
    "max",
    "mean",
    "count",
    "sum",
    "std",
    "median",
    "majority",
    "minority",
    "unique",
    "histogram",
    "valid_percent",
    "masked_pixels",
    "valid_pixels",
    # Percentile entries depend on user inputs
    "percentile_2",
    "percentile_98",
]

ImageData¶

You can get statistics from ImageData objects which are returned by all rio-tiler reader methods (e.g. .tile(), .preview(), .part(), ...)

with Reader("cog.tif") as src:
    image = src.preview()
    stats = image.statistics()

Area Weighted Statistics¶

When getting statistics from a feature, you may want to calculate values from the pixels which intersect with the geometry but also take the pixel intersection percentage into account. Starting with rio-tiler 6.2.0, we've added a coverage option to the statistics utility which enable the user to pass an array representing the coverage percentage such as:

import numpy as np
from rio_tiler.utils import get_array_statistics

# Data Array
# 1, 2
# 3, 4
data = np.ma.array((1, 2, 3, 4)).reshape((1, 2, 2))

# Coverage Array
# 0.5, 0
# 1, 0.25
coverage = np.array((0.5, 0, 1, 0.25)).reshape((2, 2))

stats = get_array_statistics(data, coverage=coverage)
assert len(stats) == 1
assert stats[0]["min"] == 1
assert stats[0]["max"] == 4
assert (
    round(stats[0]["mean"], 4) == 2.5714
)  # sum of weighted array / sum of weights | 4.5 / 1.75 = 2.57
assert stats[0]["count"] == 1.75

Adjusting geometry `align_bounds_with_dataset=True`¶

In rio-tiler 6.3,0 a new option has been introduced to reduce artifacts and produce more precise zonal statistics. This option is available in the low-level reader.part() method used in rio-tiler reader's .feature() and .part() methods.

with Reader("cog.tif") as src:
    data = src.feature(
        shape,
        shape_crs=WGS84_CRS,
        align_bounds_with_dataset=True,
    )

    coverage_array = data.get_coverage_array(
        shape,
        shape_crs=WGS84_CRS,
    )

    stats = data.statistics(coverage=coverage_array)

When passing align_bounds_with_dataset=True to the reader.part() method (forwarded from .feature or .part reader methods), rio-tiler will adjust the input geometry bounds to match the input dataset resolution/transform and avoid unnecessary resampling.

Zonal Statistics method¶

You can easily extend the rio-tiler's reader to add a .zonal_statistics() method as:

import attr
from typing import Any, Union, Optional, List, Dict

from rio_tiler import io
from rio_tiler.models import BandStatistics
from rio_tiler.constants import WGS84_CRS

from geojson_pydantic.features import Feature, FeatureCollection
from geojson_pydantic.geometries import Polygon

class Reader(io.Reader):
    """Custom Reader with zonal_statistics method."""

    def zonal_statistics(
        self,
        geojson: Union[FeatureCollection, Feature],
        categorical: bool = False,
        categories: Optional[List[float]] = None,
        percentiles: Optional[List[int]] = None,
        hist_options: Optional[Dict] = None,
        max_size: int = None,
        **kwargs: Any,
    ) -> Union[FeatureCollection, Feature]:
        """Return statistics from GeoJSON features.

        Args:
            geojson (Feature or FeatureCollection): a GeoJSON Feature or FeatureCollection.
            categorical (bool): treat input data as categorical data. Defaults to False.
            categories (list of numbers, optional): list of categories to return value for.
            percentiles (list of numbers, optional): list of percentile values to calculate. Defaults to `[2, 98]`.
            hist_options (dict, optional): Options to forward to numpy.histogram function.
            max_size (int, optional): Limit the size of the longest dimension of the dataset read, respecting bounds X/Y aspect ratio. Defaults to None.
            kwargs (optional): Options to forward to `self.preview`.

        Returns:
            Feature or FeatureCollection

        """
        kwargs = {**self.options, **kwargs}

        hist_options = hist_options or {}

        fc = geojson
        # We transform the input Feature to a FeatureCollection
        if isinstance(fc, Feature):
            fc = FeatureCollection(type="FeatureCollection", features=[geojson])

        for feature in fc:
            geom = feature.model_dump(exclude_none=True)

            # Get data overlapping with the feature (using Reader.feature method)
            data = self.feature(
                geom,
                shape_crs=WGS84_CRS,
                align_bounds_with_dataset=True,
                max_size=max_size,
                **kwargs,
            )
            coverage_array = data.get_coverage_array(
                geom,
                shape_crs=WGS84_CRS,
            )

            stats = data.statistics(
                categorical=categorical,
                categories=categories,
                percentiles=percentiles,
                hist_options=hist_options,
                coverage=coverage_array,
            )

            # Update input feature properties and add the statistics
            feature.properties = feature.properties or {}
            feature.properties.update({"statistics": stats})

        return fc.features[0] if isinstance(geojson, Feature) else fc