Skip to content

How to recognize a COG and how to create a proper one!

Requirements

  • python 3.7
  • rio-cogeo

$ pip install rio-cogeo

The COG Specification is pretty basic

A cloud optimized GeoTIFF is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, whose internal organization is friendly for consumption by clients issuing HTTP GET range request ("bytes: start_offset-end_offset" HTTP header). It contains at its beginning the metadata of the full resolution imagery, followed by the optional presence of overview metadata, and finally the imagery itself. To make it friendly with streaming and progressive rendering, we recommend starting with the imagery of the smallest overview and finishing with the imagery of the full resolution level.

Ref: github.com/cogeotiff/cog-spec/blob/master/spec.md

In Short, the specification just means you MUST create a GeoTIFF with internal block (tile) and the header must be ordered.

From a command line point of view, it just means you need to add --co TILED=TRUE in a gdal_translate command.

1. Get some data

Natural Earth web site host really neat raster and vector datasets. Let's download a large scale raster image: www.naturalearthdata.com/downloads/50m-raster-data/50m-cross-blend-hypso/

$ wget https://naciscdn.org/naturalearth/50m/raster/HYP_50M_SR.zip

2. Inspect the data

Here is what we want to look at:

  • the size in row x lines
  • the data type (byte, float, complex …)
  • the internal block size
  • the presence of overview or not
$ rio cogeo info HYP_50M_SR.tif
Driver: GTiff
File: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR.tif
Compression: None
ColorSpace: None

Profile
    Width:            10800
    Height:           5400
    Bands:            3
    Tiled:            False
    Dtype:            uint8
    NoData:           None
    Alpha Band:       False
    Internal Mask:    False
    Interleave:       PIXEL
    ColorMap:         False

Geo
    Crs:              EPSG:4326
    Origin:           (-179.99999999999997, 90.0)
    Resolution:       (0.03333333333333, -0.03333333333333)
    BoundingBox:      (-179.99999999999997, -89.99999999998201, 179.99999999996405, 90.0)

IFD
    Id      Size           BlockSize     Decimation
    0       10800x5400     10800x1       0

What we can see from the rio cogeo info output:

  • The raster has 3 bands
  • The data type is Byte (0 → 255)
  • It's not internally tiled (Tiled: false and BlockSize=10800x1)
  • There is no overview (Only one IFD)

With those informations we already know the GeoTIFF is not a COG (no internal blocks), but let's confirm with the validation script.

3. COG validation

$ rio cogeo validate HYP_50M_SR.tif
The following warnings were found:
- The file is greater than 512xH or 512xW, it is recommended to include internal overviews

The following errors were found:
- The file is greater than 512xH or 512xW, but is not tiled
- The offset of the main IFD should be 8 for ClassicTIFF or 16 for BigTIFF. It is 174982088 instead
- The offset of the first block of the image should be after its IFD
/Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR.tif is NOT a valid cloud optimized GeoTIFF

As mentioned earlier, the validation script confirms the GeoTIFF is not internally tiled and doesn't have overviews.

4. COG creation

Creating a valid Cloud Optimized GeoTIFF, is not just about creating internal tiles and/or internal overviews. The file internal structure has to be specific and require a complete copy of a file, which is what rio-cogeo does internally.

$ rio cogeo create HYP_50M_SR.tif HYP_50M_SR_COG.tif
Reading input: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR.tif
    [####################################]  100%
Adding overviews...
Updating dataset tags...
Writing output to: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR_COG.tif

You could get the same COG with GDAL commands

$ gdal_translate HYP_50M_SR.tif tmp.tif -co TILED=YES -co COMPRESS=DEFLATE
$ gdaladdo -r nearest tmp.tif 2 4 8 16 32
$ gdal_translate tmp.tif HYP_50M_SR_COG.tif -co TILED=YES -co COMPRESS=DEFLATE -co COPY_SRC_OVERVIEWS=YES

By default rio-cogeo will create a COG with 512x512 blocksize (for the raw resolution) and use DEFLATE compression to reduce file size.

$ rio cogeo info HYP_50M_SR_COG.tif
Driver: GTiff
File: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR_COG.tif
Compression: DEFLATE
ColorSpace: None

Profile
    Width:            10800
    Height:           5400
    Bands:            3
    Tiled:            True
    Dtype:            uint8
    NoData:           None
    Alpha Band:       False
    Internal Mask:    False
    Interleave:       PIXEL
    ColorMap:         False

Geo
    Crs:              EPSG:4326
    Origin:           (-179.99999999999997, 90.0)
    Resolution:       (0.03333333333333001, -0.03333333333333001)
    BoundingBox:      (-179.99999999999997, -89.99999999998204, 179.9999999999641, 90.0)

IFD
    Id      Size           BlockSize     Decimation
    0       10800x5400     512x512       0
    1       5400x2700      128x128       2
    2       2700x1350      128x128       4
    3       1350x675       128x128       8
    4       675x338        128x128       16

The importance of the compression

$ ls -lah
-rw-r--r--@  1 youpi  staff   167M Oct 18  2014 HYP_50M_SR.tif
-rw-r--r--   1 youpi  staff    58M Jun 12 14:56 HYP_50M_SR_COG.tif

By using rio-cogeo, we are not only creating a valid COG with internal tiling but we are also adding internal overviews (which let us get previews of the raw resolution with few GET requests).

Even with the addition of 4 levels of overviews (see IFD section in previous rio cogeo info output), we managed to reduce the file size by 3 (167Mb → 58Mb), and this is because rio cogeo applies Deflate compression by default to the COG.

More Magic ?

As seen in the first rio cogeo info output, the data has 3 bands (RGB) and is of Uint8 data type. Because of this configuration, we can use even more efficient compression like JPEG or WEBP.

$ rio cogeo create HYP_50M_SR.tif HYP_50M_SR_COG_jpeg.tif -p jpeg
Reading input: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR.tif
    [####################################]  100%
Adding overviews...
Updating dataset tags...
Writing output to: /Users/vincentsarago/Downloads/HYP_50M_SR/HYP_50M_SR_COG_jpeg.tif

$ ls -lah
-rw-r--r--@  1 vincentsarago  staff   167M Oct 18  2014 HYP_50M_SR.tif
-rw-r--r--   1 vincentsarago  staff    58M Jun 12 14:56 HYP_50M_SR_COG.tif
-rw-r--r--   1 vincentsarago  staff   4.8M Jun 15 11:08 HYP_50M_SR_COG_jpeg.tif

Now, our output file is only 4.8Mb, which is only ~3% of the original size 😱.

Note:

  • JPEG compression is not lossless but lossy, meaning we will loose some information (change in pixel values) but if you need a COG for visual purposes the gain in size might be worth it.
  • WEBP compression has a configuration option to be lossless and will result is a file which will be ~50% smaller than the deflate version. Sadly WEBP is not provided by default in geospatial software.

5. Visualize

You can either load the COG in QGIS or use our plugin (rio-viz) to load it in a web browser.

$ pip install rio-viz
$ rio viz HYP_50M_SR_COG.tif