Cloud Optimized Geotiff Part 1


A short walkthrough to create and use Cloud Optimized Geotiff

Part 1: get up and running with gdal, python and some testdata

In this post, I’d like to show how to install all needed tools you need to create cog-files. As example data I use grib files as they are standards in meteorology. But unfortunately, this format is in many ways pure pain in your butt. I like to test if converting this format to a more modern format allows as to better handle large formats of rasterized data.

The tool-of-choice to handle rasterdata is gdal. eccodes is no match 😃

As we also want to build the gdal python bindings we need to install python-dev

sudo apt-get install python3-dev

NOTE: I strongly recommend using a virtual environment for the python bindings!

Download, extract and configure gdal installation

wget https://github.com/OSGeo/gdal/releases/download/v3.4.0/gdal-3.4.0.tar.gz
tar xvf gdal-3.4.0.tar.gz
cd gdal-3.4.0/
./configure --with-python

Build and install gdal

NOTE: Adjust the -j parameter to the number of cores of your system to speed up compile time!

make -j 8
make install (as root with activated venv)

NOTE: sudo make install resulted in python errors….

Install gdal-python in your venv

pip install gdal

Verify installation

(gdal)  difu@Diggler ~/Programming/cogTests> python
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from osgeo import gdal
>>>

My test dataset is taken from DWDs Open-Data portal. I use the COSMO_REA6 dataset. Download for example the first file and decompress:

wget https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/hourly/2D/T_2M/T_2M.2D.199501.grb.bz2
bunzip2 T_2M.2D.199501.grb.bz2

You get an about 1G grib file.

(gdal)  difu@Diggler ~/Programming/cogTests > ls -lh T_2M.2D.199501.grb
-rw-r--r-- 1 difu difu 992M Jul 23  2019 T_2M.2D.199501.grb

This is hourly data of the temperature at 2 meters above ground of one month, here of January 1995. So this is 24h*31=744 hours of data. Let’s verify that:

(gdal)  difu@Diggler  ~/Programming/cogTests > gdalinfo T_2M.2D.199501.grb
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: T_2M.2D.199501.grb
Size is 848, 824
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
    BASEGEOGCRS["Coordinate System imported from GRIB file",
        DATUM["unnamed",
            ELLIPSOID["Sphere",6367470,0,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    DERIVINGCONVERSION["Pole rotation (GRIB convention)",
        METHOD["Pole rotation (GRIB convention)"],
        PARAMETER["Latitude of the southern pole (GRIB convention)",-39.25,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        PARAMETER["Longitude of the southern pole (GRIB convention)",18,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        PARAMETER["Axis rotation (GRIB convention)",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-28.430500590318776,21.890500607533415)
Pixel Size = (0.055001180637544,-0.055001215066829)
Corner Coordinates:
Upper Left  ( -28.4305006,  21.8905006) ( 44d44'46.43"W, 60d12'10.20"N)
Lower Left  ( -28.4305006, -23.4305006) ( 10d 5'25.85"W, 21d54'59.99"N)
Upper Right (  18.2105006,  21.8905006) ( 65d 9'11.38"E, 66d42' 8.34"N)
Lower Right (  18.2105006, -23.4305006) ( 36d27' 6.04"E, 25d 2'49.86"N)
Center      (  -5.1100000,  -0.7700000) ( 10d 5' 3.84"E, 49d42'23.66"N)
Band 1 Block=848x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  Metadata:
    GRIB_COMMENT=temperature [C]
    GRIB_ELEMENT=T
    GRIB_FORECAST_SECONDS=0
    GRIB_REF_TIME=788922000
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=788922000
Band 2 Block=848x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  Metadata:
    GRIB_COMMENT=temperature [C]
    GRIB_ELEMENT=T
    GRIB_FORECAST_SECONDS=0
    GRIB_REF_TIME=788925600
...

...
Band 744 Block=848x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  Metadata:
    GRIB_COMMENT=temperature [C]
    GRIB_ELEMENT=T
    GRIB_FORECAST_SECONDS=0
    GRIB_REF_TIME=791596800
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=791596800

Great! As expected, all layers/bands are accessible!

Now for some simple python stuff! Install rasterio

pip install rasterio

Let’s check some attributes.

(gdal)  difu@Diggler  ~/Programming/cogTests > python
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rasterio
>>> dataset = rasterio.open('T_2M.2D.199501.grb')
>>> dataset.count
744
>>> dataset.width
848
>>> dataset.height
824
>>> dataset.bounds
BoundingBox(left=-28.430500590318776, bottom=-23.43050060753342, right=18.21050059031877, top=21.890500607533415)
>>>

For my first testing purposes one single 1GB sized file is pretty unhandy. As a grib is more or less also a simple stream it is easy to split a large grib file that is simply concatinated by multiple single grib files, you just have to look for the header/footer and split.

Or… use eccodes. DWD provides an easy to use docker container we can use right away. We split up doing:

docker run --rm --mount type=bind,source="$(pwd)"/,target=/local deutscherwetterdienst/python-eccodes grib_copy  T_2M.2D.199501.grb 'out_[dataDate].grb'

We now get smaller files:

(gdal)  difu@Diggler ~/Programming/cogTests > ls -Alhr --time-style=+""
total 2.0G
-rw-r--r-- 1 root root 1.4M  out_19950201.grb
-rw-r--r-- 1 root root  32M  out_19950131.grb
-rw-r--r-- 1 root root  32M  out_19950130.grb
-rw-r--r-- 1 root root  32M  out_19950129.grb
-rw-r--r-- 1 root root  32M  out_19950128.grb
-rw-r--r-- 1 root root  32M  out_19950127.grb
-rw-r--r-- 1 root root  32M  out_19950126.grb
-rw-r--r-- 1 root root  32M  out_19950125.grb
-rw-r--r-- 1 root root  32M  out_19950124.grb
-rw-r--r-- 1 root root  32M  out_19950123.grb
-rw-r--r-- 1 root root  32M  out_19950122.grb
-rw-r--r-- 1 root root  32M  out_19950121.grb
-rw-r--r-- 1 root root  32M  out_19950120.grb
-rw-r--r-- 1 root root  32M  out_19950119.grb
-rw-r--r-- 1 root root  32M  out_19950118.grb
-rw-r--r-- 1 root root  32M  out_19950117.grb
-rw-r--r-- 1 root root  32M  out_19950116.grb
-rw-r--r-- 1 root root  32M  out_19950115.grb
-rw-r--r-- 1 root root  32M  out_19950114.grb
-rw-r--r-- 1 root root  32M  out_19950113.grb
-rw-r--r-- 1 root root  32M  out_19950112.grb
-rw-r--r-- 1 root root  32M  out_19950111.grb
-rw-r--r-- 1 root root  32M  out_19950110.grb
-rw-r--r-- 1 root root  32M  out_19950109.grb
-rw-r--r-- 1 root root  32M  out_19950108.grb
-rw-r--r-- 1 root root  32M  out_19950107.grb
-rw-r--r-- 1 root root  32M  out_19950106.grb
-rw-r--r-- 1 root root  32M  out_19950105.grb
-rw-r--r-- 1 root root  32M  out_19950104.grb
-rw-r--r-- 1 root root  32M  out_19950103.grb
-rw-r--r-- 1 root root  32M  out_19950102.grb
-rw-r--r-- 1 root root  31M  out_19950101.grb
-rw-r--r-- 1 difu difu 992M  T_2M.2D.199501.grb

Each gribfile contains now 24 single gribs, one per hour of the given day. Verify:

difu@Diggler  ~/Programming/cogTests > gdalinfo out_19950102.grb
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: out_19950102.grb
Size is 848, 824
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
    BASEGEOGCRS["Coordinate System imported from GRIB file",
        DATUM["unnamed",
            ELLIPSOID["Sphere",6367470,0,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    DERIVINGCONVERSION["Pole rotation (GRIB convention)",
        METHOD["Pole rotation (GRIB convention)"],
        PARAMETER["Latitude of the southern pole (GRIB convention)",-39.25,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        PARAMETER["Longitude of the southern pole (GRIB convention)",18,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        PARAMETER["Axis rotation (GRIB convention)",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-28.430500590318776,21.890500607533415)
Pixel Size = (0.055001180637544,-0.055001215066829)
Corner Coordinates:
Upper Left  ( -28.4305006,  21.8905006) ( 44d44'46.43"W, 60d12'10.20"N)
Lower Left  ( -28.4305006, -23.4305006) ( 10d 5'25.85"W, 21d54'59.99"N)
Upper Right (  18.2105006,  21.8905006) ( 65d 9'11.38"E, 66d42' 8.34"N)
Lower Right (  18.2105006, -23.4305006) ( 36d27' 6.04"E, 25d 2'49.86"N)
Center      (  -5.1100000,  -0.7700000) ( 10d 5' 3.84"E, 49d42'23.66"N)
Band 1 Block=848x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  Metadata:
    GRIB_COMMENT=temperature [C]
    GRIB_ELEMENT=T
    GRIB_FORECAST_SECONDS=0
    GRIB_REF_TIME=789004800
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=789004800
...

...
Band 24 Block=848x1 Type=Float64, ColorInterp=Undefined
  Description = 2[m] HTGL (Specified height level above ground)
  Metadata:
    GRIB_COMMENT=temperature [C]
    GRIB_ELEMENT=T
    GRIB_FORECAST_SECONDS=0
    GRIB_REF_TIME=789087600
    GRIB_SHORT_NAME=2-HTGL
    GRIB_UNIT=[C]
    GRIB_VALID_TIME=789087600

Now let’s convert a grib into a geoTIFF and compress it with LZW, according to documentation, this should be the best compression ratio.

gdal_translate out_19950102.grb -co COMPRESS=LZW -of Gtiff out_19950102.tiff

I am a little surprised, that the output file size doubles! This needs further investigation later.

(gdal) difu@Diggler ~/Programming/cogTests > ls -lh --time-style=+"" out_19950102* 
-rw-r--r-- 1 root root  32M  out_19950102.grb
-rw-r--r-- 1 difu difu  60M  out_19950102.tiff
-rw-r--r-- 1 difu difu 4.1K  out_19950102.tiff.aux.xml

How far are we away from a Cloud Optmized GeoTIFF? There is a little helper script, that verifies if a given file is already a COG or if something else is missing. It is installed in your python site-packages, in my case

(gdal) difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102.tiff
The following warnings were found:
 - The file is greater than 512xH or Wx512, it is recommended to include internal overviews

out_19950102.tiff is a valid cloud optimized GeoTIFF

Wow! It looks we have already what we want! But wait, there is a warning that says we do not have overviews. This is easy to fix. We use gdaladdo (soundzzz a bit funny?):

(gdal)  AWS: difu difu@Diggler  ~/Programming/cogTests > gdaladdo -r average out_19950102.tiff 2 4 8 16
0...10...20...30...40...50...60...70...80...90...100 - done.

But!!! If checking again:

(gdal)  AWS: difu difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102.tiff
out_19950102.tiff is NOT a valid cloud optimized GeoTIFF.
The following errors were found:
 - The offset of the first block of overview of index 2 should be after the one of the overview of index 3
 - The offset of the first block of overview of index 1 should be after the one of the overview of index 2
 - The offset of the first block of overview of index 0 should be after the one of the overview of index 1
 - The offset of the first block of the main resolution image should be after the one of the overview of index 3

For some reason gdaladdo screws up the cog. If this is a feature or a bug? Or am I missing something? I don’t know. We have to run gdal_translate again and add the option COPY_SRC_OVERVIEWS=YES to include the created overviews.

(gdal)  AWS: difu difu@Diggler ~/Programming/cogTests > gdal_translate out_19950102.tiff out_19950102_cog.tiff -co COMPRESS=LZW -co TILED=YES -co COPY_SRC_OVERVIEWS=YES 
Input file size is 848, 824
0...10...20...30...40...50...60...70...80...90...100 - done.
(gdal)  AWS: difu difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102_cog.tiff                                                                                                                                                                                                         <aws:difu>
out_19950102_cog.tiff is a valid cloud optimized GeoTIFF

The size of all IFD headers is 14188 bytes

Ok! Looks good at this point. In part 2 we will upload our COGs and so some testings and comparisons. Stay tuned!