Cloud Optimized Geotiff Part 1
A short walkthrough to create and use Cloud Optimized Geotiff
Part 1: get up and running with gdal, python and some testdata
In this post, I’d like to show how to install all needed tools you need to create cog-files. As example data I use grib files as they are standards in meteorology. But unfortunately, this format is in many ways pure pain in your butt. I like to test if converting this format to a more modern format allows as to better handle large formats of rasterized data.
The tool-of-choice to handle rasterdata is gdal. eccodes is no match 😃
As we also want to build the gdal
python bindings we need to install python-dev
sudo apt-get install python3-dev
NOTE: I strongly recommend using a virtual environment for the python bindings!
Download, extract and configure gdal installation
wget https://github.com/OSGeo/gdal/releases/download/v3.4.0/gdal-3.4.0.tar.gz
tar xvf gdal-3.4.0.tar.gz
cd gdal-3.4.0/
./configure --with-python
Build and install gdal
NOTE: Adjust the -j parameter to the number of cores of your system to speed up compile time!
make -j 8
make install (as root with activated venv)
NOTE: sudo make install
resulted in python errors….
Install gdal-python
in your venv
pip install gdal
Verify installation
(gdal) difu@Diggler ~/Programming/cogTests> python
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from osgeo import gdal
>>>
My test dataset is taken from DWDs Open-Data portal. I use the COSMO_REA6 dataset. Download for example the first file and decompress:
wget https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/hourly/2D/T_2M/T_2M.2D.199501.grb.bz2
bunzip2 T_2M.2D.199501.grb.bz2
You get an about 1G grib file.
(gdal) difu@Diggler ~/Programming/cogTests > ls -lh T_2M.2D.199501.grb
-rw-r--r-- 1 difu difu 992M Jul 23 2019 T_2M.2D.199501.grb
This is hourly data of the temperature at 2 meters above ground of one month, here of January 1995. So this is 24h*31=744 hours of data. Let’s verify that:
(gdal) difu@Diggler ~/Programming/cogTests > gdalinfo T_2M.2D.199501.grb
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: T_2M.2D.199501.grb
Size is 848, 824
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
BASEGEOGCRS["Coordinate System imported from GRIB file",
DATUM["unnamed",
ELLIPSOID["Sphere",6367470,0,
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
DERIVINGCONVERSION["Pole rotation (GRIB convention)",
METHOD["Pole rotation (GRIB convention)"],
PARAMETER["Latitude of the southern pole (GRIB convention)",-39.25,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
PARAMETER["Longitude of the southern pole (GRIB convention)",18,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
PARAMETER["Axis rotation (GRIB convention)",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-28.430500590318776,21.890500607533415)
Pixel Size = (0.055001180637544,-0.055001215066829)
Corner Coordinates:
Upper Left ( -28.4305006, 21.8905006) ( 44d44'46.43"W, 60d12'10.20"N)
Lower Left ( -28.4305006, -23.4305006) ( 10d 5'25.85"W, 21d54'59.99"N)
Upper Right ( 18.2105006, 21.8905006) ( 65d 9'11.38"E, 66d42' 8.34"N)
Lower Right ( 18.2105006, -23.4305006) ( 36d27' 6.04"E, 25d 2'49.86"N)
Center ( -5.1100000, -0.7700000) ( 10d 5' 3.84"E, 49d42'23.66"N)
Band 1 Block=848x1 Type=Float64, ColorInterp=Undefined
Description = 2[m] HTGL (Specified height level above ground)
Metadata:
GRIB_COMMENT=temperature [C]
GRIB_ELEMENT=T
GRIB_FORECAST_SECONDS=0
GRIB_REF_TIME=788922000
GRIB_SHORT_NAME=2-HTGL
GRIB_UNIT=[C]
GRIB_VALID_TIME=788922000
Band 2 Block=848x1 Type=Float64, ColorInterp=Undefined
Description = 2[m] HTGL (Specified height level above ground)
Metadata:
GRIB_COMMENT=temperature [C]
GRIB_ELEMENT=T
GRIB_FORECAST_SECONDS=0
GRIB_REF_TIME=788925600
...
...
Band 744 Block=848x1 Type=Float64, ColorInterp=Undefined
Description = 2[m] HTGL (Specified height level above ground)
Metadata:
GRIB_COMMENT=temperature [C]
GRIB_ELEMENT=T
GRIB_FORECAST_SECONDS=0
GRIB_REF_TIME=791596800
GRIB_SHORT_NAME=2-HTGL
GRIB_UNIT=[C]
GRIB_VALID_TIME=791596800
Great! As expected, all layers/bands are accessible!
Now for some simple python stuff! Install rasterio
pip install rasterio
Let’s check some attributes.
(gdal) difu@Diggler ~/Programming/cogTests > python
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rasterio
>>> dataset = rasterio.open('T_2M.2D.199501.grb')
>>> dataset.count
744
>>> dataset.width
848
>>> dataset.height
824
>>> dataset.bounds
BoundingBox(left=-28.430500590318776, bottom=-23.43050060753342, right=18.21050059031877, top=21.890500607533415)
>>>
For my first testing purposes one single 1GB sized file is pretty unhandy. As a grib is more or less also a simple stream it is easy to split a large grib file that is simply concatinated by multiple single grib files, you just have to look for the header/footer and split.
Or… use eccodes
. DWD provides an easy to use docker container we can use right away. We split up doing:
docker run --rm --mount type=bind,source="$(pwd)"/,target=/local deutscherwetterdienst/python-eccodes grib_copy T_2M.2D.199501.grb 'out_[dataDate].grb'
We now get smaller files:
(gdal) difu@Diggler ~/Programming/cogTests > ls -Alhr --time-style=+""
total 2.0G
-rw-r--r-- 1 root root 1.4M out_19950201.grb
-rw-r--r-- 1 root root 32M out_19950131.grb
-rw-r--r-- 1 root root 32M out_19950130.grb
-rw-r--r-- 1 root root 32M out_19950129.grb
-rw-r--r-- 1 root root 32M out_19950128.grb
-rw-r--r-- 1 root root 32M out_19950127.grb
-rw-r--r-- 1 root root 32M out_19950126.grb
-rw-r--r-- 1 root root 32M out_19950125.grb
-rw-r--r-- 1 root root 32M out_19950124.grb
-rw-r--r-- 1 root root 32M out_19950123.grb
-rw-r--r-- 1 root root 32M out_19950122.grb
-rw-r--r-- 1 root root 32M out_19950121.grb
-rw-r--r-- 1 root root 32M out_19950120.grb
-rw-r--r-- 1 root root 32M out_19950119.grb
-rw-r--r-- 1 root root 32M out_19950118.grb
-rw-r--r-- 1 root root 32M out_19950117.grb
-rw-r--r-- 1 root root 32M out_19950116.grb
-rw-r--r-- 1 root root 32M out_19950115.grb
-rw-r--r-- 1 root root 32M out_19950114.grb
-rw-r--r-- 1 root root 32M out_19950113.grb
-rw-r--r-- 1 root root 32M out_19950112.grb
-rw-r--r-- 1 root root 32M out_19950111.grb
-rw-r--r-- 1 root root 32M out_19950110.grb
-rw-r--r-- 1 root root 32M out_19950109.grb
-rw-r--r-- 1 root root 32M out_19950108.grb
-rw-r--r-- 1 root root 32M out_19950107.grb
-rw-r--r-- 1 root root 32M out_19950106.grb
-rw-r--r-- 1 root root 32M out_19950105.grb
-rw-r--r-- 1 root root 32M out_19950104.grb
-rw-r--r-- 1 root root 32M out_19950103.grb
-rw-r--r-- 1 root root 32M out_19950102.grb
-rw-r--r-- 1 root root 31M out_19950101.grb
-rw-r--r-- 1 difu difu 992M T_2M.2D.199501.grb
Each gribfile contains now 24 single gribs, one per hour of the given day. Verify:
difu@Diggler ~/Programming/cogTests > gdalinfo out_19950102.grb
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: out_19950102.grb
Size is 848, 824
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
BASEGEOGCRS["Coordinate System imported from GRIB file",
DATUM["unnamed",
ELLIPSOID["Sphere",6367470,0,
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
DERIVINGCONVERSION["Pole rotation (GRIB convention)",
METHOD["Pole rotation (GRIB convention)"],
PARAMETER["Latitude of the southern pole (GRIB convention)",-39.25,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
PARAMETER["Longitude of the southern pole (GRIB convention)",18,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
PARAMETER["Axis rotation (GRIB convention)",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-28.430500590318776,21.890500607533415)
Pixel Size = (0.055001180637544,-0.055001215066829)
Corner Coordinates:
Upper Left ( -28.4305006, 21.8905006) ( 44d44'46.43"W, 60d12'10.20"N)
Lower Left ( -28.4305006, -23.4305006) ( 10d 5'25.85"W, 21d54'59.99"N)
Upper Right ( 18.2105006, 21.8905006) ( 65d 9'11.38"E, 66d42' 8.34"N)
Lower Right ( 18.2105006, -23.4305006) ( 36d27' 6.04"E, 25d 2'49.86"N)
Center ( -5.1100000, -0.7700000) ( 10d 5' 3.84"E, 49d42'23.66"N)
Band 1 Block=848x1 Type=Float64, ColorInterp=Undefined
Description = 2[m] HTGL (Specified height level above ground)
Metadata:
GRIB_COMMENT=temperature [C]
GRIB_ELEMENT=T
GRIB_FORECAST_SECONDS=0
GRIB_REF_TIME=789004800
GRIB_SHORT_NAME=2-HTGL
GRIB_UNIT=[C]
GRIB_VALID_TIME=789004800
...
...
Band 24 Block=848x1 Type=Float64, ColorInterp=Undefined
Description = 2[m] HTGL (Specified height level above ground)
Metadata:
GRIB_COMMENT=temperature [C]
GRIB_ELEMENT=T
GRIB_FORECAST_SECONDS=0
GRIB_REF_TIME=789087600
GRIB_SHORT_NAME=2-HTGL
GRIB_UNIT=[C]
GRIB_VALID_TIME=789087600
Now let’s convert a grib into a geoTIFF and compress it with LZW, according to documentation, this should be the best compression ratio.
gdal_translate out_19950102.grb -co COMPRESS=LZW -of Gtiff out_19950102.tiff
I am a little surprised, that the output file size doubles! This needs further investigation later.
(gdal) difu@Diggler ~/Programming/cogTests > ls -lh --time-style=+"" out_19950102*
-rw-r--r-- 1 root root 32M out_19950102.grb
-rw-r--r-- 1 difu difu 60M out_19950102.tiff
-rw-r--r-- 1 difu difu 4.1K out_19950102.tiff.aux.xml
How far are we away from a Cloud Optmized GeoTIFF? There is a little helper script, that verifies if a given file is already a COG or if something else is missing.
It is installed in your python site-packages
, in my case
(gdal) difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102.tiff
The following warnings were found:
- The file is greater than 512xH or Wx512, it is recommended to include internal overviews
out_19950102.tiff is a valid cloud optimized GeoTIFF
Wow! It looks we have already what we want! But wait, there is a warning that says we do not have overviews. This is easy to fix. We use gdaladdo
(soundzzz a bit funny?):
(gdal) AWS: difu difu@Diggler ~/Programming/cogTests > gdaladdo -r average out_19950102.tiff 2 4 8 16
0...10...20...30...40...50...60...70...80...90...100 - done.
But!!! If checking again:
(gdal) AWS: difu difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102.tiff
out_19950102.tiff is NOT a valid cloud optimized GeoTIFF.
The following errors were found:
- The offset of the first block of overview of index 2 should be after the one of the overview of index 3
- The offset of the first block of overview of index 1 should be after the one of the overview of index 2
- The offset of the first block of overview of index 0 should be after the one of the overview of index 1
- The offset of the first block of the main resolution image should be after the one of the overview of index 3
For some reason gdaladdo
screws up the cog. If this is a feature or a bug? Or am I missing something? I don’t know.
We have to run gdal_translate
again and add the option COPY_SRC_OVERVIEWS=YES
to include the created overviews.
(gdal) AWS: difu difu@Diggler ~/Programming/cogTests > gdal_translate out_19950102.tiff out_19950102_cog.tiff -co COMPRESS=LZW -co TILED=YES -co COPY_SRC_OVERVIEWS=YES
Input file size is 848, 824
0...10...20...30...40...50...60...70...80...90...100 - done.
(gdal) AWS: difu difu@Diggler ~/Programming/cogTests > python ~/venv/gdal/lib/python3.8/site-packages/osgeo_utils/samples/validate_cloud_optimized_geotiff.py out_19950102_cog.tiff <aws:difu>
out_19950102_cog.tiff is a valid cloud optimized GeoTIFF
The size of all IFD headers is 14188 bytes
Ok! Looks good at this point. In part 2 we will upload our COGs and so some testings and comparisons. Stay tuned!