Skip to main content

In which format the Copernicus Arctic Hub data are delivered?

Let's see the different data formats available in the Copernicus Arctic Hub and some simple steps to open them!

Cedric avatar
Written by Cedric
Updated over 3 weeks ago

Context


The Arctic Hub provides more than 100 Earth Observation datasets from several originating centers (more info on data available on Arctic Hub).

In this article, we will discuss the various formats of the data provided and show easy instructions on how to open the data using Python.

Table of data formats


Observation domain

Data format

Atmosphere

GRIB and NetCDF* in .zip

Climate Change

GRIB and NetCDF*

Emergency

GRIB and NetCDF in .zip

Land

GRIB, GeoTIFF and NetCDF* in .zip

Marine

NetCDF

*the NetCDF format is experimental for these Services

πŸ“ In the following examples, we will use different datasets over Italy.

NetCDF format


Let's see here how to open a NetCDF data file.

We will focus on the atmospheric temperature of January 1978, provided in the product ERA5 hourly data on pressure levels from 1940 to present (datasetID = EO:ECMWF:DAT:REANALYSIS_ERA5_PRESSURE_LEVELS).

The main packages we use are:

This simple code allows to access all the information (dimensions, coordinates, variables, attributes) of the NetCDF file:

import xarray as xr
dataset = xr.open_dataset("ERA5_CAMS_1978.nc")
dataset

Now, to generate a quick map, just call the xarray.plot() function as follows:

time = "1978-01-01"
dataset.t.sel(time=time).plot()

We can then use matplotlib to enhance the plot. We are visualizing here the atmospheric temperature, and we add the coasts and better georeference the map:

import matplotlib.pyplot as plt 

f = plt.figure(figsize=(15,10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.add_feature(cfeature.LAND, zorder=1, edgecolor='k')

dataset.t.sel(time=time).plot()
plt.title(f"Atmospheric Temperature (K) on {time}", size = 15)

GRIB format


For the GRIB data format, we downloaded the ERA5-Land hourly data from 1950 to present product (datasetID = "EO:ECMWF:DAT:REANALYSIS_ERA5_LAND"), and its leaf area index (lai_hv) for the first week of January 2022.

We recommend installing cfgrib via conda:

conda install -c conda-forge cfgrib

As for the data in NetCDF format, we use xarray to open the data:

import xarray as xr
grib = xr.load_dataset("mydirectory/ERA_CLMS_2022.grib", engine = "cfgrib")
grib

πŸ“Œ Note: it is important here to specify the engine type engine = "cfgrib".

This command will open the .grib file and explore the downloaded data:

πŸ’‘WEkEO Pro Tip: make sure you have the cfgrib package installed, otherwise the error message "ValueError: unrecognized engine cfgrib must be one of: ['netcdf4', 'scipy', 'store']" will be displayed.

Finally, a simple line of code to plot and view the data:

grib.lai_hv.sel(time="2022-01-01").plot()

GeoTIFF format


For GeoTIFF data format, we downloaded the Fractional Snow Cover (raster 20m) 2016-present, Europe, daily, Jul. 2020 product (datasetID = "EO:CRYO:DAT:HRSI:FSC").

Using the rasterio package, we will loop through a folder containing .tif files, open each one, print some basic information, and display the images.

import rasterio as rs
import rasterio.plot

# set the directory where .tif files are stored
data_dir = './tiff_files'
all_tiff = []

# loop to open and plot all .tif files
for path in os.listdir(data_dir):
if os.path.isfile(os.path.join(data_dir, path)):
all_tiff.append(path)
print('file_name = ', path) # file's name
with rs.open(os.path.join(data_dir, path)) as file:
print("data info : ", file.profile) # file's information
rasterio.plot.show(file) # plot
print(all_tiff)

Thus, for each GeoTIFF in the data_dir directory, we will obtain:

  • The file name:

file_name =  FSC_20250121T082000_S2B_T37SCA_V102_1_CLD.tif
  • General file information, such as the data format, data type, crs, and more:

data info :  {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 5490, 'height': 5490, 'count': 1, 'crs': CRS.from_wkt('...'), 'transform': Affine(20.0, 0.0, 300000.0,        0.0, -20.0, 4100040.0), 'blockxsize': 1024, 'blockysize': 1024, 'tiled': True, 'compress': 'deflate', 'interleave': 'band'}
  • The basic generated image:

πŸ“Œ Note: rasterio also allows to get some information about the .tif file:

  • tiff.bounds: indicates the spatial bounding box

  • tiff.count: number of bands

  • tiff.width: number of columns of the raster dataset

  • tiff.height: number of rows of the raster dataset

  • tiff.crs: coordinate reference system

For more information about this Python package, please consult the rasterio documentation page.

What's next?


Please let us know if anything is missing or confusing that would need modification or more explanation! Feel free to contact us:

  • through a chat session available in the bottom right corner of the page

  • via e-mail to our support team (supportATarctic.hub.copernicus.eu)

Did this answer your question?