|Paper title||The CCI Open Data Portal: Recent Developments and Lessons Learnt|
|Form of presentation||Poster|
The CCI Open Data Portal has been developed as part of the European Space Agency (ESA) Climate Change Initiative (CCI) programme, to provide a central point of access to the wealth of data produced across the CCI programme. It is an open-access portal for data discovery, which supports faceted search and multiple download routes for all the key CCI datasets. The CCI ODP can be accessed at https://climate.esa.int/data.
The CCI Open Data Portal has been in operation since 2015, and since its inception, has provided access to over 450 datasets and has had more than 50 million file accesses. It consists of two front end access routes for data discovery: a CCI dashboard, which shows the breadth of CCI products available and the time ranges which are covered and can be drilled down to select the appropriate datasets; and a faceted search index, which allows users to search for data over a wider range of characteristics. These are supported at the back end by a range of services provided by the Centre for Environmental Data Analysis (CEDA), which includes the data storage and archival, catalogue and search services, and download servers supporting multiple access routes (FTP, HTTP, OPeNDAP, OGC WMS and WCS). Direct access to the discovery metadata is also available, and can be used by downstream tools to build other interfaces on top of these components e.g., the CCI Toolbox uses the search and OPeNDAP access services to include direct access to data.
In the initial phase of the CCI Open Data Portal, a combination of Earth System Grid Federation (ESGF) search and CEDA’s Catalog Service for Web (CSW), were used to provide the functionality of the portal search. However, using the combination of the two services, and the specialised requirements of ESGF, added complexity, and increased the effort needed to publish data, so the portal was redeveloped in 2019 under the CCI Knowledge Exchange project. In this new phase, the Open Data Portal combines search and data cataloguing using OpenSearch with data serving capacity using Nginx and THREDDS, which has simplified the publication process, and allowed more flexibility when including data. A number of innovations have been made to data serving functionality with the adoption of containers and Kubernetes to provide a scalable data service and the provision of an analysis-ready data cache on JASMIN’s object store using Zarr serialisation of source netCDF files. The latter augments the existing data service to provide access to data for the CCI Toolbox application with data rechunked to provide optimal performance for data analysis queries. Publishing has been further streamlined through two changes. First, the servers providing data download and OPeNDAP services (Nginx & THREDDS) are reading directly from the file system so data appears there as it reaches the CEDA archive. Second, through the use of message passing frameworks (RabbitMQ) and containerised processing scripts, we can generate the metadata needed for search in parallel to the files reaching the archive. In some cases, manual changes are needed to this metadata. These are fed in using configuration files and become part of an automated workflow to re-tag the affected data files, leveraging Continuous Integration pipelines.
A key challenge in the operation of the CCI Open Data Portal comes from the heterogeneity of the different datasets that are produced across the Climate Change Initiative programme, with different scientific areas and different user communities all have differing needs in terms of the format and types of data produced. To this end, the work of the CCI Open Data Portal, also includes maintaining CCI data standards. These standards aim to provide a common format for the data, but necessarily, still leaves considerable breadth in the types of data produced. This provides challenges in providing harmonised search and access services, and solutions have been developed to ensure that every dataset can still be fully integrated into our faceted search services.
In this presentation we will describe the CCI Open Data Portal, recent developments, and the lessons that we have learnt from over six years of operations.