Day 4

Detailed paper information

Back to list

Paper title pyjeo, an open source Python library for processing geospatial data
Authors
  1. Pieter Kempeneers European Commission Joint Research Centre Speaker
  2. Pierre Soille Joint Research Centre - European Comission
Form of presentation Poster
Topics
  • Open Earth Forum
    • C5.03 Open Source, data science and toolboxes in EO: Current status & evolution
Abstract text Recent years have witnessed a dynamic development of open source software libraries and tools that deal with the analysis of geospatial data. The European Commission Joint Research Centre (JRC) has released a Python package, pyjeo, as open source under the GNU public license (GPLv.3). It has been written by and for scientists and builds upon existing open source software libraries such as the GNU scientific library (GSL) and GDAL. Its design allows for an easy integration with existing libraries to take fully advantage of the plethora of functions these libraries offer. Extra care was hereby taken on selecting the underlying data model to avoid unnecessary copying of data. This minimizes the memory footprint and does not involve time consuming disk operations. With increasing EO data volumes at an unprecedented pace, this has become particularly important.

A multi-band three-dimensional (3D) data model was selected, where each band represents a 3D contiguous array in C/C++ of a generic data type. The lower level algorithmic part of the library, where processing performance is important, has been written in C/C++. Parallel computing is introduced using the open-source library openMP. Through the Simplified Wrapper and Interface Generator (SWIG) modules, the C/C++ functions were ported to Python. Python is an increasingly used programming language within the scientific computing community with popular libraries dealing with multi-dimensional data processing such as SciPy ndimage and xarray. Important within the context of this work is that Python allows for easy interfacing with C/C++ libraries by providing a C-API to get access to its Numpy array object. This allows pyjeo to smoothly integrate with packages such as xarray and by extension other packages that use the Numpy array object at their core.

In this talk, we will present the design of pyjeo and focus on how it has been integrated in the JRC Big Data Analytics Platform (BDAP). For instance, we will show how virtual data cubes are created to serve various use cases at the JRC that are based on Sentinel-1 and Sentinel-2 collections. We will also introduce the BDAP as an openEO compatible backend for which pyjeo was used as a basis and where scientists can deploy their EO data analysis workflows without knowing the infrastructure details. Finally, results on optimal parallel processing strategies will be discussed.