Day 4

Detailed paper information

Back to list

Paper title AI4GEO ENGINE: a Jupyter-based platform for Earth observation data processing.
Authors
  1. Michael Darques CS GROUP - France Speaker
  2. Vincent Gaudissart CS GROUP - France
  3. Christophe Triquet
  4. Robin Leclair CS GROUP - France
  5. Brice Mora CS GROUP
  6. Fabien Adam CS GROUP - France
Form of presentation Poster
Topics
  • Open Earth Forum
    • C5.03 Open Source, data science and toolboxes in EO: Current status & evolution
Abstract text The production of precise geospatial information has become a major challenge for many applications in various fields such as agriculture, environment, civil or military aviation, geology, cartography, marine services, urban planning, natural disasters, etc.
These applications would greatly benefit from both automation as well as Big Data scalability to increase work efficiency as well as final products throughput, quality, and availability.

Our ambition is to answer these difficulties by developing a Jupyter-based, AI oriented platform for Earth Observation (EO) data processing whose architecture offers a fully automated chain of production of highly detailed images.
At the core of the platform lies the Virtual Research Environment (VRE), a collaborative prototyping environment based on JupyterLab that relies on WEB technologies and integrates tools required by scientists and researchers. The VRE allows selecting, querying, and performing in-depth analysis on 2D and 3D geographic data via a simple web interface with performance and reactivity that makes it possible to quickly display large EO products within a web browser. The environment is not solely based on Jupyter since it also offers an IDE (Code Server) and a remote desktop for using specific software such as QGIS.
The users can therefore execute specific software remotely to manipulate remote data without any data transfer from distant repositories to their computers.

The objective is to offer a turnkey service that facilitates access to data and computing resources. All the major required tools and libraries are open source and available for scientific analysis (e.g. sklearn), geographic data processing (e.g. Orfeo ToolBox, OTB), deep learning (e.g. Pytorch), 2D and 3D plotting, etc. The installation, configuration, and compatibility between this palette of tools is ensured at the platform’s level which avoids both hardware and software constraints at the final users’ level who can concentrate on the scientific work instead of resolving dependencies conflicts.

To ease access to input products, EODAG, an open-source Python SDK for searching aggregated and downloading remote images has been integrated into the JupyterLab environment via a plugin that allows to search products by drawing ROI on an interactive map with specific search criteria. With EODAG, the user can also directly access the pixels: for example, a specific band of a product at a given resolution and geographic projection. This feature improves productivity and lowers infrastructure costs by drastically reducing download time, bandwidth usage, and user’s disk space.
Once the products have been selected and downloaded into its online home folder, the user can rapidly prototype and execute scientific analyses and computations using JupyterLab: from simple statistics to complex deep-learning modeling and inference with libraries such as Pytorch or Tensorflow and associated tools such as tensorboard to help measure and visualize the machine learning workflow directly from the web interface.

Our platform is not only a prototyping tool: processing or transforming EO products often rely on complex algorithms that require heavy computation resources. To improve their efficiency, we offer computation parallelism or distribution (on Cloud, or on premise even without Kubernetes) using technologies such as Dask for computation parallelism or Dask Distributed and Ray for distributed computing. The main advantage of Dask is that it is a Python framework that relies mainly on the most widely used data analysis tools and technologies (e.g. pandas, NumPy). Therefore, it allows researchers to reuse existing code and benefit from multiple nodes computing with very little programming effort. The Dask dashboard is available within the web browser, or as a frame into a Jupyter Notebook, to monitor the status of workers (CPU, memory, ...) or tasks and to check Dask’s graphs execution in real-time.
When their analyses are completed, the users can explore and visualize data in several ways. From a Jupyter Notebook with standard visualization libraries for regular 2D or 3D products or from the remote desktop using e.g., QGIS for geographic data.
For larger products that cannot be properly handled by these libraries (e.g., matplotib, Bokeh) we have developed and integrated into the platform specific libraries that allow to display in a smooth and reactive way both 2D (QGISlab) or 3D (view3Dlab) products into Jupyter Notebooks.

Finally, the users can share their developments and communicate their analysis to third parties by transforming Jupyter Notebooks into operational services with “Voilà”. “Voilà” converts notebooks into interactive dashboards including HTML widgets, served from a webpage that can be interacted with using a simple browser.
The platform targets both cloud and high-performance computing centers deployments. It is used today in production mode for example in the AI4GEO project and at the French space agency (CNES).

The Cloud deployment of the VRE has also been done for the EO Africa project, which fosters an African-European R&D partnership, facilitating the sustainable adoption of the Earth Observation and related space technology in Africa – following an African user driven approach with a long-term (>10 years) vision for the digital era in Africa. Thanks to the use of contenerization technologies our VRE can be deployed easily on any DIAS and benefit from its infrastructure and data access. For EO Africa, Creodias has been selected: it provides direct access to a large amount of Earth Observation (EO) data and all the requirements to deploy our platform. Throughout the project life cycle, multiple versions of the VRE will be created to fulfill the needs of various events.

The platform was used during a hackathon in November 2021 with up to 40 participants, each of them with access to their own instance of the VRE, ready to visualize and transform EO Data using Jupyter-based tools. Each participant can work independently or collaborate by sharing their work with its own team directly within the VRE thanks to a shared directory. On top of that, the VRE provides another tool to share, save and keep the history of all the work done by people involved in the EO Africa project.