Day 4

Detailed paper information

Back to list

Paper title Pangeo, an Open source driven community for for Big Data geoscience
Authors
  1. Anne Fouilloux University of Oslo Speaker
  2. Alejandro Coca-Castro The Alan Turing Institute
  3. Pier Lorenzo Marasco SEIDOR Italia
  4. Peter Strobl European Commission - Joint Research Centre (JRC)
  5. Tina Odaka IFREMER
  6. Guillaume Eynard-Bontemps CNES
Form of presentation Poster
Topics
  • Open Earth Forum
    • C5.03 Open Source, data science and toolboxes in EO: Current status & evolution
Abstract text Pangeo is first and foremost an inclusive community promoting open, reproducible and scalable science. This community provides documentation, develops and maintains software, and deploys computing infrastructure to make scientific research and programming easier.
There is no single software package called “Pangeo”; rather, the Pangeo project serves as a coordination point between scientists, software, and computing infrastructure.
Pangeo is based around the Python programming language and the scientific Python software ecosystem. The Pangeo stack is an agile collection of open-source Python tools which, when combined, enables efficient and flexible distributed processing of large geospatial datasets, so far primarily used in the ocean, weather, climate, and remote sensing domains but equally relevant throughout the whole geospatial field.
The Pangeo software ecosystem involves open source tools such as xarray, a data model and analysis toolkit based on the NetCDF data model; Zarr for cloud-optimised data storage; Dask, a framework for parallel computing; and Jupyter for user interaction with remote computing systems.
The Pangeo tools can be adapted to meet a wide range of different usage scenarios and be deployed on many different architectures. The community is focused on acting as a coordinating point between scientists and engineers, software and computing infrastructure.
In this presentation we would like to showcase real-world applications of the Pangeo stack and discuss with all stakeholders how Pangeo can be a part of the European approach to geospatial “Big Data” processing that is sustainable in the long term, inclusive in that it is open to everyone, flexible and open enough to allow us to smoothly move from one platform to another.

Come and learn about a pace-making, fully open source initiative that is already at the core of many data cube implementations and gathering the European community to participate in this global initiative. Pangeo (https://pangeo.io/) has a huge potential to become a common gateway able to leverage a wide variety of infrastructures and data providers.