|Paper title||OPERATIONAL INNOVATIONS AND LESSONS LEARNED FROM SWARM REPROCESSING|
|Form of presentation||Poster|
Ever since the Swarm mission was launched in 2013, Swarm mission data has been produced systematically up to Level 2 (CAT2) within the ESA Archiving and Payload Data Facility (APDF). In parallel to the nominal operations, the L1b and L2CAT2 processing algorithms undergo constant improvement and new Instrument Processing Facility (IPF) versions are released whenever the Swarm Data, Innovation, and Science Cluster (DISC) team has approved stable algorithms. With every new major IPF release, a complete reprocessing of the Swarm mission data is required before a new baseline can be published to the end user. It is carried out in a dedicated environment and in individual reprocessing campaigns. Since the time of initial operation, two successful reprocessing campaigns were completed this way and a third campaign is being executed to reprocess the full amount of 8 years of mission data.
As the reprocessing of the full mission data is a computing resource intensive task, the reprocessing environment of the Swarm APDF is equipped with scalable processing nodes in a cluster streamlined for high load with parallel processing of the IPFs, optimized quality control and report generation for monitoring purposes.
Following the demands of the reprocessing campaigns, the IPF executables have been optimized for parallel operation by removing dependencies on previous day input and external licenses so that they can be scaled linearly in order to achieve the required throughput.
With a design that makes it scalable, configurable and robust, the APDF software additionally supports smooth and successful execution of the reprocessing.
The reprocessing environment makes use of up to 30 L1b Magnet processing instances and 110 L2CAT2 IPF instances in parallel, which are spread over 10 virtual machines in ESA ESRIN's cluster infrastructure.
This setup in combination with the related system optimizations can achieve a very high throughput in reprocessing 3 months of operational L1b data in one day and one year of L2CAT2 in one day.
The overall success of the Swarm reprocessing campaigns can further be attributed to the close collaboration of all teams involved. The APDF system evolutions are based on the operations team's direct needs, which are formulated and communicated to the system maintainers in short communication loops following an agile method. This process, too, is supported by the underlying APDF software with its high configurability and overall robustness.
In conclusion, the Swarm reprocessing campaigns suit to serve as a role model for other missions when it comes to the cost-effective introduction of system changes and an effective execution of change procedures with only a small overhead.