|Paper title||Automatic machine learning for efficient forest classification service: from modelling to operational service|
|Form of presentation||Poster|
The global availability of Sentinel-2 images makes mapping tree species distribution over large areas easier than ever before, which can be very beneficial for better management of forest resources. Research on methodology on how to derive tree species classification from the Sentinel-2 data is very advanced, including tests and comparison of various Machine Learning (ML) algorithms (Grabska et al., 2019, Immitze et al., 2019, Lim et al., 2020, Person et al., 2018, Thanh and Kappa, 2018, Wessel et al., 2018). On the other hand, implementation of this knowledge into an operational service delivering products to the end users such as forest managers and forestry consultant companies remains a major challenge. Through this presentation we aim to share our experience with turning ML modelling into operational service dedicated to tree species classification.
NextLand is an alliance of Earth Observation (EO) stakeholders, which collaborate to offer the cutting-edge of EO technology by co-designing 15 agriculture and forestry commercial services. NextLand Forest Classification service targets an ambitious goal to combine ML expertise with geoscience knowledge and cloud service know-how to provide end-to-end solutions to our users. To achieve this objective, several key issues have to be addressed, including algorithm selection, modular pipeline development, close cooperation with users for service fine-tuning, and service integration into visible marketplace.
A process of ML algorithm selection has been already presented in (Łoś et al., 2021). We compared performance of XGBoos and Light Gradient Boosting Machine (LGBM) with widely used in remote sensing Random Forest, Support Vector Machine and K-Nearest Neighbour algorithms by classifying 8 classes of tree species over a 40 000 km2 area in central Portugal. LGBM was chosen as the most optimal for our needs taking into account efficacy – measured through F1-score and accuracy - and efficiency – measured through processing time.
Processing pipeline for NextLand Forest Classification Service is built from modules, which makes adaptations and development very convenient. Individual modules contribute to the bigger tasks such as image pre-processing or data preparation. As we cooperate with users expressing various requirements, the flexibility of pipeline adaptation through selection of the relevant modules is crucial. For example, a user can choose a product generated based on an in-house model owned by the service provider, or can provide their reference data to develop a new model. In the first case the procedure is to run a pipeline in a classification mode. In the second, pipeline runs first in a training mode and then in the classification one. Users can provide reference data as points or as polygons. As a consequence, a module dedicated to reference data reading must be able to handle both these types. When a new model is developed, a user can choose if the final product representing tree species distribution is generated from the same Sentinel-2 data that were used for the model development, or from Sentinel-2 data representing another year, e.g., the most recent. It was found out that often users own archival forest inventories data, which are used for the model development, but the users are interested in tree species distribution from recent years. This requirement is handled by a module dedicated to satellite data download. Some users prefer products provided as geotiff, while others prefer shapefile. As default, the developed pipeline provides tree species classification stored as geotiff, and when requested a module converting raster to vector is included in the pipeline. Examples described above, confirm the importance of a modular approach in development of an operational EO-based service.
As the service is developed for users, close cooperation with them is crucial in development of a successful application. We target users with expertise in forest management, which does not necessarily include ML and EO knowledge. A user has to be informed about requirements of ML approaches, especially that a model can be only as good as training data are. Forest inventory data are an excellent input for ML models as they present high accuracy. Moreover, as these data are collected by forest owners for various applications, using them in EO-based services does not generate additional costs of data acquisition. However, in practice, forest inventory data are rarely shared for confidentiality, privacy and other reasons (i.e., economic value of data). Apart from Finland, to our best knowledge, none of the European Union countries provides open access to the national forest inventory. Limited access to high-quality training data is one of the main limiting factors of ML EO-based application for forestry. It can be mitigated by e.g., signing agreement on usage of data provided by a user. We learnt that close cooperation with a user is also important at product evaluation stage. Limitation of EO-based services, e.g., regarding spatial resolution, should be clearly stated before product delivery.
Convenient access to a service is another key element of a successful EO-based application. NextLand Forest Classification service is integrated into Store4EO, which is Deimos EO Exploitation Platform solution. This platform holds service development, integration, deployment, delivery and operation activities. Its design and deployment is driven by the need to come up with services that are easily tailored to the real operational conditions, accepted by the users, and become a constituent element of the users’ business as-usual working scheme.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 776280.
Grabska, E., Hostert, P., Pflugmacher, D., & Ostapowicz, K. (2019). Forest stand species mapping using the Sentinel-2 time series. Remote Sensing, 11(10), 1197.
Immitzer, M., Neuwirth, M., Böck, S., Brenner, H., Vuolo, F., & Atzberger, C. (2019). Optimal input features for tree species classification in Central Europe based on multi-temporal Sentinel-2 data. Remote Sensing, 11(22), 2599.
Lim, J., Kim, K. M., Kim, E. H., & Jin, R. (2020). Machine Learning for Tree Species Classification Using Sentinel-2 Spectral Information, Crown Texture, and Environmental Variables. Remote Sensing, 12(12), 2049.
Łoś, H., Mendes, G. S., Cordeiro, D., Grosso, N., Costa, H., Benevides, P., & Caetano, M. (2021). Evaluation of Xgboost and Lgbm Performance in Tree Species Classification with Sentinel-2 Data. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS (pp. 5803-5806). IEEE.
Persson, M., Lindberg, E., & Reese, H. (2018). Tree species classification with multi-temporal Sentinel-2 data. Remote Sensing, 10(11), 1794.
Thanh Noi, P., & Kappas, M. (2018). Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors, 18(1), 18.
Wessel, M., Brandmeier, M., & Tiede, D. (2018). Evaluation of different machine learning algorithms for scalable classification of tree types and tree species based on Sentinel-2 data. Remote Sensing, 10(9), 1419.