|Paper title||Large-scale machine learning techniques for vegetation traits estimation: opportunities for the CHIME mission|
|Form of presentation||Poster|
Given the increasing demand for food with a growing population together with a changing Earth, there is a need for more agricultural resources along with up-to-date cropland monitoring. Optical Earth observation is able to generate valuable data to estimate the quality of vegetation traits that directly affect agricultural resources and the quality of vegetation [Verrelst2015].
In the data era an unprecedented inflow of information is acquired from different satellite missions such as the Sentinel constellations, and exponentially more data is expected given the upcoming Sentinels such as the imaging spectroscopy mission CHIME. This valuable data stream can be used to obtain spatiotemporal-explicit quantification of a suite of vegetation traits across the globe.
Despite the plethora of satellite data freely available to the community, when it comes to developing and validating vegetation retrieval models, however, the most valuable information is the ground truth of the observations. This is a challenging problem as it requires human-assisted tasks of annotation involving campaigns with high monetary costs.
Due to the impossibility of collecting ground truth for the whole Earth at any time, one feasible alternative is to use prior knowledge about the Earth system in order to generate physically-plausible data.
As an alternative of in situ observations, spectral observations of surfaces can also be approximated with radiative transfer models (RTMs). RTMs are physically models built to generate pairs of spectra and variable, they are of crucial importance in the optical remote sensing due to its capability of surface-radiation interactions modelling.
In this work we propose the use of RTM simulations and large-scale machine learning (ML) algorithms in order to develop hybrid models of vegetation traits such as chlorophyll (Chl) both at leaf and canopy levels, and leaf area index (LAI). The ML kernel ridge regression algorithm (KRR) has been proven to be an effective algorithm to make inference about variables, but its limitation is on the amount of data used to build that model, as it involves a cubical asymptotic order. With the ambition to alleviate the KRR complexity burden, we compare the use of the large-scale techniques random Fourier features (RFF) [Rahimi2008], orthogonal random features (ORF) [Yu2016] and with the Nyström method [Williams2001].
We focus on the retrieval task of the above-mentioned biophysical variables by building hybrid models through RTM SCOPE generated training data. Several experiments were designed, regarding to the large-scale methods a study of both error and execution time with regard to the rank of that methods. The predictive behaviour of the proposed versions is as good as the original KRR by decreasing their execution time [PerezSuay2017]. In particular, when estimating canopy chlorophyll content, values of root mean squared error (RMSE) closer to 0.45 have been achieved with Nyström method, this value is relatively closer compared with the 0.4 achieved by KRR. In the case of the LAI parameter, a value of 0.8 is achieved in RMSE terms by Nyström which remains closer to the 0.77 of the KRR (being the lower one). Regarding the computational execution time, all the proposed methods are alleviating the execution time by almost one order of magnitude in the current configuration, where selected rank is 300 representing the 10% of the data sample used to build the model. Furthermore, all models were validated against in-situ data, achieving promising results in accuracy terms. Also, we have evaluated the validity of the models by making inferences when using CHIME-like acquired scenes originating from PRISMA data. The obtained results are promising in error terms, and provide a pathway to build more generic models by using a bigger amount of available training data, and so reaching globally-applicable models, e.g. in the context of the upcoming CHIME mission.
[PerezSuay2017] A. Pérez-Suay, J. Amorós-López, L. Gómez-Chova, V. Laparra, J. Muñoz-Marí, and G. Camps-Valls. "Randomized kernels for large scale earth observation applications". Remote Sensing of Environment, 202:54--63, 2017.
[Rahimi2008] A. Rahimi and B. Recht. "Random features for large-scale kernel machines". Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2008.
[Verrelst2015] J. Verrelst, G. Camps-Valls, J. Muñoz-Marí, J. P. Rivera, F. Veroustraete, J. G. Clevers, and J. Moreno. "Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties – a review". ISPRS Journal of Photogrammetry and Remote Sensing, 108:273--290, 2015.
[Williams2001] C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. Advances in
Neural Information Processing Systems}, volume 13. MIT Press, 2001.
[Yu2016] F. X. X. Yu, A. T. Suresh, K. M. Choromanski, D. N. Holtmann-Rice, and S. Kumar. "Orthogonal random features". Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.