|Paper title||Exploiting Spatial and Temporal Information with ConvLSTM Networks for Cloud Detection over Landmarks|
|Form of presentation||Poster|
This work explores cloud detection on time series of Earth observation satellite images through deep learning methods. In the past years, machine learning based techniques have demonstrated excellent performance in classification tasks compared with threshold-based methods using spectral characteristics of satellite images . In this study, we use MSG/SEVIRI data acquired during one year with a 15-min temporal resolution on 13 landmarks distributed in different geographic locations with diverse properties and scenarios. In particular, we implement an end-to-end deep learning network, which consists of a U-Net segmentation CNN network  coupled to a long short-term memory (LSTM) layer , called ConvLSTM . The network design aims to exploit the spatial information contained in the images and the temporal dynamics of the time series simultaneously to provide state-of-the-art classification results. Regarding the experimental results, we address several related problems. On the one hand, we provide a comparison of the proposed network with other standard baselines such as an ensemble of SVM  and other recurrent models such as convRNN . On the other hand, we want to validate the robustness of the proposed method by training with data from all the available landmarks except the landmark used for the evaluation. Then, the network is fine tuned to measure its generalization and global fitness through the impact on the performance metrics. Other secondary objectives of the work consist of evaluating different training strategies of the implemented model through architecture modifications, e.g. measuring the impact of removing the batch normalization layers. Moreover, we have evaluated two different strategies for training the ConvLSTM. The standard way consists in training from scratch the full network at once. However, we achieve better performance with a two-phase training, i.e. training first the CNN part and then training the full network from the CNN weights in an end-to-end manner. Provided results show interesting insights about the nature of the image time series and its relation to network architecture and training.
Keywords: convolutional neural networks, CNN, LSTM, landmarks, MSG/SEVIRI, cloud detection.
Acknowledgements: This work was supported by the Spanish Ministry of Science and Innovation under the project PID2019-109026RB-I00.
 L. Gomez-Chova, G. Camps-Valls, J. Calpe, L. Guanter, and J. Moreno, “Cloud-screening algorithm for ENVISAT/MERIS multispectral images,” IEEE Trans. on Geoscience and Remote Sensing, vol. 45, no. 12, Part 2, pp. 4105–4118, Dec. 2007.
 Mateo-García, G., Adsuara, J. E., Pérez-Suay, A., & Gómez-Chova, L. (2019, July). Convolutional Long Short-Term Memory Network for Multitemporal Cloud Detection Over Landmarks. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium (pp. 210-213). IEEE.
 Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in MICCAI. Oct. 2015, Lecture Notes in Computer Science, pp. 234–241, Springer, Cham.
 Sepp Hochreiter and Jurgen Schmidhuber, “Long short-term ¨ memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
 Pérez-Suay, A., Amorós-López, J., Gómez-Chova, L., Muñoz-Marí, J., Just, D., & Camps-Valls, G. (2018). Pattern recognition scheme for large-scale cloud detection over landmarks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11), 3977-3987.
 Turkoglu, M. O., D'Aronco, S., Perich, G., Liebisch, F., Streit, C., Schindler, K., & Wegner, J. D. (2021). Crop mapping from image time series: deep learning with multi-scale label hierarchies. arXiv preprint arXiv:2102.08820.