Day 4

Detailed paper information

Back to list

Paper title Land Use and Land Cover Classification using CNN Deep Learning Architectures
  1. Luigi Selmi datiaperti Speaker
Form of presentation Poster
  • C1. AI and Data Analytics
    • C1.04 AI4EO applications for Land and Water
Abstract text Classical machine learning algorithms, such as Random Forests or Support Vector Machine, are commonly used for Land Use and Land Cover (LULC) classification tasks. Land cover indicates the type of surface, such as forest, agriculture or urban, whereas land use indicates how people are using the land. Land cover can be determined by the reflectance properties of the surface. This information is commonly extracted from aerial or satellite imagery whose pixel values represent the solar energy reflected by the Earth’s surface in different spectral bands. On the other hand, spectral data at the pixel level alone cannot provide information about the land use and a patch image has to be considered in its entirety to infer its use. Often also additional information is required to disambiguate among all the possible uses of a land. The purpose of this work was to study the accuracy of Convolutional Neural Networks to learn the spatial and spectral characteristics of image patches of the Earth surface, extracted from Sentinel-2 satellite images for LULC classification tasks. A Convolutional Neural Network that can learn how to distinguish different types of land covers, where geometries and reflectance properties can be mixed in many different ways, requires an architecture with many layers to achieve a good accuracy. Such architectures are expensive to train from scratch in terms of amount of labeled data needed for training, and also in terms of time and computing resources. It is nowadays normal practice in computer vision to reuse a model that has been pretrained on a different but large set of examples, such as ImageNet, and finetune this pretrained model with data that is specific to the task at hand. Fine-tuning is a transfer learning technique in which the parameters of a pretrained neural network architecture are updated using the new data. In this work we have used the ResNet50 architecture, pretrained on the ImageNet dataset and finetuned with the EuroSAT dataset, a set of 27000 patch images, extracted from Sentinel-2 images, containing 13 spectral bands, from the visible to the short wave infrared, with 10 m. spatial resolution, divided in 10 classes. In order to further improve the classification accuracy, we have used a data augmentation technique to create additional images from the original EuroSAT dataset by applying different transformations such as flipping, rotation and brightness modification. Finally, we have analyzed the accuracy of the fine-tuned CNN to detect changes in patch images that were not included in the EuroSAT dataset. A change in a patch image is represented by a change in the probability values for each class. Since ImageNet has been pretrained using images with only the three RGB bands, the other bands available from the Sentinel-2 MSI products and in the EuroSAT images are not used. In order to investigate the accuracy that can be achieved using additional bands available in the EuroSAT dataset, we have trained smaller CNN architecture from scratch using only the EuroSAT dataset and compared the results with that from the ResNet50 architecture pretrained with the ImageNet dataset.