Day 4

Detailed paper information

Back to list

Paper title Positive Unlabelled Learning for mapping Cereal and Forest land cover classes from Satellite Images Time Series
  1. Johann Desloires Syngenta Seeds Speaker
  2. Dino Ienco INRAE - UMR TETIS
Form of presentation Poster
  • C1. AI and Data Analytics
    • C1.04 AI4EO applications for Land and Water
Abstract text Time series of satellite images provide opportunities to assess agricultural resource monitoring and deploy yield prediction models for particular types of forests and cereal crops. In such a context, one of the preliminary steps is to obtain binary land cover maps where the category of interest is well defined on a given study area whereas the other category is difficult to describe since it includes the rest of the possible land cover classes. In addition, traditional supervised classification models require labels to learn an appropriate discriminative model, and labeling each land-cover type is time-consuming and labor-intensive.
To tackle this problem of one-class classification which only requires samples of the class of interest, Positive Unlabelled Learning (PUL) is a learning paradigm in the field of machine learning particularly suited for this task. In such a setting, training data only requires one set of positive samples and one set of unlabeled samples, the latter potentially involving both positive and negative samples. There are many classification situations in which PU data settings come naturally and this is well adapted for earth observation data applications where unlabeled samples are plentiful. To the best of our knowledge, only a limited number of approaches were proposed to cope with the complexity of satellite image time series data and exploit the plethora of unlabelled samples.
Our objective is to propose a new framework named PUL-SITS (Positive Unlabelled Learning of Satellite Image Time Series) that relies on a two-step learning technique. At the first step, a recurrent neural network autoencoder is trained only on positive samples. Successively, the same autoencoder model is employed to filter out reliable negative samples from the unlabelled data based on the reconstruction error of each sample. At the second step, both labeled (positive and reliable negative) and unlabelled samples are exploited in a semi-supervised manner to build the final binary classification model.
We choose a study area located in the southwest region of France referenced as Haute-Garonne, strongly characterized by the Cereals/Oilseeds and Forest land cover classes. The entire study site is enclosed in the Sentinel-2 T31TCJ which covers an area of ​​4,146.2 km2. The ground truth label data is obtained from various public land cover maps published in 2019, with a total of 846,838 pixels extracted from 7,358 objects randomly sampled. Since we are addressing a positive and unlabelled learning setting, we consider two different scenarios where each one involves a particular land cover class as positive class and all the other land cover classes as negative, seeing at first (resp. second) Cereals/Oilseeds (resp. Forest) as the input positive class data gathering a sample of 898 (resp. 846) labelled objects in Haute-Garonne. The Figure attached illustrates (a) the study area location, (b) the ground truth spatial distribution and (c) the Sentinel-2 RGB composite.
To assess the quality of the proposed methodology, we design a fair evaluation protocol in which, for each experiment, we divide the data (both positive and negative classes) in two sets: training and test. Then, the training set is split again in two parts: the positive and the unlabelled set. While the former contains only positive samples, the latter consists of samples from both positive and negative classes. Whereas the amount of positive samples may influence the model behaviour, we increase the quantity of positive samples ranging in the set {20,40,60,80,100} in terms of objects.
Moreover, we provide a quantitative and qualitative analysis of our method with respect to the recent state-of-art work in Positive Unlabeled Learning for satellite images. We consider first the One-Class SVM positive classifier and then a PU method which aims to weight unlabelled samples to bias the learning stage, with the latter evaluated separately with a Random Forest and an ensemble of supervised algorithms. In addition, to disentangle the contributions from each component of our proposed semi-supervised approach, we provide two ablations study. While One-Class SVM achieves the best performance among the state-art competitors with a weighted F-Measure metric values ranging from 63.9 to 65.2 (resp. 82.7 to 87.2) for the class Cereals/Oilseeds (resp. Forest), PUL-SITS outperforms all other approaches with values ranging from 78.9 to 88.6 (resp. 91.4 to 92.9).