Day 4

Detailed paper information

Back to list

Paper title Distance to trees estimation on landscape photos using semantic segmentation and skyline variations
  1. Laura Martinez-Sanchez Joint Research Centre - European Comission Speaker
  2. Daniele Borio European Commission, Joint Research Centre (JRC)
  3. Raphael d’Andrimont European Commission, Joint Research Centre (JRC
  4. Marijn Van Der Velde European Commission, Joint Research Centre, Ispra, Italy
Form of presentation Poster
  • C1. AI and Data Analytics
    • C1.04 AI4EO applications for Land and Water
Abstract text The human eye can approximately estimate the distance to objects that are relatively closer or further away on landscape photos. Advances in image analysis such as semantic or instance segmentation allow computers to identify objects on photos or videos in near real time. This capacity is also revolutionizing in-situ data collection for Earth Observation – potentially turning already existing geo-tagged photos into sources of in-situ data. The automatic estimation of the distance between the point of observation and the identified objects is the first step toward their localization. Moreover, approximate distance estimation can be used to determine fundamental landscape properties including openness. In this respect, a landscape is open if it is not surrounded by nearby objects which occlude the view.

In this work, we show how variations in the skyline on landscape photos can be used to approximate the distance to trees on the horizon. This is done by detecting the objects forming the skyline and analysing the skyline signal itself. The skyline is defined as the boundary between sky and non-sky (ground objects) of an image. The skyline signal is the height (y coordinate in the image) of the skyline expressed as a function of the image horizontal coordinate (x component).

In this study, we use 150 landscape photos collected during the 2018 Land Use/Cover Area frame Survey (LUCAS) campaign. In a first step, the landscape photos are semantically segmented with DeepLab-V3, trained with the Common Object in Context (COCO) dataset to provide pixel-level classification of the objects forming the image. In a second step, a Conditional Random Fields (CRF) algorithm was applied to increase the details of the segmentation and to extract the skyline signal. The CRF algorithm improves the skyline resolution increasing, on average, the skyline length by a factor of two. This is an important result, which provides improved performance when estimating tree distances. For each photo, the skyline is described by the skyline signal, ysky[x], and by the associated object classes, ck[x]. In particular, objects forming the skyline are identified and associated to different classes. Signal ck[x] returns the class to which pixel (x, ysky[n]) belongs. Different objects, such as trees, houses and buildings, have different geometrical properties and need to be analyzed separately. For this reason, object classification is a crucial step in the methodology developed in this work.

The main idea developed and exploited in this work is that distant objects show lower variations in the corresponding skyline signal. For instance, a close tree is characterized by an irregular profile which is rich of details. When a tree forms the skyline, the corresponding skyline signal is affected by significant and fast variations. As the distance between the point of observation and the tree increases, details are lost and the skyline signal becomes smoother with less details and variations. This principle has been developed by considering different metrics to quantify signal variations and investigating potential relationships between object distance and variation metrics.

Variation metrics have been computed considering first order differences of the skyline signals. First order differences, which correspond to a numerical derivative, remove offsets in the skyline signal and operates as a high-pass filter which enhances high frequency signal variations. After computing first order differences, three metrics were evaluated: the normalized segment length, the sample variance, and the absolute deviation. Each metric has been computed considering skyline segments belonging to the same object class, as identified by the signal ck[x]. In addition, the effect of windowing has been considered. Windowing has been used to limit the length of the segment used for the metric computation and has been introduced to mitigate the effect of different objects belonging to the same class. Consider, for instance, the case where a line of trees is present in the skyline. This line of trees can be slanted, and trees could be at different distances. Since all the trees belong to the same object class, the corresponding skyline segment will be used for the metric computation. With windowing, only a portion of the skyline segment is used, reducing the impact of objects at different distances.

The variation metrics have been evaluated against 475 reference distances carefully measured on orthophotos for the objects belonging to the ‘trees’, ‘houses’, ‘other plants’ and ‘other buildings’ classes. As hypothesized, due do their fractal shape, the metrics based on skyline variations scale with distance for the tree and other plants classes but they do not show a clear relationship for the buildings and houses classes which are characterized by flat skyline profiles. Linear regression has been performed between the different metrics and the reference distances expressed in a logarithmic scale. For trees, the best performing windowed metric achieved an R2 of 0.47. This implies that 47% of the changes observed in the variation metrics is explained though a linear relationship with the log of distances. The metric performs from a couple of meters to over 1000 meters, effectively determining the distance order of magnitude. This is an encouraging result, which shows the potential of skyline variation metrics for the estimation of the distance between trees and observation points.

The distance metrics analyzed in this work can be useful to quantify the evolution and perceptions of landscape openness, to guide simultaneous object location on oblique (e.g. street level) and ortho-imagery, and to gather in-situ data for Earth Observation.