|Paper title||HIECTOR: Hierarchical object detector using multi-scale satellite imagery|
|Form of presentation||Poster|
Object detection, classification and semantic segmentation are ubiquitous and fundamental tasks in extracting, interpreting and understanding the information acquired by satellite imagery. The suitable spatial resolution of the imagery mainly depends on the application of interest, e.g. agricultural activity monitoring, land cover mapping, building detection. Applications for locating and classifying man-made objects, such as buildings, roads, aeroplanes, ships, and cars typically require Very High Resolution (VHR) imagery, with spatial resolution ranging approximately from 0.3 to 5m. However, such VHR imagery is generally proprietary and commercially available only at a high cost. This prevents its uptake from the wider community, in particular when analysis at large scale is desired. HIECTOR (HIErarchical deteCTOR) tackles the problem of efficiently scaling object detection in satellite imagery to large areas by leveraging the sparsity of such objects over the considered area-of-interest (AOI). In particular, this work proposes a hierarchical method for detection of man-made objects, using multiple satellite image sources at different spatial resolutions. The detection is carried out in a hierarchical fashion, starting at the lowest resolution and proceeding to the highest. Detections at each stage of the pyramid are used to request imagery and apply the detection at the next higher resolution, therefore reducing the amount of data required and processed. In an ideal scenario, where objects of interest typically cover only a very small fraction of the whole AOI, the hierarchical method would use a significant lower amount of VHR imagery. We investigate how the accuracy and cost efficiency of the proposed method compares to a method that uses VHR imagery only, and report on the influence that detections at each pyramidal stage have on the final result. We evaluate the HIECTOR for the task of building detection at the country level, and frame it as object detection, meaning that a bounding box is estimated around each object of interest. The same criteria could be however applied to different objects or land covers, and a different task such as semantic segmentation can replace the detection task.
For the detection of buildings, HIECTOR is demonstrated using the following data sources: a Global Mosaic  of Sentinel-2 imagery at 120m spatial resolution, Sentinel-2 imagery at 10m spatial resolution, Airbus SPOT imagery pan-sharpened to 1.5m resolution and Airbus Pleiades imagery pan-sharpened to 0.5m resolution. Sentinel-2 imagery and the derived mosaic are openly available, making their use very cost efficient. Given that single buildings are not discernible at 120m and 10m resolutions, we re-formulate the task differently for such levels of the pyramid. Using the Sentinel-2 mosaic at 120m resolution, we regress the fraction of buildings at the pixel-level, and threshold the estimated fraction at a given value to get predictions of built-up areas. Such threshold is optimised to minimise the amount of detected area and of missed detections, while maximising the true detections. Once the build-up area is detected on the 120m mosaic, Sentinel-2 imagery at 10m resolution is requested, and an object detection algorithm is applied to the imagery to refine the estimation of build-up areas. In this case, a bounding box does not describe a single building but rather a collection of buildings. The estimated bounding boxes at 10m are joined and the resulting polygon is used to further request SPOT imagery at the pan-sharpened spatial resolution of 1.5m. In the case of SPOT imagery, given the higher spatial resolution, one bounding box is estimated for each building. As a final step, predictions are improved in areas with low confidence by requesting Airbus Pleiades imagery at the pan-sharpened 0.5m resolution. Within this framework, the VHR imagery at 0.5m resolution is requested only for a small percentage of the entire AOI, greatly reducing costs.
The Single-Stage Rotation-Decoupled Detector (SSRDD) algorithm proposed in  has been adapted and used for building detection in Sentinel-2 10m images, and in Airbus SPOT and Pleiades imagery. The Sentinel Hub service  is used by HIECTOR to request the imagery sources on the specified polygons determined at each level of the pyramid, allowing to request, access and process specific sub-parts of the AOI. Within this talk we will present an in-depth analysis of the experiments carried out to train, evaluate and deploy HIECTOR to a country-level AOI. In particular, analysis of the trade-off between detection accuracy and cost savings will be presented and discussed.
 Sentinel-2 L2A 120m Mosaic, https://collections.sentinel-hub.com/sentinel-s2-l2a-mosaic-120/
 Zhong B., and Ao K. Single-Stage Rotation-Decoupled Detector for Oriented Object, Remote Sens. *2020*, 12(19), 3262; https://doi.org/10.3390/rs12193262
 Sentinel Hub, https://www.sentinel-hub.com