Culex pipiens mosquitoes are important vectors of emerging and re-emerging diseases in Europe. This mosquito species is able to adapt to a wide variety of environments, which give a complex picture in terms of trophic behaviour and vectorial capacities. A fine understanding of the vectors’ habitat suitability that facilitates survival, reproduction and dispersal becomes of paramount importance for determining the risk of local establishment, persistence and spread.
In the last 20 years, studies on vector distribution models have grown exponentially. Nowadays, the ever-increasing abundance of Remote Sensing and Earth Observation (EO) data, together with new Artificial Intelligence (AI) techniques, offers enormous opportunities for vector-borne disease investigations.
With the objective of explaining the spatial distribution of Culex pipiens in Abruzzo and Molise regions in central Italy, the integration of field entomological data, highly detailed remotely sensed imagery and newly innovative artificial intelligence algorithms, has been tested.
Two season campaigns of field collections have been carried out in 2019 and 2020, and the presence/absence and abundance of Culex pipiens in about 50 sites have been collected on a biweekly basis during the vector season (between May and November). The site locations have been chosen in a variety of climatic and environmental conditions .
The presence/absence (labels or annotations) of Culex pipiens in each entomological collection was associated with different EO datasets using the date of collection for the timing. Patches of 224x224 pixels of 20 meters spatial resolution were extracted around each site and for each date (ground truth dataset).
The EO datasets considered were:
- the multi-spectral bands captured through the optical devices onboard the Sentinel-2A and 2B satellites of the Copernicus programme, for each revisit time;
- the Land Surface Temperature Daytime and Nighttime from MODIS mission, every 8 days.
The whole dataset (1384 site observations) was split in training (80%) and test (20%) so that all the images of the same site could only belong to one of the two datasets, thus avoiding misleading results. A stratified k-fold technique was carried out with five-folds preserving the proportion of positive and negative labels between the train and test sets.
A Deep Convolutional Neural Network (DCNN) was applied to classify each site according to the presence/absence of Cx. pipiens mosquitoes (binary classification task). We evaluated three models: the first one (baseline) exploited a single multi-band image for predicting the given binary target; the second one integrated the baseline with the spatial relationships among sites, through a graph features aggregation method; the third model focused on the temporal aspect by using the features coming from a sequence of images of the same site.
Due to the reduced dataset dimension, we firstly performed and tested two different pre-training stages, followed by fine-tuning (knowledge transfer) on our data. In detail, we adopted a pre-training phase targeted on RGB bands (B4, B3, B2), taking advantage of the ImageNet dataset , and another pre-training phase which involved a self-supervised procedure called colourization . This latter technique recovers the RGB information by taking as input the other spectral bands.
In the baseline model, we applied the DCNN to extract meaningful features from a single image (the closest before the catch date) per site. Then, the final classification layer delivered the probability of each site being positive or negative.
The second approach integrated the similarity among sites based on multiple factors. As in the baseline model, we first extracted features from each image by employing a DCNN. On these embeddings, we applied a graph aggregation layer, which influences features based on their similarity. We calculated this measure as a multidimensional similarity by considering temperature and haversine distance: sites having similar values of these measures reveal a high features correlation. The hypothesis is that this correlation -in terms of the above similarities- among sites can avoid false-negative or false-positive results.
The baseline model reached an F1 score of 0.80, which was increased up to 0.82 with the integration of the graph aggregation model.
The third approach focused on the temporal component by using sequences of Sentinel 2 images of the same site as input for the deep model: the temporal window was of ten images, taking the first images 15-24 days before the Culex pipiens catch date. We extracted the embedding for each image and applied an attentive module, which computes a weighted average of instances. In this way, the most relevant embeddings were combined to provide the final classification.
The third model reached an F1 score of 0.83.
The performances of the best approach were investigated in specific sites of interest, giving useful information for targeting surveillance activities in the following seasons.
This work describes a successful synergy between entomological field activities and the use of new and advanced technologies, i.e. Sentinel 2 satellite imagery and AI Deep Learning algorithms. The methodology adopted can be extended to the national territory and to other vectors, to support the Ministry of Health in the surveillance and control strategies for the vectors and the diseases they transmit.
 Ippoliti C, Candeloro L, Gilbert M, Goffredo M, Mancini G, Curci G, Falasca S, Tora S, Di Lorenzo A, Quaglia M, Conte A. 2019. Defining ecological regions in Italy based on a multivariate clustering approach: A first step towards a targeted vector borne disease surveillance. PLoS ONE 14(7): e0219072. https://doi.org/10.1371/journal.pone.0219072
 Deng J, Dong W, Socher R, Li L, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248-255, DOI: 10.1109/CVPR.2009.5206848.
 Vincenzi S., Porrello A., Buzzega P., Cipriano M., Fronte P., Cuccu R., Ippoliti C., Conte A., and Calderara S. “The color out of space: learning self-supervised representations for Earth Observation imagery,” Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, pp. 3034 -3041 , 10-15 January 2021, 2020, DOI: 10.1109/ICPR48806.2021.9413112