Given the accelerating global change, the protection of plant biodiversity is of major importance, which inevitably requires understanding its global distribution its relation to the earth system as a whole. Such plant biodiversity-environment relationships can be understood in terms of functional rather than taxonomic diversity. Still, knowledge on the global distribution of functional diversity and how it is shaped by individual functional traits remains sparse. In a joint effort, the scientific community is constantly gathering plant trait observations and making it available through the TRY database, which provides more than 11.8 million records of 2,091 traits across 280,000 plant taxa (TRY v. 5). Several studies attempted to spatially extrapolate these sparse trait observations using globally available predictors, including data on climate, soil properties, or landcover. However, so far there are only global maps for a few traits with hardly measurable and probably high uncertainties. Also, Earth observation satellite data is becoming a key technology for generating global products on plant functional traits. However, the bird-eye view from Earth observation primarily informs on upper canopies and, as a result, may only represent functional traits of the most competitive plants and not the actual plant community.
Here, we take a completely different perspective: an ever-growing plethora of geocoded plant photographs with species information crowdsourced by already more than a million citizens around the globe, i.e., the citizen science project iNaturalist. With the human eye, we immediately see that such plant photos, even if they are very heterogeneous, can provide information about morphological plant characteristics - and thus also about functional traits. To effectively harness this data treasure, we make use of deep learning and Convolutional Neural networks (CNN). Training CNN to accurately and robustly capture plant functional traits from plant photographs requires an excessive amount of training data for model training, consisting of pairs of plant photographs and plant traits. Such datasets are not readily available; however, while, iNaturalist data informs on plant species in the photographs, the above-mentioned TRY database contains enormous amounts of species-specific plant trait records. Therefore, we investigated how joining these two databases (iNaturalist, TRY) by the species affiliation enables a weakly supervised learning of CNN models for predicting functional plant traits from simple photographs. Our results show that morphological image features indeed suffice to predict several traits representing the main axes of plant functioning, including growth height, leaf area, leaf mass. Lower but still promising accuracies were obtained for traits that relate not directly to visible morphological features (nitrogen concentration and stem specific density). The accuracy was enhanced when using CNN ensembles (combinations of different CNN architectures), incorporating prior knowledge on trait plasticity and contextual information on climate (WorldClim). Our results suggest that these models generalised across growth forms, taxa, and biomes. We did not find any phylogenetic signal in the residuals (e.g., bias towards taxonomic groups), indicating that the models indeed learned to see plant traits in the photographs. The trait predictions were robust over heterogeneous crowdsourced photographs (image quality, and other image acquisition settings). Spatially aggregating such CNN-based trait predictions from 185,000 independent iNaturalist photographs via their geolocation enabled the generation global trait distribution maps, which reflect known macroecological patterns. A quantitative comparison with previous global trait distribution maps revealed significant correlations with Pearson’s r > .5 concerning growth height, seed mass, specific leaf area and stem specific density. Still, global trait distribution maps also commonly show large discrepancies, which we can also partly confirm with our products. Note, that most of the global trait distribution maps typically feature substantial uncertainty as trait expressions are commonly spatially extrapolated across large extents from sparse TRY records. In contrast, the extremely high and steadily increasing sampling density in the iNaturalist database enables the generation of global products without spatial extrapolation. All trait products derived this way are freely available, including the first-ever published map on the global distribution of leaf area.
The presented approach presents an alternative way to understand macroecological patterns. There exists a great potential of fusing crowd-sourced data with spaceborne remote sensing data or to use the respective products for comparison. Yet, biases in citizen science-based data still remains largely unknown and until now there does not exist a comprehensive reference dataset to evaluate global trait maps. Our results show that integrating crowd-sourced data may enable to further close the gap between actual and intended spatial coverage, which ecology has been falling short of for decades. Moreover, these findings demonstrate the potential of exploiting big data, with its volumes, sparsity, and variety, derived from professional and citizen science in concert with deep learning for assessing Earth’s plant functional diversity.