Division Federated Systems and Data
Group Member of Earth System Data Exploration
Principal Investigator of the KI:STE project
Artificial Intelligence and Machine Learning Research
Mapping of Geographical Data to Air Quality Data
Every year, Forschungszentrum Jülich organizes a festive lecture evening to mark the end of the year. This year, Jülich’s young investigators give an insight into their exciting and forward-looking work. I was lucky to be among them! Johannes Laube uses weather balloons to send measuring instruments into the little researched stratosphere, where he collects data on its composition. I am talking about modelling environmental and Earth system data using artificial intelligence.
Environmental data consists of heterogeneous, big datasets encoding
not fully understood spatio-temporal processes. These datasets pose
unique challenges to Earth scientists decoding natural processes to
solve global environmental challenges. The recent algorithmic
developments and impressing capabilities of artificial intelligence
led to first real world applications using environmental data. Under
JSC leadership, the KI:STE project aims to facilitate the application
of large scale machine learning on HPC systems for environmental data
by using a sophisticated strategy that combines the development of an
Earth-AI-Platform with strong training and network concept. The
Earth-AI-Platform will create the technical prerequisites to make
high-performance AI applications on environmental data portable for
future users and to establish environmental AI as a key technology.
Jülich, 28. Juni 2021 – Bundesumweltministerin Svenja Schulze machte auf ihrer Sommerreise heute Station im Forschungszentrum Jülich. Im Zentrum des Besuchs standen Informationen über energieeffizientes Supercomputing und der Einsatz Künstlicher Intelligenz (KI) für den Klima- und Umweltschutz. Die Jülicher Forscherinnen und Forscher wollen Methoden der KI nutzen, um Gefahren durch den Klimawandel frühzeitig zu erkennen. Mit JUWELS können sie dafür auf einen äußerst energieeffizienten und den aktuell schnellsten Superrechner Europas zurückgreifen.
Abstract. With the AQ-Bench dataset, we contribute to
the recent developments towards shared data usage and machine learning
methods in the field of environmental science. The dataset presented
here enables researchers to relate global air quality metrics to
easy-access metadata and to explore different machine learning methods
for obtaining estimates of air quality based on this metadata.
AQ-Bench contains a unique collection of aggregated air quality data
from the years 2010–2014 and metadata at more than 5500 air quality
monitoring stations all over the world, provided by the first
Tropospheric Ozone Assessment Report (TOAR). It focuses in particular
on metrics of tropospheric ozone, which has a detrimental effect on
climate, human morbidity and mortality, as well as crop yields. The
purpose of this dataset is to produce estimates of various long-term
ozone metrics based on time-independent local site conditions. We
combine this task with a suitable evaluation metric. Baseline scores
obtained from a linear regression method, a fully connected neural
network and random forest are provided for reference and validation.
AQ-Bench offers a low-threshold entrance for all machine learners with
an interest in environmental science and for atmospheric scientists
who are interested in applying machine learning techniques. It enables
them to start with a real-world problem relevant to humans and nature.
The dataset and introductory machine learning code are available in
(Betancourt et al., 2020) and
(Betancourt et al., 2021). AQ-Bench thus provides a blueprint for
environmental benchmark datasets as well as an example for data re-use
according to the FAIR principles.
Abstract. The recent hype about artificial
intelligence has sparked renewed interest in applying the
successful deep learning (DL) methods for image recognition,
speech recognition, robotics, strategic games and other
application areas to the field of meteorology. There is some
evidence that better weather forecasts can be produced by
introducing big data mining and neural networks into the weather
prediction workflow. Here, we discuss the question of whether it
is possible to completely replace the current numerical weather
models and data assimilation systems with DL approaches. This
discussion entails a review of state-of-the-art machine learning
concepts and their applicability to weather data with its
pertinent statistical properties. We think that it is not
inconceivable that numerical weather models may one day become
obsolete, but a number of fundamental breakthroughs are needed
before this goal comes into reach. This article is part of the
theme issue ‘Machine learning for weather and climate modelling’.