The goal is tunable, reusable foundation models that make it easier to mine vast datasets for new knowledge to advance science and help us adapt to a changing environment.
IBM and NASA’s Marshall Space Flight Center today announce a collaboration to use IBM’s artificial intelligence (AI) technology to discover new insights in NASA’s massive trove of Earth and geospatial science data. The joint work will apply AI foundation model technology to NASA’s Earth-observing satellite data for the first time.
Foundation models are types of AI models that are trained on a broad set of unlabeled data, can be used for different tasks, and can apply information about one situation to another. These models have rapidly advanced the field of natural language processing (NLP) technology over the last five years, and IBM is pioneering applications of foundation models beyond language.
Earth observations that allow scientists to study and monitor our planet are being gathered at unprecedented rates and volume. New and innovative approaches are required to extract knowledge from these vast data resources. The goal of this work is to provide an easier way for researchers to analyze and draw insights from these large datasets. IBM’s foundation model technology has the potential to speed up the discovery and analysis of these data in order to quickly advance the scientific understanding of Earth and response to climate-related issues.
IBM and NASA plan to develop several new technologies to extract insights from Earth observations. One project will train an IBM geospatial intelligence foundation model on NASA’s Harmonized Landsat Sentinel-2 (HLS) dataset, a record of land cover and land use changes captured by Earth-orbiting satellites. By analyzing petabytes of satellite data to identify changes in the geographic footprint of phenomena such as natural disasters, cyclical crop yields, and wildlife habitats, this foundation model technology will help researchers provide critical analysis of our planet’s environmental systems.
Another output from this collaboration is expected to be an easily searchable corpus of Earth science literature. IBM has developed an NLP model trained on nearly 300,000 Earth science journal articles to organize the literature and make it easier to discover new knowledge. Containing one of the largest AI workloads trained on Red Hat’s OpenShift software to date, the fully trained model uses PrimeQA, IBM’s open-source multilingual question-answering system. Beyond providing a resource to researchers, the new language model for Earth science could be infused into NASA’s scientific data management and stewardship processes.
“The beauty of foundation models is they can potentially be used for many downstream applications,” said Rahul Ramachandran, senior research scientist at NASA’s Marshall Space Flight Center in Huntsville, Alabama. “Building these foundation models cannot be tackled by small teams,” he added. “You need teams across different organizations to bring their different perspectives, resources, and skill sets.”
“Foundation models have proven successful in natural language processing, and it’s time to expand that to new domains and modalities important for business and society,” said Raghu Ganti, principal researcher at IBM. “Applying foundation models to geospatial, event-sequence, time-series, and other non-language factors within Earth science data could make enormously valuable insights and information suddenly available to a much wider group of researchers, businesses, and citizens. Ultimately, it could facilitate a larger number of people working on some of our most pressing climate issues.”
Other potential IBM-NASA joint projects in this agreement include constructing a foundation model for weather and climate prediction using MERRA-2, a dataset of atmospheric observations. This collaboration is part of NASA’s Open-Source Science Initiative, a commitment to building an inclusive, transparent, and collaborative open science community over the next decade.