Tyler Josephson’s lab sits off a main corridor in the department of chemical, biochemical and environmental engineering at UMBC. Open the door, though, and you’ll see nary a beaker, chemical closet, or lab coat. Inside, a few computers sit on tables. You might see equations scrawled on the white board or a few students poring over lines of code.
Using this modest setup, Josephson has launched an ambitious project to equip computers to make scientific discoveries—starting in the realm of chemistry. This March he won a prestigious NSF CAREER award to advance the project.
The goal of the work is ultimately to speed up the process of science, which should in turn give humanity new knowledge and tools to face down big challenges such as climate change and environmental degradation.
As Josephson and his students dive into the work, they are bringing together techniques from across mathematics, computer science, and chemical engineering. Their first step is to translate chemical theories into a rigorous mathematical language that a computer can understand.
Math as the language of science
In 1623, the Italian natural philosopher Galileo Galilei wrote an essay in which he described nature as a book written in “the language of mathematics.” Many scientists since have puzzled over the mysterious power of math to describe physical phenomena.
Josephson and his students are tapping into this power. They are using a tool developed by researchers at Microsoft called the Lean theorem prover. Lean is both a computer language and a program for checking each step of a rigorous mathematical proof.
“Formal proofs, which are verified by a computer, differ from the informal, handwritten versions often used by scientists,” says Josephson. Informal proofs are easier to write, but they usually skip logical steps, assuming a human reader will have the knowledge and skill to follow along. This means that errors can creep in undetected.
On the other hand, if a proof has been written and checked in Lean, it is guaranteed to be correct as long as the stated assumptions are true.
A community of Lean programmers
Lean has a dedicated community of volunteer developers who have built a large library of mathematical proofs, each of which can then serve as a building block for more complicated proofs. They aim to digitize mathematics, starting with the entirety of the undergraduate math curriculum, which will lay the foundation for formal proofs in advanced modern mathematics.
Josephson plans to build a similar library with formally correct derivations in science and engineering, starting with chemical concepts such as the thermodynamic behavior of gases and of molecules sticking and unsticking from surfaces.
He and his students describe their approach in a first paper on the subject, and are in the process of submitting it to journals.
The power of the work will multiply as more of the foundations of science are translated into Lean, so a large part of the team’s work will also be to recruit, inspire, and train fellow proof creators. They will hold workshops to showcase Lean for scientists and engineers, and they plan to create fun and educational games that will teach Lean-programming skills to newcomers.
“I’m really excited to share this tool with students and the scientific community,” Josephson says.
Building better scientific computing tools
Josephson’s goal to formally verify scientific theories isn’t just an intellectual exercise—it’s a means of building better tools for better science. One such tool he plans to create with NSF CAREER award support is Lean-based computer software that can simulate the behavior of molecules under a range of conditions.
Scientists often use such software to test theories as an alternative to physical experiments. It can be easier to run simulations of reactions on a computer, for example, than to mix real chemicals again and again, and some molecular phenomena may happen so fast, or under such extreme conditions, that current experimental tools cannot capture them.
However, bugs can mar the performance of the software. For example, starting in 2011, a hidden coding error fueled a seven-year “war over supercooled water,” in which two scientific groups disagreed about what happens to ultrapure water when it is cooled significantly below the freezing point of normal water, and then suddenly crystallizes.
Code written in Lean is unique from that written in the programming languages commonly used in scientific computing, since it can be provably free of such math errors, Josephson says.
As more scientists and engineers learn to write code and proofs in Lean, others will be able to write bug-free software for applications as diverse as weather forecasting, drug discovery, and predicting material performance.
“AI scientists” who reason on their own
Ultimately, Josephson hopes to use a Lean-based library of scientific knowledge to train computers as fellow scientists. For example, large language models, such as the recently popularized ChatGPT, might be trained on a library of scientific proofs and gain the ability to “autocomplete” proofs on their own, translate informal proofs from the literature into formal ones, and even discover entirely new scientific theories, which could then be checked for correctness by Lean.
A tool like this might revolutionize science. In Galileo’s time, a single person could master large portions of human scientific knowledge, but now scientists usually go to school for decades to become experts in a tiny subfield, Josephson says.
AI scientists capable of digesting a database of thousands of scientific proofs in multiple disciplines might draw connections across them to reveal new discoveries. “Such a tool could lead to an AI-powered Renaissance in interdisciplinary scientific discovery,” says Josephson.
While such lofty goals remain in the future, Josephson and his students are energized by the possibilities. As they embark on an exciting scientific journey, they are thrilled to bring as many people as possible along on the ride.