Emptying the ‘black box’ to build better AI models | MIT News

When deep learning models are deployed in the real world, perhaps to detect financial fraud from credit card activity or identify cancer in medical images, they are often able to outperform humans.

But what exactly are deep learning models learning about? Is a model trained to detect skin cancer in clinical images, for example, actually learning the colors and textures of cancerous tissue, or does it point out some other trait or pattern?

These powerful machine learning models are usually based on artificial neural networks It can contain millions of nodes that process data to make predictions. Because of their complexity, researchers often call these models “black boxes” because even the scientists who build them don’t understand everything that goes on under the hood.

Stephanie Jegelka was not satisfied with the “black box” interpretation. A newly appointed Associate Professor in MIT’s Department of Electrical Engineering and Computer Science, Jegelka delves into deep learning to understand what these models can learn and how they behave, and how to build certain prior information into these models.

“At the end of the day, what a deep learning model will learn depends on many factors. But building a practically relevant understanding will help us design better models, and also help us understand what’s going on inside them so we know when we can deploy a model and when we can’t,” says Jegelka, who is Also a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Institute for Data, Systems, and Society (IDSS): This is critical.

Jegelka is particularly interested in optimizing machine learning models when the input data is in the form of graphs. Graph data presents specific challenges: for example, the information contained in the data consists of information about individual nodes and edges, as well as structure—what is connected to what. In addition, the graphs have mathematical symmetries that the machine learning model must respect, so that the same graph always leads to the same prediction for example. Building such symmetries into a machine learning model is usually not easy.

Take molecules, for example. Molecules can be represented as graphs, with vertices corresponding to atoms and edges corresponding to the chemical bonds between them. Pharmaceutical companies may want to use deep learning to quickly predict the properties of many molecules, narrowing down the number they have to physically test in a lab.

Jegelka is studying ways to build machine learning mathematical models that can effectively take graph data as the input and output of something else, in this case a prediction of a molecule’s chemical properties. This is particularly challenging since the properties of a molecule are determined not only by the atoms in it, but also by the bonds between them.

Other examples of machine learning on graphs include traffic routing, chip design, and recommendation systems.

Designing these models is made more difficult by the fact that the data used to train them often differs from the data that the models see in practice. The model may have been trained using small molecular graphs or traffic networks, but the graphs it sees once deployed are larger or more complex.

In this case, what would the researchers expect the model to learn, and would it still work in practice if the real-world data were different?

“Your model won’t be able to learn everything because of some rigidity issues in computer science, but what you can learn and what you can’t learn depends on how you set up the model,” says Jegelka.

She tackles this question by combining her passion for algorithms and discrete mathematics with her enthusiasm for machine learning.

From butterflies to bioinformatics

Jigelka grew up in a small town in Germany and became interested in science as a high school student. A supportive teacher encouraged her to take part in an international science competition. She and her colleagues from the US and Singapore won a prize for a website about butterflies in three languages.

For our project, we took pictures of the wings using a scanning electron microscope at a local university of applied sciences. I also had the opportunity to use a Mercedes-Benz high-speed camera – this camera usually shoots combustion engines – which I used to capture slow-motion video of the movement of a butterfly’s wings. That was the first time I really connected with science and exploration,” she recalls.

Intrigued by both biology and mathematics, Jegelka decided to study bioinformatics at the University of Tübingen and the University of Texas at Austin. She had a few opportunities to do research as an undergraduate, including an internship in computational neuroscience at Georgetown University, but she wasn’t sure what career to pursue.

When she returned for her final year of college, Jegelka moved in with two roommates who were working as research assistants at the Max Planck Institute in Tübingen.

“They were working on machine learning, and that sounded really cool to me. I had to write my bachelor’s thesis, so I asked at the institute if they had a project for me. I started working on machine learning at the Max Planck Institute and I loved it. I learned a lot there, and it was a great place.” to search,” she says.

I stayed at the Max Planck Institute to complete my master’s thesis, and then proceeded to obtain a PhD in machine learning at the Max Planck Institute and the Swiss Federal Institute of Technology..

During my PhD, I explored how concepts from discrete mathematics can help improve machine learning techniques.

Teaching models for learning

The more Jegelka learned about machine learning, the more she became interested in the challenges of understanding how models behave, and how to direct that behavior.

“You can do a lot with machine learning, but only if you have the right model and data. It’s not just a black box where you throw it on the data and it works. You actually have to think about it, its characteristics, what you want the model to learn and do,” she says.

After completing his postdoc at UC Berkeley, Jegelka became involved in research and decided to pursue a career in academia. She joined the MIT faculty in 2015 as an Assistant Professor.

“What I’ve really loved about MIT, from the very beginning, is that people really care about research and creativity. That’s what I value most at MIT. People here really value originality and depth in research,” she says.

This focus on creativity has enabled Jegelka to explore a wide range of subjects.

In collaboration with other MIT faculty, she studies applications of machine learning in biology, imaging, computer vision, and materials science.

But what really drives Jegelka is the investigation of the fundamentals of machine learning and, more recently, the issue of robustness. Often, the model works well on training data, but its performance deteriorates when it is deployed on slightly different data. Building prior knowledge into a model can make it more reliable, but understanding what information a model needs to be successful and how to build it isn’t quite as simple, she says.

It is also exploring ways to improve the performance of machine learning models for image classification.

Image classification models are everywhere, from facial recognition systems on cell phones to tools that identify fake accounts on social media. These models need huge amounts of data to train, but because it is expensive for humans to manually label millions of images, researchers often use unlabeled datasets to pre-test models instead.

These models then reuse the representations they learned when they are later tuned to a specific task.

Ideally, the researchers want the model to learn as much as it can during pre-training, so that it can apply this knowledge to its final task. But in practice, these models often learn a few simple associations — such as which image has sunlight and which one has shadow — and use these “shortcuts” to classify the images.

“We’ve shown that this is a problem with ‘divergent learning’, which is a standard pre-training technique, both theoretically and experimentally. But we’re also showing that you can influence the types of information the model will learn to represent by modifying the types of data you show the model. This is a step. One towards understanding what the models will actually do in practice,” she says.

Researchers still don’t understand all that goes on inside the deep learning model, or the details of how it affects what the model learns and how it behaves, but Jegelka looks forward to continuing to explore these topics.

“Often in machine learning, we see something happen in action and try to understand it theoretically. That’s a big challenge. You want to build an understanding that matches what you see in practice, so you can do a better job. We’re still at the beginning of understanding this,” she says.

Outside of the lab, Jegelka is a fan of music, art, travel, and cycling. But these days, she enjoys spending most of her free time with her preschool-age daughter.

Leave a Comment