Research Spotlight: Ameet Soni - Computational Biology and Machine Learning

Editor’s note: This article was initially published in The Daily Gazette, Swarthmore’s online, daily newspaper founded in Fall 1996. As of Fall 2018, the DG has merged with The Phoenix. See the about page to read more about the DG.

This is the sixth interview in the series Research Spotlight, in which I share conversations that I have with faculty regarding their research, their journeys within their fields, and their fields in a broader context.

Ameet Soni is an Assistant Professor of Computer Science.

REDDY: Generally, how would you describe your research?

SONI: My research is in the field of computer science, and within that, I would say it focuses in two subareas. One is machine learning. Machine learning is the study of solutions or algorithms that can automatically learn from data and detect patterns from data to be able to be able to solve problems. The second area is computational biology. I’m very interested in applying computer science algorithms to solve problems in biology and medicine. The problems I look at are generally data-oriented. We have lots of data and very little understanding of the underlying phenomena of that data. So, we want to come up with solutions and algorithms that can make sense of that data.

REDDY: What research projects are you working on right now?

SONI: There are a few that I have open right now. I’ve been working on one with Ali Valliani ’16, who was a student here, but has continued working with me after he graduated. We’re studying Alzheimer’s disease. In our data set we have brain images of about a thousand patients that are either susceptible to Alzheimer’s disease of have already been diagnosed with Alzheimer’s disease. We have very large brain images. In some of them we have a series of images over time. We want to see if we can detect patterns in the patients that have Alzheimer’s or are developing Alzheimer’s vs. patients that don’t have Alzheimer’s. So, hopefully we can come up with a model that we can apply to a new patient and say, “Well, this person has an x percent chance of developing Alzheimer’s, or they already have these indicators of of the disease, even though they haven’t shown any symptoms yet.” The main issue there that we have is that doctors can’t actually diagnose Alzheimer’s disease until much later after the onset of symptoms. It’s really hard to detect what the precursors are. They can look at a brain image and if the person has late-stage Alzheimer’s, you can easily see that, but they haven’t been able to determine how to look at images maybe five or ten years beforehand and predict whether somebody is going to develop Alzheimer’s. So, we’ve been doing work on that problem, and we’ve gotten some really interesting results recently on being able to improve the current state of the art in terms of how accurately we can be with those predictions. Depending on the metric, we’re around 60-70% accuracy on diagnosing across a spectrum of ways that you can classify Alzheimer’s disease. So, not good enough, but we’re making improvements, and it’s really interesting to see where that’s moving.

We’re working with a group at the University of Texas at Dallas on Parkinson’s disease. There, instead of just a brain image, we have a lot of different pieces of data for the patients. We have a doctor’s assessment, we have blood tests, we have genetic information for some. We do have brain images for some. Can we combine all these pieces of information together to get a better understanding of Parkinson’s Disease and the development of the disease, because it varies across different subpopulations.

Another project that I’m working on currently with student here is more on the biology side. We’re trying to understand if we can look at the sequence of the human genome and be able to detect patterns in the sequence to tell us what’s common across different areas of the genome, and that would allow us to understand how different genes connect and work with one other in a network.

This is work with Chris Magnano ’14 on identifying grey matter and white matter tissue in the brain. This is an important step in trying to detect various diseases or tissue decay in the brain. Our approach, CRF, outperforms other approaches by using machine learning to detect local patterns in the brain structure.

REDDY: Where are you getting the data for these projects?

SONI: We don’t have a medical school here at Swarthmore, so my projects rely either on collaboration with labs in outside medical schools or the use of data that has been released publicly. National funding agencies, such as the NIH, increasingly require or encourage the later option. So, for the Alzheimer’s data set, it comes from an initiative out of either USC or UCLA out in California, where they’ve anonymized all of the patients and made the data publicly available, and so I’ve been able to get it through that. The Parkinson’s data set is available through The Michael J. Fox Foundation, and the DNA data set has actually been made available by an organization called DREAM. They like to have competitions between between different groups where they’ll release the data and set up different benchmarks, like getting to a certain point within six months or a year.

REDDY: Machine learning and artificial intelligence are very big buzzwords at the moment. How would you define machine learning and how is it distinct from artificial intelligence?

SONI: I often go back and forth in terms of those boundaries because they seem to be getting blurred a lot more these days than when I learned about it ten or fifteen years ago in graduate school. Artificial intelligence, to me, is much more focused on the parallels, or even direct modeling, of the cognitive side of intelligence. So, the “intelligence” word is really being emphasized there. A lot of people who do research there are trying to mimic human cognition or use human cognition as a basis for evaluating systems. Artificial intelligence is more big picture in terms of trying to get an intelligence that can work across a variety of tasks.

Machine learning is a subfield of artificial intelligence, and it’s much more focused on the machinery of doing individual learning. For example, in machine learning, we might pick a specific task and only care about developing an algorithm that can learn for that specific task. Artificial intelligence wants to work on many different tasks using the same algorithm. When I talk about brain images, an artificial intelligence person may say, “Well, there are different types of images. You could be talking about a brain, you could be talking about a picture of a dog, you could be talking about the sensor on a car.” So, they might be thinking about the different ways you can be looking at images and having a more general-purpose machinery. I think that’s incredibly useful, but, for my purposes, I’m interested in solving very particular tasks that humans are struggling with. It would be great if I could get something that generalizes to other things, but I’m much more focused in working on a specific application for a specific purpose. So there, I’m thinking more about the algorithm in terms of how I can attack a very specific issue.

I think machine learning is a subproblem, and artificial intelligence is a global study of a lot of different things that interact with each other. It’s a little more philosophical, in terms of “What does intelligence mean? What does artificial mean? What are our goals? What are our benchmarks?” Machine learning is a little more on the practical side. It’s very mathematical in nature and much more applied in nature.

One experimental result, showing the average length of the root over time across hundreds of individual variants.

A bigger experiment, showing how the Kaplinsky lab can easily quantify and visualize the difference between different genetic variants using our tool. The following is a video produced by Scout Clark ’19, who worked on a joint project with Prof. Kaplinsky in Biology.

REDDY: How did you develop an interests in computer science and, specifically, machine learning and computational biology?

SONI: When I was in high school, there was a new program to encourage students to explore new areas. One of the things in there was a computer science course. That was my first experience with it. My parents are immigrants, and my dad started a small business when I was young, and I was kind of the computer guy. So, I had already had an inclination towards that. I had to learn everything about the computer and how to use it for particular things with the business, and I got really interested in that. So I took the computer science class, and that was when I really started to get interested in it. Up to that point, I thought math was really the thing I was most interested in, and a lot of the same skills that I really enjoyed using in math also applied to computer science. But it also drew on many other different types of skills and interests that I had. It felt a little bit more interdisciplinary.

I was always really interested in biology and medicine. Talking to a teacher, she mentioned that, with the human genome, there was this field called bioinformatics that was becoming big, and she was interested in it. She was a high school teacher, so she didn’t know a ton about it, but we talked about it and read some articles about it, and it really sparked my interest. But, it wasn’t until college that I really got a much firmer grasp of what was going on there. I always maintained both my interests in both biology and in computer science. I just continued pushing in that direction and got much more interested in it. I actually wasn’t interested in machine learning up until I got into graduate school. So, I knew generally I was interested in how computer science could be used to solve problems in biology. When I got to graduate school, there was a project that I had interest in working on and the professor that I paired up with specialized in artificial intelligence and machine learning. So, that’s how I got into that area, and I really started becoming much more interested in that areas as my depth area in terms of my studies at that point.

This is a result from my PhD thesis. My research on this project was to develop an algorithm to help determine the atomic structure of proteins. We focused on very difficult proteins which humans either cannot solve, or require laborious effort to solve. The image is of a protein that my collaborators were finally able to solve using our algorithm.

REDDY: Computer science as a field is growing rapidly and has been for years. What do you think the future of computer science looks like?

SONI: I think computer science is increasingly a necessary skill for individuals in society, let alone as a major. Just as we think everybody should have a math background and take English courses, I think having quantitative skills in necessary for most of us to be a functioning member of society. I think we’re starting to think much more closely and carefully about how we translate computer science in to a service discipline, in addition to being a specialty that people can focus in on. I think computer scientists need to do a better job, and are increasingly doing so, of thinking about the interface between technology and society. I think we’ve kind of left that off as a question for either philosophers or people in industry to think about. But, how we think about privacy, how we think about technology and the role of technology in our lives is a question that we should be asking as well, and be involved in that discussion. I think those are really two important directions. How can we increase the accessibility of computing to the entire population? How can we think about the role of technology in society?

Part of the boom is specifically how we think about distributed computing. It used to be that you had a personal computer and every file that had was on that computer, and all of the things that you wanted to do were on there. Maybe you had a CD or a floppy disk that you could use. But, increasingly all of our stuff is in “the cloud.” That really introduces new problems and new questions, so that’s a big area of computer science right now.

Professor Krista Thomason (Department of Philosophy) and I are planning on co-teaching a course on ethics in technology next year as a first-year seminar. I hope it’s the beginning of a lot of new collaborations between computer science and the humanities. We’re starting to see interest in digital humanities as well. I’m not involved in that, Professor Wicentowski is. We’re starting to think about the two roles I mentioned about making computing more accessible to everybody and how it is intertwined in our lives. Hopefully, students are interested in those things. We hope to expand them as we go forward.

REDDY: Are there other people in the computer science and other departments working to collaborate on similar, interdisciplinary courses?

With our enrollment problems, we’ve kind of had to cut back a little bit on them, but we have had them in the past, and I think we’re going to start them up again in the future. For example, Professor Wicentowski has worked a lot with linguists because he does natural language processing. A couple of my courses are cross-listed in biology. I’ve collaborated with researchers over there. There’s obviously collaborations between math and computer science. But, I think there are more avenues out there for us to be able to continue to collaborate different disciplines across the College.