Steve Wang has been a statistics professor at Swarthmore for 22 years. He focuses on evolution and mass extinctions, using incomplete fossil data to examine trends and causes. The Phoenix spoke to him about his research, teaching, and music.
What is your current research focused on?
I study statistics applied to paleontology. I look, primarily at mass extinction. I often try to study [questions such as] “How do you tell there’s a mass extinction and what’s the timing of the extinction?” “How long did it take – a million years, half a million years, 1,000 years?” – and then try to pinpoint when the extinction occurred. If you can figure out when it occurred, and how long it took, that often gives you clues as to what caused the extinction.
Were you always interested in statistics?
Ever since I was a kid, I wanted to study statistics because I was a baseball fan. I was hoping to go into baseball as a baseball statistician. That was my original goal in learning statistics.
How come you didn’t follow through on that?
Now I think it’s commonplace for almost every baseball team to employ statisticians or data analysts. When I was growing up that career path didn’t exist at all. I don’t think teams employed statisticians. There was no way to become a baseball statistician; there was no path to doing that. I think that was more of a fantasy at that point than an achievable goal.
What baseball teams do you follow?
The Yankees!
Were you interested in paleontology and extinctions growing up?
As a kid I was interested in dinosaurs just like lots of kids are, but didn’t really go beyond that. I read some books on evolution and paleontology, but I didn’t know that you could use statistics in paleontology. I figured paleontologists just dig up dinosaur bones in the desert and I didn’t realize there was any statistics in paleontology until after I finished grad school.
Do you think that paleontology is special when applying statistics to it?
I like paleontology because the statistics that we do in paleontology is very different from what we typically teach people in intro. We often teach people methods that apply to fairly large datasets. If you’re taking an opinion poll, you can easily sample 1,000 people. If you’re doing a medical study, you might have hundreds of people you’re giving a drug or placebo to. But in paleontology when you’re working with things like dinosaurs, it’s pretty common that dinosaurs are only known from maybe one fossil. You can’t assume that there are hundreds of specimens or hundreds of data points and you’re often trying to deal with a very small sample. The kinds of statistics that we do is different from what primarily I’ve been trained to do because those methods are typically focused on larger samples. To me, it’s fun because you’re trying to come up with new methods and you can’t rely on what you’ve been taught in college.
I watched your TED Talk where you talked about what’s missing from raw data and filling in those gaps. Does that play a large role in your teaching?
The theme I always try to communicate is that the datasets you get often in class are cleaned up and have no errors, and are very straightforward. But in real life, often you have to spend a lot of time cleaning up datasets, tracking down mistakes, and trying to figure out if there are errors and why they’re there. I think the idea that there are things that are missing in the dataset and from things that are measured is an important one for any kind of statistical work that one does.
Is paleontology specifically challenging for finding those clean datasets?
In addition to the issue of not having large samples, as I mentioned before, the fossils that we have are from all over the world, but there are some places where people look for fossils much more. So a lot of what’s known from dinosaurs is primarily known from the U.S. and Europe. There are other places in the world where there are either not as many paleontologists or not as much funding to do paleontology, or for whatever reason the terrain might be harder to find fossils in if the land is covered with a jungle as opposed to being an exposed desert. The data that we have from fossils is very biased as to who’s looking, where they’re looking, and when they looked at other factors.
How do you overcome those biases and account for them?
Part of my research is trying to figure out when things go extinct. If the fossil record were perfect, it’d be easy to figure out when things went extinct because you’d start finding their fossils and then you assume once you stop finding fossils, then the species has gone extinct. But since you’re missing a lot of fossils, for any number of reasons, you’re trying to figure out what are the biases that create the fossil record that you see, and then trying to adjust for that to try to figure out when something went extinct. There are ways you can use statistics to infer what the process that led to the fossil record looked like, and how you use that information to try to account for misleading biases in what you see.
Do you think using statistics provides any perspectives on paleontology and climate?
One of the projects that we’ve been working on is trying to look at extinctions in the past and compare them to extinctions now. If you look at, for instance, sharks, our shark populations are declining very rapidly now. It’s been estimated that for some species, we’ve lost 90% of the number of sharks of that species. Sharks have been around for a long, long time – much longer than dinosaurs – and sharks have made it through all these mass extinctions previously. So it’s somewhat alarming that these creatures that have lived through 400 years of mass extinctions are now threatened with going extinct. We’re trying to look at extinctions in the past and compare them with what’s going extinct now, and asking the question, “Is what we see now more like a mass extinction in the past, or is it more like what we call background extinction?” And what we found is the patterns definitely look more like a mass extinction. That could be because of climate change. It could be because of habitat loss. It could be because of hunting. But we can definitely use the fossil record, since that’s our best record of what life looked like before humans, as a baseline to compare against the effects that we’re having on the environment now.
What’s your favorite part about teaching?
I strive to try to create a story in each course that I have. I try to have a beginning, a middle, and an end, and I try to think of the topic that I’m trying to teach and create a story that explains that topic either by some metaphor or an analogy, or sometimes I use props to try to explain something. For instance, when I teach correlation coefficients, I use an example of taking a dating quiz. When I teach opinion polling, I use an example of tasting soup to see if the soup has enough salt in it. When I talk about confidence intervals, I use an example of walking a dog. So I try to think of a vivid example that I can use to tell the story of each topic. Trying to find those examples and craft those stories is the most fun part.
Do you think statistics is changing as technology changes?
The work I do would not have been possible, say, 30 years ago. In Intro Stat, you often learn a formula and then you plug numbers into the formula and you get an answer. In the work I do, there’s almost never a formula and so I’m trying to construct an algorithm that carries out a calculation on the computer. So often, I need to do simulations on the computer that runs for several days, sometimes even weeks, and then I’ll get an answer at the end. The work I do not would not be possible without computing.
Why does your work not lend itself to formulas?
I think it doesn’t lend itself to having formulas because a lot of the formulas assume you have a large enough sample size. A lot of the formulas we do in Intro assume that something has a normal distribution. There’s a normal distribution and there’s a clear formula that you can get an answer out of. With the small sample sizes I deal with, I almost can never assume that you’re going to have a normal distribution. The standard methods that one learns about in Intro Stat don’t often apply and so there often is no formula that will apply.
Do you think you’re more interested in applications with small sample sizes?
I think it’s more because in paleontology we have small sample sizes. I think because a lot of the theory that’s been worked out is for larger sample sizes, the same results don’t necessarily work with small sample sizes.
Within extinctions, are there any phenomena or events that you find interesting in terms of statistics?
Mostly what I’ve looked at in my career is mass extinctions. You have these great events in the history of life where large numbers of things have gone extinct. For instance, in the greatest mass extinction in history, it’s estimated that maybe 90% of all species went extinct. That’s almost like losing everything on Earth. We weren’t too far away from that. I think mass extinctions are especially interesting because these big events in the history of the Earth really shaped what happened to life on Earth afterward.
Does studying these mass extinctions change your perspective on Earth in its current state?
I think a lot of what we see in the aftermath of mass extinctions is that life will eventually rebound, but often it takes millions of years. I think that gives us more urgency for trying to prevent a mass extinction from happening now because it’s not something where ten years later everything’s going to be fine. We can be confident that eventually life will rebound but it’s going to take millions of years for life to come back after a mass extinction. It is going to take a long time to recover.
Are you interested in the biology side of extinction or more of the patterns and trends?
A little of both. Part of what I study is what the things that go extinct have in common. Is it kinds of animals that live in certain kinds of habitats, or animals or plants that have certain characteristics? Part of what I study is trying to look at what are the biological aspects of the things that go extinct versus the things that survive. For instance, we know that dinosaurs went extinct except for birds during the last mass extinction. But why did birds survive? What is it about birds as opposed to other dinosaurs that allows them to survive?
Is paleontology experiencing more data as experiments and fieldwork go on?
Yeah, there’s been a big push in the last maybe twenty years. There’s a large online database and so the push is to try to collect the entire history of paleontological discoveries and put them in this online database. So for instance, you could look up all the T. Rex fossils ever found,, where they were found, and what parts of the animal were found. If you want to find all the Cretaceous fossils that were found, you can pull up a list of them. Using this database, there’s a wealth of information that you can use to try to answer questions.
Do you have any advice for students looking to go into stats as an applied field later in life?
I would say stats is almost nowadays always used in application with other fields – stats and paleontology, biology, psychology, or whatever. So learn some other fields that you care about and that you’re interested in, and learn how statistics can be used in that field. And the other thing I’d say is that statistics nowadays involves lots of computer programming. So that’s definitely a skill that you want to practice and be good at.
I saw the keyboard in your background. Do you think music and statistics have links?
I haven’t really thought about that a whole lot. I’m sure there’s something that one could pursue there.
Do you play music?
I’m more interested in writing music, so I don’t play anything very well. I can play well enough to try to work out pieces that I want to write. I’ve always been interested in music ever since I was a kid. I had a piece performed here at the college. When I write music on my own, I’m just doing it on GarageBand on my computer, so having the chance to hear something you wrote being performed by top-notch musicians in Lang Concert Hall was a huge thrill. It was a percussion ensemble piece. Andrew Hauze [director of Swarthmore orchestra and wind ensemble] rounded up people he knew. I do that for fun; I’m not doing it in any serious way, but I enjoy it when I have time.
Any other hobbies?
A fun fact is I was on “Jeopardy!” a long time ago and I was leading the entire game until Final Jeopardy. The Final Jeopardy question I got was the most obscure question I’ve ever seen.
Do you remember what the question was?
Oh yes, I’m never going to forget this question. It was, “Lloyd Bucher was the last captain of this U.S. ship?” I think a lot of the Final Jeopardy questions you may not know them right away, but you can figure them out. And this question you just had to know cold, there was no way to figure this out.
Do you know the answer now?
Yes, it was the Pueblo. It was the ship that got captured by North Korea during the Vietnam War.
What made you want to go on Jeopardy?
I played Quiz Bowl in college and grad school, and I’ve always been interested in lots of different things. Quiz Bowl rewards broad knowledge in lots of different things as opposed to deep knowledge in a few things and so I think as someone curious about lots of different topics, statistics is the perfect field because as a statistician you get to work in lots of different fields and learn about lots of different people.