Two weeks ago today, I woke up to tragic news from Brussels. As with many others in the United States, I started the day in stunned disbelief, reading everything I could about the attacks – looking in horror at images of carnage, hanging on the words of survivors, praying for the families of those lost.
After a while, though, I closed my browser and returned to two open files on my computer. In particular, I looked again at these two charts:
I’ve been staring at those charts, and trying to make sense of what they mean, since last October. The first plots a Syrian foreign fighter measure against itself for fifty countries. The second chart shows which variables, out of forty or so it was given, a machine learning algorithm thinks is best at predicting the dots in the first chart. As you can see, the algorithm flagged the Francophone variable — whether a country currently or previously speaks French as an official language — as far and away the most important.
When I first created these charts in October, they seemed significant and interesting, but only in an academic sense, not a practical one. Sure, Tunisia had been struck the prior spring in the Sousse attacks, but you could explain that away as a one-off event. And in any case, the plot was created to measure foreign fighter out-flows, not actual terror attacks.
A couple weeks later though, in mid-November, Beirut was hit. Then a day later, Paris.
At that point I began looking at both charts more intently. The raw data for the first one come from the best foreign fighter estimates we have for 2011-2014. The transformed measure we’d created was clearly picking up something important about the way Sunni militant networks, particularly Francophone ones, were behaving post-2014.
The more I looked at the data the more I began to idly worry about Belgium. No matter how I ran the numbers (and no matter how I tried to discount the biases involved in them), Belgium always stood out as the country outside the Syrian theater most likely to be hit next. Of course, it didn’t take a genius to come to that conclusion: the network that carried out the Paris had roots in Belgium, so I was by no means alone in flagging the country as being at-risk.
What concerned me though was that the algorithm didn’t know anything about the Paris attacks. Nor did it know anything about the specific networks that planned it. And yet it was basically saying the exact same thing.
***In the coming weeks I’ll be spending a lot of time explaining those graphs. The obvious story they tell is about Francophone foreign fighter networks, but in my view they’re also a story about machine learning too – and how the new tools at our disposal both can and cannot help us with important social and political questions.
Before I get to that though, I want to stress something else. When Will McCants and I first started our research, our starting point had nothing to do with a “French Connection.”
Instead, as I mentioned to Vox last week, our concern was something much broader: fifteen years after the Pentagon and WTC attacks, we still didn’t have a great understanding of what was going on. For all the policies we had crafted and all the resources we had expended, the problem of Sunni militancy had not gotten any better – and if anything, appeared to have only grown worse.
In the interim, a lot of scholars much smarter and more knowledgeable than ourselves had tried to explain what exactly was happening. Rather than tossing out yet another explanation, we thought, why not craft a study whose design itself could prompt a rethink on the question?
Accordingly, we set about thinking through what kind of design to use, especially given recent advances in machine learning. By way of background, social scientists have known for a while that machine learning algorithms can predict political outcomes better than even our best quantitative models, which suggests that they are better at picking up the latent structure underlying complex social processes. However, those algorithms were themselves generally too complex for us to understand. Yet in the last few years that’s begun to change. For a certain class of algorithm in particular, we now often can peek inside the “black box” – and as a result, we can use the structure the algorithms uncover to theorize anew about otherwise intractable problems.
In our case, we decided to use one type of algorithm, called a Bayesian Additive Regression Tree, to help us gain new purchase over questions about Sunni militancy. As I noted above, a lot of very talented social scientists have been thinking about violent Sunni extremism for a while now. Why not gather as much data as we could for the explanations they came up with, and then use machine learning to flag what it thought was most important and why? If we then coupled those results with in-depth qualitative work, we might be able to make new headway on what was happening.
I’ll walk through more clearly both what we did and what we found in a deep dive to be published next. For now though the main point is simply this: everything about our project was designed to start a conversation, rather than end one.
It only turned out that an important part of that conversation, at least at this point, should concern what’s going on in Francophone countries – and particularly in Tunisia, France, and Belgium.