In 1936, the Literary Digest infamously predicted a landslide win for Alf Landon over FDR. In 1948, Gallup even more infamously had Dewey decisively besting Truman. And now, in 2016, yet another reckoning: “Clinton Trumps Trump” has yielded to “Trump Triumphs.”
No doubt it would be easy to lump these failures together. However, it would also be a mistake. Landon’s hopes and Dewey’s aspirations ultimately lost out to fundamental errors in polling methodology. By contrast, Clinton fell prey to something far more befitting the 21st century. The problem wasn’t with the polls, but with how they were aggregated — or more specifically, with the sophisticated models that were built on top of them.
Where the models went wrong was this: they assumed that the fifty state elections were largely independent. Think of the election as a series of fifty coin flips. One way to model that series is by flipping fifty distinct coins exactly once, with each coin biased in a way that’s both unique and random. The allure of this model is its simplicity. If each state’s bias is independent from the others, then over fifty states the biases will ultimately cancel out, and the actual results will fall in a narrow band around the national polling average. (As best I can tell, this is why Huff Post’s model had a win probability for Hillary of 98%.)
However, that model is almost certainly not true. In reality, many state elections are likely to be biased in a similar way. In that case, the election overall is more like fifty flips of the same biased coin. If we model the election that way, then the actual election results will converge on the polling average plus the consistent bias. The polls for PA, NC, FL, MI, and WI, among others, suggest that this is likely what happened: the polling error in each of these states moved in the same way, due to mistaken likely-voter estimates for white citizens without a college degree. In that regard, it’s not a concidence that Nate Silver, the aggregator who most strongly accounted for the possibility of correlated state-level errors, also had the greatest uncertainty in his forecasts.
So where to go from here?
To be sure, the polls themselves could be improved.  But again, they were not the problem. The median polling error was only around 4%, and even then the polls got the popular vote right. It’s the aggregation that needs work. And to be clear, it’s not that the aggregators should have identified the specific bias among white, non-college voters — it’s that they should have accounted for the possibility that that kind of bias might exist. In the next election, aggregators either need to pay more attention to how they model correlated errors, or publish clear disclaimers about potential sources of uncertainty that are not represented in their models.
Finally, one last point. This is the same mistake that contributed to the housing bubble a decade ago, just prior to Obama entering office. As with the aggregators today, the banks back then assumed that state and regional real estate markets were all independent — many separate coins, so to speak, rather than one coin flipped many times. In retrospect treating sub-national errors in this way was clearly wrong then and is clearly wrong now. 
The first time we made this mistake it was devastating for millions. But at least then we could argue, with something of a straight face, that we didn’t know better.
We have no such luxury here.
 In particular, there need to be greater incentives for pollsters to release their raw data. When the Times commissioned a survey of Florida voters last September and asked five separate teams to analyze it, it’s telling that the one team who got Florida “right” also used a unique likely voter screen. Ideally, this kind of open analysis should be the norm, not the exception.
 I’d also go a step further than this, and suggest that in this case the bias probably wasn’t just correlated across states in the U.S. If you had tried to model U.S. and U.K. politics together one year ago, you likely would have found that the national-level errors were correlated too — i.e., that the polls had a similar downward bias in their likely-voter screens for white non-college voters.
When it comes to prosecuting human rights violations, one major challenge is establishing how extensive those violations were in the first place.
The worst cases of mass atrocities often occur in the contexts of civil wars and protracted insurgencies, neither of which are great research environments. As a result, political scientists and policy researchers are often left only with multiple, fragmented datasets. The situation is akin to Rumi’s parable about blind men touching an elephant — each dataset offers a glimpse of the overall picture of what happened, but none give the whole thing.
So how do we best use what little information we do have?
MSE is still relatively arcane. However, last week Ball posted a great introduction to several new implementations of MSE:
This is a semi-technical introduction to estimating undocumented events by using multiple, intersecting datasets. This approach is called “capture-recapture” or “multiple systems estimation” …
The point of MSE is to use multiple, partial lists of a population to estimate the total population. The method depends on the integration of the lists, in which records that refer to the same elements of the population (called “co-referent” records) are linked in clusters.
If you know a bit of R and are interested in human rights and foreign policy, do yourself a favor and take a quick walk-through of Ball’s post.
In case it’s not obvious: one reason I’d love to see MSE get more traction has to do with ISIS and Syria. If there’s ever to be full reckoning of ISIS’s war crimes (much less Assad’s), it will only happen through the methods that Ball and others have pioneered.
It is difficult to design an effective policy when there is little consensus on what it means or what it constitutes. Consequently, CVE policy and efforts are mostly designed and funded on the basis of anecdotal evidence, with unknown results.
Not to beat a drum on this, but we badly need a better empirical understanding of what’s going on, both in terms of what drives individuals to violence and what the most effective interventions are. Partly that means more funding for independent CVE research, but it also means greater coordination of existing programs and analysis.
One last point I loved about Bobby’s piece: I actually learned a lot. I follow the space pretty closely, but even I wasn’t aware of some the resources and programs he mentioned.
Murtaza Hussein has a fascinating piece up on “Rage Wind”, a new short film on the recent battle for Aleppo:
ISIS exploded into the popular consciousness of many Americans following a series of videotaped executions of Western hostages. Those images have become grimly iconic of the Syrian conflict to many in the West — but ISIS is only one of many parties producing its own media about the war. The video produced by Ahrar al-Sham eschews the blood and gore that is ISIS’s signature, turning the subjects of the film into identifiable characters and imitating the storytelling style of traditional war movies.
The film bears an easy and purposefully resemblance to multi-player video games, but the camera movement evokes Saving Private Ryan as much as Call of Duty. The coupling of distinct cinematic and gamer aesthetics is what makes Rage Wind visually compelling.
To me, though, the main takeaway of the film is how ubiqitous smartphones and HD cameras are on the battlefield now. More and more it feels like Syria will go down as the first smartphone war: not just in the sense that Syria bears the same relationship to smartphones that Crimea or the US Civil War bear to photography, but in the sense that the smartphone itself has become a kind of battleground in its own right. The parties are no longer competing over territory alone, if they ever were. They’re also investing heavily in a competition over how the war is experienced and consumed on a four inch piece of glass.
The overall trend is of a global Muslim community that has lowered its birthrate at a much slower pace than the rest of the world, according to Pew senior demographer Conrad Hackett.
“That Muslims are growing twice as fast as the world’s population is really striking and remarkable,” he said.
When young people lack economic opportunities and the prospect of being able to support families of their own, experts say, they are especially susceptible to the lure of anti-establishment ideology. In Latin America during the 1970s and 1980s, when countries in the region were experiencing youth bulges, that draw often was Marxism, Assaad noted. But it could take the form of austere varieties of Islam for disgruntled Muslim youths today.
I wish Emont had honed in more on Hackett’s point. We don’t know anything about why youth bulges in Muslim countries might be different from those elsewhere. We do know, however, that many of the factors that reduced fertility in non-Muslim societies have not had the same effect in Muslim ones.
If there’s something distinctive about “Muslim” youth bulge, I suspect it will lie more in the cause than the effect.
Globally, over one billion people self-identify as Sunni Muslim. The overwhelming majority of those individuals live peaceful and productive lives. Yet a miniscule fraction, far less than one percent of one percent, resort to political violence in pursuit of what they believe to be an Islamic ideal.
The question we all want an answer to is this: What’s going on with that tiny fraction? What could possibly compel an individual to join it?
That question is an exceptionally difficult one to answer. Yet as recent violence from Pakistan to Paris can attest, it’s one that urgently beckons us to try.
Will McCants and I recently set out to rethink how we do that. Using new techniques in machine learning, we arrived at some surprising conclusions. Neither of us anticipated or particularly liked what we found; the early results, which we recently discussed in a short essay in Foreign Affairs, went against our own prior beliefs about what drives Islamist violence. Yet it is precisely because of our own misgivings about those results, as well as the heated responses they have provoked, that I want to walk through how we arrived at them, and articulate why we believe they warrant further attention.
The place to start is with two barriers to any project on radicalization.
The first is epistemological: we will never know with full certainty what causes an individual to radicalize. That may sound like a depressing statement, but it’s not. We also cannot know with full certainty that smoking causes lung cancer, or that humans have caused climate change. Yet in those cases a decades-long effort by teams of researchers around the world – each one further testing and refining correlational data – has eventually led us to believe that those relationships are in fact causal.
The second barrier, as you might guess, is that to date there has been no decades-long, quantitative effort to study radicalization . Why does someone decide to engage in political violence, including against innocent civilians, in pursuit of an Islamic ideal? Answering that question effectively will require a generational effort on par with cancer and climate research. Obviously social processes are different than biological and natural ones, and any research agenda needs to take that into account. But if we’re to make progress on radicalization, we need to begin collecting better data while also systematically refining and exploring the correlations that appear within them. Although we may never enjoy as much clarity with radicalization as we do with climate change, we need to begin at least trying to attain it – study by study, correlation by correlation.
At present we have way more theory than data. Accordingly, Will and I started by designing a study that would let us pare those theories down. Put differently, unlike conventional social science, the goal of our study was not to prove any one argument conclusively. Instead, the goal was to build better, more precise theory that we and others could then use as the basis for subsequent research .
To explain how we did that, I need to introduce two concepts here. The terms for them may be a little jargony, but in the right context they’re intuitive to understand.
The first concept is what are called “linear” and “non-linear” effects. Just think of the effect of rainfall on plants. Initially, it may seem like the effect is linear: a little rain will make a plant grow a little bit, and a little more rain will make it grow a little bit more. In reality though that’s not true. If there’s no rain, plants will die; if there’s some rain, they’ll grow well; if there’s too much, they go back to dying. It turns out that what seemed simple and straightforward is actually more complicated. In this case, the effect of rain depends on how much rain has already fallen. In social science terms, this means rain’s effect is “non-linear.”
The second concept is what’s called an “interaction.” Initially, it may seem like the effect of sunlight on plants is also straightforward: plant a seed where it can get some sun, and it should grow. Yet it’s not that simple either. Plant a seed in the desert, where there’s plenty of sun, and it still probably won’t grow. The reason is that the effect of sunlight depends on other factors, such as whether there’s water. In social science terms, this means that the effect of sunlight is “interactive” with the effect of water.
When we as researchers have no prior knowledge of what causes a social phenomenon, the principal question we face is not just whether we think the social world is as complex a process as plant growth. Instead, it’s also whether we believe we can adequately theorize about that complexity on the basis of our intuitions alone. For instance, with respect to radicalization, we might think that poverty plays a role, and education too. But how exactly does each play a role? Is the effect of poverty or education non-linear? And if we think either effect is non-linear, how is it non-linear? Do we think the effect of going from 5% to 15% youth unemployment, say, will be greater or less than the effect of going from 35% to 45%? If so, why? And in any case, do we think the effect of education depends on the effect of youth unemployment, or vice versa? If so, do we think their interaction has a negative effect? And if so, in what ranges do we think that negative effect is most pronounced?
In the past there wasn’t much we could do to answer these questions. Standard methods of analysis, or what are generally known as linear regressions, require researchers to specify up front which variables they think are “non-linear” and which are “interactive” – which is an obvious problem for researchers who lack strong intuitions about the structure of their data. By contrast, we’ve long known that machine learning algorithms could theoretically pick up on non-linearities and interactions, but until recently the algorithms were so complex in themselves we couldn’t actually peek inside them to see the structure they were uncovering.
Recently that’s begun to change. We can start to peek inside these algorithms now. For a certain class of algorithm in particular, we generally can identify the “non-linear” and “interactive” relationships the algorithm uncovers . As a result, these algorithms have opened up a new way to do social science: in cases where we have more theory than data, we can use them to flag both which theories appear to matter most, and how exactly they seem to matter.
That is what Will and I did with our study on radicalization. There are a lot of plausible theories about what might be going on, and we felt we couldn’t know beforehand how exactly all those different theories might fit together. So we thought we would turn our data over to an algorithm and let it tell us what it thought mattered most and why. Once we’d done that, we could then go through the results to theorize anew about what might be going on.
Of course, this approach begs another question: which data should we use, and why?
Recall that what we really want to know is, why do individuals radicalize? (In this case, Sunni Muslims who actively support violent extremist groups.)
The only way you can answer this question directly is with a dataset about individuals. Think of a spreadsheet where each row is a person, and then each column is a data point about that individual, such as their age, income, or gender.
The catch is that such a spreadsheet is exceptionally difficult to build in the right way – and especially so if you want it to include individuals from more than a handful of countries. I’ll have much more to say about this in a future post. But for now suffice it to say that a global dataset of individual extremists was beyond the scope of what we could do.
As a result, we had to change the question slightly. In particular, we asked, where is the problem of radicalization most acute, and why?
Put differently: if we were policy-makers concerned with radicalization, which countries would we most want to focus on, and why?
The advantage of posing the question this way is you can answer it with what’s called “country-level” data. Think of a spreadsheet where each row is a country, and each column is for a data point about that country, such as GDP, literacy rate, or child mortality. Then in the last column, there’s a relative measure of radicalization.
The good news is that we have this kind of data. The bad news is that it comes with a drawback: aggregate data places strong limits on the kinds of inferences we can make. In this case, once we start using country data, we can draw conclusions about larger trends, but we can’t draw conclusions about individuals. I’ll be returning to this point below, because it bears on what we can and can’t say about our Francophone finding.
For now, I will just note that this issue, whose technical name is the “ecological inference problem”, was not a pressing one for our research design. Recall that our goal was to build better theory that we could then test later. The whole idea here was to use data about countries to identify the major aggregate factors that appear to matter for radicalization. As individual data becomes available, we could then look at it directly to see if the aggregate issues we flagged really do bear out at the individual level.
By way of comparison, imagine if you were a doctor a hundred years ago, and you were trying to figure out what was going on with lung cancer. Further, imagine all you had was county-level information across the United States. You’d notice pretty quickly that there was a correlation between smoking rates and lung cancer rates. That wouldn’t tell you conclusively there was an individual-level relationship, but it would tell you to take a close look at smoking as a risk factor once individual data became available.
In a nutshell, that’s what Will and I were trying to do here: flag strong aggregate correlations now, so that we and others can take a closer look at them later.
All that is preamble for the foreign fighter data we used. About a year ago the International Centre for the Study of Radicalization and Political Violence (ICSR) released estimates for the number of foreign fighters who traveled from a given country to Syria. The ICSR dataset covers fifty separate countries, and each estimate is for the entire 2011 to 2014 period.
As with any observational data, the data we used is not perfect. It’s got two potential biases in particular, although I don’t think they’re especially problematic here . Further, Syrian foreign fighter data also is not a perfect proxy for radicalization overall – some Sunni radicals may stay and fight at home, after all, while others may travel abroad to countries other than Syria. Nonetheless, the Syrian civil war was clearly the most important conflict for Sunni jihadists in the 2011-2014 period, and traveling there was relatively easy. As a result we felt it served as a reasonable proxy for radicalization.
The big issue was what to do with the data. To lay readers, it may seem like the hardest part of data science is knowing which algorithm to use, or which analysis to run. In my experience that is not true. Instead, the hardest and most difficult part comes well before that, when you decide how to reshape your data so that it can actually tell you what you want to know. Get this wrong, and everything else falls apart.
In our case, the raw ICSR data couldn’t tell us what we wanted to know. So we had to transform it in a few ways to be able to make useful comparisons.
First, we had to account for the size of a country’s Sunni Muslim population. Without doing this, you can’t make an honest comparison between foreign fighter contingents from separate countries. For example, Saudi Arabia produced more foreign fighters than France, but they’re a much smaller percentage of Saudi Arabia’s Muslim population compared to France. To make valid comparisons, we thus had to calculate a radicalization rate: namely, the number of foreign fighters from a country divided by its Muslim population. (For data on Muslim populations, we used the 2010 estimates in the World Religion Database.)
Second, we also had to determine a country’s share of the overall foreign fighter problem. Note that where is the problem of radicalization most acute? is a relative question. It’s not asking which countries produce the most militants. Instead, it’s asking where the problem is most pressing relative to other countries where it’s not so urgent. Since the question is a relative one, the data needs to be relative too. And the way you get relative data is to divide the estimates for each country by the same number – in our case, we divided the number of foreign fighters from each country by the sum of all Syrian foreign fighters globally. Doing this puts each number on the same footing, and produces a relative measure we can use to make informative comparisons. In our case, we called that measure the foreign fighter share.
However, note that neither the radicalization rate nor the foreign fighter share tell us all we need to know. For instance, consider the radicalization rate. Suppose that Latvia had one self-identifying Muslim, and that individual went to fight in Syria. Do we then want to say that Latvia is the epicenter of radicalization because 100% of its Sunni Muslim population radicalized? No, not at all. If another country had 10 million Muslims and only 0.01% of them radicalized, we would want to pay much more attention to that country than Latvia. By contrast, the foreign fighter share has the opposite problem. It gives us a good relative measure of where the most foreign fighters are coming from, but it doesn’t control for Muslim populations. Again, Saudi Arabia’s share is slightly higher than France’s, but since Saudi Arabia has far more Muslims, the share alone is misleading.
Both the radicalization rate and the foreign fighter share tell us something about what we want to know, but not all of it. To get the complete picture, we thus multiplied the two numbers together. That combined figure is what we called a foreign fighter score.
However, that foreign fighter score can be difficult to interpret on its own. Population statistics tend to be quite large, so these scores tend to be quite small. Is a score of 0.000018 bad? It’s hard to say by itself. So we standardized the number to make it easier to interpret. The resultant standardized foreign fighter score tells us how far a country’s foreign fighter score is above or below the mean. That gives us what we want: a measure of how acute radicalization is in some countries relative to others. The countries at the top of this score are the ones we especially wanted to pay attention to – they are the ones that have both high rates of radicalization and also contribute a large share of all foreign fighters globally. If we were policy-makers, these are the countries we’d want to focus on most.
I’ve released the full foreign fighter scores here. For a snapshot though, this is what the data look like:
Ave Std FF Score
United States of America
United Arab Emirates
Table 1: Standardized Foreign Fighter Scores. Countries are listed left-to-right with the higher scoring country at left. The first number in each row is how many standard deviations above or below the mean the initial country in each row is. All countries above the mean are italicized. Countries in bold are Francophone, with asterisked countries (*) representing countries that officially speak French but are not former French colonies.
The first five countries are the ones you want to pay attention to. They’re way above everyone else.
In fact, just how far above those countries are becomes a lot clearer when you plot that standardized score against itself:
As you can see, the first five countries are clustered well above the remaining countries. And while the data for that chart come from 2011-2014, the events of 2015 and early 2016 seem to bear out their importance. Four of those five, or 80%, have suffered terror attacks in which thirty or more people were killed by Sunni extremists; less than 10% of the remaining 45 countries have. Although we didn’t design the data to predict attacks, the fact that those countries have all been attacked seems to confirm that our transformation is in fact picking up important information about where radicalization is most problematic.
Once we had that list of important countries, we wanted to know what they had in common that most other countries didn’t. In that regard, Jordan and Lebanon are easy to explain: they are both small Muslim countries that border Syria. In a sense it would be weird if they didn’t score highly.
But Tunisia, Belgium, and France are more difficult to understand. Obviously they are all Francophone countries. But that could be random; likewise, it could owe to something else they all share that others do not, such as political institutions or unemployment patterns. To determine what was producing the result, we had to analyze the data using some of the machine learning I discussed above.
More precisely, we passed all our data to something called a Bayesian Additive Regression Tree (BART) .
That’s a mouthful. But the core idea for how the algorithm works should make sense to anyone who’s ever shuffled a deck of cards.
In our case, imagine if for every theory you thought might explain radicalization, you wrote down the name of a corresponding dataset on an index card. So for example, since some scholars have proposed poverty as a driver of Sunni militancy, you could write down “average income” on one card. Meanwhile, other scholars have argued that it’s not really the overall level of wealth that matters, but whether people are employed or not; youth with jobs, after all, may be less likely to radicalize than those without them. So for that theory, you could write down “youth unemployment” on an index card. Likewise, other scholars have framed radicalization as a problem of education, so you could write down “literacy rate” on another card. As you do this for more and more theories, pretty soon you’re left with a sizable deck of cards.
Now comes the fun part. What we want to do is see which cards – and more, which combinations of cards – best predict our foreign fighter score.
Suppose we shuffle the deck and pick up the card on top. If it’s the “youth unemployment” card, we go and get our unemployment data and see how well it predicts the foreign fighter score. If in general youth unemployment is high when the foreign fighter score is high, then the card helped us make a better prediction, so we put it in a pile to our left, which we’ll call the “keep” pile. If the card didn’t, we instead put it in a pile to our right, which we’ll call our “garbage” pile.
Let’s say the unemployment data does help us predict the foreign fighter score. So now we’ve got one card in our “keep” pile. We take then the next card from the deck. It says “literacy rate”, so we go and get our literacy data. This time though, we don’t just want to see if it helps to predict our foreign fighter score. Instead, we want to see if combining the unemployment data and literacy data helps us predict our foreign fighter score better than just using the unemployment data alone. (It’s this emphasis on combinations of cards that allows us to build up a clear understanding of interactions in the data.) In any case, if the unemployment data and literacy data do a better job than just unemployment alone, we put the literacy card in the “keep” pile. If the combination doesn’t work, the literacy card goes straight toward the garbage pile.
We keep doing this until we have, say, five cards in the “keep” pile. At this point, we write down the order of the cards in the “keep” pile. Then we put all the cards back into one deck, shuffle like crazy, and go through the whole process again.
It turns out that if we do this enough times, we can start to average across all the “keep” piles we’ve written down. And as long as we’ve been recording what’s in those piles, we can get an incredibly granular understanding of our data – including which cards are best at predicting our data, whether or not certain cards usually appear in the “keep” deck with others (a strong interaction), and so forth.
In reality what the BART algorithm does is a bit different than what I’ve described, and altogether more complex. But it’s generally consistent with the core idea I’ve illustrated. When all is said and done, this method helps us paint a remarkably clear picture of what might be going on.
In our case, I fed the BART algorithm data on about 40 variables that scholars have theorized might have something to do with radicalization. I’ve included a list of all the data we used in the footnotes below . (In reality, we looked at even more variables than this, but for technical reasons I pared it down. I’ll have more on this and other details, such as model fit and robustness checks, in a more technical follow-up post.)
Most of the data we used, such as average income, urbanization rates, and literacy rates, are fairly self-explanatory. But there are also a few others that I need to explain here, because they bear on our Francophone finding.
In particular, we included data on political institutions and culture. The data we used for political institutions included what you might expect, such as how democratic a country is, or how well it protects civil liberties and political rights. However, because we are interested in a form of religious extremism, we also included data on the degree to which the state imposed secularism on its populace, and the degree to which it discriminated against Muslims. (We took data for each from the well-known Religion and State and Religion and State-Minorities datasets of Jonathan Fox.)
In addition, we included several variables designed to capture any historical or cultural effects shared widely across countries. In particular, we included whether a country currently speaks, or previously spoke, French (Francophone) , English (Anglophone) and/or Arabic (Arabophone) as an official language. Further, we also included a marker for countries that were formerly Soviet Socialist Republics.
Once BART had finished, the first thing we wanted to know was which variables seemed to do a good job predicting our foreign fighter scores. Put differently, if you think back to the “keep” piles I mentioned above, what we wanted to know was this: if we picked a card from one of those “keep” piles at random, how likely would we be to pick a given variable’s card? If all the approximately forty variables were equally predictive, each variable would show up a little less than 3% of the time. So what we want are variables that are above 3% (0.03) in the “inclusion proportion” referenced in the graph. That’ll tell us which of the variables are best at predicting our foreign fighter scores.
As you can see, there are really maybe a half dozen above that number. But there is only one that is well above that number (Francophone) and only two more that stand out at the next tier (urbanization and youth unemployment). These are the three variables doing the best job of predicting our foreign fighter score.
That said, we don’t just want to know which variables are good at predicting foreign fighter scores. We also want to know how strong the relationship is between those variables and our foreign fighter score. To get that we can use what’s called a partial dependence (PD) plot. In our case, the PD plot below shows how many more standard deviations above or below the mean a country would be, on average, if it were Francophone rather than not Francophone. The plot shows that the Francophone variable seems to drive the standardized score up by more than half a standard deviation, which here is a large effect.
The million-dollar question, of course, is whether we can trust that result. Empirically it doesn’t seem to be random, nor so far does it seem to be better explained by something else. Yet to fully believe it, we still need to have some theoretical story, preferably one grounded in additional evidence, for why the relationship might be so strong.
Consider again Figure 2 above, which is the graph that weighs the variables’ relative importance. When I first saw it, I immediately thought of Ferguson or Baltimore. Each of those cities have high rates of youth unemployment overall, which are driven primarily by disproportionately high levels of unemployment among African-American youth. The chart suggests that something similar may be going on in Europe and North Africa, but with a religious/secular divide overlaid on top of any racial one.
In the future I’ll explain more what I mean by that. What I want to focus on here is just the secular part, and how it relates to what Will and I called “French political culture” in our Foreign Affairs article.
As we said in the article, we don’t believe speaking French has any real effect, and certainly not a negative one. “Francophone” is a proxy for something else. We think that something else is French political culture, specifically the French approach to secularism or laïcité.
Recall that we included a “forced secularism” variable in our data. That variable codes China and Turkmenistan, among others, as having “forced secularism.” Since China and Turkmenistan score far below Francophone countries in our standardized foreign fighter score, we think the Francophone coding is picking up on whatever is unique to the political discourse surrounding secularism within France, and not present in the discourse around secularism in countries like China or Turkmenistan.
Of course, the French political discourse around laïcité has been relatively constant in France for quite some time, as it has in the post-colonial Francophone countries of North Africa. Yet based on qualitative research, we don’t think foreign fighters were as much of a problem in Francophone countries prior to 2010 or so as they were in the 2011-2014 period covered by our study. How then can a constant variable (Francophone) suddenly produce such a major change in behavior? For the political culture theory to be compelling, there needs to be some story that explains this abrupt change. We believe the story has to do with a change in the discourse around laïcité in some French-speaking countries prior to the onset of the Syrian civil war.
Both France and Belgium had public national debates about whether to ban face veils, which are commonly worn by Muslim women in particular, in 2010. France passed a bill that outlawed a niqab and some burqas in September 2010, and the bill began to be enforced in April 2011. (In the same month, the Assad regime began large-scale attacks on civilian protestors in Syria, leading soon thereafter to outright civil war.) Meanwhile, the lower parliament of Belgium passed a similar bill in April 2010. It began to be enforced in July 2011.
In Tunisia, the opposite experience produced the same effect. In 1981, Tunisia’s secular regime passed a law banning headscarves from being worn in government offices and schools. The law remained in place over the next 25 years, but was unevenly enforced. Then in 2006, the Ben Ali regime began enforcing the law more consistently – as the BBC put it, the regime “launched a campaign against the Islamic veil.” When the demonstrations and protests that started in late 2010 led to Ben Ali’s ouster in January 2011, the ban was effectively overturned. But in the political discourse leading up to the elections of October 2011, secularism – and the legacy of the veil – was still very much a part of the national discourse. (Nor has it fully gone away since; in February 2014, the Tunisian Interior Ministry “vowed to enforce stricter controls” on the niqab.)
So among the countries that do not border Syria, the three that had the highest foreign fighter score were all Francophone, and all had rancorous national debates about legislation that would enforce secular norms in the immediate period prior to the onset of the Syrian civil war. If it’s not clear, what we think matters most here is not the bill or law itself, but the political discourse around it – particularly the way it heightens, for a time, popular conceptions that it is not possible to have both a western or secular nationalist identity, and also a Muslim one.
Why would this lead to more foreign fighters? In short, such a discourse helps jihadist recruiters who want to sign up Muslims who believe they don’t belong. Note that in jihadist texts, a core strategic goal of carrying out attacks in the west is to force western societies to over-react such that they start discriminating against their Muslim populations. In that sense, the ultimate goal of the violence is not so much to kill as it is to force the Muslims within those populations to make a black and white choice between identifying as Muslim and identifying as western. ISIS calls this strategy eliminating the “grayzone.”
When a state seeks to ban the niqab or hijab, it forces that same choice. Debates over that ban are thus debates, in a sense, about whether Muslims who cover their head in adherence to their faith can be both pious and western at the same time. For jihadist recruiters in the west, those debates play right into their hands. Indeed, as one French sociologist has noted about French foreign fighters and France’s 2010 veil ban: ”Those who have left to go and fight in Syria say that this law is one of things that encouraged them. They saw it as a law against Islam. It had the effect of sending a message that Islam was not welcome in France.”
Admittedly, one downside of our research design is that the foreign fighter data only covers the period after these national debates took place. To demonstrate our theory more conclusively, we’d want similar data for how intense radicalization was in those countries prior to those debates; if our theory is right, we’d expect radicalization to be less of an issue in the prior period.
We don’t have that “before” data for Tunisia, France, and Belgium. However, we do have some detailed data on foreign fighter radicalization in Quebec. About halfway through the Syrian civil war, in September 2013, the Parti Quebecois introduced a bill in Quebec’s parliament called the “Charter of Values.” Although the bill was ultimately defeated in the spring of 2014, the legislation would have imposed strict regulations on Islamic headscarves in public places; more importantly, it set off a broad debate within Quebec about secular and Islamic identities.
Fortunately, the data we have on Quebec’s foreign fighters can be disaggregated by time period. It was compiled by Amarnath Amarasingam, who is a Fellow at the George Washington University’s Program on Extremism, and who also co-directs a project on Canadian foreign fighters at the University of Waterloo. As part of that project, Amarasingam has confirmed sixteen cases in which an individual traveled from Quebec to Syria. Only two of those cases came before September 2013. The remaining fourteen all came after. When I asked him about that disparity, Amarasingam confirmed that in his interviews with the friends and family of several foreign fighters who have left, the bill appeared to have been a “big catalyst” for recruitment. Indeed, as Amarasingam put it, those who left “had experienced a general sense of discrimination and racism in the Quebec context, but the Charter was kind of the straw that broke the camel’s back.”
The Quebec evidence is too small to be conclusive in itself. But it is certainly consistent with what we would expect to see if our theory about the political discourse surrounding these bills is correct.
Before I move on there are several further points I should make. First, hopefully it is clear at this point that our argument is not that Muslims who speak French are more prone to radicalization. As we noted in our Foreign Affairs article, we think language serves only as a proxy, not as any kind of causal factor in itself. Consider Belgium. As our critics have pointed out, many of the Belgian Muslims who have traveled to Syria speak Dutch, not French. That is undeniably true. However, it’s also beside the point here. Even if those individuals spoke Dutch, they were still subject to a national political debate about secularism and Islam. The bill at the heart of the debate was to be enforced nationwide, not just in French-speaking communities.
Second, the mechanism underlying our argument is generalizable. Put differently, if rancorous public debates about veils are at issue here, then we should expect them to play into jihadist recruiters’ hands in places like the United States or United Kingdom too if either country ever introduced legislation to ban Muslim head coverings nationwide. However, we expect the effect would likely to be less pronounced than in French-speaking countries, given both the historical importance of laïcité in those countries and the legacy of prior (sometimes violent) conflicts over it.
Finally, I would also stress that the theory we’ve developed here is not monocausal. Will and I alluded to this in our Foreign Affairs piece when we mentioned our findings about urbanization and youth unemployment, and how they appear to exacerbate the problem. Yet it seems like there is a lot of uncertainty around this point, so I want to make sure I clear it up. Think again of lung cancer. Hundreds of thousands of people have suffered from it without ever having smoked. Yet the fact that lung cancer can develop for separate genetic or environmental reasons does not disprove that smoking is a key risk factor for lung cancer.
Will and I believe something similar is at play with Sunni violent extremism. Consider Pakistan. Sunni militants there have carried out no shortage of attacks, and the carnage they have left in their wake is horrific in scope. Yet despite the scale of the problem there, laïcité and veil ban debates obviously aren’t a main part of the story. Whatever is going on there, it must be something else. Yet that does not mean the case of Pakistan somehow disproves that laïcité and veil ban debates may be at issue elsewhere, any more than the fact that there are many kinds of carcinogens does not disprove that cigarette tar is also one.
Where radicalization is concerned, there are many possible causes. But the combination of laïcité and veil ban debates appears to be, at present, among the most important.
As I noted at the outset, Will and I did not embark on this study to prove any one theory conclusively. Again, the goal was to use machine learning to improve our theories about what might be going on. Once we had our initial results, we could then investigate the main findings qualitatively to confirm that they seemed plausible, and design follow-up studies accordingly.
At this point we think there is enough quantitative and qualitative evidence to take the “laïcité plus veil ban debate” theory seriously. As a result our next step will be to code an additional variable for all fifty countries. The variable will capture whether a “veil ban” bill was introduced in a given country’s national legislature in the immediate years prior to the onset of the Syrian civil war, or whether a bill to repeal such a law was. (Again, we don’t think the bill itself is at issue so much as the debate around it; empirically though, the introduction of such a bill serves as the best proxy we can think of for the presence of a significant national debate.)
Once we have that data, we will then re-run the analysis I described above. If only a few countries outside Tunisia, France, and Belgium also had major national debates around the veil ban in the period prior to spring 2011, then the results for the veil ban variable will likely be stronger than those for the Francophone one. However, if several other countries have also debated such bills, then it’s likely that the interaction between the Francophone and veil ban variables will be the most significant. Further research should focus on what exactly it is about the discussions and debates over these bills that is doing the damage.
Finally, the other data collection effort we need, which again I’ll discuss at length later, is the construction of an individual-level dataset with truly global coverage. We cannot make significant and meaningful progress on the radicalization issue without one. For instance, our study also flagged urbanization and youth unemployment as major factors, which is why I referenced Ferguson and Baltimore above. An individual-level dataset would help confirm whether this really was a story about religious extremism in urban enclaves of unemployed youth – or whether it was actually about, say, rural believers getting left behind in rapidly-urbanizing countries. Further, individual-level data would also allow us to test co-ethnic networks as a potential cause. A lot of readers have suggested, publicly and privately, that the real issue is Moroccan networks in France and Belgium. I’m a bit skeptical of this; Spain too has a massive Moroccan population, but it has produced remarkably few foreign fighters. Yet without individual-level data, it’s impossible to rule alternate stories out.
If I may I’d like to close with one last point. Deciding to publish our findings early, in the aftermath of the Brussels attacks, was not an easy one to make.
In particular, I was worried it might seem insensitive to those still grieving. Prior to my doctoral work I was a seminarian, and I am profoundly aware of the many pained intimacies of grief – particularly that strange and beautiful and haunting way it burrows ever downward within our souls, much too far for words to reach but shallow enough, mercifully, that touch and presence can yet follow. Over the past month I would have been more comfortable providing that touch, in silence and in tears, than composing this article.
Yet I also believe policy matters, and especially empirically-informed policy. That is, I believe we owe it to the future victims of these attacks, and particularly the families that will otherwise survive them, to study this issue as best we can, and then set about implementing the policies that are most likely to ensure as few families as possible suffer from these attacks again.
That is why I agreed t0 publish our earlier article in Foreign Affairs, and this one now. In the coming months, France and Belgium – and no doubt many other countries too – will begin having difficult and painful debates about what to do going forward.
The question to my mind was whether our study was relevant to those debates. What I kept returning to was a set of five facts. We used data on Syrian foreign fighters from 2011-2014 to create a measure of radicalization in fifty countries. Among the countries that do not border Syria, the three highest-scoring were Tunisia, Belgium, and France. Each of them is Francophone. Each of them had a painful public debate about Islamic headdress in the year before the Syrian civil war began. And in the post-2014 period, each has suffered a major, mass-casualty terror attack carried out by Sunni extremists.
I cannot say with full certainty that those facts are all causally related. But I can say with certainty that if I were a citizen of one of those countries, I would want to know about that information, and begin discussing what it might mean.
1. As Gilles Kepel has noted in response to our article, the term “radicalization” is a problematic one. We certainly agree that the term is far from perfect, since it implies both a passive process (i.e., something that happens to people, rather than something people choose), and also a homogeneous one (i.e., something that happens in the same way for everyone, rather than different ways for different individuals). Yet we still need some way of referring to the general phenomenon in which an individual sheds mainstream Islamic beliefs in favor of violent extremist ones. However inelegant it may be, radicalization is already widely used in that sense, so we use here it too.
4. Two biases are important here. The first is selection bias. ICSR produced foreign fighter estimates for fifty countries, and those countries were selected based on data availability rather than at random. If countries outside the sample were included in our study, it is likely some of our findings would change. However, we likely would still see interesting findings regarding Francophone countries. Since we think the issue is particularly acute in Francophone countries that had national debates about banning the veil in the period immediate prior to the Syrian civil war, an expanded dataset would likely yield strong interactions between the Francophone variable and the characteristics of the countries that held such debates. Put differently, a global dataset would likely just shift how the Francophone variable registered as important, not whether it was important.
The second potential bias is reporting bias. Journalists and reporters are not evenly distributed throughout the countries in the dataset, so it is possible that foreign fighters are being under-reported from some countries and over-reported from others. However, while there are fewer journalists in North Africa and the Middle East than Europe, the French and Arabic media nonetheless provide good coverage of those regions – and given the profile of the Syrian conflict, they also have strong incentive to report on contingents of ex-patriot fighters. Further, many of the Syrian foreign fighter militias and brigades have placed a heavy emphasis on propaganda, and actively seek to demonstrate the extent of their recruitment abroad in order to facilitate further recruitment. As a result, it’s hard to think of a civil war in which we have had greater or more fine-grained real-time knowledge of foreign fighter contingents. Again, this does not mean that the numbers are perfect, but it’s unlikely that they are radically off, or so much so anyway that our strongest inferences would be grossly mistaken.
5. Bayesian Additive Regression Trees were first proposed by Hugh A. Chipman, Edward I. George, and Robert E. McCulloch , “BART: Bayesian Additive Regressive Trees”, The Annals of Applied Statistics (2010), 4(1): 266–298.
7. Francophone is a difficult to operationalize. If colonial heritage is the important measure, then Francophone should be restricted only to France and its former colonies (here, Algeria, Morocco, Tunisia and Lebanon). If French culture is the more important measure, then the Francophone variable should also include Canada, Switzerland, and Belgium, which have significant French-speaking populations and list French as an official language.
Significantly, the results reported above define Francophone in terms of French being an official language. In other words, in the results reported here, Francophone includes France and its former colonies as well as Canada, Switzerland, and Belgium. However, for reasons I’ll explain in a technical post, even if we had used the results from models that used the more limited definition of Francophone, the results wouldn’t vary too much.
Two weeks ago today, I woke up to tragic news from Brussels. As with many others in the United States, I started the day in stunned disbelief, reading everything I could about the attacks – looking in horror at images of carnage, hanging on the words of survivors, praying for the families of those lost.
After a while, though, I closed my browser and returned to two open files on my computer. In particular, I looked again at these two charts:
I’ve been staring at those charts, and trying to make sense of what they mean, since last October. The first plots a Syrian foreign fighter measure against itself for fifty countries. The second chart shows which variables, out of forty or so it was given, a machine learning algorithm thinks is best at predicting the dots in the first chart. As you can see, the algorithm flagged the Francophone variable — whether a country currently or previously speaks French as an official language — as far and away the most important.
When I first created these charts in October, they seemed significant and interesting, but only in an academic sense, not a practical one. Sure, Tunisia had been struck the prior spring in the Sousse attacks, but you could explain that away as a one-off event. And in any case, the plot was created to measure foreign fighter out-flows, not actual terror attacks.
A couple weeks later though, in mid-November, Beirut was hit. Then a day later, Paris.
At that point I began looking at both charts more intently. The raw data for the first one come from the best foreign fighter estimates we have for 2011-2014. The transformed measure we’d created was clearly picking up something important about the way Sunni militant networks, particularly Francophone ones, were behaving post-2014.
The more I looked at the data the more I began to idly worry about Belgium. No matter how I ran the numbers (and no matter how I tried to discount the biases involved in them), Belgium always stood out as the country outside the Syrian theater most likely to be hit next. Of course, it didn’t take a genius to come to that conclusion: the network that carried out the Paris had roots in Belgium, so I was by no means alone in flagging the country as being at-risk.
What concerned me though was that the algorithm didn’t know anything about the Paris attacks. Nor did it know anything about the specific networks that planned it. And yet it was basically saying the exact same thing.
In the coming weeks I’ll be spending a lot of time explaining those graphs. The obvious story they tell is about Francophone foreign fighter networks, but in my view they’re also a story about machine learning too – and how the new tools at our disposal both can and cannot help us with important social and political questions.
Before I get to that though, I want to stress something else. When Will McCants and I first started our research, our starting point had nothing to do with a “French Connection.”
Instead, as I mentioned to Vox last week, our concern was something much broader: fifteen years after the Pentagon and WTC attacks, we still didn’t have a great understanding of what was going on. For all the policies we had crafted and all the resources we had expended, the problem of Sunni militancy had not gotten any better – and if anything, appeared to have only grown worse.
In the interim, a lot of scholars much smarter and more knowledgeable than ourselves had tried to explain what exactly was happening. Rather than tossing out yet another explanation, we thought, why not craft a study whose design itself could prompt a rethink on the question?
Accordingly, we set about thinking through what kind of design to use, especially given recent advances in machine learning. By way of background, social scientists have known for a while that machine learning algorithms can predict political outcomes better than even our best quantitative models, which suggests that they are better at picking up the latent structure underlying complex social processes. However, those algorithms were themselves generally too complex for us to understand. Yet in the last few years that’s begun to change. For a certain class of algorithm in particular, we now often can peek inside the “black box” – and as a result, we can use the structure the algorithms uncover to theorize anew about otherwise intractable problems.
In our case, we decided to use one type of algorithm, called a Bayesian Additive Regression Tree, to help us gain new purchase over questions about Sunni militancy. As I noted above, a lot of very talented social scientists have been thinking about violent Sunni extremism for a while now. Why not gather as much data as we could for the explanations they came up with, and then use machine learning to flag what it thought was most important and why? If we then coupled those results with in-depth qualitative work, we might be able to make new headway on what was happening.
I’ll walk through more clearly both what we did and what we found in a deep dive to be published next. For now though the main point is simply this: everything about our project was designed to start a conversation, rather than end one.
It only turned out that an important part of that conversation, at least at this point, should concern what’s going on in Francophone countries – and particularly in Tunisia, France, and Belgium.