A Practitioner’s Guide To Interrupted Time Series A Practitioner’s Guide To Interrupted Time Series
In the world of causal inference, Randomized Controlled Trials, RCTs, are considered the gold standard as it rules out any covariate... A Practitioner’s Guide To Interrupted Time Series

In the world of causal inference, Randomized Controlled Trials, RCTs, are considered the gold standard as it rules out any covariate differences before the intervention. However, running an RCT isn’t an option for multiple reasons (e.g., too expensive, invalid assumptions, too long, not ethical, etc.).

[Related Article: Discovering 135 Nights of Sleep with Data, Anomaly Detection, and Time Series]

Under these circumstances, Interrupted Time Series (ITS) design comes in handy (see Netflix). Aa a quasi-experimental method, ITS contains a strong inferential power and has wide applications in epidemiology, medication research, and program evaluations in general.

Arguably, ITS is the strongest quasi-experimental method in causal inference (Penfold and Zhang, 2013).

In this post, we will learn the basics of the method and how to apply it in real life.

What is an ITS?

As a quasi-experimental design, ITS is an analysis of a single time-series data before and after the intervention (Bernal, et al. 2017). From the perspective of research design, ITS builds upon a rather straightforward design idea: the outcome variable would not be altered if there were no intervention.

However, the tricky part is:

how we can derive causal argumentation from a single time series data?

how can we eliminate confounders?

In other words, it is crucial to create “counterfactuals” that serve as the baseline point. We can attribute the “altered” trajectory to the presence of the intervention.

Fortunately, there is a time component with ITS, as the name suggests, that allows us to assume the outcome variable would not change if the intervention were not present.

Besides, we could examine whether the outcome variable returns to the baseline after taking away the treatment condition, if there are multiple data entries (see Netflix for examples).

Furthermore, we must control for time-varying confounders, including seasonable trends and concurrent events that may interfere with the results.

For example, researchers challenge and repudiate the prior findings that the Great Recession in 2008 leads to more suicides in the U.S., arguing the previous studies fail to consider seasonality and social groupings (Harper and Bruckner).

Photo by David Armstrong on Unsplash

Strengths and Limitations of ITS

Penfold and Zhang (2013) have provided a complete list of the strengths and limitations, and I’m going to summarize the key points in the following.


  1. To control for long-term time trends in the data. ITS presents a long-term analytical framework with more extended periods, which better explain any data trends.
  2. To account for individual-level bias and to evaluate the outcome variable at the population level. Individual-level data may introduce bias, but not with population data. Honestly, this is both a blessing and a curse. We will elaborate more on the latter aspect in the following part.
  3. To evaluate both intended and unintended consequences of interventions. We can easily enlarge analysis and incorporate more outcome variables with minimum or no adaptations.
  4. To conduct stratified analyses of subpopulations of individuals and to derive different causal effects. This is critical. We can divide the total population into different sub-groups according to various criteria and examine how each sub-group may behave differently. Social groups are different, and grouping them together may dilute or hide critical information, as positive and negative effects mix together and cancel out (see Harper and Bruckner for examples).
  5. To provide clear and interpretable visual results. Visual inspections are always welcome and should be treated seriously (See my other post for more explanations).
Photo by Karol Smoczynski on Unsplash


  1. Multiple rounds of data entries. A minimum of 8 periods before and 8 after an intervention to evaluate the changes. So, we need a total of 16 data entries, which may not be possible all the time. I think Penfold and Zhang (2013) are being cautious about the number of data entries. It’s still possible to apply ITS with few rounds of data entry. Just the causal power may not as robust as the one with multiple rounds.
  2. Time lag. It takes some unknown time for a program to achieve intended results, which makes it difficult to pinpoint the causal effects of several events that coincide. Let’s say the transportation department in the U.S. adopt three policies within a two-year timespan to curb highway speeding. Playing God, we somehow know it would take 1 yr for Policy A to have any effect, 1.5 ys for Policy B, and 3 yrs for Policy C. In the meantime, it becomes impossible to separate the intertwined effects using ITS.
  3. Inference Level. It’s population-level data, so we can’t make inferences about each individual.
Photo by Pierre Gui on Unsplash


ITS uses Segmented Regression to examine the effects of the intervention. ITS requires two segments: the one before the intervention and the one after the intervention. Each segment has its own slope and intercept, and we compare the two segmented regression models to derive the effects.

We attribute any changes in the direction (e.g., from positive to negative) and/or the extent (from large effects to small effects) between these two segmented regression models to the intervention variable.

Actually, this is how ITS overcomes the limitations of having only one case and still has strong inferential power.

Here are two examples of ITS analyses. The first one is a simple illustration of how to test for significant effects using simulated data, and the second example has more sophisticated analysis using ITS.

#1 simulated data # data preparation
CaseID = rep(1:100,6)#some intervention
Intervention = c(rep(0,300), rep(1,300))
Outcome_Variable = c(rnorm(300), abs(rnorm(300)*4))
mydata = cbind(CaseID, Intervention, Outcome_Variable)
mydata = as.data.frame(mydata)#construct a simple OLS model
model = lm(Outcome_Variable ~ Intervention, data = mydata)
lm(formula = Outcome_Variable ~ Intervention, data = mydata)Residuals:
    Min      1Q  Median      3Q     Max 
-3.3050 -1.2315 -0.1734  0.8691 11.9185Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.03358    0.11021   0.305    0.761    
Intervention  3.28903    0.15586  21.103   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.909 on 598 degrees of freedom
Multiple R-squared:  0.4268, Adjusted R-squared:  0.4259 
F-statistic: 445.3 on 1 and 598 DF,  p-value: < 2.2e-16

As can be seen, the regression result of the intervention variable is statistically significant.

[Related Article: 5 Hands-on Skills Every Data Scientist Needs in 2020 – Coming to ODSC East 2020]

This is a quick intro class to ITS using simulated data. Actually, ITS can do so much more in causal inference, and I’ll elaborate more in a follow-up post soon.

Originally Posted Here

Leihua Ye

Leihua is a Ph.D. Candidate in Political Science with a Master's degree in Statistics at the UC, Santa Barbara. As a Data Scientist, Leihua has six years of research and professional experience in Quantitative UX Research, Machine Learning, Experimentation, and Causal Inference. His research interests include: 1. Field Experiments, Research Design, Missing Data, Measurement Validity, Sampling, and Panel Data 2. Quasi-Experimental Methods: Instrumental Variables, Regression Discontinuity Design, Interrupted Time-Series, Pre-and-Post-Test Design, Difference-in-Differences, and Synthetic Control 3. Observational Methods: Matching, Propensity Score Stratification, and Regression Adjustment 4. Causal Graphical Model, User Engagement, Optimization, and Data Visualization 5. Python, R, and SQL Connect here: 1. http://www.linkedin.com/in/leihuaye 2. https://twitter.com/leihua_ye 3. https://medium.com/@leihua_ye