Seasonality in the DAX

Seasonalcharts.de claims that stock indices exhibit persistent seasonality that may be exploited through an appropriate trading strategy. As part of a job application, I had to replicate the seasonal pattern for the DAX and then test whether this pattern entails a profitable trading strategy. To sum up, I indeed find that a trading strategy that holds the index only over a specific season outperforms the market significantly, but these results might be driven by a few outliers. The text below references an opinion and is for information purposes only. I do not intend to provide any investment advice.

The code is structured in a way that allows for a straight-forward replication of the methodoloty for other indices. The document was knitted in RStudio using the following packages:

library(tidyverse)  # for grammar
library(scales)     # for pretty breaks in figures
library(tidyquant)  # for data download
library(knitr)      # for html knitting
library(kableExtra) # for nicer tables

Data Preparation

First, download data from yahoo finance using the tidyquant package. Note that the DAX was officially launched in July 1988, so this is where our sample starts.

dax_raw <- tq_get("^GDAXI", get = "stock.prices", 
                  from = "1988-07-01", to = "2019-11-30") 

Then, select only date and the adjusted price (i.e., closing price after adjustments for all applicable splits and dividend distributions) as the relevant variables and compute summary statistics to check for missing or weird values. The results are virtually the same if I use unadjusted closing prices.

dax <- dax_raw %>%
  select(date, price = adjusted)
summary(dax)
##       date                price      
##  Min.   :1988-07-01   Min.   : 1150  
##  1st Qu.:1996-04-04   1st Qu.: 2512  
##  Median :2004-01-08   Median : 5286  
##  Mean   :2004-02-06   Mean   : 5657  
##  3rd Qu.:2011-12-03   3rd Qu.: 7445  
##  Max.   :2019-11-29   Max.   :13560  
##                       NA's   :160

I replace the missing values by the last available index value.

dax <- dax %>%
  arrange(date) %>%
  fill(price, .direction = "down")

As a last immediate plausibility check, I plot the dax over the whole sample period.

dax %>%
  ggplot(aes(x = date, y = price)) +
  geom_line() + 
  labs(x = "", y = "Adjusted Price") +
  scale_x_date(expand = c(0, 0), breaks = "5 years") +
  scale_y_continuous(breaks = pretty_breaks()) + 
  theme_classic()

The data seems to be good to go, so let us turn to some graphical evidence for seasonality.

Graphical Evidence for Seasonality

To replicate the average seasonal pattern that we can find here, I construct an index of cumulative returns that starts at 100 in each year of my sample. To do so, I nest the data by year (i.e., create a list-column of returns for each full year in the sample).

dax_nested <- dax %>%
  filter(date >= "1988-01-01" & date <= "2018-12-31") %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  nest()

Next, I define the function that creates the seasonal pattern, given the sample of returns of a specific year, apply the function to each year and compute summary statistics for each trading day across all years in my sample.

get_seasonality <- function(data) {
  data %>%
    arrange(date) %>%
    mutate(trading_day = 1:n(), # use number of trading days as index
           ret = (log(price) - lag(log(price)))*100,
           ret = if_else(is.na(ret), 0, ret),
           cum_ret = 100 + cumsum(ret)) %>%
    select(trading_day, ret, cum_ret)
}

dax_seasonality <- dax_nested %>%
  mutate(seasonality = map(data, get_seasonality)) %>%
  select(year, seasonality) %>%
  unnest(cols = c(seasonality)) %>%
  ungroup()

dax_seasonality_summary <- dax_seasonality %>%
  group_by(trading_day) %>%
  summarize(mean = mean(cum_ret),
            q05 = quantile(cum_ret, 0.05),
            q95 = quantile(cum_ret, 0.95))

The latter data frame contains the average cumulative return and corresponding quantiles for each trading day of a year in my sample. Let us now take a look at the average pattern across trading days.

dax_seasonality_summary %>%
  ggplot(aes(x = trading_day)) +
  geom_line(aes(y = mean)) +
  labs(x = "Trading Days", y = "Cumulative Returns (in %)") +
  scale_x_continuous(expand = c(0, 0), breaks = pretty_breaks()) +
  scale_y_continuous(breaks = pretty_breaks()) +
  theme_classic()

While it is unclear what exactly Seasonalcharts plots on their website, the above pattern seems to be fairly consistent with their claimed seasonality. However, even if the pattern seems to be on average there, let us add confidence intervals before we think about trading strategies.

dax_seasonality_summary %>%
  ggplot(aes(x = trading_day)) +
  geom_ribbon(aes(ymin = q05, ymax = q95), alpha = 0.25) + 
  geom_line(aes(y = mean)) +
  labs(x = "Trading Days", y = "Cumulative Returns (in %)") +
  scale_x_continuous(expand = c(0, 0), breaks = pretty_breaks()) +
  scale_y_continuous(breaks = pretty_breaks()) +
  theme_classic()

Naturally, these confidence intervals mechanically increase over the course of a year as each index starts at 100. Even with confidence intervals, the pattern is still visible. Given the graphical evidence, let us now turn to the analysis of a trading strategy that aims to exploit this seasonal pattern.

Trading Strategy

The main idea of Seasonalcharts is to implement the strategy proposed by Jacobsen and Bouman (2002) and Jacobsen and Zhan (2018) which they label ‘The Halloween Indicator’ (or ‘Sell in May Effect’). The main finding of these papers is that stock indices returns seem significantly lower duing the May-October period than during the remainder of the year. The corresponding trading strategy holds an index during the months November-April, but holds the risk-free asset in the May-October period.

To replicate their approach (and avoid noise in the daily data), I focus on monthly returns from now on.

dax_monthly <- dax %>%
  mutate(year = year(date),
         month = month(date)) %>%
  group_by(year, month) %>%
  slice(which.max(date)) %>%
  ungroup() %>%
  arrange(date) %>%
  mutate(ret = (log(price) - lag(log(price))) * 100) %>%
  na.omit()
nrow(dax_monthly)
## [1] 376

As usual in empirical asset pricing, we do not care about raw returns, but returns in excess of the risk-free asset. I simply add the European risk free rate from the Fama-French data library as the corresponding reference point. Of course, one could use other measures for the risk-free rate, but the impact on the results won’t be substantial.

temp <- tempfile()
download.file("https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Europe_3_Factors_CSV.zip",
              temp)
unzip(temp, "Europe_3_Factors.csv")
rf_raw <- read_csv("Europe_3_Factors.csv", skip = 3)
## Parsed with column specification:
## cols(
##   X1 = col_character(),
##   `Mkt-RF` = col_character(),
##   SMB = col_character(),
##   HML = col_character(),
##   RF = col_character()
## )
rf <- rf_raw %>%
  mutate(date = ymd(paste0(X1, "01")),
         year = year(date),
         month = month(date),
         rf = as.numeric(RF)) %>%
  filter(date <= "2019-09-01") %>%
  select(year, month, rf)

dax_monthly <- dax_monthly %>%
  left_join(rf, by = c("year", "month")) %>%
  mutate(excess_ret = ret - rf) %>%
  na.omit()
nrow(dax_monthly) # lose a few obs b/c ff data starts in july 1990
## [1] 351

Next, let us examine the average excess returns per month in our sample. Before we do that, I define a new (factor) column that ensures that R sorts months correctly in the following analyses.

dax_monthly$month_factor <- factor(x = months(dax_monthly$date),
                                   levels = c("January", "February", "March",
                                              "April", "May", "June", "July", 
                                              "August", "September", "October", 
                                              "November", "December"))

Let us now take a look at the average excess returns per month. I also add the standard deviation, 5% and 95% quantiles, and t-statistic of a t-test of the null hypothesis that average returns are zero in a given month.

dax_monthly %>%
  group_by(`Month` = month_factor) %>%
  summarize(Mean = mean(excess_ret),
            SD = sd(excess_ret),
            Q05 = quantile(excess_ret, 0.05),
            Q95 = quantile(excess_ret, 0.95),
            `t-Statistic` = sqrt(n()) * mean(excess_ret) / sd(excess_ret) ) %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Month Mean SD Q05 Q95 t-Statistic
January 0.50 5.93 -9.88 8.46 0.45
February 0.63 5.49 -8.72 8.16 0.62
March 0.38 4.46 -6.05 7.41 0.46
April 2.78 5.68 -4.24 13.02 2.63
May 0.11 4.20 -5.90 5.73 0.14
June -0.69 4.55 -8.13 5.48 -0.82
July 0.95 6.38 -7.59 9.23 0.82
August -3.07 7.05 -19.05 2.87 -2.38
September -2.94 8.46 -19.82 5.66 -1.90
October 2.18 6.49 -9.12 11.57 1.80
November 1.79 4.27 -4.86 6.70 2.25
December 1.43 5.43 -6.27 8.53 1.41

August and September seem to usually exhibit negative excess returns with an average of about -3% (statistically significant) over all years, while April and November are the only months that tend to exhibit statistically significant positive excess returns. For a graphical illustration of the above table, I complement it with boxplots for each month. The takeaway is essentially the same, but we can see that August and September exhibit a couple of outliers that might considerably drive the results.

dax_monthly %>% 
  ggplot(aes(x = month_factor, y = excess_ret)) + 
  geom_boxplot() +
  labs(x = "", y = "Monthly Excess Return (in %)") +
  theme_classic()

Let us proceed to test for the presence of statistically significant excess returns due to seasonal patterns. In the above table, I only test for significance for each month seperately. To test for positive returns in a joint model, I regress the monthly excess returns on month indicators. Note that I always adjust the standard errors to be heteroskedasticity robust.

summary(lm(excess_ret ~ month_factor, data = dax_monthly), robust = TRUE)
## 
## Call:
## lm(formula = excess_ret ~ month_factor, data = dax_monthly)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.5324  -3.0783   0.7171   3.7981  16.4934 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)             0.4970     1.0852   0.458   0.6473  
## month_factorFebruary    0.1380     1.5347   0.090   0.9284  
## month_factorMarch      -0.1179     1.5347  -0.077   0.9388  
## month_factorApril       2.2835     1.5347   1.488   0.1377  
## month_factorMay        -0.3909     1.5347  -0.255   0.7991  
## month_factorJune       -1.1862     1.5347  -0.773   0.4401  
## month_factorJuly        0.4564     1.5218   0.300   0.7645  
## month_factorAugust     -3.5626     1.5218  -2.341   0.0198 *
## month_factorSeptember  -3.4373     1.5218  -2.259   0.0245 *
## month_factorOctober     1.6789     1.5347   1.094   0.2747  
## month_factorNovember    1.2913     1.5347   0.841   0.4007  
## month_factorDecember    0.9298     1.5347   0.606   0.5450  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.844 on 339 degrees of freedom
## Multiple R-squared:  0.08555,    Adjusted R-squared:  0.05588 
## F-statistic: 2.883 on 11 and 339 DF,  p-value: 0.001222

Seems like August and September have on average indeed lower returns than January (which is the omitted reference point in this regression). Note that the size of the coefficients from the regression are the same as in the table above (i.e., constant plus coefficient). Next, I follow Jacobsen and Bouman (2002) and simply regression excess returns on dummies that indicate specific seasons, i.e., I estimate the model \[ y_t=\alpha + \beta D_t + \epsilon_t, \] where \(D_t\) is a dummy variable equal to one for the months in a specific season and zero otherwise. I consider both the ‘Halloween’ season (where the dummy is one for November-April) and a `Seasonality’ season which only excludes July-September (and the dummy is one for October-June). If \(D_t\) is statistically significant and positive for the corresponding season, then I take this as evidence for the presence of seasonality effects.

halloween_months <- c(11, 12, 1, 2, 3, 4)
seasonality_months <- c(10, 11, 12, 1, 2, 3, 4, 5, 6)
dax_monthly <- dax_monthly %>%
  mutate(halloween = if_else(month %in% halloween_months, 1L, 0L),
         seasonality = if_else(month %in% seasonality_months, 1L, 0L))

The first model considers the `Halloween’ effect:

summary(lm(excess_ret ~ halloween, data = dax_monthly), robust = TRUE)
## 
## Call:
## lm(formula = excess_ret ~ halloween, data = dax_monthly)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.8772  -2.9011   0.5344   3.8357  18.0227 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -0.5954     0.4473  -1.331  0.18402   
## halloween     1.8465     0.6353   2.906  0.00389 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.951 on 349 degrees of freedom
## Multiple R-squared:  0.02363,    Adjusted R-squared:  0.02083 
## F-statistic: 8.447 on 1 and 349 DF,  p-value: 0.00389

I indeed find evidence that excess returns are higher during the months November-April relative to the remaining months. Let us take this spiel even further by adding even more months:

summary(lm(excess_ret ~ seasonality, data = dax_monthly), robust = TRUE)
## 
## Call:
## lm(formula = excess_ret ~ seasonality, data = dax_monthly)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.7885  -3.0178   0.5807   4.0105  18.2628 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.6842     0.6226  -2.705 0.007159 ** 
## seasonality   2.6952     0.7220   3.733 0.000221 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.906 on 349 degrees of freedom
## Multiple R-squared:  0.0384, Adjusted R-squared:  0.03564 
## F-statistic: 13.94 on 1 and 349 DF,  p-value: 0.0002207

The effect seems to be even stronger if I also include October, May and June.

As a last step, let us compare five different strategies: (i) buy and hold the index over the full year, (ii) go long in the index outside of the Halloween season and otherwise hold the risk-free asset, (iii) go long in the index outside of the Halloween season and otherwise short the index, (iv) buy the index outside of the extended seasonality period and otherwise invest in the risk-free asset, and (v) go long in the index outside of the extended seasonality period and short the index otherwise. Below I compare the returns of the three different strategies on an annual basis:

dax_monthly <- dax_monthly %>%
  mutate(excess_ret_halloween = if_else(halloween == 1, ret, rf),
         excess_ret_halloween_short = if_else(halloween == 1, ret, -ret),
         excess_ret_seasonality = if_else(seasonality == 1, ret, rf),
         excess_ret_seasonality_short = if_else(seasonality == 1, ret, -ret))

dax_monthly %>%
  group_by(year) %>%
  summarize(`Buy and Hold` = sum(excess_ret),
            `Seasonality` = sum(excess_ret_seasonality),
            `Seasonality-Short` = sum(excess_ret_seasonality_short),
            `Halloween` = sum(excess_ret_halloween),
            `Halloween-Short` = sum(excess_ret_halloween_short)) %>%
  pivot_longer(-year, names_to = "strategy", values_to = "excess_ret") %>%
  ggplot(aes(x = year, group = strategy)) +
  geom_col(aes(y = excess_ret, fill = strategy), position = "dodge") +
  labs(x = "", y = "Annual Excess Return (in %)", fill = "Strategy") +
  scale_x_continuous(expand = c(0, 0), breaks = pretty_breaks()) + 
  theme_classic()

The ‘Halloween’ and ‘Seasonality’ strategies seem to outperform the ‘Buy and Hold’ strategy in most of the years with the ‘Seasonality’ typically outperforming ‘Halloween’. The strategies that short the index rather than holding the risk free assets also outperform their counterparts. Plotting the overall cumulative excess return of the five strategies confirms these conjectures.

dax_monthly %>%
  arrange(date) %>%
  mutate(`Buy and Hold` = 100 + cumsum(excess_ret),
         `Seasonality` = 100 + cumsum(excess_ret_seasonality),
         `Seasonality-Short` = 100 + cumsum(excess_ret_seasonality_short),
         `Halloween` = 100 + cumsum(excess_ret_halloween),
         `Halloween-Short` = 100 + cumsum(excess_ret_halloween_short)) %>%
  select(date, `Buy and Hold`, `Seasonality`, `Seasonality-Short`, 
         `Halloween`, `Halloween-Short`) %>%
  pivot_longer(-date, names_to = "strategy", values_to = "cum_excess_ret") %>%
  ggplot(aes(x = date)) +
  geom_line(aes(y = cum_excess_ret, color = strategy)) +
  scale_x_date(expand = c(0, 0), breaks = pretty_breaks()) +
  scale_y_continuous(breaks = pretty_breaks()) + 
  labs(x = "", y = "Cumulative Excess Return (in %)", color = "Strategy") +
  theme_classic()

Which of these strategies might constitute a better investment opportunity? For a very simple assessment, let us compute the corresponding Sharpe ratios. Note that I annualize Sharpe ratios by multiplying them with \(\sqrt{12}\) which strictly speaking only works under IID distributed returns (which is typically unlikely to be the case), but which suffices for the purpose of this note.

sharpe_ratio <- function(x) {
  sqrt(12) *  mean(x) / sd(x)
}

dax_monthly %>%
  arrange(date) %>%
  summarize(`Buy and Hold` = sharpe_ratio(excess_ret),
            `Seasonality` = sharpe_ratio(excess_ret_seasonality),
            `Seasonality-Short` = sharpe_ratio(excess_ret_seasonality_short),
            `Halloween` = sharpe_ratio(excess_ret_halloween),
            `Halloween-Short` = sharpe_ratio(excess_ret_halloween_short))
## # A tibble: 1 x 5
##   `Buy and Hold` Seasonality `Seasonality-Short` Halloween `Halloween-Shor~
##            <dbl>       <dbl>               <dbl>     <dbl>            <dbl>
## 1          0.184       0.737               0.754     0.774            0.530

The ‘Seasonality-Short’ strategy delivers the highest cumulative excess return in my sample period, but the ‘Halloween’ strategy exhibits a slightly higher Sharpe ratio than the others and thus constitutes a better investment opportunity.

Christoph Scheuch
Christoph Scheuch
Director of Product

My interests include Product Management, Data Science, R and FinTech-related stuff. matter.

Related