Tidy Asset Pricing - Part II: Beta and Stock Returns

Update 2022-02-11: check out our new book Tidy Finance with R for an updated and improved implementation of the content below.

This note proposes an estimation and evaluation procedure for market betas following Bali, Engle and Murray (BEM). I do not go into details about the foundations of market beta, but simply refer to any treatment of the Capital Asset Pricing Model (CAPM). However, I provide details about all the functions that I use to compute my results. As the post introduces a number of approaches of BEM, there are also many functions to discuss. Subsequent blog posts then recycle most of the functions.

This note has two objectives: the first one is to implement several approaches to estimate a stock’s beta in a tidy manner and to empirically examine these measures. The second objective is to analyze the cross-sectional relation between stock returns and market beta. The following disclaimer applies: the text below references an opinion and is for information purposes only and I do not intend to provide any investment advice.

I mainly use the following packages throughout this note:

library(tidyverse)  # for grammar
library(scales)     # for nicer axes of figures
library(viridis)    # for nicer colors in figures
library(slider)     # for rolling window operations
library(moments)    # for skewness and kurtosis 
library(corrr)      # for correlations in table format
library(kableExtra) # for nicer html tables
library(broom)      # for converting statistical objects to tidy tibbles
library(sandwich)   # for Newey-West standard errors

First, I load the monthly and daily return data that I prepared in an earlier post and compute the stock and market excess returns that are required for the beta estimation procedure.

tbl.crsp_monthly <- read_rds("data/crsp_monthly.rds") %>%
  mutate(ret = ret_adj - rf_ff,
         mkt = mkt_ff - rf_ff) %>%
  select(permno, month, mkt, ret)

tbl.crsp_daily <- read_rds("data/crsp_daily.rds") %>%
  mutate(ret = ret - rf_ff,
         mkt = mkt_ff - rf_ff) %>%
  select(permno, month, mkt, ret)

Estimation

According to the CAPM, cross-sectional variation in expected asset returns should be a function of the covariance between the return of the asset and the return on the market portfolio – the beta. To estimate stock-specific betas, I have to regress excess stock returns on excess returns of the market portfolio. Throughout the note, I use the market excess return and risk-free rate provided in Ken French’s data library. The estimation procedure is based on a rolling window estimation where I may use either monthly or daily returns and different window lengths. I follow BEM and examine nine different combinations of estimation periods and data frequencies. For monthly return data, I use excess return observations over the past 1, 2, 3 and 5 years, requiring at least 10, 20, 24 and 24 valid return observations, respectively. For daily data measures, I use period lengths of 1, 3, 6, 12 and 24 months and require 15, 50, 100, 200 and 450 days of valid return data. The function below implements these requirements for a given estimation window and data frequency and returns the estimated beta.

fun.capm_regression <- function(x, window, freq) {
  # Drop missing values
  x <- na.omit(x)
  
  # Determine minimum number of observations depending on window size and data frequency
  if (freq == "monthly") {
    if (window == 12) {check <- 10}
    if (window == 24) {check <- 20}
    if (window == 36) {check <- 24}
    if (window == 60) {check <- 24}
  }
  if (freq == "daily") {
    if (window == 1) {check <- 15}
    if (window == 3) {check <- 50}
    if (window == 6) {check <- 100}
    if (window == 12) {check <- 200}
    if (window == 24) {check <- 450}
  }
  
  # Check if minimum number of observations is satisfied
  if (nrow(x) < check) {
    return(as.numeric(NA))
  } else {
    reg <- lm(ret ~ mkt, data = x)
    return(as.numeric(reg$coefficients[2]))
  }
}

To perform the rolling window estimation, I use the slider package of Davis Vaughan, which provides a family of sliding window functions similar to purrr::map(). Most importantly, the slide_period function is able to handle months in its window input in a straight-forward manner. I thus avoid using any time-series package (e.g., zoo) and converting the data to fit the package functions, but rather stay in the tibble world.

fun.rolling_capm_regression <- function(x, window, freq) {
  # Nested tibbles of daily data need to be binded
  if (freq == "daily") {
    x <- bind_rows(x)
  }
  out <- slide_period_vec(.x = x, .i = x$month, .period = "month",
                   .f = function(x) {fun.capm_regression(x, window, freq)},
                   .before = (window-1), .complete = FALSE)
  return(out)
}

I use the above function to compute several different variants of the beta estimator put forward by BEM. First, I use monthly data to estimate betas with different years as input. However, the estimation below takes up a considerable amount of time when I use the whole CRSP sample (around 2.5 hours). In principle, I could considerably speed up the estimation procedure by parallelizing it across stocks (e.g. using multidplyr or putting the data into some SQL table). However, for the sake of exposition, I just provide the code for the joint estimation below.

tbl.betas_monthly <- tbl.crsp_monthly %>%
  group_by(permno) %>%
  arrange(month) %>%
  mutate(beta_1y = fun.rolling_capm_regression(cur_data(), window = 12, freq = "monthly"),
         beta_2y = fun.rolling_capm_regression(cur_data(), window = 24, freq = "monthly"),
         beta_3y = fun.rolling_capm_regression(cur_data(), window = 36, freq = "monthly"),
         beta_5y = fun.rolling_capm_regression(cur_data(), window = 60, freq = "monthly")) %>%
  ungroup() %>%
  select(-c(mkt, ret)) %>%
  arrange(permno, month)

Next, I use the same function to perform rolling window estimation on the daily return data. To use the above function, I first nest the data by month. the function then loads the relevant months of daily data and binds them together. Note that that the computation now takes about 3.5 hours for the whole CRSP sample, but is again in principle parallelizable.

tbl.crsp_daily_nested <- tbl.crsp_daily %>%
  group_by(permno, month_group = month) %>%
  nest(daily_data = c(month, mkt, ret)) %>% 
  ungroup()

tbl.betas_daily <- tbl.crsp_daily_nested %>% 
  group_by(permno) %>% 
  arrange(month_group) %>%
  mutate(beta_1m = fun.rolling_capm_regression(daily_data, window = 1, freq = "daily"),
         beta_3m = fun.rolling_capm_regression(daily_data, window = 3, freq = "daily"),
         beta_6m = fun.rolling_capm_regression(daily_data, window = 6, freq = "daily"),
         beta_12m = fun.rolling_capm_regression(daily_data, window = 12, freq = "daily"),
         beta_24m = fun.rolling_capm_regression(daily_data, window = 24, freq = "daily")
  ) %>%
  ungroup() %>% 
  select(permno, month = month_group, beta_1m, beta_3m, beta_6m, beta_12m, beta_24m) %>% 
  arrange(permno, month)

Descriptives

Let us add the beta estimates to the monthly CRSP sample and look at some descriptives.

tbl.betas <- read_rds("data/crsp_monthly.rds") %>%
  select(permno, month, ret_adj_excess_f1, mkt_ff_excess_f1, mktcap, mktcap_cpi, exchange) %>% 
  left_join(tbl.betas_monthly, by = c("permno", "month")) %>%
  left_join(tbl.betas_daily, by = c("permno", "month"))

# Define vector of betas for convenience (e.g. factor labels and maps)
vec.betas <- c("beta_1y", "beta_2y", "beta_3y", "beta_5y",
               "beta_1m", "beta_3m", "beta_6m", "beta_12m", "beta_24m")

I first check whether our estimation procedure actually produce beta estimates. The figure below shows the share of stocks with a corresponding beta estimate for each month since 1927. I can see drops in 1962 and 1972 because that is when AMEX and NASDAQ enter the sample, respectively. As a new batch of stocks enters the sample, I cannot compute beta estimates until there is sufficient data available for these stocks. Finally, note that the estimates based on daily data drop in 2019 since I currently don’t have access to more recent daily CRSP data.

tbl.coverage <- tbl.betas %>% 
  select(month, permno, contains("beta")) %>% 
  pivot_longer(cols = contains("beta"), names_to = "measure", values_to = "estimate") %>% 
  group_by(month, measure) %>% 
  summarize(share = n_distinct(permno[!is.na(estimate)]) / n_distinct(permno)) %>% 
  ungroup()

fig.coverage <- tbl.coverage %>% 
  mutate(measure = factor(measure, levels = vec.betas)) %>% 
  ggplot(aes(x = month, y = share, color = measure, linetype = measure)) +
  geom_line() +
  labs(x = NULL, y = NULL, color = NULL, linetype = NULL,
       title = "Monthly share of stocks with beta estimate") + 
  scale_x_date(expand = c(0, 0), date_breaks = "10 years", date_labels = "%Y") +
  scale_y_continuous(labels = percent, breaks = pretty_breaks()) + 
  theme_classic() +
  scale_color_viridis_d()

As a next evaluation, I compute average summary statistics across all months, just like BEM.

# Compute summary statistics for each month
tbl.betas_ts <- tbl.betas %>%
  select(month, contains("beta")) %>%
  pivot_longer(cols = contains("beta"), names_to = "measure", values_drop_na = TRUE) %>%
  group_by(measure, month) %>%
  summarize(mean = mean(value),
            sd = sd(value),
            skew = skewness(value),
            kurt = kurtosis(value),
            min = min(value),
            q05 = quantile(value, 0.05),
            q25 = quantile(value, 0.25),
            q50 = quantile(value, 0.50),
            q75 = quantile(value, 0.75),
            q95 = quantile(value, 0.95),
            max = max(value),
            n = n()) %>%
  ungroup()

# Compute average summary statistics across months
tab.betas_summary <- tbl.betas_ts %>%
  select(-month) %>%
  group_by(measure) %>%
  summarize_all(list(~mean(., na.rm = TRUE))) %>%
  mutate(measure = factor(measure, levels = vec.betas)) %>%
  arrange(measure) %>% 
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
tab.betas_summary
measure mean sd skew kurt min q05 q25 q50 q75 q95 max n
beta_1y 1.15 1.17 0.71 21.40 -7.60 -0.41 0.48 1.05 1.73 3.06 12.36 2972.14
beta_2y 1.16 0.82 0.63 9.21 -3.62 0.05 0.63 1.08 1.60 2.56 7.29 2611.63
beta_3y 1.16 0.70 0.64 6.31 -2.09 0.20 0.69 1.09 1.54 2.37 5.71 2255.79
beta_5y 1.15 0.61 0.56 6.91 -1.81 0.30 0.73 1.09 1.50 2.21 5.01 1975.01
beta_1m 0.86 1.26 0.27 27.77 -11.01 -0.81 0.20 0.77 1.47 2.83 12.76 3144.43
beta_3m 0.88 0.82 0.34 12.71 -4.92 -0.23 0.36 0.81 1.34 2.26 6.83 3112.60
beta_6m 0.89 0.68 0.39 7.08 -2.84 -0.05 0.43 0.83 1.30 2.08 4.89 3069.49
beta_12m 0.90 0.60 0.44 4.22 -1.61 0.06 0.47 0.84 1.27 1.96 3.68 2972.62
beta_24m 0.90 0.54 0.45 3.40 -0.91 0.13 0.50 0.85 1.25 1.87 3.03 2734.99

There appears to be more stability in the distribution of estimated betas as I extend the estimation period. This might indicate that longer estimation periods yield more precise beta estimates. However, the average number of observations mechanically decreases as the estimation windows increase. Moreover, estimates based on monthly observations are on average higher than estimates based on daily returns.

Correlations

The next table presents the time-series averages of the monthly cross-sectional (Pearson product-moment) correlations between different measures of market beta.

fun.compute_correlations <- function(data, use = "complete.obs", method = "pearson",
                                     diagonal = 1, quiet = TRUE, shave_upper = TRUE) {
  data %>% 
    correlate(use = use, method = method, diagonal = diagonal, quiet = quiet) %>% 
    shave(., upper = shave_upper)
}

tbl.correlations <- tbl.betas %>%
  na.omit() %>% 
  select(month, contains("beta")) %>% 
  group_by(month) %>% 
  nest(betas = contains("beta")) %>% 
  mutate(correlations = map(betas, fun.compute_correlations)) %>% 
  select(-betas) %>% 
  unnest(correlations) %>% 
  rename(measure = term) %>% 
  mutate(measure = factor(measure, levels = vec.betas)) %>% 
  group_by(measure) %>% 
  summarize_at(vars(contains("beta")), list(~mean(., na.rm = TRUE)))

tab.correlations <- tbl.correlations %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
tab.correlations
measure beta_1y beta_2y beta_3y beta_5y beta_1m beta_3m beta_6m beta_12m beta_24m
beta_1y 1.00
beta_2y 0.76 1.00
beta_3y 0.66 0.87 1.00
beta_5y 0.58 0.78 0.88 1.00
beta_1m 0.23 0.27 0.29 0.29 1.00
beta_3m 0.36 0.41 0.42 0.43 0.69 1.00
beta_6m 0.42 0.48 0.50 0.50 0.59 0.85 1.00
beta_12m 0.46 0.54 0.56 0.56 0.53 0.75 0.89 1.00
beta_24m 0.42 0.57 0.61 0.62 0.48 0.68 0.80 0.91 1

Correlations range from as low as 0.22 between the one-year measure based on monthly data and the one-month measure based on daily data to as high as 0.91 between the one-year and two-year measure based on daily data. Again, correlations seem to increase as the length of the measurement period increases.

Persistence

If I use a measure of a stock’s beta in further analyses, I want it to be fairly stable over time. To examine whether this is the case, I calculate the persistence of each measure as the time-series averages of monthly cross-sectional correlations between month \(t\) and month \(t+\tau\). Following BEM, I winsorize each measure at the 0.5% level on a monthly basis to mitigate the impact of outliers. The table below presents the persistence measures for lags one, three, six, 12, 24, 36, 48, 60 and 120 months.

fun.winsorize <- function(x, cut = 0.005){
  cut_point_top <- quantile(x, 1 - cut, na.rm = TRUE)
  cut_point_bottom <- quantile(x, cut, na.rm = TRUE)
  i <- which(x >= cut_point_top)
  x[i] <- cut_point_top
  j <- which(x <= cut_point_bottom)
  x[j] <- cut_point_bottom
  return(x)
}

fun.compute_persistence <- function(data, var, tau, cut = 0.005){
  # Initialize output
  out <- list()
  for (t in 1:length(tau)) {
    # Create a table with lagged dates
    dates <- data %>%
      distinct(month) %>%
      mutate(month_lag = lag(month, tau[t]))
    
    # Winsorize the data
    data_winsorized <- data %>%
      filter(!is.na({{var}})) %>%
      group_by(month) %>%
      mutate(beta = fun.winsorize({{var}}, cut = cut)) %>%
      ungroup() %>% 
      select(permno, month, beta)
    
    # Compute correlation to lagged value
    correlation <- data_winsorized %>%
      left_join(dates, by = "month") %>%
      left_join(data_winsorized %>% 
                  select(permno, month, beta_lag = beta),
                by = c("permno", "month_lag" = "month")) %>%
      select(month, beta, beta_lag) %>%
      na.omit() %>% 
      nest(betas = c(beta, beta_lag)) %>%
      mutate(correlations = map(betas, fun.compute_correlations)) %>%
      unnest(correlations) %>%
      filter(term == "beta_lag") %>% 
      summarise(mean = mean(beta, na.rm = TRUE))
    
    out[[t]] <- tibble(tau = tau[t],
                       {{var}} := as.numeric(correlation))
  }
  return(bind_rows(out))
}

vec.tau <- c(1, 3, 6, 12, 24, 36, 48, 60, 120)

fun.compute_persistence(tbl.betas, beta_1y, vec.tau)

lst.persistence <- vec.betas %>%
  map(~fun.compute_persistence(tbl.betas, !!sym(.), vec.tau)) %>%
  set_names(vec.betas)
tab.persistence <- lst.persistence %>% 
  reduce(left_join) %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
tab.persistence
tau beta_1y beta_2y beta_3y beta_5y beta_1m beta_3m beta_6m beta_12m beta_24m
1 0.92 0.97 0.98 0.99 0.28 0.82 0.93 0.98 0.99
3 0.78 0.91 0.95 0.97 0.26 0.48 0.79 0.92 0.97
6 0.58 0.82 0.90 0.95 0.24 0.45 0.60 0.84 0.93
12 0.24 0.66 0.80 0.90 0.23 0.42 0.55 0.67 0.86
24 0.22 0.37 0.61 0.80 0.20 0.37 0.49 0.60 0.71
36 0.20 0.35 0.44 0.70 0.19 0.35 0.45 0.55 0.65
48 0.19 0.32 0.41 0.60 0.18 0.33 0.42 0.52 0.60
60 0.18 0.30 0.39 0.49 0.17 0.31 0.40 0.49 0.57
120 0.14 0.24 0.31 0.38 0.14 0.25 0.32 0.39 0.44

Of course, entries that correspond to lags for which the measurement periods overlap mechanically exhibit high persistence. The results indicate that the calculation of beta using short measurement periods is very noisy. Moreover, as the length of the measurement window increases, so does the persistence of beta calculated from daily return data. The measures based on monthly return data exhibit similar persistence patterns. In summary, the results indicate that the use of longer measurement periods results in more accurate measures of beta and that beta seems to be more accurately measured using daily return data.

Portfolio Analysis

Next, I turn to the relation of beta and stock returns. The fundamental prediction of the CAPM is that there is a positive relation between market beta and expected stock returns where the slope of this relation is the market risk premium. I should hence expect to find a positive cross-sectional association between beta and future excess returns. However, as I show below, contrary to this prediction, analyses fail to detect any strong relation or even exhibit a negative relation. This result is typically viewed as one of the most persistent empirical anomalies in all of empirical asset pricing.

The construction of univariate portfolios relies on a couple of helper functions. First, I need to introduce a wrapper for the basic weighted mean function to also exclude observations with missing weights.

fun.weighted_mean <- function(x, w, ..., na.rm = FALSE){
  if(na.rm){
    x1 <- x[!is.na(x) & !is.na(w)]
    w <- w[!is.na(x) & !is.na(w)]
    x <- x1
  }
  weighted.mean(x, w, ..., na.rm = FALSE)
}

The next function I introduce is a helper function that returns a list of quantile functions that I can throw into the summarize command to get a list of breakpoints for any given number of target portfolios. The code snippet below is a modified version of the code found here. Most importantly, the function uses purrr:partial to fill in some arguments to the quantile functions.

fun.get_breakpoints_functions <- function(var, n_portfolios = 10) {
  # Get relevant percentiles
  percentiles <- seq(0, 1, length.out = (n_portfolios + 1))
  percentiles <- percentiles[percentiles > 0 & percentiles < 1]
  
  # Construct set of named quantile functions
  percentiles_names <- map_chr(percentiles, ~str_c(rlang::quo_text(enquo(var)), "_q", .x*100))
  percentiles_funs <- map(percentiles, ~partial(quantile, probs = .x, na.rm = TRUE)) %>% 
    set_names(nm = percentiles_names)
  
  return(percentiles_funs)
}

Given a list of breakpoints, I want to sort stocks into the corresponding portfolio. This turns out to be a bit tricky if I want to write a function that does the sorting for a flexible number of portfolios. Fortunately, my brilliant colleague Stefan Voigt pointed out the findInterval function to me.

fun.get_portfolio <- function(x, breakpoints) {
  portfolio <- as.integer(1 + findInterval(x, unlist(breakpoints)))
  
  return(portfolio)
}

The next function finally constructs univariate portfolios for any sorting variable, an abitrary number of portfolios and using all stocks or only NYSE stocks to compute breakpoints. The rationale behind using only NYSE stocks to calculate quantiles is that for a large portion of the sample period, these stocks tend to be much larger than stocks listed on AMEX or NASDAQ. If the breakpoints are calculated using all stocks in the CRSP sample, the breakpoints would effectively just separate NYSE stocks from all other stocks. However, it does not ensure an equal number of NYSE stocks in each portfolios, as I demonstrate in later posts.

Followin BEM, I sort stocks into 10 portfolios ranked by the value of the beta estimate. The breakpoints for each portfolio are based on deciles constructed using all stocks. In addition to the 10 decile portfolios, the function below also adds the 10-1 portfolio. The function also computes a set of average stock characteristics for each portfolio, which becomes relevant in the other blog posts.

fun.construct_univariate_portfolios <- function(data, var, n_portfolios = 10, nyse_breakpoints = FALSE) {
  # Keep only observations where sorting variable is defined
  data <- data %>%
    filter(!is.na({{var}}))
  
  # Determine breakpoints based on NYSE stocks only or on all stocks
  if (nyse_breakpoints == TRUE) {
    data_quantiles <- data %>%
      filter(exchange == "NYSE") %>%
      select(month, {{var}})
  } else {
    data_quantiles <- data %>%
      select(month, {{var}}) 
  }
  
  # Compute quantiles
  var_funs <- fun.get_breakpoints_functions({{var}}, n_portfolios)
  quantiles <- data_quantiles %>%
    group_by(month) %>%
    summarize_at(vars({{var}}), lst(!!!var_funs)) %>%
    group_by(month) %>%
    nest(quantiles = -month) 
  
  # Independently sort all stocks into portfolios based on breakpoints
  portfolios <- data %>%
    left_join(quantiles, by = "month") %>%
    mutate(portfolio = map2_dbl({{var}}, quantiles, fun.get_portfolio)) %>%
    select(-quantiles)
  
  # Compute average portfolio characteristics
  portfolios_ts <- portfolios %>%
    group_by(portfolio, month) %>%
    summarize(ret_ew = mean(ret_adj_excess_f1, na.rm = TRUE),
              ret_vw = fun.weighted_mean(ret_adj_excess_f1, mktcap, na.rm = TRUE),
              ret_mkt = mean(mkt_ff_excess_f1, na.rm = TRUE),
              {{var}} := mean({{var}}, na.rm = TRUE),
              mktcap = mean(mktcap, na.rm = TRUE),
              mktcap_cpi_all = mean(mktcap_cpi, na.rm = TRUE),
              mktcap_cpi_nyse = mean(mktcap_cpi[exchange == "NYSE"], na.rm = TRUE),
              n_stocks = n(),
              n_stocks_nyse = sum(exchange == "NYSE", na.rm = TRUE)) %>%
    ungroup()
  
  # Construct long-short portfolio
  portfolios_ts_ls <- portfolios_ts %>%
    select(portfolio, month, ret_ew, ret_vw, ret_mkt) %>%
    filter(portfolio %in% c(max(portfolio), min(portfolio))) %>%
    pivot_wider(names_from = portfolio, values_from = c(ret_ew, ret_vw)) %>%
    mutate(ret_ew = .[[4]] - .[[3]],
           ret_vw = .[[6]] - .[[5]],
           portfolio = paste0(n_portfolios, "-1")) %>%
    select(portfolio, month, ret_ew, ret_vw, ret_mkt)
  
  # Combine everything
  out <- portfolios_ts %>%
    mutate(portfolio = as.character(portfolio)) %>%
    bind_rows(portfolios_ts_ls) %>%
    mutate(portfolio = factor(portfolio, levels = c(as.character(seq(1, n_portfolios, 1)), str_c(n_portfolios, "-1"))))
  
  return(out)
}

lst.portfolios <- vec.betas %>%
  map(~fun.construct_univariate_portfolios(tbl.betas, !!sym(.))) %>%
  set_names(vec.betas)

Given the portfolio returns, I want to evaluate whether the portfolio exhibit on average positive or negative excess returns. The following function estimates the mean excess return and CAPM alpha for each portfolio and computes corresponding Newey and West (1987) \(t\)-statistics (using six lags as common in the literature) testing the null hypothesis that the average portfolio excess return or CAPM alpha is zero.

fun.estimate_portfolio_returns <- function(data, ret, n_portfolios = 10, lag = 6) {
  # Compute average returns per portfolio
  average_ret <- data %>%
    select(portfolio, month, ret = {{ret}}) %>% 
    group_by(portfolio) %>%
    nest(data = c(month, ret)) %>% 
    mutate(model = map(data, ~lm("ret ~ 1", data = .)),
           nw_stderror = map_dbl(model, ~sqrt(diag(sandwich::NeweyWest(., lag = lag)))),
           model = map(model, tidy)) %>% 
    unnest(model) %>%
    ungroup() %>% 
    mutate(nw_tstat = estimate / nw_stderror) %>%
    select(estimate, nw_tstat) %>%
    t()
  
  # Estimate capm alpha per portfolio
  average_capm_alpha <- data %>%
    select(portfolio, month, ret = {{ret}}, ret_mkt) %>% 
    group_by(portfolio) %>%
    nest(data = c(month, ret, ret_mkt)) %>% 
    mutate(model = map(data, ~lm("ret ~ 1 + ret_mkt", data = .)),
           nw_stderror = map_dbl(model, ~sqrt(diag(sandwich::NeweyWest(., lag = lag))[1])),
           model = map(model, tidy)) %>% 
    unnest(model) %>%
    ungroup() %>% 
    filter(term == "(Intercept)") %>% 
    mutate(nw_tstat = estimate / nw_stderror) %>%
    select(estimate, nw_tstat) %>%
    t()
  
  # Construct output table
  out <- rbind(average_ret, average_capm_alpha)
  colnames(out) <-c(as.character(seq(1, n_portfolios, 1)), str_c(n_portfolios, "-1"))
  rownames(out) <- c("Excess Return", "t-Stat", "CAPM Alpha", "t-Stat")
  
  return(out)
}

Let us first apply the function to equal-weighted porfolios.

tab.portfolio_returns_ew <- lst.portfolios %>% 
  map(~fun.estimate_portfolio_returns(., ret_ew)) %>% 
  reduce(rbind) %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  pack_rows("Estimation Period: 1 Year (Monthly Data) - Equal-Weighted Portfolio Returns", 1, 4) %>%
  pack_rows("Estimation Period: 2 Years (Monthly Data) - Equal-Weighted Portfolio Returns", 5, 8) %>%
  pack_rows("Estimation Period: 3 Years (Monthly Data) - Equal-Weighted Portfolio Returns", 9, 12) %>%
  pack_rows("Estimation Period: 5 Years (Monthly Data) - Equal-Weighted Portfolio Returns", 13, 16) %>%
  pack_rows("Estimation Period: 1 Month (Daily Data) - Equal-Weighted Portfolio Returns", 17, 20) %>%
  pack_rows("Estimation Period: 3 Months (Daily Data) - Equal-Weighted Portfolio Returns", 21, 24) %>%
  pack_rows("Estimation Period: 6 Months (Daily Data) - Equal-Weighted Portfolio Returns", 25, 28) %>%
  pack_rows("Estimation Period: 12 Months (Daily Data) - Equal-Weighted Portfolio Returns", 29, 32) %>%
  pack_rows("Estimation Period: 24 Months (Daily Data) - Equal-Weighted Portfolio Returns", 33, 36)
tab.portfolio_returns_ew
1 2 3 4 5 6 7 8 9 10 10-1
Estimation Period: 1 Year (Monthly Data) - Equal-Weighted Portfolio Returns
Excess Return 0.84 0.73 0.80 0.88 0.93 0.91 0.99 1.01 0.93 0.81 -0.03
t-Stat 2.98 3.08 3.32 3.63 3.65 3.28 3.45 3.32 2.93 2.24 -0.18
CAPM Alpha 0.24 0.15 0.15 0.19 0.17 0.08 0.09 0.04 -0.10 -0.32 -0.56
t-Stat 1.37 1.07 1.13 1.52 1.34 0.61 0.69 0.25 -0.62 -1.57 -3.63
Estimation Period: 2 Years (Monthly Data) - Equal-Weighted Portfolio Returns
Excess Return 0.77 0.78 0.81 0.90 0.93 0.97 1.05 0.98 0.93 0.84 0.08
t-Stat 3.19 3.27 3.33 3.68 3.47 3.39 3.64 3.15 2.78 2.25 0.35
CAPM Alpha 0.30 0.24 0.21 0.23 0.19 0.18 0.19 0.05 -0.07 -0.25 -0.55
t-Stat 1.88 1.83 1.60 1.82 1.42 1.27 1.29 0.32 -0.43 -1.19 -3.15
Estimation Period: 3 Years (Monthly Data) - Equal-Weighted Portfolio Returns
Excess Return 0.81 0.88 0.91 1.00 1.01 1.05 1.10 1.09 1.02 0.97 0.16
t-Stat 3.57 3.89 4.01 3.99 3.83 3.72 3.88 3.47 3.07 2.60 0.73
CAPM Alpha 0.33 0.30 0.27 0.27 0.22 0.18 0.18 0.07 -0.06 -0.23 -0.56
t-Stat 2.14 2.45 2.21 2.05 1.58 1.29 1.20 0.48 -0.32 -1.09 -3.13
Estimation Period: 5 Years (Monthly Data) - Equal-Weighted Portfolio Returns
Excess Return 0.78 0.91 0.94 0.97 1.05 1.13 1.03 1.05 1.01 0.93 0.15
t-Stat 3.64 4.41 4.20 4.30 4.20 4.14 3.79 3.54 3.20 2.57 0.67
CAPM Alpha 0.29 0.34 0.26 0.21 0.22 0.20 0.05 -0.01 -0.13 -0.36 -0.65
t-Stat 2.02 2.51 2.08 1.63 1.61 1.44 0.34 -0.05 -0.77 -1.86 -3.94
Estimation Period: 1 Month (Daily Data) - Equal-Weighted Portfolio Returns
Excess Return 1.04 0.94 0.92 1.02 0.98 1.01 0.99 0.96 0.93 0.74 -0.30
t-Stat 3.52 4.31 4.38 4.73 4.34 4.19 3.97 3.63 3.11 2.14 -2.17
CAPM Alpha 0.30 0.31 0.29 0.34 0.23 0.22 0.15 0.05 -0.10 -0.38 -0.69
t-Stat 1.90 3.25 3.39 3.98 2.80 2.46 1.58 0.48 -0.86 -2.34 -5.40
Estimation Period: 3 Months (Daily Data) - Equal-Weighted Portfolio Returns
Excess Return 1.06 0.95 1.00 0.95 1.03 0.98 1.00 0.93 0.90 0.71 -0.35
t-Stat 3.74 4.52 4.64 4.37 4.52 4.04 3.92 3.48 3.01 1.99 -2.07
CAPM Alpha 0.39 0.39 0.38 0.26 0.29 0.18 0.14 0.01 -0.14 -0.48 -0.87
t-Stat 2.71 3.88 4.10 2.90 3.35 2.03 1.45 0.10 -1.20 -2.91 -5.85
Estimation Period: 6 Months (Daily Data) - Equal-Weighted Portfolio Returns
Excess Return 1.13 0.98 0.98 0.97 1.05 1.02 0.97 0.92 0.89 0.73 -0.40
t-Stat 4.20 4.61 4.45 4.54 4.49 4.16 3.73 3.38 2.92 2.07 -2.22
CAPM Alpha 0.52 0.41 0.35 0.30 0.29 0.20 0.09 -0.02 -0.17 -0.48 -1.00
t-Stat 3.74 4.20 3.58 3.31 3.15 2.14 0.93 -0.15 -1.47 -2.96 -6.46
Estimation Period: 12 Months (Daily Data) - Equal-Weighted Portfolio Returns
Excess Return 1.13 1.03 1.02 1.02 1.04 1.05 1.01 0.90 0.87 0.72 -0.42
t-Stat 4.67 4.79 4.71 4.34 4.35 4.13 3.85 3.27 2.87 2.09 -2.15
CAPM Alpha 0.60 0.47 0.40 0.33 0.28 0.22 0.13 -0.02 -0.18 -0.48 -1.08
t-Stat 4.77 4.60 4.03 3.39 2.86 2.32 1.27 -0.21 -1.46 -3.15 -6.64
Estimation Period: 24 Months (Daily Data) - Equal-Weighted Portfolio Returns
Excess Return 1.14 1.01 1.08 1.05 1.08 1.02 1.03 0.97 0.92 0.69 -0.45
t-Stat 4.96 4.69 4.72 4.49 4.29 4.05 3.97 3.56 3.02 2.04 -2.17
CAPM Alpha 0.64 0.47 0.45 0.36 0.32 0.22 0.18 0.07 -0.10 -0.48 -1.12
t-Stat 5.62 4.68 4.20 3.73 3.14 2.19 1.76 0.61 -0.83 -3.15 -6.60

There is neither a clear pattern across the portfolios in average excess returns, nor does the 10-1 portfolio yield statistically significant excess returns. However, the CAPM alphas exhibit a decreasing pattern across portfolios and a statistically significant negative coefficient on the 10-1 portfolio for all beta estimates. This result is actually not surprising since the risk-adjustment using the CAPM model adjusts returns for the sensitivity to the market factor which is the same factor used to calculate beta. Stocks in the highest decile hence mechanically exhibit a higher sensitivity than stocks in the low decile. As a result, the 10-1 portfolio has a high sensitivity to the market portfolio. Since the market factor generates positive returns in the long run and the 10-1 portfolio has a positive sensitivity, the effect of the risk-adjustment is negative.

Next, let us repeat the analysis using value-weighted returns for each portfolio.

tab.portfolio_returns_vw <- lst.portfolios %>% 
  map(~fun.estimate_portfolio_returns(., ret_vw)) %>% 
  reduce(rbind) %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  pack_rows("Estimation Period: 1 Year (Monthly Data) - Value-Weighted Portfolio Returns", 1, 4) %>%
  pack_rows("Estimation Period: 2 Years (Monthly Data) - Value-Weighted Portfolio Returns", 5, 8) %>%
  pack_rows("Estimation Period: 3 Years (Monthly Data) - Value-Weighted Portfolio Returns", 9, 12) %>%
  pack_rows("Estimation Period: 5 Years (Monthly Data) - Value-Weighted Portfolio Returns", 13, 16) %>%
  pack_rows("Estimation Period: 1 Month (Daily Data) - Value-Weighted Portfolio Returns", 17, 20) %>%
  pack_rows("Estimation Period: 3 Months (Daily Data) - Value-Weighted Portfolio Returns", 21, 24) %>%
  pack_rows("Estimation Period: 6 Months (Daily Data) - Value-Weighted Portfolio Returns", 25, 28) %>%
  pack_rows("Estimation Period: 12 Months (Daily Data) - Value-Weighted Portfolio Returns", 29, 32) %>%
  pack_rows("Estimation Period: 24 Months (Daily Data) - Value-Weighted Portfolio Returns", 33, 36)
tab.portfolio_returns_vw
1 2 3 4 5 6 7 8 9 10 10-1
Estimation Period: 1 Year (Monthly Data) - Value-Weighted Portfolio Returns
Excess Return 0.36 0.45 0.54 0.65 0.72 0.66 0.65 0.68 0.75 0.66 0.30
t-Stat 1.90 2.40 3.04 3.59 3.52 3.14 2.84 2.73 2.74 2.06 1.36
CAPM Alpha -0.10 -0.05 -0.01 0.05 0.04 -0.07 -0.15 -0.22 -0.22 -0.45 -0.35
t-Stat -0.80 -0.35 -0.06 0.55 0.39 -0.67 -1.42 -1.92 -1.64 -2.59 -1.86
Estimation Period: 2 Years (Monthly Data) - Value-Weighted Portfolio Returns
Excess Return 0.43 0.43 0.56 0.72 0.66 0.68 0.67 0.63 0.68 0.66 0.23
t-Stat 2.40 2.42 3.11 3.77 3.03 3.16 2.89 2.50 2.40 1.98 0.92
CAPM Alpha 0.04 -0.01 0.05 0.14 0.00 -0.02 -0.12 -0.22 -0.27 -0.44 -0.49
t-Stat 0.33 -0.09 0.41 1.32 -0.04 -0.18 -1.04 -1.86 -1.97 -2.69 -2.59
Estimation Period: 3 Years (Monthly Data) - Value-Weighted Portfolio Returns
Excess Return 0.42 0.50 0.58 0.69 0.68 0.65 0.70 0.69 0.70 0.75 0.33
t-Stat 2.45 2.78 3.35 3.54 3.18 3.02 2.90 2.69 2.44 2.23 1.26
CAPM Alpha 0.04 0.03 0.03 0.06 -0.04 -0.12 -0.15 -0.24 -0.33 -0.47 -0.50
t-Stat 0.26 0.19 0.26 0.53 -0.38 -1.04 -1.23 -2.03 -2.44 -2.91 -2.69
Estimation Period: 5 Years (Monthly Data) - Value-Weighted Portfolio Returns
Excess Return 0.52 0.59 0.72 0.67 0.76 0.69 0.70 0.76 0.65 0.78 0.26
t-Stat 3.21 3.57 4.27 3.52 3.67 3.30 3.14 3.13 2.34 2.49 1.03
CAPM Alpha 0.14 0.09 0.12 -0.03 -0.02 -0.14 -0.22 -0.23 -0.45 -0.47 -0.60
t-Stat 1.00 0.64 1.04 -0.25 -0.17 -1.22 -1.80 -1.85 -3.31 -3.01 -3.30
Estimation Period: 1 Month (Daily Data) - Value-Weighted Portfolio Returns
Excess Return 0.58 0.56 0.62 0.70 0.67 0.70 0.77 0.79 0.75 0.54 -0.05
t-Stat 2.95 3.62 4.24 4.55 4.29 4.12 4.22 3.95 3.19 1.78 -0.25
CAPM Alpha 0.01 0.12 0.13 0.18 0.10 0.07 0.08 0.02 -0.15 -0.52 -0.53
t-Stat 0.12 1.64 2.30 3.22 2.23 1.35 1.45 0.38 -2.19 -3.92 -3.22
Estimation Period: 3 Months (Daily Data) - Value-Weighted Portfolio Returns
Excess Return 0.47 0.63 0.60 0.66 0.68 0.75 0.78 0.78 0.71 0.55 0.08
t-Stat 2.31 4.35 4.17 4.46 4.42 4.25 4.17 3.87 2.89 1.84 0.37
CAPM Alpha 0.01 0.22 0.16 0.14 0.12 0.11 0.09 0.02 -0.19 -0.53 -0.54
t-Stat 0.07 2.94 2.38 2.36 2.36 1.95 1.56 0.28 -2.40 -4.18 -2.82
Estimation Period: 6 Months (Daily Data) - Value-Weighted Portfolio Returns
Excess Return 0.59 0.68 0.58 0.62 0.72 0.70 0.77 0.74 0.68 0.64 0.05
t-Stat 3.35 4.71 4.07 4.30 4.66 4.10 4.31 3.57 2.80 2.15 0.25
CAPM Alpha 0.17 0.27 0.15 0.14 0.15 0.07 0.08 -0.02 -0.22 -0.46 -0.63
t-Stat 1.58 3.37 2.13 2.22 2.76 1.25 1.38 -0.28 -2.75 -3.80 -3.53
Estimation Period: 12 Months (Daily Data) - Value-Weighted Portfolio Returns
Excess Return 0.64 0.64 0.68 0.66 0.67 0.74 0.80 0.72 0.69 0.62 -0.02
t-Stat 3.87 4.58 4.78 4.24 4.50 4.28 4.29 3.48 2.81 2.07 -0.10
CAPM Alpha 0.26 0.23 0.24 0.14 0.13 0.14 0.12 -0.04 -0.20 -0.47 -0.73
t-Stat 2.69 3.09 3.50 2.22 2.35 2.30 1.86 -0.72 -2.53 -3.90 -4.19
Estimation Period: 24 Months (Daily Data) - Value-Weighted Portfolio Returns
Excess Return 0.54 0.69 0.64 0.69 0.68 0.71 0.77 0.76 0.62 0.58 0.05
t-Stat 3.64 4.69 4.49 4.42 4.34 4.29 4.24 3.60 2.57 1.97 0.20
CAPM Alpha 0.21 0.30 0.20 0.19 0.14 0.12 0.12 0.03 -0.23 -0.48 -0.69
t-Stat 2.25 3.62 2.74 2.92 2.44 2.00 1.95 0.45 -3.07 -4.28 -4.04

The results show that the negative relation between beta and future stock returns detected using equal-weighted portfolios is weaker using value-weighted portfolios. Taken together, the portfolio analyses provide evidence that contradicts the predictions of the CAPM. According to the CAPM, returns should increase across the decile portfolios and risk-adjusted returns should be statistically indistinguishable from zero.

Regression Analysis

As a last step, I analyze the relation between beta and stock returns using Fama and MacBeth (1973) regression analysis. Each month, I perform a cross-sectional regression of one-month-ahead excess stock returns on the given measure of beta. Time-series averages over these cross-sectional regressions then provide the desired results.

The next function performs Fama-MacBeth regressions for a given variable and returns the time-series average of monthly regression coefficients and corresponding \(t\)-statistics.

fun.fama_macbeth_regression <- function(data, model, model_name = "value") {
  # Prepare and winsorize data
  data_nested <- data %>%
    group_by(month) %>% 
    nest(data = -month) 
  
  # Perform cross-sectional regressions for each month
  cross_sectional_regs <- data_nested %>%
    mutate(model = map(data, ~lm(model, data = .x)),
           tidy = map(model, tidy),
           glance = map(model, glance),
           n = map_int(model, nobs))
  
  # Extract average coefficient estimates
  fama_macbeth_coefs <- cross_sectional_regs %>%
    unnest(tidy) %>%
    group_by(term) %>%
    summarize(coefficient = mean(estimate))
  
  # Compute newey-west standard errors of average coefficient estimates
  newey_west_std_errors <- cross_sectional_regs %>%
    unnest(tidy) %>%
    select(month, term, estimate) %>% 
    group_by(term) %>%
    arrange(month) %>%
    nest(data_nw = c(month, estimate)) %>% 
    mutate(nw_std_error = map_dbl(data_nw, function(x) sqrt(diag(NeweyWest(lm(estimate ~ 1, data = x), lag = 6))))) %>%
    select(term, nw_std_error)
  
  # Put coefficient estimates and standard errors together and compute t-statistics
  fama_macbeth_coefs <- fama_macbeth_coefs %>%
    left_join(newey_west_std_errors, by = "term") %>%
    mutate(nw_t_stat = coefficient / nw_std_error) %>%
    select(term, coefficient, nw_t_stat) %>%
    pivot_longer(cols = c(coefficient, nw_t_stat), 
                 names_to = "statistic", values_to = model_name) %>%
    mutate(statistic = paste(term, statistic, sep = " ")) %>%
    select(-term)

  # Extract average r-squared and average number of observations
  fama_macbeth_stats <- cross_sectional_regs %>% 
    unnest(c(glance, n)) %>%
    ungroup() %>%
    summarize(adj_r_squared = mean(adj.r.squared),
              n = mean(n)) %>%
    pivot_longer(c(adj_r_squared, n), names_to = "statistic", values_to = model_name)
  
  # Combine desired output and return results
  out <- rbind(fama_macbeth_coefs, fama_macbeth_stats)
  return(out)
}

Before I apply the function, I winsorize all beta measures at 0.05% on a monthly basis to mitigate the impact of outliers or estimation errors.

tbl.betas_regression <- tbl.betas %>% 
  filter(!is.na(ret_adj_excess_f1)) %>%
  group_by(month) %>%
  mutate_at(vars(contains("beta")), ~fun.winsorize(., cut = 0.005)) %>%
  ungroup()

lst.regressions <- vec.betas %>%
  map(function(x) {tbl.betas_regression %>% 
      filter(!is.na(!!sym(x))) %>% 
      fun.fama_macbeth_regression(str_c("ret_adj_excess_f1 ~ ", x), x)}) %>%
  set_names(vec.betas)

The following table provides the regression results for each beta measure.

tbl.regressions <- lst.regressions %>%
  map(~mutate(., statistic = str_replace_all(statistic, str_c(vec.betas, collapse = "|"), "beta"))) %>% 
  reduce(full_join)
tbl.regressions$statistic <- c("intercept", "t-stat", "beta", "t-stat", "adj. R-squared", "n")

tab.regressions <- tbl.regressions %>%
  kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
tab.regressions
statistic beta_1y beta_2y beta_3y beta_5y beta_1m beta_3m beta_6m beta_12m beta_24m
intercept 0.82 0.78 0.83 0.84 0.99 1.04 1.08 1.14 1.18
t-stat 3.39 3.59 4.16 4.48 4.26 4.60 4.91 5.30 5.59
beta 0.04 0.07 0.11 0.10 -0.06 -0.12 -0.15 -0.21 -0.23
t-stat 0.69 0.73 0.96 0.79 -1.28 -1.61 -1.66 -2.02 -2.05
adj. R-squared 0.02 0.02 0.03 0.03 0.01 0.02 0.03 0.03 0.03
n 2944.37 2587.02 2232.09 1949.31 3131.99 3094.10 3045.12 2948.86 2713.91

For all measures based on monthly data, I do not find any statistically significant relation between beta and stock returns, while the measures based on daily returns do exhibit a statistically significant relation for longer estimation periods. Moreover, the intercepts are positive and statistically significant across all beta measures, although, controlling for the effect of beta, the average excess stock returns should be zero according to the CAPM. The results thus provide no evidence in support of the CAPM.

It is worth noting that the average adjusted R-squared ranges only from 2 to 3 percent. Low values of R-squared are common in empirical asset pricing studies. The main reason put forward in the literature is that predicting stock returns is that realized stock returns are just a very noisy measure for expected stock returns (e.g., Elton, 1999).

In all future blog posts, I use the 12-month beta estimate based on daily data just asBEM do. The figure below shows the cumulative log returns of the corresponding value-weighted beta portfolios. It clearly demonstrates that high-beta stocks underperform low-beta stocks over the long-run, at least in the CRSP data.

fig.cum_returns <- lst.portfolios$beta_12m %>%
  group_by(portfolio) %>%
  arrange(month) %>%
  mutate(cum_return = cumsum(log(1+ret_vw/100))) %>%
  ungroup() %>%
  ggplot(aes(x = month, y = cum_return, color = portfolio, linetype = portfolio)) +
  geom_line() +
  labs(x = NULL, y = NULL, color = NULL, linetype = NULL,
       title = "Cumulative log returns of value-weighted beta portfolios") + 
  scale_x_date(expand = c(0, 0), date_breaks = "10 years", date_labels = "%Y") +
  scale_y_continuous(labels = percent, breaks = pretty_breaks()) + 
  theme_classic() +
  scale_color_viridis_d()
Christoph Scheuch
Christoph Scheuch
Director of Product

My interests include Product Management, Data Science, R and FinTech-related stuff. matter.

Related