12 Clustered and Panel Data

Grouped data structures, in which we observe individual units within larger groups, are common in political science and other social sciences. Examples include:

  • Cross-national survey data, where we observe individual respondents grouped by country.

  • Block-randomized field experiments. For example, in an experiment where the treatment is administered at the village level, we observe individual outcomes grouped by village.

  • Panel data, where we observe the same units repeatedly over time. This includes panel surveys as well as observational data on states, countries, or other political units over time.

Grouped data presents problems and opportunities. At the root of both is the idea of unobserved heterogeneity—that some variation across groups might be due to unobservable features of the groups.

To return to our running example, suppose you are interested in the correlates of voting for Donald Trump in the 2016 general election. You observe survey respondents grouped by state. Vote choice might be affected by:

  1. Individual characteristics like one’s age, gender, race, income, and education.

  2. State characteristics29 like the unemployment rate, undocumented population, and exposure to trade with China.

Some state characteristics that might affect vote choice are difficult or impossible to measure. For example, some states have a more cosmopolitan culture, while others have a more traditional culture. A 50-year-old white man living in Connecticut is, we would expect, less likely to have voted for Trump than a 50-year-old white man living in Alabama. This is unobserved heterogeneity—characteristics that we do not observe, and therefore cannot control for, but which affect the response.

If we are not careful in dealing with grouped data, unobserved heterogeneity can be a major problem. The spherical errors assumption is usually not tenable for grouped data, since observations will be correlated with others in the same group. OLS will therefore be inefficient and yield invalid inferences, as we saw last week. Even worse, if group-level sources of unobserved heterogeneity are correlated with individual characteristics—if, say, younger voters are more likely to live in cosmopolitan states—OLS may also be biased and inconsistent.

On the other hand, if we deal with grouped data properly, we can enhance the credibility of our inferences. We can eliminate the influence of variation across groups, allowing us to focus on the comparisons within groups that we are usually most interested in.

Before we get started, a note of caution. One week allows us just enough time to scratch the surface of how to analyze grouped data. If you work with grouped data in your dissertation or other research, you should think carefully about your data structure and potential sources of unobserved heterogeneity. The methods we discuss this week may not solve your problems. As further references, I recommend Wooldridge (2002) and Gelman and Hill (2006).

12.1 The Linear Model with Grouped Data

We need to change our notation a bit to reflect the arrangement of observations into groups. Let there be \(G\) groups indexed by \(g = 1, \ldots, G\). Within each group, we have \(N\) observations indexed by \(n = 1, \ldots, N\).30 (In the special case of panel data, where each group is a unit observed over time, the standard notation is to use \(t = 1, \ldots, T\) instead.)

We will now index individual “rows” of the data by \(gn\), which stands for the \(n\)’th observation within group \(g\).

  • Unit-level
    • \(Y_{gn}\): response for the \(gn\)’th observation
    • \(\mathbf{x}_{gn}\): vector of \(K\) covariates for the \(gn\)’th observation
    • \(\epsilon_{gn}\): random shock to the \(gn\)’th response
  • Group-level
    • \(\mathbf{Y}_g\): vector of \(N\) responses for the \(g\)’th group
    • \(\mathbf{X}_g\): \(N \times K\) matrix of covariates for the \(g\)’th group
    • \(\epsilon_g\): vector of \(N\) random shocks for the \(g\)’th group
  • Full data
    • \(\mathbf{Y}\): vector of all \(GN\) responses, where \[\mathbf{Y} = \begin{bmatrix} \mathbf{Y}_1 \\ \mathbf{Y}_2 \\ \vdots \\ \mathbf{Y}_G \end{bmatrix}.\]
    • \(\mathbf{X}\): \(GN \times K\) matrix of covariates, where \[\mathbf{X} = \begin{bmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_G \end{bmatrix}.\]
    • \(\mathbf{D}\): \(GN \times G\) matrix of group membership indicators.
    • \(\epsilon\): vector of all \(GN\) random shocks

If we assume a standard linear model for each \(gn\)’th response, \[ Y_{gn} = \mathbf{x}_{gn} \cdot \beta + \epsilon_{gn}, \] we end up with the familiar matrix equation \[ \mathbf{Y} = \mathbf{X} \beta + \epsilon. \]

You may look at this equation and think, “This looks like something we’ve used OLS on. What’s wrong with OLS?” Two reasons. First, at a minimum, we are unlikely to have spherical errors in grouped data. Observations in the same group—students in the same classroom, voters in the same state, LAPOP respondents in the same country—are likely to have residual correlation. There are unmeasured factors that commonly affect their responses. As in any non-spherical error model, this means OLS will yield invalid inferences. Moreover, because we are talking about autocorrelation and not just heteroskedasticity, the standard error correction that we encountered last week won’t do the trick.

Second, depending on the nature of the group-specific effects, we may also have a failure of strict exogeneity. I hope I have impressed on you by now that this is a major problem. To see why we have a failure of strict exogeneity, let us return to the example of voters living in states with cosmopolitan versus traditional cultures. If we cannot measure cosmopolitanism (as I am assuming), then it ends up in the error term of our model: \[ \epsilon_{gn} = \text{Cosmpolitanism}_g + \text{Other Stuff}_{gn}. \] But we know that younger and more educated voters are more likely to live in cosmopolitan areas. So if our covariate matrix includes age and education, we have \[ E[\epsilon \,|\, \mathbf{X}] \neq \mathbf{0}. \]

We will proceed from the easiest problems to the hardest. We will first consider a standard error correction and an efficiency improvement for the case where errors are correlated within groups but strict exogeneity still holds. We will then identify an unbiased estimator in the case where strict exogeneity fails.

12.2 Clustered Standard Errors

Imagine the following policy experiment. The federal government randomly selects half of the states to receive a grant intended to improve high school education, while the other half do not receive it. We observe some indicator of educational quality (e.g., graduation rates) at the school district level, where school districts are grouped within states. So our model looks like \[ \text{Quality}_{gn} = \beta_1 + \beta_2 \text{Grant}_g + \epsilon_{gn}. \] We want to know whether receiving the grant affected quality. What we cannot measure is how well the states used the money. If some states used it better than others, then we would expect the error term to be correlated across school districts within the state. In a state where the money was used wisely, we would expect most of the schools to do “better than expected”—to have positive residuals, in the language of regression. Conversely, in a state where the money was squandered, we would expect most of the schools to do worse than we would otherwise predict.

This is one example of the general phenomenon Moulton (1986; 1990) identifies—that there is often substantial correlation in the random errors within groups, especially when we are looking at the effects of variables that only vary at the group level. One way I think about it is that, with grouped data and group-level covariates, the effective number of observations is less than the nominal number of observations. If we use the OLS standard errors, we are pretending to have more data than we really do.

Luckily, we can correct these “clustered” errors in a manner similar to what we did last week with heteroskedasticity of unknown form. The most we can assume on \(\Omega = V[\epsilon \,|\, \mathbf{X}]\) is

  • Heteroskedasticity of unknown form, within and across groups.

  • Autocorrelation of unknown form within groups.

  • No autocorrelation across groups.

Under these assumptions, \(\Omega\) has the block-diagonal form \[ \Omega = \begin{bmatrix} \Omega_1 & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \Omega_2 & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & \Omega_G \end{bmatrix}, \] where each \(\Omega_g\) is a symmetric \(N \times N\) matrix. These may differ from each other—each group may have its own special autocorrelation structure.

If we knew \(\Omega\) exactly, we could use GLS to efficiently estimate \(\beta\) and obtain correct standard errors. That is unlikely. Instead, we will use OLS to obtain our coefficient estimates, then we will correct the standard errors so as to be approximately correct in large samples. The cluster-robust variance estimator is \[ \hat{\Sigma}_{\text{CR}} = (\mathbf{X}^\top \mathbf{X})^{-1} \left( \sum_{g=1}^G \mathbf{X}_g^\top \hat{e}_g \hat{e}_g^\top \mathbf{X}_g \right) (\mathbf{X}^\top \mathbf{X})^{-1}, \] where \(\hat{e}_g\) is the \(N \times 1\) vector of OLS residuals for the \(g\)’th group. Like White’s estimator for heteroskedasticity, this is a “sandwich” estimator. The “meat” of the sandwich here accounts for the within-group correlations in the error term.

Also like White’s estimator for heteroskedasticity, the cluster-robust estimator is consistent, but not unbiased. It approaches the truth as \(G\), the number of groups, grows large (holding fixed \(N\), the number of observations per group), but it may be badly biased in small samples. The consensus seems to be that \(N = 50\) is large enough (Cameron and Miller 2015), so you Americanists may go ahead and rejoice. However, if you observe a small number of units over a long period of time (e.g., in a study of the historical political economy of Western European countries), the cluster-robust standard error estimator will be severely biased. Beck and Katz (1995) provide a similar estimator for such data, except with the summation in the middle of the “sandwich” taken over the \(N\) time periods instead of the \(G\) groups.

The earliest derivation of this estimator was Liang and Zeger (1986), and that is whom you should cite when you use it. Arellano (1987) derives the same estimator but is a bit easier to follow. As with White’s estimator, there are various finite-sample corrections to \(\hat{\Sigma}_{\text{CR}}\) that you may want to use; we will not go through those here.

12.3 Random Effects

The coverage and notation in this section and the next one closely follow Johnston and DiNardo (1997, chap. 12).

Just as Huber-White standard errors do not fix the inefficiency of OLS under heteroskedasticity, cluster-robust standard errors do not fix the inefficiency of OLS under within-group correlation. To get a handle on the efficiency problem, we will make further assumptions about the source of that correlation.

We will assume each group \(g\) has a (potentially) different intercept. We can incorporate this into our original model, \[ Y_{gn} = \mathbf{x}_{gn} \cdot \beta + \epsilon_{gn}, \] by decomposing the error term into independent group-specific and individual-specific components: \[ \epsilon_{gn} = \underbrace{\alpha_g}_{\text{group}} + \underbrace{\eta_{gn}}_{\text{individual}}. \] In this equation, \(\alpha_g\) represents the difference between the intercept for group \(g\) and the overall average, while \(\eta_{gn}\) is an idiosyncratic shock specific to the \(n\)’th observation of group \(g\). We will assume that \(\eta_{gn}\) is independent and identically distributed across observations, so that the only source of autocorrelation is that observations in the same group share the same \(\alpha_g\).

When we decompose the error term like this, we are assuming implicitly that the \(\alpha_g\)’s are uncorrelated with the covariates in the model. Otherwise, strict exogeneity fails and the techniques that follow are useless. Let me put that another way—the random effects model depends on the assumption that the group-specific shocks are uncorrelated with the covariates. We will proceed under this assumption, and return shortly to the questions of testing it and what to do if it fails.

The random-intercepts model gives us a convenient structure for the \(\Omega\) matrix in GLS. Let \(V[\alpha_g] = \sigma^2_\alpha\) and \(V[\eta_{gn}] = \sigma^2_\eta\). The error variance matrix for the \(g\)’th group then works out to \[ \Omega_g = \begin{bmatrix} \sigma_{\eta}^2 + \sigma_{\alpha}^2 & \sigma_{\alpha}^2 & \cdots & \sigma_{\alpha}^2 \\ \sigma_{\alpha}^2 & \sigma_{\eta}^2 + \sigma_{\alpha}^2 & \cdots & \sigma_{\alpha}^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{\alpha}^2 & \sigma_{\alpha}^2 & \cdots & \sigma_{\eta}^2 + \sigma_{\alpha}^2 \end{bmatrix}. \] Since there is no autocorrelation across groups, \(\Omega\) takes the same block-diagonal form as in our discussion of cluster-robust standard errors.

If we knew \(\sigma^2_\alpha\) and \(\sigma^2_\eta\), we could estimate \(\beta\) by GLS. It is unlikely, however, that we would know these in advance. Luckily, there is a compromise option available. Feasible GLS, or FGLS, entails using a pair of first-stage regressions to estimate the variance of \(\alpha\) and \(\eta\), then plugging these into the GLS formula. I won’t go through all the math, but the basic idea is as follows.

  1. Calculate the average response \(\bar{Y}_g\) and average covariate vector \(\bar{\mathbf{x}}_g\) for each group. Estimate \(\sigma^2_\alpha\), the between-group variance, using the residual variance of a regression of \(\bar{\mathbf{Y}}\) on \(\bar{\mathbf{X}}\).

  2. Estimate \(\sigma^2_\eta\), the within-group variance, using the residual variance of a regression of \(\mathbf{Y}\) on \(\mathbf{X}\) and \(\mathbf{D}\), the full set of group dummies. We will see the importance of this regression very shortly.

  3. Form the matrix \(\hat{\Omega}\) by plugging \(\hat{\sigma}^2_\alpha\) and \(\hat{\sigma}^2_\eta\) into the formulas above, then run GLS using \(\hat{\Omega}\) as the weighting matrix.

This gives us the random effects estimator, \[ \hat{\beta}_{\text{RE}} = (\mathbf{X}^\top \hat{\Omega}^{-1} \mathbf{X})^{-1} \mathbf{X}^\top \hat{\Omega}^{-1} \mathbf{Y}. \]

See Johnston and DiNardo (1997, 392–95) for full formulas and details. FGLS is consistent but not unbiased, so the random effects model may not be a good idea in small samples. If our specification of the error structure is correct, it is asymptotically efficient—as the sample size increases, no other estimator has lower standard errors.

12.4 Fixed Effects

What if the group-specific intercepts are correlated with the covariates? Then, in order to maintain strict exogeneity, we must pull them out of the error term and into the covariate matrix. This is easy to do—we can rewrite the full model as \[ \mathbf{Y} = \mathbf{X} \beta + \mathbf{D} \alpha + \eta, \] where \(\alpha\) is the \(G \times 1\) vector of group-specific intercepts. This suggests that we run OLS on our covariates plus the full set of group membership indicators. (As with any set of indicators, we need to omit one category.) The first \(K\) elements of this regression constitute the fixed effects estimator.

Most textbook treatments of fixed effect estimators go through a whole rigmarole about computation, because it used to be challenging to invert a \((K + G) \times (K + G)\) matrix when \(G\) was moderately large. This is no longer true,31 so you can safely ignore most of the hand-wringing about computational difficulties.

The standard errors of the fixed effects estimator are usually higher than those of the random effects estimator, since estimating \(G\) additional parameters uses a lot of degrees of freedom. This leads us to the following pair of observations.

  • If the random effects assumption is met (group-specific effects are uncorrelated with covariates), then the random effects and fixed estimators are both consistent, but fixed effects is less efficient.

  • If the random effects assumption is not met, then the random effects estimator is inconsistent while the fixed effects estimator is consistent.

The typical test for whether fixed effects are necessary comes from Hausman (1978). Under the null hypothesis that both estimators are consistent (and thus fixed effects are unnecessary and inefficient), the test statistic \[ H = (\hat{\beta}_{\text{RE}} - \hat{\beta}_{\text{FE}})^\top (\hat{\Sigma}_{\text{FE}} - \hat{\Sigma}_{\text{RE}})^{-1} (\hat{\beta}_{\text{RE}} - \hat{\beta}_{\text{FE}}) \] asymptotically has a \(\chi^2\) distribution with \(K\) degrees of freedom.

The other main drawback of fixed effects estimation is that you cannot estimate the effects of variables that do not vary within groups. (Why not?) See Greene (2003, sec. 13.5) for estimation strategies with panel data and time-invariant covariates.

One final note: Arellano (1987) shows that the cluster-robust variance matrix estimator can be used with fixed effects. Cameron and Miller (2015) recommend doing so, at least when the number of groups is large.

12.5 Appendix: Implementation in R

The methods introduced here can be implemented via the plm package.

library("plm")

We will use the Produc dataset from plm, a riveting collection of economic statistics about the U.S. states from 1970 to 1986.

data(Produc, package = "plm")
head(Produc)
##     state year region  pcap    hwy  water   util    pc   gsp    emp unemp
## 1 ALABAMA 1970      6 15033 7325.8 1655.7 6051.2 35794 28418 1010.5   4.7
## 2 ALABAMA 1971      6 15502 7525.9 1721.0 6255.0 37300 29375 1021.9   5.2
## 3 ALABAMA 1972      6 15972 7765.4 1764.8 6442.2 38670 31303 1072.3   4.7
## 4 ALABAMA 1973      6 16406 7907.7 1742.4 6756.2 40084 33430 1135.5   3.9
## 5 ALABAMA 1974      6 16763 8025.5 1734.8 7002.3 42057 33749 1169.8   5.5
## 6 ALABAMA 1975      6 17316 8158.2 1752.3 7405.8 43972 33604 1155.4   7.7

The functions in plm assume that your data are organized like Produc, with the grouping variable in the first column and the identification variable (time, in the case of panel data) in the second column. See the plm package vignette on how to get datasets not organized this way into line.

We will treat unemployment (unemp) as our response and public capital stock (pcap) and private capital stock (pc) as our covariates. As a benchmark, let’s use OLS.

fit_ols <- lm(unemp ~ pcap + pc,
              data = Produc)
summary(fit_ols)
## 
## Call:
## lm(formula = unemp ~ pcap + pc, data = Produc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.559 -1.622 -0.337  1.213 11.595 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.20e+00   1.08e-01   57.57   <2e-16
## pcap        1.01e-05   5.52e-06    1.82    0.069
## pc          2.56e-06   2.56e-06    1.00    0.319
## 
## Residual standard error: 2.2 on 813 degrees of freedom
## Multiple R-squared:  0.0352, Adjusted R-squared:  0.0328 
## F-statistic: 14.8 on 2 and 813 DF,  p-value: 4.79e-07

The “pooling” estimator implemented by plm() ought to give us the same results.

fit_pooling <- plm(unemp ~ pcap + pc,
                   data = Produc,
                   model = "pooling")
summary(fit_pooling)
## Pooling Model
## 
## Call:
## plm(formula = unemp ~ pcap + pc, data = Produc, model = "pooling")
## 
## Balanced Panel: n=48, T=17, N=816
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -3.560  -1.620  -0.337   1.210  11.600 
## 
## Coefficients :
##             Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 6.20e+00   1.08e-01   57.57   <2e-16
## pcap        1.01e-05   5.52e-06    1.82    0.069
## pc          2.56e-06   2.56e-06    1.00    0.319
## 
## Total Sum of Squares:    4060
## Residual Sum of Squares: 3920
## R-Squared:      0.0352
## Adj. R-Squared: 0.0328
## F-statistic: 14.8141 on 2 and 813 DF, p-value: 4.79e-07

We can obtain the cluster-robust variance matrix estimate via vcovHC(). Make sure to specify method = "arellano" so as to get the usual estimator. It is not entirely clear to me which of the various finite-sample adjustments corresponds to the defaults in Stata.

crvm_pooling <- vcovHC(fit_pooling,
                       method = "arellano",
                       type = "HC1")
summary(fit_pooling, vcov = crvm_pooling)
## Pooling Model
## 
## Note: Coefficient variance-covariance matrix supplied: crvm_pooling
## 
## Call:
## plm(formula = unemp ~ pcap + pc, data = Produc, model = "pooling")
## 
## Balanced Panel: n=48, T=17, N=816
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -3.560  -1.620  -0.337   1.210  11.600 
## 
## Coefficients :
##             Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 6.20e+00   2.45e-01   25.34   <2e-16
## pcap        1.01e-05   1.21e-05    0.83     0.41
## pc          2.56e-06   7.30e-06    0.35     0.73
## 
## Total Sum of Squares:    4060
## Residual Sum of Squares: 3920
## R-Squared:      0.0352
## Adj. R-Squared: 0.0328
## F-statistic: 6.45144 on 2 and 47 DF, p-value: 0.00334

Notice that our \(t\)-statistics are cut in more than half, even though our variables have within-group variation (unlike Moulton (1990)’s example).

We can also use plm() to estimate a random-effects model.

fit_random <- plm(unemp ~ pcap + pc,
                  data = Produc,
                  model = "random")
summary(fit_random)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = unemp ~ pcap + pc, data = Produc, model = "random")
## 
## Balanced Panel: n=48, T=17, N=816
## 
## Effects:
##                var std.dev share
## idiosyncratic 3.09    1.76  0.69
## individual    1.36    1.17  0.31
## theta:  0.656  
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -3.640  -1.360  -0.286   1.010   9.730 
## 
## Coefficients :
##             Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 5.67e+00   2.53e-01   22.42  < 2e-16
## pcap        1.55e-09   1.12e-05    0.00  0.99989
## pc          1.59e-05   4.55e-06    3.51  0.00048
## 
## Total Sum of Squares:    2920
## Residual Sum of Squares: 2800
## R-Squared:      0.0421
## Adj. R-Squared: 0.0398
## F-statistic: 17.88 on 2 and 813 DF, p-value: 2.52e-08

And, finally, a fixed-effects model, which plm() calls the "within" model.

fit_fixed <- plm(unemp ~ pcap + pc,
                 data = Produc,
                 model = "within")
summary(fit_fixed)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = unemp ~ pcap + pc, data = Produc, model = "within")
## 
## Balanced Panel: n=48, T=17, N=816
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -3.810  -1.170  -0.248   0.947   8.390 
## 
## Coefficients :
##      Estimate Std. Error t-value Pr(>|t|)
## pcap 2.27e-04   3.18e-05    7.14  2.2e-12
## pc   4.18e-06   6.57e-06    0.64     0.52
## 
## Total Sum of Squares:    2770
## Residual Sum of Squares: 2370
## R-Squared:      0.144
## Adj. R-Squared: 0.0889
## F-statistic: 64.2646 on 2 and 766 DF, p-value: <2e-16

We can extract the fixed-effect estimates themselves via fixef().

fixef(fit_fixed)
##        ALABAMA        ARIZONA       ARKANSAS     CALIFORNIA       COLORADO 
##       3.763902       3.201232       5.020980     -24.351816       1.906885 
##    CONNECTICUT       DELAWARE        FLORIDA        GEORGIA          IDAHO 
##       2.222801       5.723806      -2.950385       0.090978       5.926215 
##       ILLINOIS        INDIANA           IOWA         KANSAS       KENTUCKY 
##      -7.483176       1.499181       1.176446       1.258062       2.487150 
##      LOUISIANA          MAINE       MARYLAND  MASSACHUSETTS       MICHIGAN 
##       2.394977       6.132823      -0.469424      -0.081008      -2.073261 
##      MINNESOTA    MISSISSIPPI       MISSOURI        MONTANA       NEBRASKA 
##      -0.913146       4.790411       0.576112       5.398842       1.033804 
##         NEVADA  NEW_HAMPSHIRE     NEW_JERSEY     NEW_MEXICO       NEW_YORK 
##       6.121925       3.867146      -0.475632       5.875722     -22.416937 
## NORTH_CAROLINA   NORTH_DAKOTA           OHIO       OKLAHOMA         OREGON 
##       0.588848       3.714982      -5.208327       2.212210       4.913671 
##   PENNSYLVANIA   RHODE_ISLAND SOUTH_CAROLINA   SOUTH_DAKOTA       TENNESSE 
##      -6.740590       6.045561       3.895325       2.942687       1.080415 
##          TEXAS           UTAH        VERMONT       VIRGINIA     WASHINGTON 
##     -11.161384       4.233540       5.624496      -0.960964       1.080757 
##  WEST_VIRGINIA      WISCONSIN        WYOMING 
##       6.940764       0.191971       3.798969

If we wanted to include time dummies as well, we could specify effect = "twoways" in the fitting function.

fit_fixed_2 <- plm(unemp ~ pcap + pc,
                   data = Produc,
                   effect = "twoways",
                   model = "within")
summary(fit_fixed_2)
## Twoways effects Within Model
## 
## Call:
## plm(formula = unemp ~ pcap + pc, data = Produc, effect = "twoways", 
##     model = "within")
## 
## Balanced Panel: n=48, T=17, N=816
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
## -3.3300 -0.8030 -0.0449  0.7500  6.0200 
## 
## Coefficients :
##       Estimate Std. Error t-value Pr(>|t|)
## pcap  9.10e-05   2.62e-05    3.47  0.00054
## pc   -1.27e-05   5.04e-06   -2.51  0.01216
## 
## Total Sum of Squares:    1260
## Residual Sum of Squares: 1240
## R-Squared:      0.0164
## Adj. R-Squared: -0.0689
## F-statistic: 6.23517 on 2 and 750 DF, p-value: 0.00206
fixef(fit_fixed_2, effect = "time")
##   1970   1971   1972   1973   1974   1975   1976   1977   1978   1979 
## 3.5548 4.2823 3.7710 3.1935 3.8319 6.5160 5.4577 4.9365 3.9381 3.8231 
##   1980   1981   1982   1983   1984   1985   1986 
## 5.1814 5.6677 7.7005 7.6848 5.6401 5.4863 5.3504

phtest() implements the Hausman test. Remember that the null hypothesis is that both estimators are consistent; if we reject it, then the random effects estimator is inconsistent and we must use fixed effects.

phtest(fit_random, fit_fixed)
## 
##  Hausman Test
## 
## data:  unemp ~ pcap + pc
## chisq = 95.9, df = 2, p-value <2e-16
## alternative hypothesis: one model is inconsistent

References

Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge university press.

Moulton, Brent R. 1986. “Random group effects and the precision of regression estimates.” Journal of Econometrics 32 (3): 385–97.

Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” The Review of Economics and Statistics 72 (2): 334.

Cameron, A Colin, and Douglas L Miller. 2015. “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317–72.

Beck, Nathaniel, and Jonathan N Katz. 1995. “What to do (and Not to Do) with Time-Series Cross-Section Data.” American Political Science Review 89 (03): 634–47.

Liang, Kung-Yee, and Scott L Zeger. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73 (1): 13.

Arellano, M. 1987. “Practitioners Corner: Computing Robust Standard Errors for Within-groups Estimators.” Oxford Bulletin of Economics and Statistics 49 (4): 431–34.

Johnston, John, and John DiNardo. 1997. Econometric Methods. 4th ed. McGraw-Hill.

Hausman, J A. 1978. “Specification Tests in Econometrics.” Econometrica 46 (6): 1251.

Greene, William H. 2003. Econometric Analysis. 5th ed. Prentice Hall.


  1. Why did I leave out national characteristics, like inflation or the national unemployment rate? In a single-election study, these are constant. While they might shape variation across elections, they cannot be linked to variation within a single election.

  2. Everything we do here would go through if we allowed the number of observations to vary across groups, but the notation would get uglier. In the panel data context we say that the panels are balanced if the number of observations is the same for each group and unbalanced otherwise.

  3. When I taught this class last year, I simulated a dataset with a single covariate, \(N = 1{,}000\), and \(J = 1{,}000\), meaning the fixed effects regression entailed estimating \(1{,}001\) parameters from \(1{,}000{,}000\) observations. It took less than two minutes to run on my not-especially-powerful laptop. That said, should you ever have \(N\) another order of magnitude above that, the computational tricks given in the textbook treatments will once again be of use.