The Trouble with Instruments: Re-Examining Shock-Based IV Designs Vladimir Atanasov College of William and Mary Mason School of Business Bernard Black Northwestern University Law School and Kellogg School of Management (draft March 2015) European Corporate Governance Institute Finance Working Paper 2014/xx Northwestern University School of Law Law and Economics Research Paper Number 14-xx Available on SSRN at: http://ssrn.com/abstract=2417689 The Trouble with Instruments: Re-examining Shock-Based IV Designs Vladimir Atanasov* College of William and Mary Bernard Black** Northwestern University Abstract: Credible causal inference in accounting and finance research often comes from “natural” experiments. These natural experiments generate “shocks” which can be exploited using various research designs, including difference-in-differences (DiD), instrumental variables based on the shock (shock based IV), and regression discontinuity (RD). There is much to be said for shock-based designs. Moreover, if one must use IV, shock-based IV designs are highly likely to be preferred to non-shock IV designs. But shockbased IV remain problematic. Often, a near-equivalent DiD design is available, and is usually preferable. We illustrate the problems with shock-based IV by re-analyzing three recent, high-quality papers. None of the IVs turn out to be valid. For Desai and Dharmapala’s (REStat 2009) study of the interaction between tax shelter opportunities and corporate governance, their first stage fails when we impose a balanced sample of firms with data both before and after the shock. For Duchin, Matsusaka and Ozbas’s (DMO) (JFE 2010) study of the effect of board independence on firm performance, their first stage also fails when we balance treated and control firms on the pre-shock proportion of independent directors. For Iliev’s (JF 2010) RD/IV study of the cost of compliance with SOX § 404, we use combined DiD/RD and principal strata methods, and find cost estimates somewhat below his RD estimate, and well below his RD/IV estimate. The principal problem is that Iliev’s IV does not, for subtle reasons, satisfy the core “only through” condition (exclusion restriction) for a valid instrument. We discuss common themes that emerge from our re-analysis, including the fragility of IV compared to other shock-based designs; the need for covariate balance between treated and control firms; and the difficulty in satisfying the only-through condition. Our results suggest that even for shock-based designs, the scope for IV methods is very limited. Keywords: Instrumental variables; shock-based research design; exclusion restriction; covariate balance JEL codes: C26, G34, G38 * Mason School of Business, College of William and Mary, P.O. Box 8795, Williamsburg, VA 23187, vladimir.atanasov@mason.wm.edu, 757-221-2954. We owe special thanks to Mihir Desai and Dhammika Dharmapala; Peter Iliev; and Ran Duchin, John Matsusaka, and Oguzhan Ozbas for their willingness to share their datasets and statistical code with us, which made this project possible. We also owe strong thanks to Dhammika Dharmapala, Peter Iliev, and John Matsusaka for reviewing a draft of this paper and discussing with us our reinterpretation of their results. We also thank [*to come] and participants in finance workshops at Emory University (accounting department) Rice University (finance department), Rutgers University (finance department), [*others to come] for comments and suggestions; [to come] for research assistance; and the Searle Center on Law, Regulation and Economic Growth at Northwestern Law School for financial support. ** Corresponding author. Nicholas J. Chabraja Professor at Northwestern University, Law School and Kellogg School of Management. Tel. 312-503-2784, email: bblack@northwestern.edu. 2 The Trouble with Instruments: Re-examining Shock-Based IV Designs I. Introduction Accounting and finance scholarship is moving toward greater stress on “identification” of causal effects. That has led to increased use of “natural” experiments, which exploit “shocks” that plausibly satisfy the core “as-if random assignment to treatment” and “only through” conditions for credible causal inference.1 For a recent survey, see our related paper (Atanasov and Black, 2015; below, AB-2015), on which we build here. Shocks can be exploited in a number of ways, including difference-in-differences (DiD), event studies (ES), regression discontinuity (RD), and instrumental variable (IV) designs, as well as combined designs such as DiD/RD (DiD on a sample limited to firms in a bandwidth around an RD threshold). We focus here on IV designs where the instrument is, or is based on, an underlying shock. “Shock-based IV” designs generally rest on a much sounder basis than the non-shock IV designs that are often used in accounting and finance research. But they remain vulnerable to a number of threats to validity. We discuss those threats, some responses, and ways to improve shock-based IV designs. We illustrate the fragility of shock-based IV designs by re-analyzing three recent, high-quality papers by strong authors: Desai and Dharmapala (Review of Economics and Statistics, 2009); Duchin, Matsusaka and Ozbas (Journal of Financial Economics, 2010); and Iliev (Journal of Finance, 2010). Desai and Dharmapala (2009, below D&D) study how corporate governance mediates the effect of tax shelter opportunities on firm value. Their shock is 1996 Treasury regulations that simplified taxation for small private firms. As an unintended side effect, these rules increased tax shelter 1 The only-through condition is often called an “exclusion restriction.” E.g., Angrist and Pischke (2009), § 4.1. We use the phrase “only through condition,” to clarify what the exclusion restriction is excluding. 3 opportunities for multinational firms. D&D use this shock, interacted with measures of the firm’s need to shelter income, as instruments for “book-tax gap” (a proxy for tax sheltering). They find that greater sheltering opportunities increase firm value, but only for firms with high institutional ownership (a proxy for corporate governance). Duchin, Matsusaka and Ozbas (2010, below, DMO) study the effect of board independence on firm value and profitability. Their instrument for a change in board independence is whether a firm had to add independent directors to its audit committee to meet a 1999 New York Stock Exchange (NYSE) and NASDAQ requirement that audit committees consist entirely of independent directors (“Audit Committee Shock”). DMO find that a higher proportion of independent directors is valueneutral overall, but positive (negative) for firms with low (high) information costs. Over 2000-2005, firms in the top quartile of information cost that increase board independence by 10% (the amount predicted by their instrument) suffer a 3.0% drop in ROA relative to bottom-quartile firms; a 24% relative drop in Tobin’s q; and 31% lower cumulative share returns. Iliev (2010) studies the cost of compliance with § 404 of the Sarbanes-Oxley Act (SOX) for firms near the compliance threshold (public float of $75M), using a combined regression discontinuity (RD) and IV design. His RD design exploits the discontinuity at $75M in float between firms which do (don’t) need to comply with SOX § 404. Iliev finds that some firms manipulate their float to stay below the $75M threshold, and uses IV to address this manipulation. His clever IV is whether a firm had float > $75M in 2002. This instrument relies on an SEC rule, adopted in 2003, which required compliance by firms with float > $75M in 2002 (before the $75M rule was known). Iliev estimates a mean increase in ln(audit fees) of 0.744 (110% increase) with RD alone, and 0.983 (167% increase) with combined RD/IV. 4 We count ourselves as enthusiastic participants in the move toward stronger research designs, often shock-based. We look for shocks and exploit them in our own work when we can.2 We began this project expecting to illustrate how the strategies for shock-based causal inference discussed in AB-2015 could be used to improve already strong papers, and perhaps lead to different insights. For example, DiD is similar to shock-based IV. It provides an “intent-to-treat” estimate of the effect of a shock on all firms to which the shock applies. Shock-based IV, using the same shock, provides an estimate for “compliers,” – firms whose behavior is changed by the shock. What would we learn by applying DiD to a shock-based IV paper? What would change if we attended closely to “covariate balance” – the need for treated and control firms to be similar on pre-treatment covariates. If we used “balancing methods,” adapted from pure observational studies, to improve covariate balance and ensure “common support” (reasonable overlap between treated and control firms on all covariates)? If we applied principal strata thinking (Frangakis and Rubin, 1999, 2002), which generalizes the “causal IV” concepts of always takers, never takers, compliers and defiers (Angrist, Imbens, and Rubin, 1996)? If we used a combined DiD/RD design where feasible? To choose these three papers, we began with the eight shock-based IV papers in the AB-2015 sample.3 We put aside Bennedsen et al. (2007), who rely on a truly random shock, and picked what we saw as the next three strongest IV papers (as many as one paper could reasonably re-analyze with care). An Appendix discusses our concerns with IV validity for the remaining papers. In brief, we chose papers we liked, from authors we know and respect, that illustrated different uses of shock- 2 See, for example, Atanasov et al. (2010) (exploiting a corporate governance shock in Bulgaria); Black, Jang and Kim (2006), Black and Kim (2012) (exploiting a governance shock in Korea). 3 In AB-2015, we surveyed the research designs used in 863 empirical corporate governance papers, published from 2001-2011 in 22 major journals. Of these, 285 use IV (either directly or an IV-based Heckman selection model), but only 8 papers use a shock-based IV (not counting Black, Jang and Kim, 2006, who use a “fuzzy RD” design). 5 based IV designs. All three papers begin with plausible, clearly exogenous instruments. All are careful in many ways. They address important issues, and are deservedly published in top journals. The authors were generous enough to share their datasets and code (and went to considerable trouble to do so). We thought our re-analysis would show some differences in inference, but did not know, and had no priors, whether those differences would be large or small. Some projects, however, turn out differently than you expect. For D&D, we apply both DiD and IV to a “balanced before-after sample” of firms which appear in their dataset in both 1996 (just before the shock) and 1997 (just after it). We find no evidence that the 1996 rule change affects booktax gap for this balanced sample. Instead, their IVs, already “weak,” become insignificant predictors of book-tax gap. The first stage of their two-stage least squares (2SLS) analysis fails. We also discuss why their instruments are not exogenous (despite an exogenous shock). Moreover, even with the balanced before-after sample, there is substantial covariate imbalance between treated and control firms, and evidence of non-parallel trends between treated and control firms. One would need to address these issues if the results had otherwise survived. For DMO, we apply a combination of DiD and balancing methods. The treated firms in their sample (which had to change their audit committees) had, on average, far fewer independent directors than control firms (which already had 100% independent audit committees). If we compare treated firms to control firms with similar initial proportions of independent directors, the Audit Committee Shock no longer predicts a meaningful change in the proportion of independent directors. In effect, the first stage of DMO’s 2SLS analysis fails. Their IVs also likely violate the only through condition. And their main 2SLS regression specification violates the standard advice to never include an interaction variable without including the non-interacted components. IV technology makes it easy to miss that flaw, which becomes apparent when we use DiD instead. 6 For Iliev, his core RD design is sound, though we would prefer a combined DiD/RD design. In re-assessing his combined RD/IV design; we apply a principal strata approach in which we divide firms into strata based on growth over 2002-2004. We estimate that SOX § 404 compliance increases ln(audit fees) for firms near the $75M threshold by around 0.60 (an 80% increase in fees), with smaller estimates for narrow bandwidths around the threshold. In contrast, Iliev’s RD-only estimate is a 110% increase and his preferred RD/IV estimate is a 167% increase in fees. Iliev’s higher RD estimate, and much higher RD/IV estimates flow from a subtle violation of the covariate balance that a valid RD design should achieve, and an even subtler but much larger violation of covariate balance between the “compliers” with his IV and control firms, and thus a violation of the “only through” condition for a valid instrument (the instrument must predict the outcome only through the instrumented variable). We show that the concern which led Iliev to use IV – firms that manipulated their float to avoid complying with SOX § 404 might have higher SOX compliance costs than the firms that let their float grow—is not a significant issue for his sample. Three strong papers, yet in all three, the instruments fail, for different reasons! And yet Bennedsen et al. (2007) aside, these papers appeared to be the stronger shock-based IV papers that we found in AB-2015. We have no reason to expect the IVs in the other four shock-based IV papers to survive similar scrutiny. We discuss these papers briefly in the Appendix. Indeed, even before any re-examination, only one of these four papers reports statistically significant results (at the 5% level) for their shock-based IVs. Nor is it likely that any of the IVs in the 276 non-shock IV papers are valid.4 Clearly, finding a valid instrument is a tricky business. 4 AB-2015 explain why none of these non-shock IVs in their study are likely to be valid. Larcker and Rusticus (2010) similarly conclude that none of the non-shock IVs in their review of accounting papers are likely to be valid. 7 Several common themes emerge from our analysis. These include: (i) the crucial role of covariate balance for shock-based designs; (ii) the importance of using extensive covariates (in part to check for covariate balance); (iii) the need for common support; (iv) the frequent need to use balancing methods, including sample trimming, to ensure covariate balance (including common support); (v) the value of exploiting a shock using more than one research design, using combined research designs where feasible, and assessing whether the same shock leads to similar results across designs; and (vi) use of principal strata analysis to clarify for what subsample one can estimate a “local average treatment effect (LATE).” Many of these steps can be completed in a “design phase” of the analysis, with outcomes hidden, to ensure that design decisions are not affected by knowledge of which approach will produce stronger results (Rubin, 2008). We skip the usual literature review, because the use of shock-based IVs in accounting and finance research is relatively new, and there is little to review. We build on the AB-2015 survey of shock-based research designs in corporate finance. Larcker and Rusticus (2010) and Roberts and Whited (2013) discuss the difficulties in finding valid IVs in accounting and finance research, but do not address shock-based instruments. Karpoff and Wittry (2015) and Catan and Kahan (2015) reassess and criticize DiD studies of state adoptions of antitakeover laws. We are not aware of use of principal strata methods in accounting or finance. Do our results imply that finance and accounting researchers should abandon efforts to find even shock-based IVs? Not quite. Consider Bennedsen et al. (2007), who use biological chance, which determines the gender of first-born children of CEOs of family-run firms, as an instrument for within-family CEO succession. In the AB-2015 survey, this was our favorite among the 77 74 shockbased papers. (Iliev (2010) was our second-favorite, and remains so, despite our re-analysis here.) We wrote that Bennedsen et al. “have, in effect, a randomized experiment, with an encouragement 8 design. This is a beautiful paper.” In the AB-2015 sample, Black, Jang and Kim (2006), use a fuzzy RD design -- they exploit a legal shock to the board structure of large Korean firms (assets > 2 trillion Korean won), with no similar change for smaller firms. Yet “FUZZY RD is IV,” under a different name (Angrist and Pischke, 2009, § 6.2). In AB-2015, we classified this paper as RD, rather than IV (or both). Viewed as a shock-based IV design, Black, Jang, and Kim have a plausible instrument. Other fuzzy RD designs could be reasonably valid as well. There will also be times when applying both DiD in a primary analysis, and IV based on the same shock in a secondary analysis, provides insight into how change in a shocked variable affects the outcome. Still, fuzzy RD aside, valid shock-based IVs will be rare, and situations where one should use shock-based IV alone (not just as part of a DiD-primary study) will be rarer still. Do our results imply, more broadly, that the current stress on shock-based designs in finance and accounting research is misguided? Not at all. We believe in exploiting shocks, when they can be found. DiD and RD designs will often be more robust than shock-based IV. And an imperfect shock-based paper will often still be more convincing than the non-shock alternatives. We share neither the view of some researchers, which can be caricatured as “endogeneity is everywhere, one can never solve it, so let’s stop worrying about it”; nor that of the “endogeneity police”, who believe that “if causal inference isn’t (nearly) perfect, a research design is (nearly) worthless,” 2. Background on Shock-Based Research Designs and Shock-Based IV We offer here a condensed review of shock-based research designs. We assume readers are generally familiar with the reverse causation and omitted variable bias risks that plague much corporate finance and accounting research, and how shock-based designs can respond to those risks. We focus on firm-level analyses; assume a binary shock (w = 1 if firm i is “treated” (subject to the 9 shock), and 0 otherwise); discuss the qualitative aspects of shock-based design; and refer readers to AB-2015 for details, regression mechanics, and citations to the causal inference literature. 2.1. Shock-based Designs in General Firm-level causal analyses typically seek to estimate the causal effect τi (for firm i) of treatment on an outcome yi, where τi is the value of yi if firm i is treated, minus the value of yi if firm i is not treated: τi = yi(wi = 1) – yi(wi = 0), or, more compactly: τi = yi1 – yi0 The “fundamental problem of causal inference” (Holland, 1986) is that we observe only one of the two potential outcomes, yi1 and yi0. The usual response is to impute the missing potential outcome for the treated firms from the control firms. The central challenge to imputation is “selection bias”: the treated and control firms may differ in one or more ways, perhaps unobserved, which will bias the estimated treatment effects. A randomized experiment addresses selection bias by ensuring that treated and control firms have similar expected values for both observed and unobserved covariates. But randomized experiments are rarely available in corporate finance and accounting research. Shock-based designs are a second-best alternative. A shock-based design “works” only if, and to the extent, it creates conditions that come close to those one would achieve from a true randomized experiment. Different shock-based designs – including DiD, RD, and IV -- appear to rely on different assumptions. However, all rely on a “good shock” – one which permits credible causal inference. AB-2015 state five conditions for a good shock. To summarize: (1) Shock Strength: The shock should be strong enough to significantly change firm behavior or incentives. (2) Exogenous Shock. The shock came from “outside” the system one is studying. Treated firms did not choose whether to be treated, could not change their behavior to anticipate the shock, the shock is expected to be permanent, and there is no reason to believe that which 10 firms were treated depends on unobserved firm characteristics. (3) “As If Random” Assignment: The shock must separate firms into treated and controls in a manner which is close to random. One often needs to allow an exception for the forcing variable which determines which firms are affected by the shock and, in some studies, a variable which is changed by the shock. (4) Covariate balance. The forcing and forced variables aside, the shock should produce reasonable covariate balance between treated and control firms, including “common support” (reasonable overlap between treated and control firms on all covariates). Somewhat imperfect balance can be address with balancing methods, but severe imbalance undermines shock credibility. (5) Only-Through Condition(s): The apparent effect of the shock on the outcome must come only through the shock. There must be no other shock, at around the same time, that could affect treated firms differently than control firms. If – as in an IV analysis –-- one expects the shock to affect outcomes through a particular instrumented variable, the shock must affect the outcome only through that variable. In IV analysis, this is often called an “exclusion restriction”; we prefer the term “only-through condition.” 2.2. Shock-Based IV and Alternatives Conditions (1), (2) and (5) are well-known for “causal IV” (e.g., Angrist and Pischke, 2009, § 4.1). The exogeneity [as phrased in condition (2)] and only-through conditions are implicit in the formal “exogeneity” requirement, stated in econometrics texts, that Cov(z, ε) = 0, where ε is the unobservable true error from regressing the outcome y on the instrument z and other “exogenous” covariates x. But standard discussions of causal IV do not discuss the need for as-if random assignment or its corollary, covariate balance. Indeed, severe imbalance on core covariates will be central to our re-examination of DMO and Iliev. Given a shock, one can often either run DiD based on the shock, or use the shock as an IV. The DiD design lets the researcher remain agnostic about the channels through which the shock affects the outcome. In contrast, the IV design forces the research to assume that the shock affects the outcome only through the instrumented variable. That assumption is not testable, and is often suspect. 11 Let us borrow from the language of randomized experiments with partial compliance. If the instrumented variable is binary (or can be made so by binning), one can see DiD as an “intent-totreat” design, in which one estimates the average effect of the shock on all firms which were subject to the shock (assigned to treatment). In contrast, the IV design provides a “local average treatment effect” (LATE), for compliers -- those firms which complied with the assignment to treatment, by changing their instrumented variable, in the predicted direction, at the cost of making an additional only-through assumption. One goal of this paper is to highlight the similarity between DiD and shockbased IV. In our view, most uses of shock based IV will benefit if the researchers also report intentto-treat DiD results. Below, we illustrate the similarities between the two designs in our reexamination of D&D and DMO. Following AB-2015, let q be the outcome, z be the instrumental variable, and gov (for governance) be the instrumented variable. In 2SLS, the instrument z substitutes for the instrumented variable; and we make the only-through assumption that the power of the instrument to predict the outcome reflects the true power of the instrumented variable, here gov. This assumption is reflected in the 2SLS estimate of the coefficient on gov. Without covariates, this estimate is: 𝐶𝑜𝑣(𝑧,𝑞) 𝛽̂2𝑆𝐿𝑆 = 𝐶𝑜𝑣(𝑧,𝑔𝑜𝑣) (1) The 2SLS coefficient b2SLS can also be expressed in terms of the intent-to-treat DiD coefficient δDiD: ̂ 𝛿 effect of shock on q 𝛽̂2SLS = 𝛽̂DiD = effect of shock on gov 1S (2) Here 𝛽̂1𝑆 is the coefficient on z from the first-stage regression of gov on z. Eqn. (2) is known as a Wald estimate. If we add covariates, the DiD and 2SLS estimators will diverge slightly, but should be quite similar. Statistical strength should be similar as well. If the first-stage is strong, so that the estimate of 𝛽̂1𝑆 is precise, the t-statistics for 𝛽̂2𝑆𝐿𝑆 and 𝛿̂𝐷𝑖𝐷 will be similar. If the first stage is not 12 strong, this will be reflected in a lower t-statistic 𝛽̂2𝑆𝐿𝑆 than for 𝛿̂𝐷𝑖𝐷 , which reflects the combined uncertainty in estimating both 𝛿̂𝐷𝑖𝐷 and 𝛽̂1𝑆 . If the first stage is not strong, the IV coefficient is prone to a “blowup” problem: the 2SLS coefficient will often be much larger than the non-IV coefficient one would obtain by regressing the outcome on the instrumented variable. Unless the IV is “perfect” – fully satisfies the only through condition –-- the large 2SLS coefficient can be spurious, and will mostly reflect the direct effect of z on q, rather than the effect of gov. In our experience, a high (2SLS coefficient/non-IV coefficient) ratio is a strong warning sign for likely violation of the only-through condition. Shock-based IV with multiple instruments –where the shock is interacted with other firm attributes, as in the D&D study –-- is also subject to the classical “weak instruments” problem: Even if the only-through condition is completely satisfied, standard errors can be downward-biased if one uses multiple instruments which do not strongly predict the instrumented variable in the first-stage regression. A common rule of thumb is that one wants a first-stage F-statistic > 10, for the instruments taken together of at least 10 to have reasonable comfort that this bias will be small (Stock, Wright, and Yogo, 2002). If a shock does not directly produce covariate balance between treated and control firms, balance can often be improved through balancing methods developed for pure observational studies, including trimming to common support, matching, and inverse propensity weighting. We use some of these methods below, but it is beyond our scope to discuss the many available methods and how to choose among them.5 5 See generally Imbens and Rubin (2014). For trimming to common support, see Crump et al. (2009); for matching, see Rosenbaum (2009); for inverse propensity weighting, see Busso, DiNardo and McCrary (2014). 13 2.3. Some Regression Details Unless otherwise specified, all cross-sectional regressions in this paper use robust standard errors; all panel regressions use standard errors clustered on firm. D&D, DMO, and Iliev vary in whether they report standard errors or t-statistics; we report t-statistics throughout. 3. Re-Examination of Desai and Dharmapala (2009) The first shock-based IV paper we examine is Desai and Dharmapala (2009). The authors examine the effect of tax avoidance on the value of U.S. firms, and whether corporate governance mediates this effect. They find evidence that greater tax sheltering opportunities increase Tobin’s q, but only for firms with high institutional ownership (whom, they posit, are better governed). 3.1. Research Design D&D rely for causal inference on a legal rule adopted in late 1996 (“check-the box” rules for pass-through taxation of non-public U.S. firms) as a legal shock. These rules were intended to simplify reporting for small, private companies. As an unintended byproduct, they also reduced tax sheltering costs for large public firms that had, or could create, offshore subsidiaries. They hypothesize that: (i) the difference between “book” income reported to shareholders, and taxable income reported on income tax returns (book-tax gap, below “BTG”) is a good (or at least best available) measure of a firm’s tax sheltering activities; (ii) firms with low BTG before the 1996 law change would gain more sheltering opportunities than firms with higher BTG; and (iii) at firms with high institutional ownership, greater sheltering will increase firm value (proxied by tax-adjusted Tobin’s q). In contrast, at worse-governed firms, insiders will appropriate the additional value, so Tobin’s q will not rise. BTG, however, is endogenous – it can be affected by many firm attributes, some unobserved. DD address that issue by constructing shock-based instruments for BTG: they interact the 1996 shock 14 with three firm attributes that predict the firm’s need to shelter its income, net operating loss carryforwards (NOLs), short-term debt, and long-term debt, each scaled by total assets. These variables are endogenous too. The D&D idea is that interacting them with an exogenous legal shock will makes the instruments effectively exogenous as well. They address the only through condition for instrument validity by including NOLs, short-term debt, and long-term debt as separate covariates in their IV analysis. It is then plausible that the interacted variables will affect Tobin’s q only indirectly through BTG, and not directly, nor indirectly through an omitted variable. This is wonderfully clever. The research design can fail in many ways. BTG might be a poor proxy for tax sheltering. High BTG firms (which already engage in tax sheltering) might be better at exploiting the new check-the-box opportunities than low-BTG firms. Institutional ownership might be a poor proxy for firm governance. The IV strategy treats firm financial characteristics, including the ones they use as instruments, as exogenous when we know they are not. And the IVs might not be strong enough. But the authors begin with an exogenous shock and carefully defend exogeneity. Assuming they find results, and they do, the research design is reasonably convincing. Or so we thought when we first looked at their study. The editor and reviewers at REStat, a major journal, were also convinced. D&D report results with firm and year fixed effects, and use the following covariates: total accruals/ assets, ratio of option to total compensation, sales, implied volatility of share price, NOLs/assets, short-term debt/assets, long-term debt/assets, |foreign income or loss|/assets, and R&D expenditures/assets. Their principal dependent variable is tax-adjusted Tobin’s q. We use the same variables; see D&D for details and summary statistics. Both they and we use standard errors with 15 firm clusters.6 3.2. Creating a “Pre-Post Balanced” Dataset The D&D dataset includes 862 firms observed at least once over 1993-2001. Of these, 100 are observed only once, so effectively drop out with firm FE, leaving an effective sample of 762 firms. A general theme of this paper is the importance of sample selection, including ensuring covariate balance. We therefore begin by assessing which firms belong in the sample. Consistent with good practice in a natural experiment, we do so in a “design” stage of our analysis, with outcomes hidden (Rubin, 2008; Rosenbaum, 2009). DD’s central empirical method is shock-based IV, using instruments that interact a post-1996 dummy with NOLs, short-term debt, and long-term debt (below, “BTG predictors”). As they recognize, this design relies on the shock for exogeneity.7 There is no basis for causal inference for firms that are in the sample only “pre-1996” (including 1996) or only post-1996. Yet, of the 762 firms in the effective D&D sample, only 510 appear both before and after the 1996 shock. Of the other 252 firms, 80 appear only pre-1996. Including these firms can affect the coefficients on covariates, but will not directly affect IV estimates, because the shock*covariate IV’s are zero in 1996 and earlier years. The 172 firms that appear only post-1996 are more troublesome. For these firms, the IVs are identical to the covariates that DD use to predict BTG and are equally endogenous. Thus, one should limit the sample to, at most, the 510 firms that appear both before and after the shock. We next consider the remaining 510 firms. Of these, 487 have data in both 1996 and 1997; 8 6 D&D use individual firm dummies. Each absorbs a degree of freedom. One can avoid the loss of degrees of freedom and obtain slightly smaller standard errors using the xtivreg2.ado module for Stata. For consistency with their results, we also use individual firm dummies. 7 See D&D at 542 (“In order to address [endogeneity] concerns, an exogenous source of variation in firms’ opportunities for tax avoidance is required.”) 16 have data at least once over 1993-1995, but not in 1996 (the last pre-shock year); and 15 firms have data at least once year over 1998-2001, but not in 1997 (the first post-shock year). There may be something odd about the 23 firms which pop into and out of the sample. We confirm in unreported regressions that they are very different from the other 487 firms on several covariates. 8 Moreover, one goal of our paper is to compare IV and DiD results using the same shock. For DiD analysis, we require data in 1996 because we define treated and control groups based on 1996 covariates. Given the small loss in sample size, the differences between “popin” and other firms, and the desire to use a similar sample for DiD and IV analyses, we use a “pre-post balanced sample” of 487 firms with data in both 1996 and 1997 in our principal analyses. Below, we first replicate the D&D results with their sample and then switch to the pre-post balanced sample. 3.3. Firm FE Results (D&D, Table 3) In Table DD-1 we replicate the pooled OLS results in D&D, Table 3. The dependent variable is tax-adjusted Tobin’s q, as defined by D&D. Following D&D, we estimate two models: Model 1 uses BTG as the principal independent variable of interest; Model 2 uses three main independent variables -- BTG, institutional ownership, and – of core interest for D&D -- an interaction between BTG and institutional ownership. We estimate each model separately with the original sample and the pre-post balanced sample. Consider first Model (1). The results for the original and pre-post balanced samples are 8 For each covariate, we regress the covariate on a “popin” dummy (=1 for these 23 firms; 0 otherwise), year dummies (omitting 1996), and constant term. The constant term then gives the mean for non-popin firms in 1996. The coefficients on popin dummy are often large relative to the constant term. For NOLs, the coefficient on the constant term is near zero at 0.011; the coefficient on popin dummy is 20 times larger, at 0.231, although statistically insignificant (t = 1.037). Thus the popin firms are much more likely to have NOLs than other firms. For sales, the constant term coefficient equals 3.409 vs. the popin dummy coefficient of -1.647 (t-stat = -1.965), suggesting that popin fims have average sales that almost 50% less than the other firms. For R&D, the coefficient on the constant term is 0.043; the coefficient on pop-in dummy is similar in magnitude at 0.039 and marginally significant (t = 1.93). 17 similar. In both, the coefficient on BTG is positive, at around 0.6, but not statistically significant. The differences between the two samples are more pronounced when estimating Model (2). With the original sample, the coefficients on the interaction between BTG and institutional ownership are positive and statistically significant. With the pre-post balanced sample, the coefficient is somewhat smaller and only marginally significant. But there are large differences between the original and prepost balanced samples in the coefficients on BTG and institutional ownership, which suggest that the additional firms in the full sample differ from those in the balanced sample. 3.4. DiD and DiDiD Results Given our interest in using multiple research designs based on the same shock, we next consider a DiD and then a DiDiD framework, and report results in Table DD-2. DD hypothesize that: (i) the 1996 check-the-box rules expand tax sheltering opportunities more for firms which had done less sheltering (measured by BTG) than for higher-BTG firms; and (ii) additional sheltering adds value only for firms with high institutional ownership. In our DiD model, we define treated firms as firms with below-median BTG in 1996 (lowBTG96 dummy = 1) and control firms as firms with above-median BTG in 1996 (lowBTG96 dummy = 0. The median BTG for the pre-post balanced sample in 1996 is 0.001. We report two specifications – one with lowBTG96 dummy as the only independent variable, and the other including time-varying controls. In both specifications, the coefficients on lowBTG96 dummy are close to zero. There is no evidence that firms with low BTG in 1996 achieve higher Tobin’s q relative to high BTG firms, following the 1996 rule change. In our DiDiD model, we define the third difference as firms with above-median institutional ownership in 1996 (highInstOwn96 = 1) versus below-median institutional ownership (highInstOwn96 = 0). The median institutional ownership in 1996 is 0.58. We add highInstOwn96 18 dummy and its interaction with lowBTG96 dummy to our DiD models. D&D predict a positive coefficient on the interaction term. We find, however, that the coefficients on the interaction term are small, with t-statistics well below one, and with mixed sign -- positive without controls, but negative with controls. To assess whether our non-results reflect the binary nature of our low BTG and high institutional ownership variables, we consider continuous versions of these variables in unreported results. In a “DiD-continuous” model, the coefficient on BTG is negative (consistent with Table DD2) but not statistically significant. We then estimate three “DiDiD-continuous” models: (i) continuous BTG and binary institutional ownership; (2) continuous institutional ownership and binary BTG; and (3) continuous BTG and continuous institutional ownership. The coefficients on the triple interaction term are insignificant in all specifications, and are positive with binary BTG but negative with continuous BTG. In sum, our DiD/DiDiD analysis provide no evidence that firms with low tax shields prior to the tax law change realized higher Tobin’s q after the change, nor evidence for the D&D hypothesis that only firms with both low tax shields and high institutional ownership gained from the law change. This is remarkable. The core results from a careful paper by major scholars, published in a top journal known for attention to empirical methods, vanish when we use the pre-post balanced sample and switch from IV to a DiDiD design. They will not reappear below, when we apply their IV analysis to the pre-post balanced sample. We conduct that replication next, and then explore why their results went away. 3.5. Shock Strength (D&D, Table 4) We next replicate the first-stage IV results in D&D, Table 4, first with the DD original sample and then with our pre-post balanced sample. We present results in Table DD-3. With the original 19 D&D specification, the instruments are statistically significant but “weak.” They have F-statistics of 3.33 (model (1)) or 3.00 (model (2)), well below the F > 10 rule of thumb for avoiding the classical weak instruments problem. With the pre-post balanced sample, the F-statistics drop to only 1.48(model (1)) or 1.05 (model (2)), and are statistically insignificant. The instruments no longer predict the instrumented variables strongly enough to be usable. The greater (though still modest) strength of the instruments with the original sample turns out to be driven by post-1996-only firms, for which the instruments are clearly endogenous. To further explore instrument strength, we recast the first-stage IV regression in a DiD/DiDiD framework and present results in Table DD-4. We use a similar strategy as in Table DD-2 to define treated and control firms, but instead of using BTG, we construct an index based on the three instruments (NOL, short-term debt, and long-term debt) as of 1996. We rank each firm in 1996 on each instrument, and sum the ranks. We use the sum of ranks to define a dummy variable, lowTaxShield96 = 1 for firms with above-median sum of ranks; 0 otherwise. We treat the abovemedian firms as treated and the below median firms as. For the DiDiD model, we again define the third difference as above vs. below median institutional ownership in 1996. None of the DiD/DiDiD coefficients in Table DD-4 are statistically significant. Thus, there is no evidence that the tax rule shock led firms with below-median need for tax shields to use take advantage of the check-the-box rules and increase their BTG, relative to firms with above median need. The analyses in Tables DD-3 and DD-4 are consistent. The opportunity for sheltering created by the tax rule may well lead some firms to engage in more sheltering, but it does not do so differentially for firms with high versus low apparent need for sheltering. This weak shock cannot 20 identify a causal connection between sheltering need (proxied by BTG) and firm value, whether or not mediated by institutional ownership. 3.6. 2SLS Estimates (D&D, Table 5) One might stop here, but for completeness, we replicate the second-stage IV estimates from D&D Table 5, models (1)-(3). We report results with the original and pre-post balanced samples in Table DD-5. The coefficients on instrumented BTG*institutional ownership decrease by more than 50% when we switch from the original to the pre-post balanced sample and become insignificant. We again see that including firms observed only post-1996 was central to the original results. 3.7. Intent-to-Treat DiD and DiDiD Estimates We turn next to additional steps one would want to take, for a shock-based IV study which had robust 2SLS results. In the spirit of this paper, in which we seek to use both DiD and IV designs to study the same shock, we present Intent-to-Treat DiD and DiDiD models in Table DD-6. In the DiD analysis in regressions (1) and (2), we regress tax adjusted Tobin’s q on the interaction between lowTaxShield96 and a post-shock dummy (plus constant term and firm and year FE). Regression (1) omits covariates, regression (2) adds them. We call these “intent-to-treat” models (using language borrowed from IV methods) because greater need in 1996 to shelter income can be seen as encouraging firms to take advantage of the check-the-box rules (as DD hypothesize), but does not require them to do so. The coefficient on the interaction term is positive and marginally significant in regression (1) but falls in magnitude and becomes insignificant when we add covariates. For the DiDiD analysis, we proceed similarly to Table DD-2. We add as additional variables highInstOwn96 * post and the triple interaction lowTaxShield96 * highInstOwn96 * post. The coefficients on the triple interaction are positive in both specifications but not close to being 21 statistically significant. This is consistent with the encouragement being too weak to induce much of a differential response by firms with both higher sheltering need and high institutional ownership. 3.8. Covariate Balance In randomized trials and pure observational studies, it is customary to assess covariate balance between the treated and control firms. Reporting covariate balance is not the norm in DiD and shockbased IV studies, but in our judgment, should be. In Table DD-7, we present limited results for the pre-post balanced sample. We use three definitions of treated and control firms. In Panel A, we divide firms into treated and control based on BTG in 1996. Treated firms have below-median BTG (lowBTG96 = 1); control firms have above-median. In Panel B, we divide the same based on institutional ownership in 1996. Treated firms have above-median institutional ownership (highInstOwn96 = 1). In Panel C, we divide firms based on our sum-of-ranks based measure of overall need to shelter taxable income. Treated firms are those with greater sheltering needs in 1996 (lowTaxShield96 = 1). In each panel we report means for the treated and control groups for 1996 for the covariates used in prior tables. We also report absolute values for a two-sample t-test for difference in means and a measure of “normalized differences,” suggested by Imbens and Rubin (2015), which is independent of sample size. Unlike t- values, the normalized difference does not increase with sample size. A fuller check for covariate balance might include assessing whether the treated and control groups show similar dispersion around the mean, visual inspection of kernel density plots for each covariate, and running a Kolmogorov-Smirnov test for similarity of the full distributions. In Panels A and B, treated and control firms are relatively balanced on most covariates. However, low BTG firms have significantly more share price volatility, and firms with high institutional ownership grant more options, as a proportion of total compensation. Balance is 22 substantially worse in Panel C, when we classify firms based on overall need to shelter income, proxied by lowTaxShield96. Treated firms have much lower Tobin’s q, higher sales, and lower share price volatility, and do less R&D. This imbalance would raise concerns for any shock-based design. If the D&D results had survived to this stage, one would want to address this imbalance using balancing methods. We illustrate that process below, in our re-examination of DMO. 3.9. Non-Parallel Trends In our judgment, in any DiD and shock-based IV studies, one should check for parallel trends on the outcome variable between treated and control firms during the pre-treatment period. Checks for parallel trends are sometimes conducted in DiD studies, but rarely in IV studies (DMO is an exception). Non-parallel trends are a major worry sign for both designs. In Figure DD-1, we perform such a check. We show mean Tobin’s q by year for “treated” firms with below-median tax shields in 1996 (lowTaxShield96 = 1), hence greater need to shelter income, versus “control” firms with above-median tax shields (lowTaxShield96 = 0). The high-tax shield controls have much higher Tobin’s q in all years. This is consistent with the large difference in means we found in Table DD-7, Panel C, and provides further evidence that they are not a suitable control group for the treated firms. Moreover, there is clear evidence of non-parallel trends. Mean Tobin’s q rises for the control firms, relative to treated firms, in 1996, just prior to treatment. Mean Tobin’s q for the controls rises again in 1999, relative to treated firms, well after the shock, and then falls in 2000 and 2001. The changes over 1999-2001 suggest differing reaction to the “tech bubble” of the late 1990s, which “popped” in 2000 and 2001. Given these non-parallel changes at times unrelated to the shock, even if an effect had been found for (say) 1997 versus 1996, one could not ascribe it with confidence to the shock. 23 Non-parallel pre-treatment trends (or, as in D&D, non-parallel post-treatment trends at a time not consistent with response to the shock), severely undermine the credibility of one’s results. One sometimes sees researchers addressing non-parallel trends by adding linear time trends to a regression specification. In the D&D study, one would add an interaction fi*t (where fi are firm dummies and t is year) to a panel data specification with firm and year fixed effects. But adding linear trends will rescue the design only if (i) any non-parallel effects in the pre-treatment period were linear; and (ii) the linear trend would have continued in the post-treatment period, but for the shock. Yet a trend without known cause might also stop, or even reverse, in the post-shock period. Non-parallel pretreatment trends can sometimes be addressed through careful balancing of the treated and control groups, or adding covariates that absorb the non-parallelism. Short of that, no robust shock-based specification is available. 3.10. Principal Strata Approach (Optional) Our analysis suggests a possible way to strengthen the D&D design. In the intent-to-treat analysis in Table DD-6, the coefficients have the predicted signs and nontrivial magnitudes. They are just not statistically significant. Perhaps the authors’ hypothesis is right, but their instruments are not strong enough. If one could find stronger proxies for firms’ need to shelter their income, the results might strengthen as well. Perhaps too, some firms respond to tax sheltering opportunities, while others are not. If one could isolate and study a subsample of responsive firms, the results might strengthen. This idea can be seen as an adaptation of principal strata analysis, an approach that we develop below in our reexamination of Iliev. Assume that there are latent strata of responsive and nonresponsive firms, analogous to compliers and noncompliers in a standard causal IV analysis, or to shrinkers and modest growers in our discussion below of Iliev. The tax reform shock will be a weak shock for the non24 responsive firms, but should be stronger for the responsive firms, perhaps enough so to rescue the design, and let D&D investigate their interesting hypothesis. This approach is related to methods for strengthening an instrument which affects some units than on others, by excluding from the sample units for which the instrument is weak (Baiocci et al., 2010; Small and Rosenbaum, 2008; Keele and Morgan, 2013). However, separating firms into responsive and non-responsive strata would require tax expertise, which we lack. 4. Re-Examination of Duchin, Matsusaka, and Ozbas (2010) 4.1. Overview DMO examine two research questions: (1) does a change in the proportion of independent directors on company boards causally affect three outcomes (Tobin’s q, ROA, and share returns), and (2) how does any effect of independent directors on these outcomes depend on the firm’s informational environment. DMO hypothesize that adding independent directors can improve performance in firms with low information acquisition costs, yet be counterproductive for firms with high information acquisition costs. This is an interesting and plausible hypothesis, and the authors develop a creative way to test it. DMO recognize that firms endogenously choose board composition. They rely for causal inference on a legal shock to audit committee independence, which could lead some firms to add independent directors to their boards, to staff the audit committee. The shock is a 1999 change in NYSE and /Nasdaq listing rules that forced listed firms to have 100% independent directors on the audit committee. Previously, a majority had been permitted. These listing requirements were later included in SOX. DMO argue that this rule change will affect only firms that lack a fully independent audit committee in 2000 (the rule was adopted in late 1999 and became effective after the spring 25 shareholder meeting season, when most firms elected their 2000 boards). Firms which already had 100% independent audit committees serve as controls. This can be seen as an “encouragement” design, where the law change encourages firms with less-than-100%-independent audit committees to add independent directors to their boards. One could exploit the shock using DiD, but DMO choose instead to use IV. Their core independent variable is the change in percentage of independent directors from 2000 to 2005 (δIndep). Their instrument for δIndep is the shock -- a dummy variable that equals 1 for firms without a fully independent audit committee in 2000 (“non-comply dummy”). They use 2SLS together with a firstdifferences specification, using 2005-minus-2000 differences in the outcomes and their covariates. They use board size, leverage, firm age, ln(market capitalization), and industry fixed effects (for the 48 Fama-French industries) as covariates. Instrumented δIndep is DMO’s first variable of interest. They find, consistent with prior literature, that δIndep alone is not a significant predictor of their outcome variables. Their main contribution, besides testing the effect of board independence in a causal setting, is to hypothesize (plausibly) that the effect of adding independent directors depends on the firm’s information environment: if a DMO-constructed information cost index (“Info Cost”), which combines measures of the number of analysts, analyst forecast dispersion, and analyst forecast error), is low (high), adding independent directors will add (subtract) value. DMO find strong negative coefficients on the interaction between δIndep and *Info Cost for all three outcome variables. 4.2. One Instrument or Two? DMO instrument for δIndep in a first-stage regression of δIndep on the instrument (noncomply dummy) and covariates. In their core, regressions, in which they interact δIndep with Info Cost, they use what one might call a “quasi-instrument” for δIndep*Info Cost, defined as (predicted 26 value of δIndep from first-stage regression) * Info Cost. This is technically incorrect, and is sometimes called a “forbidden regression” (Wooldridge, 2010, § 9.5.2; Angrist and Pischke, § 4.6.1). The resulting estimator is inconsistent; the standard errors are also incorrect. One should instead use two instruments. In our re-analysis, we use two instruments for the two endogenous variables δIndep and δIndep * Info Cost: the instruments are non-comply dummy and non-comply dummy * Info Cost. 4.3. Covariate Balance A strength of the DMO paper is that they report on covariate balance. They compare the 2000 characteristics of treated and control firms, in their Table 2. We present a similar covariate balance analysis in Table DMO-1.9 We add several other variables that are in the DMO dataset, but were not used in their analysis. We report both normalized differences (Imbens and Rubin, 2015) and tstatistics reported by DMO. Overall, treated and control firms are fairly similar on the outcomes, on the covariates that DMO used, and on the additional potential covariates listed at the bottom of Table DMO-1. There is a moderate imbalance in the overall information cost index (Info Cost) and one of its components – dispersion of analyst forecasts. There is huge imbalance in the percentage of independent directors on the audit committee. This is by construction of their treated and control groups. More troubling, there is a large imbalance in the percentage of independent directors on the board in the base year of 2000 (“PctIndep”). Control firms average 70% independent directors, versus 53% for treated firms (t = 15.70, normalized diff. = 0.59). Although it’s not surprising that firms 9 Table DMO-1 differs slightly from DMO Table 2, because we restrict the sample for this comparison to firms used in the Tobin’s q regressions (with non-missing values for all covariates and δQ). We impose the same restriction in reporting first-stage results in Tables DMO-3 to -5 and in creating Figures DMO-1 to DMO-3. The usable sample varies slightly depending on the outcome variable. 27 without a fully independent audit committee have fewer independent directors generally, this imbalance is a cause for concern. PctIndep could directly predict δIndep (we will see below that it does) and the outcome variables, leading to bias. Thus, careful attention to balance on PctIndep is crucial for proper research design. A core weakness in the DMO study is their failure to address this imbalance. We pursue this concern in detail below. 4.4. Replicating and Extending DMO’s Original Results: One IV or Two? We next replicate and then correct and extend DMO’s results. We pursue three main corrections: (i) replacing their forbidden regression with a permitted one; (ii) controlling for Info Cost; and (iii) controlling for pre-treatment levels of PctIndep. As we do so, their results progressively weaken. Once we control for PctIndep, the DMO results disappear entirely. In Table DMO-2, columns (1)-(4), we replicate the first and second stage of the simple DMO IV model, which allows for a direct effect of δIndep on their outcomes, without as yet interacting δIndep with Info Cost. We also report, in columns (5)-(7), corresponding intent-to-treat DiD regressions, in which we directly use the non-comply dummy to predict the outcomes. The first stage of the IV specification appears very strong. Non-comply dummy takes a coefficient of 11.4, implying that treated firms, on average, increase the percentage of independent directors on their boards by 11.4% more than control firms over 2000-2005. The t-statistic is 9.40, which easily satisfies any concerns about instrument strength. The coefficient on instrumented δIndep is basically zero for all three outcomes, in both the IV and DiD specifications. We note, however, a concern with the DMO specification: The variables with covariate imbalance are Info Cost (moderately) and PctIndep (strongly). Both variables should therefore be included as covariates; but neither is. We indicate this by adding rows for these variables, which indicate that they were not included in the regressions. 28 We turn in Table DMO-3 to the core DMO specification, in which they include both predicted δIndep (from a separate first-stage regression) and predicted δIndep interacted with Info Cost. In columns (1)-(3), we replicate their results. They find an economically large and statistically strong negative coefficient on (predicted δIndep)*Info Cost for all three outcome variables. A plausible summary measure of statistical strength is the average t-statistic for the three outcomes, which is (3.10 + 7.90 + 4.98)/3 = 5.33. In the remaining columns of Table DMO-3, we report first and second-stage results from conventional 2SLS, in which we use non-comply dummy and (non-comply dummy * Info Cost) as separate instruments, and instrument for both δIndep and δIndep*Info Cost. The first stage remains respectable, but weakens noticeably. The coefficient on non-comply dummy, as a predictor of δIndep, remains large at 12.13. But the first stage t-statistic falls to 4.43, implying larger overall 2SLS standard errors. In columns (5)-(7), the second-stage 2SLS coefficients on instrumented (δIndep*Info Cost) are similar to those that DMO report on (predicted δIndep)*Info Cost. The t-statistics fall, slightly for δROA and mean return, and more sharply for δQ, but remain significant. The average tstatistic falls from 5.33 to 4.25. Still, in another study, the forbidden regression used by DMO could make a larger difference in coefficients or statistical significant (see Wooldridge, 2010, § 9.5.2, for an example). 4.5. Adding Info Cost as a Covariate DMO do not include Info Cost as a covariate in their main results (they include this variable later as a robustness check). Omitting this variable from their main results is a clear error. Any regression which includes an interaction term should include the non-interacted components. The omitted non-interacted components will normally correlate with the interaction term. If the noninteracted components also predict the outcome, omitting them will lead to omitted variable bias. In 29 Table DMO-4, columns (1)-(5) we report 2SLS results, adding Info Cost as a covariate. The firststage weakens again, as a predictor of δIndep. The coefficient is still large, at 10.45, but the first stage t-statistic falls to 3.33. In the second stage, the coefficients weaken for δROA and (more strongly) for δQ. The t-statistics again fall for all three outcomes; the coefficient in the δROA regression is now only marginally significant. The average t-statistic is now only 2.24. In columns (6)-(8), we report the intent-to-treat DiD results that correspond to the IV results. The statistical strength of the IV and DiD results is similar, as expected. 4.5. Imbalance on PctIndep: Graphical Evidence Thus far, the reader’s reaction might be muted. DMO made two technical errors, which a referee might have caught. But their results largely survive, with both 2SLS and intent-to-treat DiD specifications. This muted view would overlook an important concern with IV designs. Technical errors often have much larger consequences than in OLS. First, these errors can substantially inflate first-stage t-statistics, and make an instrument appear much stronger than it is. Second, IV results are vulnerable to “blowup”: IV mechanics, in attributing the effect of the instrument on the outcome to the instrumented variable, magnify, often greatly, any bias in estimating that effect. Also, there are larger problems to come. We saw above that treated and control firms have very different means for percentage of independent directors (PctIndep). This percentage is an important, endogenous firm characteristic that strongly predicts δIndep, and may also predict outcomes. Yet, DMO omit PctIndep from their regressions. We illustrate graphically the importance of PctIndep to DMO’s first-stage results in Figure DMO-1. We show a scatter plot of PctIndep in 2000 versus δIndep, with (green) circles for control firms, which met the audit committee rule in 2000, and orange triangles for treated firms, which did not. Several features are apparent. First, there is a strong negative correlation between PctIndep and 30 δIndep (r = -0.69). Second, treated (control) firms mostly have low (high) PctIndep. Thus, Figure DMO-1 confirms our initial concerns that PctIndep is both strongly imbalanced between treated and control firms and strongly predicts δIndep. A first-stage regression that omits PctIndep will wrongly attribute the change in board independence to the instrument, rather than to PctIndep. Third, there is a serious lack of overlap problem. There are almost no control firms with PctIndep < 25%, and almost no treated firms with PctIndep > 80%. Simply adding PctIndep to the 2SLS equations won’t solve that problem. Instead regression coefficients will be affected by extrapolation beyond the region of common support. One response to this problem is to trim the sample to a region of reasonably think thick common support. Sensible bounds might be PctIndep [0.25, 0.80]. Any results one might with the full sample, but not the trimmed sample, would be highly suspect, because they rely on extrapolation beyond common support. Fourth, if we look within the common support region, among firms with similar PctIndep, the triangles and circles are well mixed. If the instrument was strongly inducing firms to add independent directors to their boards, the triangles should be systematically above the circles, holding constant PctIndep. They are not. We investigate the overlap problem further in Figure DMO-2. We divide the sample into 20 bins based on PctIndep. The bins cover [0-5%], (5-10%], (10-15%], and so on. We show the number of treated and control firms in each bin, side by side. Figure DMO-2 shows the imbalance on PctIndep in a different way. There are no control firms with PctIndep < 15%, and only a handful with PctIndep < 25%. Trimming the sample at 25% PctIndep excludes 24 treated firms, but only 3 control firms; trimming at 80% excludes only 6 treated firms, but 153 control firms. This confirms our judgment that one should trim the sample to firms with PctIndep (25%, 80%]. In Figure DMO-3, we examine shock strength. We trim the sample to PctIndep (25%, 80%] 31 and then plot mean PctIndep separately for treated and controls, within each 5% bin for PctIndep. Dashed lines show the overall means for δIndep for treated and controls. The overall means is 7% (well below the 10-12% difference implied by the first stages in Tables DMO-2 to DMO-4). Within each bin, however, the differences are small. The mean difference of 7% is driven by a combination of: (i) the strong tendency, even within the trimmed sample, for firms that don’t initially comply with the audit committee rule to have fewer independent directors; and (ii) the general tendency for firms with low PctIndep in 2000 to increase their independent directors by 2005. The within-bin comparison of means in Figure DMO-3 confirms the impression from Figure DMO-1 that, if one controls for PctIndep, the audit committee shock loses much of its power to predict δIndep. 4.6. Addressing Imbalance on PctIndep The audit committee shock is clearly much weaker, once one controls for PctIndep. Is it still strong enough to be usable? We take a first pass at that question in Table DMO-5. We limit the sample to PctIndep (25%, 80%], include PctIndep and Info Cost as covariates in the 2SLS models and report both stages. Column (1)-(4) report results for a simple specification, similar to Table DMO-2, with one instrument (non-comply dummy) and one instrumented variable (δIndep). In column (1), non-comply dummy takes a coefficient of 2.28 (t = 2.34) – still significant at the 5% level, but likely too weak to be usable. The second-stage coefficients are insignificant, as expected. In columns (5-(9), we report results instrumenting for both δIndep and δIndep * Info Cost. As predictors of δIndep, the two instruments have an F-statistic of 2.58 (p = .09), well below the F > 10 rule of thumb for avoiding bias due to weak instruments. In column (2), the instruments are still significant predictors of the interaction term δIndep * Info Cost. But the F-statistic is well below 10, and we would be uncomfortable relying on this specification, without a good story for why the dual instruments predict the interaction term, even though non-comply dummy only weakly predicts 32 δIndep, once we control for PctIndep. If we swallow those doubts and continue with the second stage, the coefficients on the interaction term are insignificant for all three outcomes. Thus, the original DMO results critically depend on not controlling for PctIndep in the regression. If we did not trim the sample, the instruments would remain weak. In column (9), δIndep * Info Cost would significantly predict mean return (coeff. = -0.120; t = 2.76); but would be insignificant for the other outcomes. We view the one significant result as spurious –significance depends on assuming that the linear regression model holds outside common support. 4.7. Summary for DMO and Further Comments There are several problems with the DMO study. First, they use a “forbidden regression” instead of two separate IVs. Their results weaken somewhat, but remain significant. Still, using two separate instruments could be important in another study. Second, and more importantly, they omit a non-interacted variable, Info Cost, in their main results, which involve instrumenting for Info Cost*δIndep. Their results weaken substantially when we control for Info Cost. Most critically, they have extreme imbalance on a core pre-shock covariate (PctIndep). That imbalance drives instrument strength. Once we control for PctIndep, the nudge toward higher board independence provided by their audit committee shock is too weak to be usable. Second-stage IV and (not reported) intent-totreat DiD estimates also become insignificant. Further steps. Imagine, though, that the DMO results had survived. What else might one do to strengthen inference? Omitted variable bias remains a major concern. After all, non-comply dummy is an endogenous firm choice, just as much as PctIndep. Both DiD and its shock-IV cousin rely on a parallel trends assumption -- that treated and control firms would have evolved similarly, but for the shock, even though they made different pre-shock choices, for only partly observed reasons. The defenses for this assumption include: (i) checking for covariate balance; (ii) using 33 extensive covariates (which also allows a more sensitive test for covariate balance); (iii) improving balance (including trimming to common support); and (iv) showing parallel trends in the pretreatment period. DMO provide a covariate balance table and show pre-treatment trends for ROA (one of their three outcomes), in their Figure 3. These steps strengthen the paper, and are part of why we chose it for replication. But their covariates are thin (only board size, leverage, firm age, and ln(market cap)). And while ROA trends are reasonably parallel over 1996-2000, they are not perfectly so – there is a possible divergence between the controls and the high-Info-cost treated firms over 1998-2000. And no similar graphs are provided for Tobin’s q and share returns. We would want to see many more covariates (which should be available), and graphs for all three outcome variables, for a longer pre-treatment period. In our experience, mild non-parallel trends that are apparent with, say, 6-7 pre-treatment periods, can be hard to detect with only 3-4 periods. Given the great power of PctIndep in predicting δIndep, we would also worry about whether one can assume a linear relation between PctIndep and δIndep, as we did by adding PctIndep as a regressor. Figure DMO-3 suggests that any non-linearity is minor. Still, avoiding this assumption seems warranted. Here is one balancing method that we pursued in unreported results. Our approach loosely follows Imbens and Rubin, 2015, ch. 17, but the literature on pure observational studies includes many others (e.g., Rosenbaum, 2009; Imbens, 2014). Trim the sample (as above). Use all covariates to estimate the propensity to have a non-compliant audit committee in 2000. Then divide the sample into blocks based on the propensity score (we use 5 blocks); run 2SLS (or intent-to-treat DiD) within each block, and sum across blocks to obtain overall coefficients and t-statistics. In unreported results, we the most important contributor to the propensity score is PctIndep, but other variables like Info Cost and market cap are important. All intent-to-treat DiD and 2SLS coefficients are insignificant, both within each block and summed across blocks. 34 IV Validity and the Blowup Problem. We also have serious doubts about the only-through condition. The only-through condition requires that an increase in audit committee independence affects the DMO outcome variables only through board independence. For this condition to hold, one must believe that the shock to the audit committee has no direct effect on ROA, Tobin’s q, or share returns. But why should this be true? Fully independent audit committees may affect firms’ accounting choices and hence reported profits. A fully independent audit committee could also reduce fraud risk, and thus lead investors to pay more for the same reported earnings, which would affect Tobin’s q and share returns. The other two papers we study, D&D and Iliev, address the only-through condition with care. In contrast, DMO simply note near the end of their paper the possibility of other channels. We believe every IV design should include a careful defense of the only-through condition. Intent-to-treat DiD is usually an alternative to shock-based IV, which avoids this problem. DMO could have assessed whether the audit committee change caused a change in their outcomes, while remaining agnostic on the channel – a direct effect of the audit committee change, an indirect effect through board independence, or a third channel (perhaps appointing directors with financial expertise, to staff the audit committee) causes any observed change. As we noted in Part 2, coefficient blowup, with 2SLS coefficients much larger than OLS coefficients, is a strong warning of a likely violation of the only-through condition. So are results that are significant in 2SLS but not in OLS (or intent-to-treat DiD). DMO report OLS results (as every IV paper should), and they have both problems – their 2SLS coefficients are 4-9 times the OLS coefficients (depending on the outcome variable), and in OLS, the δIndep * Info Cost interaction term is significant only with ln(q) as the outcome. They comment only that the differences between OLS and their IV-like results “suggest that endogeneity of board composition may be a significant 35 problem.” (DMO at 205). In our view, any IV paper with these issues needs a careful defense of why endogeneity should lead to coefficient blowup, stronger t-statistics in 2SLS, or both. 5. Re-analysis of Iliev (2010) 5.1. Overview SOX § 404, as implemented by the SEC and later amended by the Dodd-Frank Act, requires firms with “public float” (the market value of shares not held by insiders) > $75M to have their auditors certify that the firm has adequate internal controls, beginning in 2004. Complying with SOX § 404 surely increases firms’ auditing costs, but one would like to know by how much. For larger public firms, there is no control group, so the best available research design is interrupted time series (ITS), in which one would estimate the time trend in audit fees, before and after 2004, and look for an unusual jump in 2004, relative to that trend. But audit fees might have changed from 2003-2004 for reasons other than SOX § 404. In a careful and clever paper, Iliev (2010) seeks to do better, for firms near the $75M (in free float, which we omit below) threshold for SOX § 404 to apply. His central research design is RD, in which he compares firms just above this threshold to firms just below it, and estimates a mean increase in ln(audit fees) of 0.744 (t = 7.39). The corresponding percentage increase is e0.744 – 1 = 110%, implying that audit fees more than doubled. See Table Iliev-1, regression (3). Iliev finds evidence that firms manipulate their free float to avoid complying with SOX § 404 (below, “SOX compliance”). He argues that these firms might have especially high SOX compliance costs, which would bias his estimate of average costs downward. He addresses this concern with an IV strategy. The SEC adopted the rule containing the $75M threshold for SOX compliance in 2003. The rule required firms to comply in 2004 if their float in 2002, 2003, or 2004 exceeded $75M. Firms could not go back in time to manipulate their 2002 float, and in 2002, they had no reason to expect 36 $75M in float to become a magic number in the future. Iliev therefore uses (2002 float > $75M as an instrument for SOX compliance. His two-stage-least-squares (2SLS) estimated increase in ln(fees) is 0.983 (t = 3.65). The corresponding percentage increase is e0.983 – 1 = 167%, well above his OLS estimate. See Table Iliev-1, regression (3)). Iliev is a state-of-the-art finance paper. In many ways it is better than state-of-the-finance-art. In AB-2014, we review 863 empirical corporate governance papers published in 22 major journals over 2001-2011, and find 77 shock-based papers. Iliev is one of only two RD papers in our sample. His RD design is carefully done, including checking for covariate balance, including a flexible control for the running variable (float or, more generally, firm size), varying the bandwidth around the discontinuity, checking for evidence that firms manipulate their float to avoid complying with SOX § 404, and running an array of placebo and robustness checks. He finds evidence of threshold manipulation and addresses it with a clever and plausible instrument. Of the 74 shock-based papers we studied in AB-2015, this is our second favorite, after only Bennedsen et al. (2007). Iliev’s RD-only design is credible, even though we would supplement it by using DiD/RD combined. His article is one of the very first finance papers to use an RD design, and one of only two papers in the AB-2015 sample to do so. And yet, Iliev’s 2SLS estimate is much too high. With our combined RD/DiD design, we estimate a coefficient on a SOX-compliance dummy of 0.586 (t = 4.59). This corresponds to an e0.586 = 80% increase, or only about half of Iliev’s RD/IV estimate of 167%. We show below that his RD design, for subtle reasons, does not produce covariate balance on firm growth. That alone does not matter much. Controlling for growth modestly reduces the coefficient on a SOX-compliance dummy from 0.744 to 0.706. 37 Even more subtly, Iliev’s IV design leads to gross imbalance on growth and to violation of the only-through condition. In effect, his IV predicts audit fees both through the instrumented variable (SOX 404 compliance) and through omitted growth variables. The “IV-compliers” (who comply with SOX § 404 only because they are forced to do so by his instrument) are also a small, and likely unrepresentative, subsample of all SOX-complier firms. We also explain why Iliev’s IV design is unnecessary (firms that manipulate their float to avoid SOX compliance do not have significantly higher compliance costs than other firms). 5.2. Data Manipulations We start with Iliev’s dataset, which includes 1,492 unique firms over 2002-2004 (the period we focus on here); of which 815 firms have full data on float, audit fees, and covariates, and 281 of these firms are within his [$50M, 100M] bandwidth for 2004 float.10 Iliev mostly uses a fixed bandwidth around the $75M threshold of [$50M, $100] measured in 2004.11 In our re-analysis, we vary bandwidth systematically, in both 2002 and 2004. To do so, we define a bandwidth parameter b, and a corresponding bandwidth of [ $75𝑀 𝑏 , $75𝑀 ∗ 𝑏]. In our main analysis we use b = 1.5, which implies a bandwidth of [$50M, $112.5M], similar but not identical to Iliev’s. 5.3. Replication and Extension of Iliev RD Results We begin our re-analysis with replication and extending Iliev’s RD results. In Table Iliev-1, 10 Iliev (2010) also includes results through 2007. Following Iliev, we require that firms having non-missing data on public float, audit fees, and covariates for 2002 and 2004. In unreported robustness checks, we obtain very similar results if we also insist that this data be non-missing for 2003. To determine which firms must comply with SOX § 404, the SEC uses float at the end of the second fiscal quarter (“SEC float”). Iliev’s dataset flags firm-year observations where float is reported at a different date (generally year-end). In our extensions, to avoid dropping these observations, we assume that reported float = SEC float, but drop two firms for which this assumption is incorrect – they reported float > $75M in 2002, yet did not comply with SOX § 404. This implies that their SEC float for 2002 was < $75M. 11 Iliev finds similar (unreported) results with bandwidths of [$60m, $90M] and [$40M, $110M]. 38 regression (1), we replicate Iliev’s core RD result, with no covariates other than a cubic in float and industry fixed effects (“FE”), from his Table II. SOX compliance predicts an 0.866 (t = 7.57) increase in ln(2004 audit fees). This estimate, however, is biased upward. Within the [$50M, $100M] bandwidth, larger firms likely pay larger fees. To develop a better estimate, one must control for firm size. Iliev does so with an admirably flexible functional form in his (and our) regression (2), He controls for ln(sales), ln(assets), ln(market value) and a cubic in public float, all measured in 2004, along with other covariates (leverage, receivables/assets, dummy for Big 4 auditor, number of business segments, and number of geographic segments). His coefficient estimate falls to 0.744, implying a 110% increase in fees (e0.744 = 2.10). It is customary for an RD design to control flexibly for the “running” or forcing variable, as Iliev does, and common to find that doing so changes the estimated jump at the threshold. The theory behind RD implies, however, that (i) firms near the threshold should be similar on everything but the running variable; and (ii) adding additional covariates should not greatly affect treatment effect estimates. We verify differences in firm size, and similarity on Iliev’s other control variables, in two ways. First, in Table Iliev-1, we include an additional regression (1A), in which we add only sizerelated covariates to regression (1). The coefficient on the SOX compliance dummy is 0.749. Adding the non-size variables in regression (2) changes the estimate only trivially, to 0.744. Second, we assess covariate balance in Table Iliev-2, both in 2002 (before $75M in float became a magic number) and in 2004 (when Iliev builds his sample). In Panel A, we compare “future treated” firms, with 2002 float [$75M, $100M], to “potential future control” firms with 2002 float [$50M, $75M]. The two groups are similar on all of Iliev’s control variables except size and related variables (number of geographic segments is related to size). As Panel B shows, in 2004, within 39 Iliev’s [$50M, $100M bandwidth], treated firms are tolerably similar to control firms on Iliev’s covariates. So far, so good. But are the SOX-complier and control firms really similar? In both panels, we include variables for growth from 2002 to 2004 in float, market capitalization, sales, and assets.12 In 2002, firms below the future $75M threshold are similar to firms above the threshold, as expected. But in Panel B, SOX compliers grow more slowly than controls, significantly so for float and market cap. These differences arise because the SEC rule is asymmetric with respect to how changes in float affect whether SOX § 404 complies: Firms can grow into compliance, but cannot shrink out of compliance. We discuss this asymmetry below. A core RD assumption is violated. Iliev missed this. So did we, until we carefully thought about which firms were the compliers, always-takers, and never-takers for his IV. Does omitting growth covariates matter? The OLS answer is yes, at least somewhat. In Table Iliev-1, regression (2A), we add growth variables to his regression. The treatment effect estimate drops from 0.744 to 0.706. More centrally, our confidence in the RD design also falls. Linear controls for growth may be imperfect. Also, if the treatment and control groups are not balanced on growth, they may not be balanced on unobserved variables. As we will see below, the IV-compliers (the SOX-compliers who comply only because they have 2002 float > $75M) are much more unbalanced, versus the controls, than the full set of SOX-compliers. A core task for our re-analysis will be to construct treatment and control groups that are balanced on growth, and thus more likely to be balanced on unobservables. One lesson from imbalance on growth, which will generalize to DMO, D&D, and many other finance papers: Black et al. (2014) stress the importance of using extensive firm-level covariates, 12 Iliev’s dataset starts in 2002, so we cannot compute growth from earlier years through 2002. 40 even in studies with firm fixed effects (FE). Shock-based research designs should similarly include far more covariates than is the norm today. Had Iliev done so, he might well have found the growth imbalance and addressed it in some way, even if not by using the principal strata approach we develop below. 5.4 Replication and Extension of Iliev’s RD/IV Results We next replicate and extend Iliev’s combined RD/IV analysis, in Table Iliev-3. We show the first stage and second stage of his 2SLS regressions in separate columns. In the first stage, he uses the instrument (float > $75M in 2002), plus covariates, to predict the instrumented variable (SOX compliance in 2004). We begin with regression (3) from Iliev’s Table II, which include no covariates other than industry FE. In this RD/IV specification, Iliev omits the cubic in float that he includes in his RD regressions.13 This is an odd choice – an RD design should control flexibly for the running variable, as Iliev did in his straight RD regressions. In regression (3A), we therefore add the float cubic. Adding the float cubic reduces the first stage coefficient from 0.466 to 0.325, but the instrument remains reasonably strong (first-stage t-stat = 6.48). More problematically, adding the float cubic increases the second-stage coefficient, already uncomfortably high at 1.171, even further to 1.332, implying a near quadrupling of audit fees (278% increase). SOX § 404 compliance was expensive, but no one, to our knowledge, thought it was that expensive. As we discuss in AB-2014, an implausibly large second-stage coefficient is a strong warning sign that one’s instrument fails the only through condition. We next report the first and second stage for Iliev Table II, regression (4), which includes 13 In his Table II, Iliev says in one place that both stages use the “same controls” but in another that the first-stage, but not the second, includes “public float terms.” The latter would be an incorrect use of 2SLS if true, but based on our replication, Iliev excluded float terms in both stages. 41 covariates. The coefficient on instrumented SOX compliance falls to 0.983, which still implies a 167% increase in fees (e0.983 = 2.67). We then extend Iliev’s results by adding a cubic in float, in regression (4A), and further adding growth variables in regression (4B). The first-stage becomes progressively weaker, with a coefficient in regression (4B) of 0.237. At the same time, the coefficient on instrumented SOX compliance rises to 1.179, implying an increase in audit fees of 225% as a result of SOX 400 compliance. 5.5. Impact of Growth Imbalance on Coefficient Estimates We next investigate the effect of imbalance in growth on estimates of SOX § 404 compliance costs, relying primarily on graphical analysis. Note first that the IV-compliers – the firms which comply with SOX § 404 only because they had float > $75M in 2002 – are a very particular subgroup of the SOX-compliers (all firms which complied with SOX § 404 in 2004). The IV-compliers must have suffered a drop in float to < $75M in each of 2003 and 2004. Could their growth trajectory over 2002-2004 explain their 2004 audit fees, at least in part? We provide evidence that the answer is yes in Figure Iliev-1. We plot the relation between ln(audit fees) and float in 2004 for the 281 firms used in Iliev’s OLS and IV analysis, but split them into four groups: (1) 41 “shrinker-compliers” (shown with orange triangles), defined as firms that are SOXcompliers, but have float < $75M in 2004, implying that their float exceeded $75M in 2002 or 2003; (2) 80 grower-compliers (shown with green diamonds), defined as SOX-compliers with 2002 float < $75M but 2004 float > $75M; (3) 117 small control firms (shown with black dots), these are firms with 2004 float < $75M, that do not comply with SOX § 404 in 2004; and 42 (4) 43 “large-compliers” (also shown with black dots), firms with float > $75M in both 2002 and 2004. We also add two regression lines showing predicted ln(2004 fees). The first line, at the lower left, is for control firms, ends at 2004 float of $75M, and comes from a regression of ln(fees) on float and constant term. The second line, which covers the full range of 2004 float, is for SOX-compliers; we regress ln(fees) on float, constant term, and a “large dummy” (=1 if 2004 float > $75M). This specification lets shrinker-compliers have a different constant term than other SOX-compliers. The scatter plot and predicted fee lines show that, controlling for float, shrinkers have much higher audit fees than other SOX-compliers (the coefficient on the shrinker dummy is 0.53). If we use the regression lines to compare the audit fees of control firms to shrinker-compliers at 2004 float just below $75M, the difference in predicted ln(fees) is 1.07, which is similar to Iliev’s IV estimate. In contrast, if we compare the same control firms to other SOX-compliers with float just above $75M, the difference in predicted fees is only 0.54. This is quite close to the DiD/RD estimate we develop below. We can now see how to control for growth lead Iliev astray. Shrinker-compliers had much higher audit fees than other SOX-compliers, pre-SOX. This is plausible – these firms used to be larger, and may retain the higher fees of larger firms. Some may also incur higher fees because they have shrunk due to business troubles that might call for more intensive auditing. Iliev’s IV estimate compares shrinkers (more precisely, shrinkers with 2002 float > $75M) to control firms. This leads to a large upward bias in estimating the cost of SOX compliance. Figure Iliev-2 illustrates the large differences in growth between IV-compliers and control firms. The figure shows a scatter plot with 2002 float on the x-axis and 2004 float on the y-axis. 43 Vertical lines show 2002 float of $50M, $75M, and $100M; horizontal lines are similar for 2004. A 45-degree line indicates no change in float from 2002 to 2004. The IV-compliers are the 22 shrinkers in the red-bordered box (2002 float > $75M but 2004 float < $75M).14 All fall below – often far below – the 45-degree dotted line. The 117 control firms are in the green-bordered box (2002 and 2004 float both < $75M). Almost all (110 of 117) are above the 45-degree line, often far above it. There are 10 control firms whose float shrank from 2002-2004, of these three have reported 2002 float>$75M. We drop these three firms because their reported 2002 float was not the float used by the SEC to determine SOX 404 compliance. The largest negative change in ln(float) among the remaining 7 control firms is -0.20. The other SOX-compliers are in the black-bordered box, above the $75M line for 2004. The IV-compliers are very different from the controls on growth. There was already imbalance on growth between the full set of SOX-compliers and the controls, as we saw in Table Iliev-2, Panel B. That imbalance is far worse for IV-compliers versus controls. We show that difference numerically in Table Iliev-2, Panel C. The mean change in ln(float) is -0.57 for IVcompliers versus +0.96 for controls. Across all four growth measures, controls grow, while IVcompliers shrink. Standard advice in the causal inference literature is to avoid using regression to extrapolate beyond the “common support” of the data – the area for which treated and control firms have overlapping values (e.g., Imbens and Rubin, 2014, ch. 14). The great strength of RD is that it should lead to covariate balance and nearly complete overlap, on both observed and unobserved covariates. 14 In Figure Iliev-2, we drop [*xx] shrinker-compliers with float in both 2002 and 2003 > $75M, but 2004 float < $75M, in order to focus on the IV-compliers. 44 In this respect, Iliev’s IV design grossly departs from the RD design he began with. In trying creatively to address one problem (some firms manipulated their 2004 float to avoid complying with SOX), Iliev stumbled into a larger problem with covariate imbalance. That imbalance was pernicious, because growth in float strongly predicts audit fees. How much overlap is there between IV-compliers and controls on growth in ln(float)? Not much. Figure Iliev-3 shows a histogram of the number of IV-compliers and controls within different ranges for change in ln(float). The only overlap is for change in ln(float) [-0.25, 0]. That region of common support includes only 7 control firms and 6 IV-compliers. Iliev’s IV/RD estimate already rests on only 22 IV-complier firms. If one insists on overlap on growth in ln(float), the number of IV-compliers for which one could estimate a treatment effect is down to 6 firms -- manifestly too small to support a credible estimate. In sum, Iliev’s IV is fatally undermined by imbalance between IV-compliers and control firms on growth. His RD design is also infected by imbalance on growth, but less severely. One would be better off staying with an RD design, recognizing that SOX-avoidance by some firms could lead to an underestimate of the average treatment effect, assessing how large that underestimate might be, and estimate the cost of SOX 404 compliance for a sample of treated and control firms that is balanced on growth. We turn to those tasks next. But first, some remarks. Iliev’s IV fails due to severe imbalance on growth. It’s tempting to think – why didn’t he see this? In fact, finding the imbalance is quite subtle. Iliev wasn’t looking at 2002 – he ran cross-sectional regressions in 2004. He had an RD design, which normally produces covariate balance on all but the running variable, confirmed balance on levels in 2004, and likely never suspected there could be imbalance in changes. We convinced ourselves, early on, that growth was an important covariate, and that his IV estimate was much too high. But for a long time, we 45 didn’t understand why. After all, if we control for growth in the RD regression in Table Iliev-1, regression (2A), the compliance cost estimate falls, but not by that much (from 0.744 to 0.706). And in the IV regression (4B), in Table Iliev-3, the coefficient estimate increases. For us, the takeaway messages from the failure of Iliev’s IV include: (i) ensuring IV validity is a quite tricky business. An exogenous instrument (which Iliev has) is necessary but not sufficient; (ii) it’s crucial to use an extensive set of control variables, and check for covariate balance on all of them; (iii) it’s crucial to think carefully about who are the IV-compliers, and how they might differ from the controls;15 (iv) subtle violations of the only-through condition can lead to large biases in IV results; and (v) using regression to extrapolate beyond common support is dangerous. 5.6. Our Preferred Analysis: DiD/RD within Principal Strata We start our own analysis in 2002, with a sample of firms with 2002 float [$50M, $112.5M] (using a bandwidth parameter b = 1.5, which we will later vary). We address the imbalance on growth between all SOX-compliers and all controls by isolating subsamples of firms with similar change in float from 2002 to 2004. Our first principal strata, shrinkers, are defined as firms with 2002 float >2004 float.16 The shrinkers group includes two strata: shrinker-compliers with 2002 float > $75M, who must comply with SOX § 404 regardless of their 2004 float, and “shrinker-controls” –firms with 2002 float < $75M. Since compliance is mandatory and no firms comply with SOX § 404 unless required to, both strata are fully observed. Moreover, since firms cannot avoid SOX by shrinking, 15 [*analogy to come to Angrist quarter-of-birth instrument; and Buckles and Hungerman (ReStat 2011 or so) critique for lack of balance [winter babies are different]. 16 [*note to come explaining the concept of principal strata and how it generalizes the causal IV categories of always takers, never takers, instrument compliers, and instrument defiers]. 46 there is no reason to expect either stratum to include SOX-avoiders. Thus, the worry that SOXavoiders might have higher compliance costs than other firms does not apply to the shrinkers strata.17 We show the division of shrinkers into shrinker-compliers and shrinker-controls in Figure Iliev-4. The average change in ln(float) from 2002 to 2004 (the average distance from each point to the 45 degree dashed line) is similar for the two strata; we confirm balance on growth and other covariates, except size covariates, in unreported regressions. A DiD comparison of the ln(audit fees) of these two groups can provide an estimate of the effect of SOX on audit fees that is free of potential bias due to lack of balance on growth in float. The second pair of strata we define is for modest growers. These are firms that experience growth in float but either start above $75 Million in 2002 and do not go above $112.5 Million in 2004 (these will be SOX complier firms), or start below $75 Million in 2002 and do not cross the $75 Million threshold in 2004 (these will be SOX non-complier firms). The modest grower stratum and its two substrata of complier and non-complier firms are shown on Figure Iliev-5. One can easily verify that the two substrata are well balanced on mean change of ln(float), which suggest another DiD estimate of the effect of SOX compliance on audit fees, which will be free of the growth imbalance concerns. In Table Iliev-4, we report the results from DiD regressions comparing the log(audit fees) of complier and non-complier firms in the shrinker strata, modest grower strata, and both strata combined. All regressions include firm and year fixed effects. For each strata we report three specifications –1) no controls; 2) with the control variables included in Iliev; and 3) Iliev controls 17 In unreported results, we confirm covariate balance between the shrinker-compliers and shrinker-controls on all variables except size-related variables, including balance on growth variables. 47 plus cubic terms of float. The estimated treatment effects on log(audit fees) of SOX-404 compliance range from 0.507 to 0.645. Taking the model with most control variables, the estimated effects for shrinkers is 0.606; for modest growers it is 0.507, and for both strata combined, it is 0.586. This 0.586 estimate translates into an 80% increase in audit fees attributable to SOX 4004 compliance. , which equals roughly half of the 167% estimate from Iliev’s IV specification. The combined-strata coefficient is also a fair bit below Iliev’s RD-only estimate of 0.744, although our DiD/RD estimate is within the 95% CI for Iliev’s estimate, and vice-versa. 5.7. DiD within Strata -- Further Tests The principal strata framework also allows us to examine the extent of potential selection bias that firms with higher compliance costs will manipulate their float to opt out of SOX 404 compliance. Correcting for this bias was the main reason for Iliev’s IV design. In Figure Iliev-6 we show the strata of growers and its two substrata – 1) growers-forced-compliers, which include firms that start with 2002 float>$75 Million; and 2) growers-voluntary-compliers, consisting of firms with 2002 float<$75 Million that grow their float above $75 Million in 2004. If Iliev’s selection bias concern is valid, one would expect the audit fees of growers-voluntary-compliers to be lower that the fees of growersforced-compliers. We report the DiD comparisons of ln(audit fees) of these two substrata in Table Iliev-5. We find small, insignificant differences between growers-voluntary-complier (treated) and growers-forced-complier (control) firms. So, there is no evidence that the growers-voluntarycompliers have significantly lower cost of SOX 404 than the mix of grower-compliers and groweravoiders (if they could) in the control group. This suggests that the IV analysis in Iliev is addressing a source of potential bias that is limited in practice. For our main tests we chose a bandwidth of $50M-$112.5M in 2002 float. As our last robustness test of the within-strata DiD analysis we systematically vary this bandwidth. Define a 48 bandwidth parameter b. For a given b, we calculate a 2002 float bandwidth as [$75M/b, $75M * b]. The original bandwidth choice corresponds to b = 1.5. We vary b from 1.10 to 3.00 and show the estimated treatment effects as a function of b for the shrinker and modest grower strata in Figure 7 Panel A and B, respectively. To preserve degrees of freedom for narrower bandwidths, which have significantly smaller number of treated and control firms, we present the estimated coefficients only for the specification without covariates. The estimate treatment effects are relatively stable around 0.6 for large ranges of b, except when b approaches 1.1. The drop is treatment effect for narrower bands could be driven by increased audit fees of firms that are very close but below the threshold and prepare preemptively for expected SOX 404 compliance. Notes on Iliev: We get very different results for covariates than he does, within our 3 comparison groups. This suggests that there are meaningful differences between our three strata. Ln(sales) is insignificant for Iliev’s base RD analysis. For us, it is significant and positive for the rapid growers, insignificant for the shrinkers and modest growers. No. of geographic segments is significant and positive for Iliev. For us, it is insignificant for all three groups with varying coefficients and sign. Iliev and we both find big auditors charge higher fees. But his point estimate is 0.37, while our estimates differ among groups: 0.38 for modest growers; 0.55 or so for grower-compliers and shrinkers. 49 6. Guidance: What One Needs for Reliable Shock-Based IV 6.1. An Extended IV Validity Checklist As we discussed in Section 2, the modern checklist for instrument validity requires authors to verify three conditions – 1) instrument exogeneity, 2) instrument strength, and 3) only-through. When we started this study, we originally viewed the instruments in three shock-based papers we reexamine as valid according to the modern IV checklist. Our reexamination identifies settings when apparently valid shock-based instruments fail one or more of the IV validity conditions. First, constructing an instrument as an interaction of the shock with post-shock covariates can violate the exogeneity condition, as in the D&D case. Second, an instrument can appear strong only because of lack of balance. Once covariate balance is improved, the instrument can fail the strength requirement, as in the DMO case. Third, an instruments can fail the only through condition due to lack of balance on post-shock covariates, as in the Iliev case. Our analysis suggests the following instrument validity checklist that extends and refines the three conditions for valid instruments. Condition 1: Instrument Exogeneity The key starting point for credible shock-based IV is the choice of a credibly exogenous shock. The exogeneity of a shock is not statistically testable. Authors would need to argue that a shock is exogenous from the point of view of the business entities in their sample. Such arguments can be based on theory or institutional knowledge. For the three papers in our analysis, the shock in Iliev is clearly exogenous because rule SOX 404 was passed in 2004 and used free float in 2002, which is non-manipulable by firms. D&D carefully defend shock exogeneity with the argument that the change in tax rules was designed to help small companies and had only the unintended consequences. The variables they interact with the shock depend on unobserved firm characteristics, but the shock itself 50 does not. DMO are on more slippery ground here – before their 1999 shock, firms chose whether to have < 100% independent audit committees. This not fatal for DiD, but puts stress on controlling for a wide array of pre-shock covariates. 1. Instrument exogeneity a. Construct instrument directly as a shock dummy or as interaction of shock dummy and pre-shock firm characteristics only. Adding firm fixed effects will then ensure automatically that firms have observations both pre and post (D&D lesson) b. Thus, of our three papers, two satisfy exogeneity; the third does not, but one can hope to limit the damage. c. The vast majority of non-shock-IV papers in AB2015 and Larcker and Rusticus (2010) do not. 2. Instrument strength a. Reiterate general advice about weak instruments b. If instrument appears weak, can perhaps find subsample with covariate balance, for which instrument is both strong and plausibly satisfies only-through – DD as example]. More formally, one could use Keele and Morgan-like methods to select an optimal subsample where the instrument is stronger (but then see concerns about lack of balance below). c. If instruments appear strong, perform extensive covariate balance analysis of variables in the first-stage regression, because lack of balance on particular covariate can generate false instrument strength (DMO lessons). d. If lack of balance is detected, use matching/balancing methods from observation studies literature 51 3. Only-through condition a. Briefly reiterate general discussion about the only-through condition (DD talk about it, DMO don’t, Iliev does as well) b. Analyze only-through argument made by DD and see if it holds c. Then discuss the subtle violation of this condition in Iliev, mentioning collider variables from Pearl and other analogies with famous IV papers e.g. Angrist & X study using birth month as instrument for education d. The Iliev violation is detectable if researchers can identify the subsample of compliers and perform covariate balance tests comparing this sub-sample with the remaining sample. e. If imbalance is found, IV is suspect. Could use balancing methods to restore IV validity, but the resulting sample will be likely very small (7 vs 6 firms in the Iliev case). 6.2. Exploiting the Same Shock with Both IV and Other Methods Our second advice: when doing causal inference authors should not exploit exogenous shocks using solely IV. Instead, they should use other shock-based methods like DiD, ES, and RD, at least first, and often last as well! For shock-based IV in which the shock is used directly as the IV, intentto-treat DiD will provide similar statistical strength with weaker assumptions. If one gets results with DiD, and there is a strong case to be made that the causal effects from the shock to the outcome variables flow through a single channel, then IV can make sense as a secondary analysis. For IV designs where the instrument is constructed and an interaction of the shock and a preshock covariate, if the covariate is binary, you are back to DiD with a treated group and a control group defined using the covariate. If the covariate is continuous, you have, in effect, a DiD52 continuous design (for discussion of such designs, see Atanasov and Black, 2015), and should start with that. If the causal question of interest involves an interaction between two variables of interest, one can instead start with a version of DiDiD design. If both variables are binary, the design is true DiDiD; if one variable is binary and the other continuous (as in the DMO case, in which the authors interact a rule compliance dummy and a pre-treatment covariate measuring information costs), the design is DiDiD-continuous; if both variables are continuous (as in the D&D case, in the which the interacted variables are a measure of governance and measures of need for tax shields), one would have a DiDiD-double-continuous design. As with any DiD- or DiDiD-continuous or design, it is often useful to turn the continuous variable into a binary one as one part of the analysis. We illustrate this approach above for D&D, but one could do this for DMO as well. Other lessons from Iliev -- RD on cross-sectional data can be improved by combined DiD + RD design using panel data. 7. Conclusion Larcker and Rusticus (2010) remind us that efforts to use “bad” (non-shock-based) instruments will rarely succeed. This paper provides evidence that apparently “good” (shock-based) instruments will also rarely succeed. Researchers who begin with a plausible shock should verify that it meets the conditions for a good shock, outlined above, and that IV is a sensible use of the shock. Much of this work can be done in a design stage of the research, with outcomes hidden. [*more to come] 53 References Adams, Renee B., and Joao A.C. Santos, 2006, Identifying the Effect of Managerial Control on Firm Performance, Journal of Accounting and Economics 41: 55-85. Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin (1996), Identification of Causal Effects Using Instrumental Variables, Journal of the American Statistical Association 91: 444-455. Angrist, Joshua D., and Jorn-Steffen Pischke (2009), Mostly Harmless Econometrics: An Empiricist’s Companion. Atanasov, Vladimir, and Bernard Black (2015), Shock-Based Causal Inference in Corporate Finance and Accounting Research, Critical Finance Review, forthcoming, working paper at http://ssrn.com/abstract=1718555. Atanasov, Vladimir, Bernard Black, Conrad Ciccotello, and Stanley Gyoshev (2010), How Does Law Affect Finance? An Examination of Equity Tunneling in Bulgaria.” Journal of Financial Economics 96: 155-173. Baiocchi, Mike, Dylan S. Small, Scott Lorch, and Paul R. Rosenbaum (2010), Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants, 105 Journal of the American Statistical Association 1285-1296. Bennedsen, Morten, Kasper Meisner Nielssen, Francisco Perez-Gonzalez, and Daniel Wolfenzon, 2007, Inside the Family Firm: The Role of Families in Succession Decisions and Performance, 122 Quarterly Journal of Economics 647691. Black, Bernard, Hasung Jang, and Woochan Kim (2006), Does Corporate Governance Affect Firms' Market Values? Evidence from Korea.” Journal of Law, Economics and Organization 22: 366-413. Black, Bernard and Woochan Kim (2012), The Effect of Board Structure on Firm Value: A Multiple Identification Strategy Approach Using Korean Data, Journal of Financial Economics 103: 203-226. Black, Bernard, Antonio Gledson de Carvalho, Vikramaditya Khanna, Woochan Kim and B. Burcin Yurtoglu (2014), Methods for Multicountry Studies of Corporate Governance: Evidence from the BRIKT Countries, Journal of Econometrics (forthcoming), working paper at http://ssrn.com/abstract=2219525. Busso, Matias, John DiNardo, and Justin McCrary (2014), New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators, 96 Review of Economics and Statistics 885-897. Catan, Emiliano, and Marcel Kahan (2014), The Law and Finance of Anti-Takeover Statutes,” working paper, at http://ssrn.com/abstract=2517594. Dharmapala, Dhammika, Fritz Foley, and Kristin Forbes, 2011, Watch What I Do, Not What I Say: The Unintended Consequences of the Homeland Investment Act, 66 Journal of Finance 753-787. Desai, Mihir, and Dhammika Dharmapala (2009), Corporate Tax Avoidance and Firm Value, 91 Review of Economics and Statistics 537-546. Duchin, Ran, John Matsusaka, and Oguzhan Ozbas (2010), When Are Outside Directors Effective?, 95 Journal of Financial Economics 195-214. Frangakis, Constantine E., and Donald B. Rubin (2002), Addressing Complications of Intention-to-Treat Analysis in the Combined Presence of All-or-None Treatment-Noncompliance and Subsequent Missing Outcomes, 96 Biometrika 365-379. Frangakis, Constantine E., and Donald B. Rubin (2002), Principal Stratification in Causal Inference. 58 Biometrics 2129. Giannetti, Mariassunta, and Luc Laeven, 2009, Pension Reform, Ownership Structure and Corporate Governance: Evidence from a Natural Experiment, 22 Review of Financial Studies 4092-4127. Guner, Burak, Ulrike Malmendier and Jeffrey Tate, 2008, Financial Expertise of Directors, 88 Journal of Financial Economics 323-354. Holland, Paul (1986), Statistics and Causal Inference, Journal of the American Statistical Association 81: 945-960. 54 Iliev, Peter (2010), The Effect of SOX Section 404: Costs, Earnings Quality, and Stock Prices, 65 Journal of Finance 1163-1196. Imbens, Guido W. (2014), Matching http://ssrn.com/abstract=2417602. Methods in Practice: Three Examples, working paper, at Imbens, Guido W., and Donald B. Rubin (2015), An Introduction to Causal Inference in Statistics, Biomedical and Social Sciences. Karpoff, Jonathan M., and Micahel D. Wittry (2014), Test identification with legal changes: The case of state antitakeover laws, Working paper, at http://ssrn.com/abstract=2493913. Keele, Luke, and Jason Morgan (2013), http://ssrn.com/abstract=2280347. Stronger Instruments by Design, working paper, at Larcker, David F., and Tjomme O. Rusticus (2010), On the Use of Instrumental Variables in Accounting Research, 49 Journal of Accounting and Economics 186-205. Roberts, Michael R., and Toni M. Whited (2013), Endogeneity in Empirical Corporate Finance, in George M. Constantinides, Milton Harris, and Rene M. Stulz., eds., Handbook of the Economics of Finance, vol. 2A, 493572. Rosenbaum, Paul R., 2009, Design of Observational Studies. Rubin, Donald B., 2008, For Objective Causal Inference, Design Trumps Analysis, 2 Annals of Applied Statistics, 808840. Small, Dylan, and Paul R. Rosenbaum (2008), War and Wages: The Strength of Instrumental Variables and Their Sensitivity to Unobserved Biases,” 103 Journal of the American Statistical Association, 924–933. Stock, James H., Jonathan H. Wright, and Motohiro Yogo (2002), A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments, 20 Journal of Business and Economic Statistics 518-529. Wooldridge, Jeffrey M. (2010), Econometric Analysis of Cross Section and Panel Data. 55 Table D&D-1. OLS Results (Table 3 in D&D) with Original and Balanced Samples Firm and year fixed effects regressions of tax adjusted Tobin’s q (defined in text) on indicated variables over 1993-2001. Model 1: Regression of tax-adjusted Tobin’s q on BTG (book-tax-gap, defined in text) and covariates. Odd-numbered regressions use D&D sample (862 firms; 762 with two or more observations, 4,392 effective observations). Evennumbered regressions use “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. Model 2 adds institutional ownership and interaction between BTG and institutional ownership. Covariates are NOLs (net operating losses), total accruals, Long term debt, and current debt, all divided by assets; R&D dummy, foreign losses dummy, ratio of option compensation to total compensation for top 5 executives, sales ($ millions), and implied BlackScholes share price volatility. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable tax-adjusted Tobin’s q Model Direct Effect Mediated by Inst. Ownership (1) (2) (3) (4) Sample original balanced original balanced 0.578 0.645 -2.166 0.131 BTG (1.03) (1.06) (1.45) (0.10) 5.669* 0.873 BTG * Inst. Ownership (1.70) (0.36) 0.682* 0.771** Institutional Ownership (1.89) (2.34) 1.327*** 0.891*** 1.269*** 0.810** Total Accruals (3.48) (2.62) (3.51) (2.45) 0.439*** 0.465*** 0.437*** 0.446*** Ratio of option to total compensation (3.64) (3.76) (3.64) (3.65) 0.044* 0.047* 0.060*** 0.063*** Sales (1.78) (1.88) (2.65) (2.78) -2.114*** -1.557*** -1.947*** -1.401*** Implied volatility (3.16) (2.81) (3.01) (2.62) 0.236 0.177 0.237 0.206 Net operating losses (0.73) (0.47) (0.75) (0.56) 5.545*** 5.489*** 5.169*** 5.338*** Foreign losses (3.40) (3.85) (3.34) (3.79) -2.317*** -2.214*** -2.250*** -2.158*** Long term debt (5.82) (5.47) (5.58) (5.14) -2.446*** -2.576*** -2.472*** -2.459*** Current debt (4.28) (4.92) (4.63) (4.56) -0.089 -0.551 -0.107 -0.432 R&D (0.04) (0.37) (0.05) (0.29) No. of firms 862 487 862 487 No of observations 4,492 3,466 4,492 3,466 56 Table DD-2. DiD and DiDiD Analysis Difference-in-difference (DiD) and triple difference (DiDiD) regressions of tax adjusted Tobin’s q on indicated variables, with firm and year fixed effects, over 1993-2001. “Treated” firms in DiD analysis have lowBTG96 =1 (below-median BTG in 1996). Treated firms in DiDiD analysis have lowBTRG96 = 1 and highInstOwn96 =1 (above-median institutional ownership in 1996); post = 1 for 1997 and after. Noninteracted dummies are absorbed by firm and year effects. Covariates are same as in Table 1. Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats errors clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable tax-adjusted Tobin’s q DiDiD: Mediated by Model DiD: Direct Effect Inst. Ownership (1) (2) (3) (4) lowBTG96 * post -0.006 -0.003 -0.039 0.033 (0.05) (0.03) (0.26) (0.25) highInstOwn96 * post 0.051 0.133 (0.35) (1.01) lowBTG96 * highInstOwn96 * post 0.076 -0.054 (0.35) (0.27) Total Accruals 1.019*** 1.009*** (3.10) (3.07) Ratio of option to total 0.461*** 0.456*** compensation (3.76) (3.71) Sales 0.060*** 0.060*** (2.67) (2.64) Implied volatility -1.592*** -1.622*** (2.82) (2.87) NOLs 0.083 0.081 (0.22) (0.22) Foreign losses 5.433*** 5.461*** (3.83) (3.86) LT debt -2.263*** -2.261*** (5.73) (5.72) Current debt -2.668*** -2.653*** (5.32) (5.34) R&D -1.047 -1.108 (0.80) (0.85) 57 Table DD-3. First-Stage IV (Table 4 in D&D) with Original and Balanced Samples First-stage instrumental variable regressions of BTG (book-tax gap, defined in text) on indicated instrumental variables, with firm and year fixed effects, over 1993-2001. Model 1: Instruments are NOL, Long Term Debt and Current Debt, each interacted with post dummy. Model 2: Adds as additional instruments, the interactions between these instruments and institutional ownership. Covariates are same as in Table DD-1, variables are defined in Table DD-1. Noninteracted post dummies are absorbed by firm and year effects. Odd-numbered regressions use D&D sample (862 firms; 762 with two or more observations, 4,392 effective observations). Even-numbered regressions use “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable BTG Model Direct Effect Mediated by Inst. Ownership (1) (2) (3) (4) Sample original balanced original balanced -0.080* -0.016 -0.092 -0.035** NOL*post (1.74) (0.45) (0.83) (2.51) -0.013 -0.010 -0.047 -0.045 LongTermDebt*post (0.80) (0.67) (1.23) (1.07) -0.088* -0.047 -0.241 -0.310*** Current Debt*post (1.73) (1.27) (1.34) (2.59) -0.058 0.037 NOL*post *inst. ownership (0.43) (0.19) LongTermDebt * post * inst. 0.055 0.055 ownership (0.96) (0.87) CurrentDebt * post * inst. 0.334 0.415** ownership (1.24) (2.52) Covariates Y Y Y Y No. of firms 862 487 862 487 No of obs. 4,492 3,466 4,492 3,466 F-test (joint significance of 1.48 1.05 3.33** 3.00** instruments) (p-value) (0.22) (0.39) (0.02) (0.01) 58 Table DD-4. DiD/DiDiD Assessment of Instrument Strength DiD/DiDiD regressions, with firm and year fixed effects, of BTG (book-tax gap, defined in text), on indicated variables. In Models (1), (2), (5) and (6) treated firms are defined as firms with Low Tax Shield96 =1 and HighInstOwn96=1. LowTaxShield96 = firm has below-median sum of ranks for NOLs, long term debt, and current debt in 1996. HighInstOwn96 is defined in Table DD-2. Noninteracted LowTaxShields96, high Inst. Own96, and post are absorbed by firm and year effects. Even-numbered regressions include covariates; In Models (3), (4), (7) and (8) treated firms are defined as firms with lowNOL96 =1, and HighInstOwn96=1. Low NOL96 = firm that has below-median NOLs in 1996. Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable BTG Model DiD: Direct Effect DiDiD: Mediated by Inst. Ownership lowTaxShield96*post (1) 0.003 (0.72) (2) -0.002 (0.52) lowNOLl96*post (3) (4) -0.002 (0.36) -0.000 (0.09) -0.005 (0.66) 0.002 (0.20) highInstOwn96*post lowTaxShield96*highInstOwn 96*post lowNol96*highInst Own96*post Total Accruals Ratio of option to total compensation Sales Implied volatility NOLs Foreign losses Long Term debt Current debt R&D (5) 0.002 (0.42) 0.200*** (5.69) -0.007 (1.06) 0.000 (0.19) -0.056** (2.29) -0.147*** (6.38) -0.086 (1.20) -0.076*** (3.93) -0.144*** (3.00) -0.770*** (8.30) 0.200*** (5.69) -0.007 (1.04) 0.000 (0.19) -0.056** (2.29) -0.147*** (6.37) -0.087 (1.20) -0.076*** (3.91) -0.143*** (2.99) -0.770*** (8.28) 59 (6) -0.001 (0.21) 0.002 (0.37) -0.002 (0.28) 0.197*** (5.57) -0.008 (1.21) 0.000 (0.41) -0.050** (2.18) -0.146*** (6.37) -0.091 (1.27) -0.073*** (3.79) -0.138*** (2.86) -0.762*** (8.18) (7) (8) -0.004 (0.69) -0.008 (0.84) 0.002 (0.33) 0.004 (0.59) 0.004 (0.43) -0.004 (0.49) 0.196*** (5.56) -0.007 (1.17) 0.000 (0.38) -0.050** (2.20) -0.146*** (6.39) -0.092 (1.28) -0.073*** (3.79) -0.137*** (2.86) -0.762*** (8.18) Table DD-5. Second-Stage IV (Table 5 in D&D) with Original and Balanced Samples Second stage instrumental variables regression of tax-adjusted Tobin's q or market/book ratio on instrumented BTG, instrumented (BTG * institutional ownership), and covariates, with firm and year fixed effects. Instruments for BTG in regression (1) are NOLs*post, long term debt * post, and current debt * post. Additional instruments in regressions (2) and (3) are these instruments interacted with institutional ownership. Variables are defined in Table DD-1. Oddnumbered regressions use D&D sample (862 firms; 762 with two or more observations, 4,392 effective observations). Even-numbered regressions use “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent Variable tax-adjusted Tobin’s q Market/Book Model Direct Effect Mediated by Inst. Ownership (1) (2) (3) (4) (5) (6) Sample Original Balanced Original Balanced Original Balanced 14.523 3.623 -5.871 -6.464 -6.931 -8.712 instrumented BTG (1.18) (0.48) instrumented BTG*institutional ownership Institutional ownership Total accruals Ratio of option to total compensation Sales Implied volatility NOLs Foreign losses Long Term debt Current debt R&D -2.831 (0.78) 0.349 (1.13) 0.043 (1.57) -1.022 (1.04) 1.879 (1.29) 6.519*** (2.87) -1.093 (0.88) 0.998 (0.34) 11.035 (1.06) 0.295 (0.19) 0.485*** (3.49) 0.059*** (2.63) -1.391** (2.17) 0.616 (0.51) 5.747*** (3.55) -1.988*** (2.96) -2.150* (1.84) 1.741 (0.29) 60 (1.14) 32.820** (2.52) (1.17) 14.255 (1.34) (1.41) 31.446** (2.45) (1.19) 16.230 (1.17) 1.033** (2.36) -1.359 (0.57) 0.484** (2.26) 0.047* (1.79) -1.210 (1.57) 1.191 (1.42) 4.570** (2.50) -1.401* (1.66) -1.042 (0.61) 6.327 (0.97) 0.925** (2.24) 0.742 (0.50) 0.461*** (3.40) 0.063*** (2.76) -1.459** (2.37) 0.195 (0.18) 5.143*** (3.54) -2.198*** (3.39) -2.568** (2.46) -0.963 (0.17) 1.059** (2.49) -0.445 (0.19) 0.553*** (2.88) 0.058** (2.37) -1.220 (1.55) 1.019 (1.30) 3.730** (2.09) -2.341*** (2.71) -2.604 (1.55) 4.791 (0.72) 1.157** (2.55) 1.344 (0.98) 0.494*** (3.62) 0.067*** (2.71) -1.541** (2.25) 0.031 (0.03) 4.461*** (3.20) -3.150*** (5.01) -4.104*** (4.76) -2.549 (0.48) Table DD-6. Intent-to-Treat Estimates Difference-in-difference (DiD) and triple difference (DiDiD) regressions of tax adjusted Tobin’s q on indicated variables, with firm and year fixed effects, over 1993-2001. LowTaxShield96 is defined in Table DD-4; HighInstOwn96 is defined in Table DD-2. Noninteracted LowTaxShields96, high InstOwn96, and post are absorbed by firm and year effects. Evennumbered regressions include same covariates as in Table DD-1. Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable tax-adjusted Tobin’s q DiDiD: Mediated by Model DiD: Direct Effect Inst. Ownership (1) (2) (3) (4) lowTaxShield96 * post 0.035 -0.016 0.063 -0.007 (0.31) (0.16) (0.42) (0.05) highInstOwn96 * post 0.117 0.119 (0.60) (0.70) lowTaxShield96 * highInstOwn96 * -0.057 -0.023 post (0.26) (0.11) Total Accruals 1.022*** 1.010*** (3.10) (3.06) Ratio of option to total compensation 0.460*** 0.455*** (3.75) (3.69) Sales 0.060*** 0.060*** (2.66) (2.64) Implied volatility -1.596*** -1.622*** (2.83) (2.86) NOLs 0.082 0.079 (0.22) (0.22) Foreign losses 5.435*** 5.460*** (3.83) (3.86) LT debt -2.262*** -2.255*** (5.73) (5.76) Current debt -2.675*** -2.660*** (5.29) (5.34) R&D -1.049 -1.111 (0.80) (0.85) 61 Table DD-7. Covariate Balance Summary statistics on covariate balance for “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. Table shows means for three possible ways to define treated and control firms. Panel A. treated if lowBTG96 (defined in Table DD-2) =1. Panel B. treated if highInstOwn96 (defined in Table DD-2) = 1. Panel C. treated if lowTaxShield96 (defined in Table DD-4) = 1. Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. Covariates are defined in Table 1 and measured in 1996. Table shows t-test for differences in covariates xj (indexed by j), | t j || x jt x jc | /[(s 2jt / Nt s 2jc / Nc )]1/2 where sjt and sjc are standard deviations for treated and control groups. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Table also shows absolute values of “normalized differences”, suggested by Imbens and Rubin (2014), ND j | x jt x jc | /[(s 2jt s 2jc ) / 2]1/2 . Panel A. Treated = Below Median BTG in 1996; Control = Above-Median Variable Tax-adjusted q Total Accruals Options to Total Comp. Sales Implied Volatility Foreign Losses R&D No. Firms Mean (Controls) 2.406 -0.031 0.355 3.753 0.312 0.030 0.039 248 Mean (Treated) 2.311 -0.039 0.352 3.017 0.365 0.034 0.049 239 Norm. Difference 0.040 0.078 0.007 0.076 0.252 0.097 0.120 t-statistic 0.62 1.23 0.10 1.19 4.07*** 1.53 1.89* Panel B. Treated = Above Median Institutional Ownership in 1996; Control = Below-Median Variable Tax-adjusted q Total Accruals Options to Total Comp. Sales Implied Volatility Foreign Losses R&D No. Firms Mean (Controls) 2.357 -0.039 0.323 3.613 0.350 0.030 0.042 244 Mean (Treated) 2.361 -0.032 0.384 3.169 0.326 0.034 0.045 243 Norm. Difference 0.002 0.073 0.173 0.046 0.119 0.077 0.035 t-statistic 0.03 1.14 2.74*** -0.72 1.86* 1.21 0.54 Panel C. Treated = Below Median Tax Shields in 1996; Control = Above-Median Variable Tax-adjusted q Total Accruals Options to Total Comp. Sales Implied Volatility Foreign Losses R&D No. Firms Mean (Controls) 2.774 -0.032 0.368 2.171 0.366 0.033 0.054 241 Mean (Treated) 1.954 -0.038 0.339 4.587 0.311 0.030 0.033 246 62 Norm. Difference 0.329 0.054 0.083 0.247 0.265 0.063 0.250 t-statistic 5.46*** 0.85 1.29 3.97*** 4.28*** 0.98 4.03*** Table DMO-1. Comparison of Compliers and Non-Compliers as of 2000 Treated (control) firms are firms which lack (have) 100% independent directors on the audit committee as of 2000. Sample is 905 firms included in the Tobin’s q regressions in Table DMO-3. Table shows two-sample t-test for differences in means and “normalized differences”, suggested by Imbens and Rubin (2014), ND j | x jt x jc | /[( s 2jt s 2jc ) / 2]1/2 , where xjt and xct are values for treated and control firms, and sjt and sjc are the corresponding standard deviations. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Amounts in $ millions. Variable Mean Mean Norm. t-statistic (Controls) (Treated) Difference Core Variables Pct. Independent Directors 69.69 53.04 -0.586 15.70*** Pct. Independent Directors on Audit Committee 100.00 63.57 -0.903 53.60*** Information Cost Index 0.486 0.459 -0.099 2.03** Number of Analysts 16.076 16.683 0.038 0.77 Analyst dispersion 0.085 0.067 -0.120 2.35** Analyst Forecast Error 0.162 0.152 -0.026 0.56 Pre-treatment outcome variables ROA 0.149 0.152 0.028 0.58 Q 2.157 2.365 0.064 1.41 Annual Return 0.013 0.012 -0.024 0.51 Covariates used by DMO MV of equity 8,471 12,451 0.081 1.83* Assets 12,096 15,205 0.038 0.84 Board Size 9.657 9.989 0.079 1.68* Book leverage 0.412 0.325 -0.061 1.41 Firm Age 26.9 26.30 -0.029 0.61 Other potential covariates (in DMO dataset) Annualized std. dev. of returns 0.152 0.145 -0.073 1.53 Intangible assets 0.716 0.700 -0.049 1.04 Market/book ratio 3.506 3.752 0.031 0.63 Number of business segments 2.909 2.855 -0.020 0.40 63 Table DMO-2. DMO Table 3, cols. (1)-(4) and Intent-to-Treat DiD Dependent variables δROA and δQ are difference between 2005 and 2000 values of Q and ROA respectively. Dependent variable mean return equals the average monthly return from 2000 to 2005. Non-comply dummy = 1 if firm did not have 100% independent audit committee in 2000. The reported first-stage regression corresponds to Model (3) ( δq). DMO’s reported first-stage differs slightly (coefficient = 11.383, s.e. = 1.021) because they include in their first stage regression observations with missing second stage dependent variable. FF Industry Dummies = dummies for each of the 48 FamaFrench industries. T-stats, clustered on industries in parentheses. . *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable Non-comply dummy First-Stage IV δIndep. Directors (1) 11.399*** (9.40) δIndep Info Cost PctIndep Board Size Book Leverage Age Market Cap FF Industry Dummies Number of obs. R2 not incl. not incl. -0.186 (0.95) 0.149 (0.30) -0.073* (2.00) 0.275 (1.42) Yes 990 0.14 Second-Stage IV Intent-to-Treat DiD δROA δq (2) (3) mean return (4) 0.001 (0.03) not incl. not incl. -0.021 (0.17) 0.967** (2.53) 0.010 (0.54) -0.366*** (2.63) Yes 983 0.06 -0.252 (1.20) not incl. not incl. 1.393* (1.93) 5.090*** (4.26) 0.495*** (4.08) -13.873*** (7.33) Yes 990 0.33 0.005 (0.93) not incl. not incl. 0.002 (0.10) 0.045 (0.77) 0.002 (0.81) -0.361*** (7.65) Yes 880 0.29 64 δROA δq (5) 0.011 (0.03) (6) -2.874 (1.13) mean return (7) 0.057 (0.91) not incl. not incl. -0.021 (0.16) 0.967** (2.44) 0.010 (0.51) -0.365** (2.52) Yes 983 0.06 not incl. not incl. 1.440* (1.89) 5.053*** (4.41) 0.513*** (3.88) -13.942*** (6.99) Yes 990 0.33 not incl. not incl. 0.001 (0.07) 0.046 (0.79) 0.002 (0.72) -0.361*** (7.36) Yes 880 0.29 Table DMO-3. DMO Model versus IV Model without Control for Information Cost Results for DMO model and analogous 2SLS regressions. Dependent variables δROA and δQ are differences between 2005 and 2000 values of Q and ROA respectively. Mean return is average monthly return from 2000 through 2005. Non-comply dummy =1 if firm lacks 100% independent audit committee in 2000; 0 otherwise. Reported first-stage regressions are for sample with δQ as outcome variable. δIndep is predicted in a separate regression in the DMO model, and instrumented in our IV results. FF Industry Dummies = dummies for 48 Fama-French industries. t-statistics, with industry clusters, in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. 0 (1) δROA DMO Model (2) δQ (3) Mean return 0.269*** (2.72) 1.918*** (5.82) 0.056*** (6.15) Non-comply dummy Non-comply dummy * Info Cost Predicted δIndep First Stage IV (4) (5) δIndep δIndep*Info Cost -1.714* 12.126*** (1.79) (4.43) -0.548 15.436*** (0.13) (6.29) Instrumented δIndep (Predicted δIndep) * Info Cost -0.587*** (3.10) -4.714*** (7.90) Book Leverage Age Market Cap FF Industry Dummies 1st stage F-statistic Number of obs. R2 not incl. not incl. -0.000 (0.00) 1.000*** (2.93) 0.011 (0.49) -0.442*** (2.95) Yes -897 not incl. not incl. 1.307* (1.83) 5.167*** (8.18) 0.562*** (3.89) -14.985*** (6.83) Yes -905 Second Stage IV (7) (8) δQ Mean return 0.258*** (2.72) 1.754*** (4.90) 0.062*** (5.59) -0.556*** (2.79) not incl. not incl. -0.041 (0.29) 0.935** (2.58) 0.022 (1.04) -0.447*** (2.73) Yes -897 0.03 -4.281*** (5.38) not incl. not incl. 1.010 (1.42) 4.673*** (4.47) 0.647*** (4.95) -15.035*** (8.47) Yes -905 0.28 -0.113*** (4.59) not incl. not incl. -0.014 (0.72) 0.030 (0.49) 0.006* (1.76) -0.391*** (8.34) Yes -805 0.20 -0.103*** (4.98) Instrumented (δIndep * Info Cost) Info Cost PctIndep Board Size (6) δROA not incl. not incl. -0.003 (0.17) 0.045 (0.67) 0.004 (1.07) -0.384*** (7.42) Yes -897 not incl. not incl. -0.118 (0.58) 0.225 (0.48) -0.073* (1.88) 0.242 (1.03) Yes 50.94 905 0.15 65 not incl. not incl. -0.129 (1.33) 0.054 (0.23) -0.015 (0.82) 0.032 (0.22) Yes 43.26 905 0.17 Table DMO-4. Adding Information Cost as Control: IV and Intent-to-Treat DiDiD-Continuous Results for 2SLS and analogous intent-to-treat DiD regressions. Dependent variables, instruments, instrumented variables, and sample are same as in Table DMO3. Covariates are same, except we add Info Cost as a covariate. Reported first-stage regressions are for sample with δQ as outcome variable. t-statistics, with industry clusters, in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent Variable: Non-comply dummy Non-comply dummy * Info Cost First Stage IV (1) (2) δIndep δIndep*Info Cost -0.299 10.490*** (0.21) (3.23) 2.869 12.482*** (0.46) (3.49) Instrumented δIndep Instrumented δIndep * Info Cost Info Cost PctIndep Board Size Book Leverage Age Market Cap FF Industry Dummies 1st stage F-statistic Number of obs. R2 -4.025 (0.87) not incl. -0.127 (0.64) 0.237 (0.54) -0.073* (1.89) 0.191 (0.78) Yes 49.08 905 0.16 3.480 (1.42) not incl. -0.122 (1.18) 0.044 (0.17) -0.015 (0.81) 0.076 (0.55) Yes 45.38 905 0.17 Second Stage IV (4) (5) δQ Mean return (3) δROA 0.234* (1.72) -0.507* (1.72) -0.914 (0.29) not incl. -0.039 (0.27) 0.940*** (2.67) 0.021 (0.97) -0.456*** (3.07) Yes -897 0.04 66 1.025 (1.63) -2.758** (2.15) -28.171* (1.89) not incl. 1.060 (1.49) 4.835*** (5.83) 0.617*** (4.95) -15.263*** (7.85) Yes -905 0.33 0.063*** (3.20) -0.116*** (2.85) 0.049 (0.11) not incl. -0.014 (0.72) 0.029 (0.49) 0.006* (1.69) -0.390*** (8.57) Yes -805 0.19 (6) δROA Intent-to-Treat DiDiD (7) (8) δQ Mean return 2.573* (1.76) -5.596* (1.83) 11.575* (1.81) -31.487** (2.53) 0.613*** (3.03) -1.109** (2.57) -3.639 (1.68) not incl. -0.004 (0.14) 0.973*** (0.36) 0.011 (0.02) -0.457*** (0.15) Yes -897 0.08 -41.892*** (4.84) not incl. 1.266* (0.74) 4.956*** (0.67) 0.584*** (0.15) -15.277*** (2.23) Yes -905 0.38 -0.660** (2.30) not incl. -0.005 (0.02) 0.042 (0.06) 0.003 (0.00) -0.388*** (0.05) Yes -805 0.32 Table DMO-5. Adding Pct. Independent Directors as Control: IV Results 2SLS regressions. Sample is trimmed to PctIndep (0.25 , 0.80]. Columns (1)-(4) report first- and second-stage results using non-comply dummy to instrument for δIndep. Columns (5)-(9) report results using non-comply dummy to instrument for δIndep. In these columns, dependent variables, instruments, and instrumented variables are same as in Table DMO-4, and covariates are the same, except we add PctIndep. Reported first-stage regressions are for sample with δQ as outcome variable. t-statistics, with industry clusters, in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Sample PctIndep (0.25 , 0.80] Dependent Variable: Non-comply dummy First Stage IV (1) δIndep (2) δROA Second Stage IV (3) (4) δQ Mean return 2.277** (2.34) Non-comply dummy * Info Cost -0.037 (-0.16) -2.514* (-1.74) 0.001 (0.03) -1.487 (-0.52) -0.589*** (-13.30) 0.064 (0.35) 0.569** (2.41) 0.050 (1.34) 0.495 (1.64) Yes -6.371*** (-3.05) -0.042 (-0.29) -0.005 (-0.04) 1.300*** (3.70) 0.006 (0.22) -0.355 (-1.29) Yes -53.387*** (-5.18) -1.542* (-1.73) 1.651* (1.93) 6.723*** (4.99) 0.752*** (4.32) -14.998*** (-7.48) Yes -1.109*** (-3.12) -0.003 (-0.09) -0.005 (-0.17) -0.018 (-0.42) 0.004 (0.75) -0.371*** (-8.62) Yes 719 0.33 712 0.10 719 0.01 638 0.30 Instrumented δIndep First Stage IV (5) (6) δIndep δIndep* Info Cost -1.263 -3.346*** (0.61) (2.90) 7.519* 10.083*** (1.82) (3.45) Instrumented δIndep * Info Cost Info Cost PctIndep Board Size Book Leverage Age Market Cap FF Industry Dummies 1st stage F-stat (p-value) Number of obs. R2 67 -4.654 (1.38) -0.586*** (13.46) 0.054 (0.31) 0.573** (2.45) 0.048 (1.27) 0.488 (1.65) Yes 4.09 (.023) 5.574*** (2.74) -0.278*** (11.41) -0.045 (0.39) 0.267* (1.91) 0.045** (2.58) 0.173 (1.18) Yes 6.14 (.0043) (7) δROA 0.412 (0.77) -0.696 (1.06) 1.161 (0.17) 0.026 (0.15) -0.060 (0.37) 1.227*** (3.61) 0.016 (0.56) -0.441 (1.32) Yes -- Second Stage IV (8) (9) δQ Mean return -1.017 (0.36) -2.433 (0.74) -27.261 (0.75) -1.346 (1.31) 1.477 (1.63) 6.507*** (4.55) 0.791*** (5.01) -15.294*** (7.75) Yes -- 0.170 (0.94) -0.256 (1.39) 1.641 (0.74) 0.026 (0.40) -0.030 (0.64) -0.056 (0.80) 0.007 (1.18) -0.403*** (7.95) Yes -- Table Iliev-1. Extended Version of Iliev (2010) Table 2 Ordinary least squares (2SLS) regressions of ln(audit fees in 2004) on dummy for SOX § 404 compliance, indicated covariates (measured for, or at end of, fiscal 2004), 10 industry dummies, and constant term. Sample is 281 firms with 2004 free float [$50M, 100M]. Non-growth variables are defined in Iliev (2010); growth variables are measured from 2002-2004. The first-stage regressions have the same controls and fixed effects as the second stage. t-statistics, with heteroskedasticity-consistent standard errors, are in brackets. 95% confidence interval (CI) for compliance dummy shown below t-statistic. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively. Significant results (at 5% or better) in boldface. Amounts in $M. Dependent variable ln(2004 audit fees) Our regression (1) (1A) (2) (2A) Iliev regression (1) (2) 0.866*** 0.749*** 0.744*** 0.706*** Compliance dummy [7.57] [7.36] [7.39] [6.40] 95% CI [0.64, 1.09] [0.55, 0.95] [0.55, 0.94] [0.49, 0.92] Implied % increase in fees 138% 112% 111% 103% Cubic in float Yes Yes Yes Yes 0.020 0.050 0.088 Ln(market cap) [0.21] [0.51] [0.87] 0.042 0.031 0.047 Ln(sales) [1.52] [1.09] [1.32] 0.354*** 0.235*** 0.183** Ln(assets) [5.43] [3.35] [2.33] 0.612*** 0.653*** Leverage [2.62] [2.69] 0.086 0.105 Receivables/assets [0.35] [0.43] 0.370*** 0.361*** Big auditor [3.94] [3.80] 0.040 0.043 No. of business segments [1.45] [1.53] 0.070*** 0.072*** No. of geographic segments [2.91] [3.03] Growth variables 0.012 Change in ln(float) [0.55] -0.096* Change in ln(market cap) [-1.95] -0.020 Change in ln(sales) [-0.45] 0.078 Change in ln(assets) [0.85] F-stat for growth variables 10.93*** Industry dummies, constant Yes Yes Yes Yes Sample 281 281 281 275 R2 0.32 0.49 0.55 0.56 68 Table Iliev-2. Summary Statistics for Covariate Balance Panel A. Covariate balance in 2002, for 180 firms in Iliev dataset with 2002 free float [$50M, $75M] versus 110 firms with 2002 free float [$50M, $75M]. Panel B. Covariate balance in 2004, for 164 SOX compliers and 117 controls with 2004 free float [$50M, $100M]. Panel C. Panel C. Covariate balance in 2004, for 22 shrinkers (IVcompliers; with 2002 float > $75M and 2004 float < $75M), and 117 controls. All panels. Growth variables are measured from 2002-2004. Tables show t-statistic for differences in means between treated and control group and “normalized difference”, (see Imbens and Rubin, 2014), defined as ND j | x jt x jc | /[(s 2jt s 2jc ) / 2]1/2 , where xj is a covariate and sjt and sjc are standard deviations for the treated and control groups. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant differences (at 5% level or better) are in boldface. Panel A. Covariate Balance in 2002: Free Float [$50M, $75M] versus [$75M, $100M] Variable Size variables ln(float) ln(market cap) ln(sales) ln(assets) Iliev’s other covariates Leverage Receivables/ assets Big auditor No. of business segments No. of geographic segments Growth variables Change in ln(float) Change in ln(market cap) Change in ln(assets) Change in ln(sales) Float < $75M Means Float > $75M Norm. Difference t-statistic 4.474 4.819 4.173 4.646 4.769 5.027 4.527 4.982 0.194 0.277 0.107 0.238 2.14** 3.34*** 1.28 2.86*** 0.135 0.257 0.783 1.739 1.600 0.159 0.298 0.791 1.845 2.145 0.086 0.132 0.013 0.057 0.229 1.02 1.57 0.15 0.68 2.77*** 0.353 0.252 0.121 0.134 0.315 0.124 0.158 0.159 -0.025 -0.117 0.057 0.026 0.28 1.30 0.68 0.31 Panel B. Covariate Balance in 2004: SOX-compliers versus Controls Variable Size variables ln(float) ln(market cap) ln(sales) ln(assets) Iliev’s other covariates Leverage Receivables/ assets Big auditor No. of business segments No. of geographic segments Growth variables Change in ln(float) Change in ln(market cap) Change in ln(assets) Change in ln(sales) Controls Means SOX-compliers Norm. Difference T-Test Value 4.128 4.385 4.221 4.467 4.386 4.783 4.138 4.701 0.762 0.533 -0.027 0.154 13.43*** 7.42*** 0.31 1.855* 0.177 0.304 0.744 2.154 1.974 0.152 0.261 0.829 1.774 1.976 -0.085 -0.141 0.146 -0.182 0.000 1.00 1.67* 1.75* 2.19** 0.01 0.960 0.508 0.142 0.121 0.303 0.159 0.073 0.008 -0.331 -0.272 -0.113 -0.089 4.33*** 3.25*** 1.30 1.00 69 Table Iliev-2. (Cont.) Panel C. Covariate Balance in 2004: Shrinkers (IV-compliers) versus Controls Variable Size variables ln(float) ln(market cap) ln(sales) ln(assets) Iliev’s other covariates Leverage Receivables/ assets Big auditor No. of business segments No. of geographic segments Growth variables Change in ln(float) Change in ln(market cap) Change in ln(assets) Change in ln(sales) Controls Means SOX-compliers Norm. Difference T-Test Value 4.128 4.385 4.221 4.467 4.125 4.616 3.653 4.818 -0.015 0.353 -0.138 0.242 0.09 2.18** 1.12 1.36 0.177 0.304 0.744 2.154 1.974 0.090 -0.224 0.298 -0.223 0.153 0.589 -1.295 1.697* -1.333 0.954 0.09 0.22 0.30 0.22 0.15 0.960 0.508 0.142 0.121 -0.566 -0.582 -0.105 -0.245 -0.652 -0.700 -0.348 -0.144 4.19*** 5.76*** 2.61** 1.43 70 Table Iliev-3. Extended Version of Iliev (2010) IV Analysis in His Table 2 Two-stage least squares (2SLS) regressions of ln(audit fees in 2004) on dummy for SOX § 404 compliance, indicated covariates (measured for, or at end of, fiscal 2004), 10 industry dummies, and constant term. Sample is 281 firms with 2004 free float [$50M, 100M]. Non-growth variables are defined in Iliev (2010); growth variables are measured from 2002-2004. First-stage regressions have the same covariates as the second stage. t-statistics, with heteroskedasticity-consistent standard errors, are in brackets. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively. Significant results (at 5% or better) in boldface.18 Stage Dep. variable Our regression Iliev regression IV (2002 float > $75M) First (3) (3) 0.466*** [7.28] First First First SOX Compliance (3A) (4) (4A) (4) 0.325*** 0.376*** 0.287*** [6.48] [6.10] [5.88] First Second (4B) (3) (3) Log market cap Log sales Log assets Leverage Receivables/assets Big auditor No. of business segments No. of geographic segments No Yes No 0.328*** [5.59] -0.002 [0.13] 0.030 [0.72] -0.063 [0.38] -0.165 [1.15] -0.014 [0.22] -0.029 [1.51] 0.004 [0.31] Yes 0.148*** [3.03] -0.007 [-0.43] 0.024 [0.74] 0.054 [0.39] -0.230** [-2.11] 0.009 [0.18] -0.031** [-2.30] 0.006 [0.60] Second Second Ln(2004 audit fees) (3A) (4) (4A) (4) Second (4B) 0.237*** [4.53] Instrumented SOX compliance 95% CI Implied % increase in fees Cubic in float Second Yes 0.156*** [3.09] 0.004 [0.25] -0.001 [-0.02] 0.095 [0.65] -0.237** [-2.21] 0.002 [0.05] -0.031** [-2.35] 0.008 [0.86] 1.171*** [4.95] [0.71,1.64] 223% No 1.332*** [3.98] [0.67,1.99] 279% Yes 0.983*** [3.65] [0.45,1.51] 167% No -0.052 [-0.38] 0.034 [1.09] 0.218*** [2.79] 0.647*** [2.75] 0.129 [0.55] 0.373*** [3.86] 0.047* [1.71] 0.069*** [2.77] 1.070*** [3.11] [0.39,1.75] 192% Yes -0.006 [-0.05] 0.036 [1.14] 0.218*** [2.78] 0.598** [2.51] 0.160 [0.64] 0.364*** [3.81] 0.051* [1.82] 0.069*** [2.71] 1.179** [2.45] [0.23,2.12] 225% Yes 0.001 [0.01] 0.048 [1.34] 0.180** [2.17] 0.598** [2.32] 0.208 [0.80] 0.357*** [3.69] 0.057* [1.90] 0.067** [2.58] Growth variables -0.029*** [-2.91] -0.030 Change in ln(float) Change in ln(market cap) 18 0.038 [1.14] -0.073 We replicated Iliev’s first- and second-stage coefficients, and his second-stage, but not first-stage standard errors. For regression (3), he reports t = 10.15 (we find 7.28); for regression (4), he reports t = 7.74 (we find 6.10). 71 Stage Dep. variable Our regression Iliev regression First First (3) (3) (3A) First SOX Compliance (4) (4) First First Second (4A) (4B) (3) (3) Yes Yes 281 0.60 [-1.00] -0.015 [-0.30] -0.022 [-0.98] 3.96*** Yes Yes 275 0.62 Change in ln(sales) Change in ln(assets) F-test for growth vars. Free float cubic Industry dummies, constant Observations R2 No Yes 281 0.21 Yes Yes 281 0.56 No Yes 281 0.54 72 No Yes 281 0.28 Second Second Second Ln(2004 audit fees) (3A) (4) (4A) (4) Yes Yes 281 0.28 No Yes 281 0.54 Yes Yes 281 0.54 Second (4B) [-1.26] 0.081 [0.84] -0.009 [-0.17] 9.99*** Yes Yes 275 0.53 Table Iliev-4. DiD Analysis Using Strata (sample with 2002 float [$ 50M, $112.5M) Firm fixed effects regressions of ln(audit fees) on SOX compliance dummy and indicated covariates, using combined DiD/RD research design (DiD for firms with 2002 float with a [$50M, $112.5M] bandwidth around the SOX compliance threshold of $75M). Sample period is 2002-2004. For the Shrinkers strata (regressions (1)-(3)), sample is firms with ln(2004 float/2002 float) [-1.3, 0] (maximum drop in float of 73%). Treated firms are firms with 2002 float > 75M; control firms have 2002 float < 75M. in the Modest Grower strata comparison, We focus only on modest growers (firms with higher float in 2004 than 2002) and exclude firms with 2002 float < $75M but 2004 float > $75M and firms with 2002 float > $75M and 2004 float > 112.5M. Treated firms are modest growers with 2002 float > 75M. Control firms are firms with 2002 float < 75M. Some control firms may have limited 2004 float to avoid SOX § 404 compliance. In the combined strata comparison, we pool the treated and control firms from the Shrinker and Modest Grower strata. Standard errors clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent variable Regression Cubic Float Terms SOX Compliance Dummy (Treated * Post) 95% CI Implied % increase in fees (1) No 0.616*** [3.53] [0.27,0.97] 85% Log sales Log assets Log market size of equity Leverage Receivables/ total assets Big auditor No. of business segments No. of geogr. segments No. Obs. No. Firms No. Treated Firms 197 67 39 Shrinker Strata (2) No 0.590*** [3.17] [0.22,0.96] 80% 0.128** [2.49] 0.108 [0.61] 0.023 [0.29] 0.162 [0.52] -0.228 [-0.27] 0.484*** [4.09] -0.024 [-0.22] 0.036 [0.47] 197 67 39 (3) Yes 0.606*** [3.25] [0.23,0.98] 83% 0.141** [2.11] 0.130 [0.71] 0.015 [0.18] 0.130 [0.45] -0.326 [0.35] 0.465*** [3.51] -0.021 [0.19] 0.038 [0.50] 197 67 39 Ln(audit fees) Modest Grower Strata (4) (5) (6) No No Yes 0.645*** 0.632*** 0.507** [3.48] [3.75] [2.29] [0.34,1.03] [0.29,0.97] [0.06,0.96] 91% 88% 66% -0.304 -0.184 [-0.98] [-0.49] 0.996** 0.960** [2.54] [2.41] -0.078 -0.090 [-0.79] [-0.89] -0.450 -0.342 [-0.93] [-0.70] 0.981 0.898 [1.52] [1.28] 0.379*** 0.363** [2.84] [2.64] 0.024 0.051 [0.25] [0.51] 0.096 0.083 [0.53] [0.42] 120 120 120 40 40 40 20 20 20 73 (7) No 0.627*** [4.97] [0.38,0.88] 87% 317 107 59 Both Strata (8) No 0.578*** [4.43] [0.32,0.84] 78% 0.105 [1.37] 0.270* [1.67] 0.010 [0.14] -0.038 [-0.14] 0.247 [0.44] 0.434*** [4.77] -0.021 [-0.22] 0.045 [0.65] 317 107 59 (9) Yes 0.586*** [4.58] [0.33,0.84] 80% 0.120 [1.46] 0.285* [1.72] 0.001 [0.02] -0.056 [0.21] 0.177 [0.29] 0.424*** [4.57] -0.018 [0.18] 0.048 [0.70] 317 107 59 Table Iliev-5. Selection Bias Check: Grower-compliers vs. Growers Forced to Comply DiD model of log(2004 audit fees) for grower-complier firms versus growers forced to comply. Sample period is 20022004. We start with a sample of firms with 2002 float between $50M and $112.5M. We focus only on growers (firms with higher float in 2004 than 2002) and exclude firms with an increase in ln(float) from 2002 to 2004 greater than 1.3 (increase of 267%). “Grower-compliers are firms that move from 2002 float < 75M to 2004 float > 75M. Growers force to comply are firms with 2002 float > 75M in 2002 and 2004 float > $112.5M. Models 1 and 2 use firms with data in 2002 and 2004; Model 3 uses a balanced sample of firms with data in all three years. Standard errors clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Dependent Variable Cubic Float Terms growerComplier * Post Ln(Audit Fees) No -0.041 [0.53] 0.124*** [2.64] 0.117 [1.05] -0.069 [1.30] 0.198 [0.67] -0.009 [0.02] 0.557*** [3.56] 0.028 [0.39] 0.043 [1.02] 546 187 106 No -0.064 [-0.81] Log sales Log assets04 Log market size of equity Leverage Receivables scaled by total assets Big auditor No. of business segments No. of geogr. segments. No. Obs. No. Firms No. Treated Firms 546 187 106 74 Yes -0.037 [0.48] 0.124*** [2.62] 0.116 [1.01] -0.070 [1.28] 0.198 [0.66] -0.005 [0.01] 0.556*** [3.54] 0.028 [0.40] 0.043 [1.02] 516 172 93 Figure DD-1. Average Tobin’s q of Treated vs. Control firms through Time Average Tobin’s q for treated and control firms over the 1993-2001 period. Treated firms have below-median LowTaxShield96 (defined in Table DD-4), indicating greater need to shelter taxable income. Control firms have above-median values for this variable. 75 Figure DMO-1. Scatter Plot of PctIndep versus δIndep Scatter plot of percentage of independent directors in 2000 (PctIndep) and change in this percentage from 2000 to 2005 (δIndep). Complier (non-complier) firms are firms that do (do not) have 100% independent audit committee in 2000. Compliers (noncompliers) are shown with (green circles) (orange triangles). We add vertical lines at PctIndep=25% and 80% to highlight the limited overlap between treated and control firms outside these bounds. Sample = 905 firms used in Tobin’s q regressions in Table DMO-3. Correlation between PctIndep and δIndep is r = -0.67. 76 Figure DMO-2. Number of Treated and Control Firms Within by PctIndep Bins Histogram plot of number of complier (control) and non-complier (treated) firms, by percentage of independent directors in 2000 (PctIndep). We add vertical lines at PctIndep=25% and 80% to highlight the limited overlap between treated and control firms outside these bounds. Firms with exactly 25% independent directors are included in 20-25% bin, and similar for other bins. Sample is same as Figure DMO-1. 77 Figure DMO-3. δIndep for Complier ant and Non-Compliant Firm by Bins of Percent Independent Directors in 2000 Figure shows mean δIndep, separately for complier (control) and non-complier (treated) firms, within bins for percentage of independent directors in 2000 (PctIndep). Firms with exactly 30% independent directors are included in 25-30% bin, and similar for other bins. Sample is 711 firms with PctIndep (0.25, 0.80]. 78 Figure Iliev-1. Audit Fees vs. Free Float and SOX 404 Compliance in 2004 Natural logarithm of 2004 audit fees versus 2004 free float. Graph shows four groups: (i) 41 “shrinker-complier” firms which shrink from float > $75M (in 2002 or 2003) to < $75M in 2004 [red triangles]; (ii) 80 “grower-complier” firms which grow from float < $75M in 2002 to > $75M in 2004 [green diamonds]; (iii) 117 control firms with float < $75M over 2002-2004 [black circles]; and (iv) 42 “large-complier” firms with float > $75M in 2004 [black hollow circles]. Red line is from regression, for the 163 SOX-complier firms, of ln(2004 audit fees) on 2004 float, large dummy (=1 if 2004 float > $75M), and constant term. Equation is: 12.25 + .013 * Float2004 -0.527 [t= 2.14] * (large dummy). Green line ending at float of $75M is from regression for 117 control firms of ln(2004 fees) on 2004 float and constant term. Equation is: 12.23 + .006 * Float2004. For firms with 2004 float just below (above) $75M, predicted difference in ln(2004 fees) between shrinker-compliers (other SOX-compliers) and control firms = 1.07 (0.54). 79 Figure Iliev-2. Growth Trajectories for Instrument-compliers, Other SOX-compliers, and Control Firms Scatter plot of free float in 2002 and 2004 and representation of three groups of firms included in Iliev’s study: 22 IV-compliers with 2002 float > $75M but float in 2003 and 2004 < $75M, shown with red triangles and red border; 119 control firms, with 2002 float and 2004 float < $75M, shown with green diamonds and green border; and 142 SOX-complier firms with 2002 float > $75M, shown with black circles and black border. The remaining firms represented with black dots are firms included in the dataset provided by Iliev, but fall outside his $50-$100M 2004 Free Float band. We truncate the sample of firms not used by Iliev at 2002 and 2004 free float between $20M and $250M. Dotted line is 45-degree line; firms above (below) the line have an increase (decrease) in float from 2002 to 2004. 80 Figure Iliev-3. Change in Ln(Float) for IV Compliers versus Controls Histogram of number of firms within indicated ranges for change in ln(float) from 2002-2004, for 22 “IV-complier” firms (2002 float > $75M but 2004 float < $75M) and 116 SOX-exempt firms (float < $75M over 2002-2004).We drop three of the 119 SOX-exempt firms in the Iliev sample, because they report float in 2002>$75 Million. There is overlap only in the [-0.25, 0] bin of change in ln(float). This bin includes 7 IV-compliers and 6 control firms. 81 Figure Iliev-4. Ln(Float) in 2004 vs 2002 for Shrinkers Strata Scatter plot of free float in 2002 and 2004 and visual representation of the treated and control firms included in the DiD analysis of shrinkers reported in Table Iliev-4. The sample is confined to firms with 2002 Float between $50 and 112.5M. Treated firms, represented by red triangles, have 2002 Float>$75M and 2002 Float>2004 Float. Control firms, represented by green diamonds, have 2002 Float <$75M and 2002 Float>2004 Float. The red and green areas represent the additional imposed filter that the difference between ln(float) 2004 and Log(Float) 2002 should be higher than -1.3 (less than 73% drop in float). We truncate the sample of remaining firms represented by black dots at 2002 and 2004 free float between $20M and $250M. 82 Figure Iliev-5. Log(Float) 2004 vs log(Float) 2002 and the Modest Grower Strata Visual representation of the treated and control firms included in the DiD analysis of modest growers reported in Table Iliev-3. The sample is confined to firms with 2002 float between $50 and 112.5M. Treated firms, represented by red triangles, have 2002 float > $75M and 2004 float < 112.5M. Control firms, represented by green diamonds, have 2002 float <$75M and 2004 float < $75M. We truncate the sample of remaining firms represented by black dots at 2002 and 2004 free float between $20M and $250M. 83 Figure Iliev-6. Log(Float) 2004 vs log(Float) 2002: Grower-Compliers Versus Similar Growers Forced to Comply Visual representation of the treated and control firms included in the DiD analysis of grower compliers vs. growers forced to comply in Table Iliev-4. Sample is limited to firms with $50M < 2002 Float < $112.5M. Grower-compliers represented by red triangles, have 2002 Float < $75M and 2004 Float > $75M. Growers forced to comply, represented by green triangles, have 2002 float >$75M and 2004 float > $112.5M. Red and green shaded areas represent additional filter that ln(2004 float) - ln(2002 float) < 1.3 (< 267% increase in float). We truncate the sample of remaining firms represented by black dots at 2002 and 2004 free float between $20M and $400M. 84 Figure Iliev-7. Effect of Bandwidth on Estimated Treatment Effects Panel A. Shrinkers strata. Graph presents the estimated coefficient on ln(2004 audit fees) for the shrinkers strata, for different bandwidth parameters b (b > 1). We estimate Regression (1) of Table Iliev-4 for firms with 2002 free float [$75M/b, $75M*b] (incremented by 0.1) and record the coefficient on Treated * Post. Panel B. Modest growers strata. Similar to panel A, except sample is modest growers; we estimate Regression (3) of Table Iliev-4. Both panels. Dotted lines indicate upper and lower 90% confidence bounds. Panel A: Shrinkers 85 Figure Iliev-7. (Cont.) Panel B. Modest Growers 86 Figure Iliev-7. (Cont.) Panel C. Shrinkers and Modest Growers Combined 87 Figure Iliev-8. Treatment Effects over Time Top panel shows the mean change from 2002 in ln(audit fees) for SOX-compliers vs. noncompliers within shrinkers strata (defined in Figure Iliev-4) in 2003 and 2004, together with 90% confidence interval for specifications with and without covariates. Bottom panel is similar, for modest growers strata (defined in Figure Iliev-5). 88 Appendix. Other Shock-Based IV Papers in the AB-2015 Sample AB-2015 found eight shock-based IV papers, in their sample of 863 empirical corporate governance papers over 2001-2011, published in major journals. This appendix explains why we chose to re-examine D&D, DMO, and Iliev, rather than other papers. We excluded Bennedsen et al. (2007), who rely on a truly random shock. Their data would likely have been proprietary in any case. We spoke with Dhammika Dharmapala, who was a coauthor of two of these papers, and he recommended that we review D&D rather than Dharmapala, Foley, and Forbes (2011), for which the data was proprietary. We discuss here the remaining three papers, and why we did not select them for review: Adams and Santos (2006); Giannetti and Laeven (2009); and Guner, Malmendier and Tate (2008). Of note: In none of these three papers were the IV results statistically significant at the conventional 5% level. Adams and Santos (2006) Adams and Santos (2006) is the first shock-based IV paper in our sample (by time of publication). It is a strong and careful paper in many ways. The authors study whether the “wedge” between managers’ voting rights and their cash flow rights (wedge) affects Tobin’s q. They study banks with trust departments that hold the bank’s own shares. The bank’s managers control the voting of these shares, but have no associated cash flow rights. The main empirical design is cross-sectional OLS regressions of Tobin’s q on bank voting rights in 1966 (the year for which they were able to obtain ownership data, from a U.S. House of Representatives report). The authors use a shock-based IV design in a secondary analysis, intended to address concerns about possible endogeneity between the bank’s holdings of its own shares in fiduciary accounts and Tobin’s q. They use, as instruments for wedge, dummy variables for the four types of state laws that regulate whether and how a bank 89 trust department can vote the bank’s own shares. These laws range from no restriction on voting to a ban on voting. These laws were in place well before 1966. One might question whether these laws affect Tobin’s q only through managerial voting of trust shares. For example, tight regulation of voting might well be associated with tight regulation of banks generally, which might predict Tobin’s q. The authors are aware of this issue, and discuss why the only through condition is plausibly satisfied. Adams and Santos (2006) use two different measures of wedge. For the first measure, they report that the F-statistics for their four instruments are quite low, and in most specifications, are not statistically insignificant. They therefore instrument only for the second measure. The F-statistic for this measure in the first-stage regressions range from 3.02 to 6.94, which suggests that the instruments, although jointly significant, remain vulnerable to a weak instruments problem. The authors do not report their 2SLS results in a table, but do provide limited results in a footnote. The coefficients on instrumented wedge are much smaller than the OLS specification and are not statistically significant. The authors run a Durbin-Wu-Hausman test for endogeneity, which does not reject the null of no endogeneity, in which case OLS is a preferred specification. We did not select this paper to replicate because the 2SLS results were statistically insignificant, not reported in text, and not the focus of the paper. We also had concerns about the only through condition. Giannetti and Laeven (2009) Giannetti and Laeven (2009) examine the effect of pension fund ownership on firm market value. They use pension reform in Sweden as a shock to ownership of public companies by public pension funds. The reform proceeded in two phases. In the first phase, the one public pension fund with significant equity holdings was required to divest most of these holdings. In the second phase, 90 four other public pension funds, which had previously invested primarily in debt, were required to increase their equity holdings. IV is their main research design. The authors use two sets of instruments for changes in ownership by pension funds – one for the first phase of the reform; the other for the second phase. We focus here on the first phase; which (potentially) has a clean shockbased IV, which the second phase does not. The instruments are a dummy indicating whether the divesting fund held shares in the company pre-shock, plus three non-shock instruments: the cash flow rights of the other public pension funds in the firm, the cash flow rights of private pension funds, and the firm’s market capitalization. The authors do not discuss the only through condition. We treat this as a shock-based IV paper because the divestment shock is the strongest of the four instruments in their first stage. We did not select this paper to replicate because the authors combined a shock-based IV with several non-shock IVs, and because in the second stage, the instrumented variable (ownership by the divesting fund) is only marginally significant. The IV results are significant for the second reform phase, but for this phase, the authors lack a clean shock, because the other public funds could choose which companies to invest in. Guner, Malmendier and Tate (2008) Guner, Malmendier, and Tate (2008) examine the effect of bank-affiliated directors on the sensitivity of investment to cash flow over 1988-2001. Their main design is panel data with firm fixed effects. They use shock-based IV as a robustness test. The shock is a commercial banking crisis in the US in the late 70s and early 80s. The instrument for bank-affiliated directors is the total number of directors hired during 1976-1985. The authors argue that bank failures during the crisis reduced the availability of bank-affiliated directors. They show that, in contrast, the overall rate of new director appointments did not change. The authors discuss the only through condition with care. 91 They are aware that their instrument could generate imbalance on other firm characteristics, specifically board turnover and state that in unreported regressions, their main results are robust to adding board turnover as a control variable. They do not, however, report on covariate balance. They report that a placebo test using number of directors hired during 1966-1975 “fails to replicate the results”, but do not specify how. We did not select this paper to replicate because the 2SLS coefficient of principal interest, in instrumented (no. of bank-affiliated authors * cash flow) is only marginally insignificant with industry fixed effects. The authors do not report 2SLS results with firm fixed effects, likely because the results were statistically insignificant. Also, the coefficient on instrumented (no. of bank-affiliated authors * cash flow) is -0.820, almost 10 times the -0.085 coefficient with firm fixed effects. In our experience, this level of “blowup” of the 2SLS coefficient is a strong warning sign for violation of the only through condition. 92
© Copyright 2025