Hubris in Scorekeeping: Why Confidence Needs a Calibration
CBO and JCT predictions based on highly uncertain models and parameters need sensitivity analysis -- or at least a flag. Part 3 of our series on Checking the Scorekeepers.
Our third installment of the Checking the Scorekeepers series looks at how to improve budget and economic forecasts, not just for the recent tax bill but for all major legislation. The problem isn’t simply concern about the accuracy of assumptions and parameters. It’s that the scorekeepers’ estimates and reports often convey more confidence in their projections than the evidence warrants. Building trust will require more caution, more transparency, and a willingness to show the limits of what a forecaster can know.
The OBBBA Forecasting Debate
In June, before the passage of the One Big, Beautiful Bill Act (OBBBA), the President’s Council of Economic Advisers (CEA) released a detailed report containing a number of fiscal and economic projections. The President’s economists are optimistic about the economic effects of the bill. The Committee for a Responsible Federal Budget (CRFB) has taken issue with the CEA’s conclusions, stating that the estimates are “far more optimistic than those of the Congressional Budget Office (CBO) and other credible estimates.” Others have expressed similar skepticism about CEA’s numbers.
The CEA projects that the OBBBA’s tax provisions will increase GDP by 4.6 to 4.9 percent relative to the CBO’s January economic projections. Some of those gains will be transitory but over the long-term CEA still expects the GDP will be 2.4 to 2.7 percent above the CBO baseline.1 The growth would reduce the cost of the tax bill. The CEA projects that the growth would increase revenue by $1.8 to $2.0 trillion, offsetting more than half of the CBO’s projected revenue declines from the OBBBA.
Optimism often abounds at the CEA, particularly compared to the CBO’s projections. In April 2022, President Biden’s CEA projected real GDP growth would average 2.3 percent over the next decade.2 A month later, the CBO projected it would average a mere 2.0 percent.3 That difference might seem small, but over 10 years the economy would be 3.2 percent larger. Likewise, in 2010 President Obama’s CEA’s projected real GDP would be 3.7 percent larger by 2020 than the contemporary CBO projection.4
If the CEA’s growth forecast fails to materialize, it could be somewhat embarrassing. We say “somewhat” because ex post data cannot prove or disprove whether a forecast was or was not the best forecast at the time it was made. Forecasts of future paths of economic or financial variables are generally thought of by the forecasters either as average (mean) outcomes or as modal (most likely) outcomes. The forecast is made in an environment where there is a distribution of possible outcomes. Other factors or shocks can occur that can make a mockery of a forecast, or lead to the forecaster having been right but for the wrong reasons.
The Budget Effects of Missed Economic Forecasts
You don’t have to look far to find CBO economic forecasts that turned out to be well off the mark. Months after the Tax Cuts and Jobs Act of 2017 (TCJA) was enacted, the CBO projected economic growth would average 2.0 percent per year between 2018 and 2024. As the figure below shows, the projections underestimated growth. Even with the COVID-19 recession, real GDP growth averaged 2.5 percent over that time. GDP growth can be decomposed into labor productivity growth and growth in total hours worked (which can rise either if the labor force increases or hours per worker increase). CBO’s forecast undershot both: annual growth in labor productivity (measured as real GDP per hour worked) was 0.28 percent higher and hours worked was 0.19 percent higher.
The difference between 2.0 percent and 2.5 percent isn’t 0.5 percent—it’s 25 percent. That’s substantial. Such a large difference means much higher wage growth and more economic opportunities. It also has significant effects on the federal budget and the accuracy of the CBO’s budget projections.
The CBO regularly provides an Excel workbook that shows the budget effects of varying some of their economic assumptions. Using their 2018 workbook we can see how the CBO would have scored the increased growth in productivity and hours worked in 2018. A 0.25-percentage-point increase in productivity growth raised projected revenues by $818 billion over 2019–2028, while a 0.20-percentage-point increase in hours-worked growth added $291 billion. To put these numbers in perspective, the CBO’s 2017 TCJA score estimated revenue would fall by $1.4 trillion over 10 years (2018 to 2027).
The revenue effects of productivity gains are especially large. In 2018, CBO estimated that a 0.5-percentage-point boost to annual real GDP growth over 10 years would raise revenues by $1.6 trillion. The CBO’s 2025 workbook projects that a similar increase in GDP growth over the next 10 years would yield $2.9 trillion in revenue. This would be partially offset by higher interest payments as a faster growing economy would likely increase interest rates.5 The net deficit effects from the change would be $2 trillion over 10 years, or about half of the OBBBA’s costs including from debt servicing.
Could OBBBA by itself generate such increases in GDP? There are reasons to be skeptical. Among the independent scores, the Tax Foundation has the most optimistic number, and they estimate that long-run GDP will be 1.2 percent larger—less than half of what CEA predicts.
But there are other mechanisms that might reduce the actual costs of the OBBBA. As we have written about previously in our Checking the Scorekeepers series, not all taxable income responses must work through GDP growth. Using bracket-level elasticities of taxable income (ETIs) that combine labor-supply and evasion/avoidance responses, we find that plausible taxpayer responses alone could produce about $438 billion in additional revenue over the 2025–2030 budget window, even under the baseline 1.8 percent GDP growth assumption. This roughly $90 billion per year would match the revenue generated in the score by mechanically increasing the assumed wage growth rate to produce annual GDP growth rising from 1.8 to 2.6 percent in the score.6
In short, there is a high probability that most—perhaps all—OBBBA forecasts will prove to be far off the mark.
What Are the Forecasts Missing?
As we mentioned above, a missed forecast doesn’t mean it wasn’t a good forecast at the time. In fact, as CBO often notes, the agency’s short-run economic projections generally stack up well against other forecasts.
However, all forecasts are inevitably subject to large errors. When we rely so heavily on point estimates—whether from the CEA, the CBO, or the Joint Committee on Taxation (JCT)—the chance of being surprised by subsequent events grows.
To the CBO’s credit, their publicly available workbooks allow lawmakers and the public to see just how sensitive the budget and economic projections are to varying assumptions. The problem is that the JCT and the CBO rarely offer this type of analysis. The workbook is the exception—not the rule. Occasionally, the JCT and the CBO will acknowledge that their estimates are subject to significant uncertainty. The CBO’s budget outlooks, for example, regularly feature discussions on how their estimates could be wrong. More often, however, the scores and reports lack nuance. The qualifiers that are included are ad-hoc and often buried in the dense reports.
Why not regularly perform sensitivity analyses? When we’ve posed that question to representatives of the CBO and the JCT scorekeepers, they argue that lawmakers will inevitably cherry pick which number they use. Of course, politicians do that now. They just use the scores of outside think-tanks or the CEA’s numbers.
The other objection is that the scorekeepers simply don’t have the time or capability to consistently perform sensitivity analyses. This objection may be reasonable for cost estimates with relatively minor effects on the budget. But for large legislation—the TCJA, the Inflation Reduction Act, or OBBBA—the failure to systematically acknowledge their models’ limitations comes with real risks to the scorekeepers’ credibility.
Short of a robust sensitivity analysis, the scorekeepers could regularly express the degree of confidence they have in their scores and projections. There is precedent for this approach. The U.S. Intelligence Community uses “analytic confidence” ratings to assess their level of confidence in their intelligence assessment. The CBO and the JCT could construct a similar rating scale or perhaps create a multidimensional approach. Each major score component, such as growth effects, behavioral responses, or debt-service assumptions, could be tagged low, medium, or high confidence, with a short explanation.7
This framework could flag model components that materially change a score yet are in flux even as the estimate is released. For example, the debt-sensitivity-to-interest-rate component of the CBO’s crowding out effect in their dynamic score of OBBBA produced a more expensive dynamic cost estimate than the conventional score. This component of the CBO’s model could have been given a low or medium confidence rating, because it is actively being reconsidered for reform within a month after passage. For the CBO to not have expressed limited confidence in that component of the score—when there was active discussion to modify it—is a recipe for distrust.
Likewise, this type of framework could help flag the JCT’s “off-model” adjustments that are relatively ad-hoc in nature as low-confidence. For example, the revenue estimate for the enhanced deduction for state and local taxes (SALT) is based on limited data because many recent filers haven’t reported the actual taxes paid. They either claimed the standard deduction or, in some cases, just reported a flat $10,000 (the amount of the TCJA cap). The result is that the JCT’s full revenue effect of the SALT cap is subject to significant error.
Finally, as we’ve noted before, the CBO and the JCT would be well-served by completely releasing all of their models and parameters. This would allow outside researchers to understand how sensitive these models are to varying assumptions and to assess whether missed projections are due to unforeseeable events or bad forecasting.
In conversations with representatives of the CBO and the JCT we are consistently struck by their nuanced approach to analyzing complicated issues. Their presentations often convey a clear sense of epistemic humility. Yet their published reports and scores frequently do not. This disconnect—compounded by the opacity of their assumptions and models—breeds distrust and undermines the integrity of the entire budget process.
These growth effects are limited to the tax provisions in OBBBA; they don’t include growth effects from the President’s other policies. CEA has developed separate estimates that include these effects. See their June chartbook for an overview.
See Table 2–6 in the CEA’s Economic Report of the President (April 2022). https://bidenwhitehouse.archives.gov/wp-content/uploads/2022/04/ERP-2022.pdf.
See CBO’s May 2022 Economic Projection. https://www.cbo.gov/data/budget-economic-data#3.
See Table 2-3 in CEA’s Economic Report of the President (February 2010). https://fraser.stlouisfed.org/files/docs/publications/ERP/2010/erp_2010.pdf.
Entitlement spending, driven by Social Security and Medicare, would also rise.
A full score and breakdown of this behavioral-response channel will be provided in a forthcoming piece
For example, low confidence might be assigned to a policy being scored for the first time, if the data used for the score are limited, or if the model is being actively updated. Medium confidence might be assigned when the score is based on standard models, but noisy data. Lastly, high confidence might be assigned when that result rests on standard models, clean data, and broad agreement in the academic and policy literature.



