(Un)Conventional Tax Scoring
Transparency improvements are necessary to restore trust, especially around assumptions on taxpayer responses to policy. Part 2 of our series on Checking the Scorekeepers.
The debate surrounding the “One Big, Beautiful Bill” revealed deep concerns regarding how Congress’s scorekeeping agencies—the Congressional Budget Office (CBO) and the Joint Committee on Taxation (JCT)—arrive at their official estimates of the budgetary effects of proposed legislation. While the bill is now law, the issues with the scorekeepers remain unresolved and will inevitably resurface.
In our second installment in our series, Checking the Scorekeepers, we examine how the JCT scores tax legislation and highlight how greater transparency, particularly in what they call their “conventional model”, is essential to restoring confidence in scoring before the next tax debate.
For decades, conservatives have argued that the JCT and CBO underestimate the dynamic effects of tax reform—how tax changes influence work, saving, and investment. Less widely understood, but perhaps more consequential, is the JCT’s conventional score, the starting point for the analysis before these dynamic effects are included. This initial estimate still rests on a host of assumptions, including behavioral responses related to tax avoidance and evasion, that can determine whether a bill becomes law or dies in committee.
In several important respects, the JCT’s public documentation and comments provide an incomplete picture of their model and underlying assumptions. The opaqueness leaves little recourse for outside experts to scrutinize or challenge their estimates—even before we get to the “dynamic” assumptions about how a bill might affect GDP growth. For example, while the JCT cites academic research on how taxpayers respond to tax rates, it does not disclose the specific parameters it uses in modeling those behavioral responses, making it difficult for outside analysts to verify or replicate those key assumptions. This also makes it challenging to understand the economic and budgetary effects of specific provisions, potentially undermining the case for pro-growth tax reforms.
This opacity has helped to spur a proliferation of budget model groups, including Yale Budget Lab, Penn-Wharton Budget Model, and Tax Foundation, among others.1 That so many well-funded efforts have emerged underscores both the importance of independent replication and the difficulty of achieving it; it should not require millions of dollars and an army of economists to understand what’s driving these estimates.
Background: How Scoring Works
Tax scorers estimate future revenue under the baseline tax code, then re-estimate revenue with the proposed changes. The difference between these two estimates is the central figure in budget debates. As we saw with the recent debate, the choice of baseline—current law or current policy—has significant implications for that figure.
In the recent passage of One Big Beautiful Bill, a key debate centered on whether to score the legislation against a current law or current policy baseline. A current law baseline assumes that only laws formally enacted will continue. So, for example, provisions of the TCJA scheduled to expire are treated as if they will. In contrast, some programs, such as Social Security, Medicare, and some excise taxes that are tied to permanent trust funds are not scored in this manner. These programs are scored on a current policy baseline, meaning that even if these programs are scheduled to receive substantial reductions in law, it is assumed that these programs will be continued.
A way to think about this distinction is to imagine forecasting a company’s future cash flows. A current law approach would assume the company follows its official plans to the letter, discontinuing programs or contracts set to end. A current policy forecast, by contrast, would assume the company keeps doing what it is doing, regardless of whether formal plans say otherwise. The difference in projected outcomes can be substantial.
Given the statutory constraints and conventions on scoring relative to current law or current policy, the JCT scores are not going to be perfect forecasts of economic outcomes. These constraints and conventions are deliberate because they allow for some consistency across proposals. Nevertheless, the JCT aims to have scores that, contingent on the chosen baseline, reflect expected changes in revenue. That requires a focus on how tax changes will affect behavior.
All taxes create behavioral responses by taxpayers that impact the amount of revenue raised. Property taxes create incentives to own less property or make fewer improvements to existing property. Sales taxes and import tariffs reduce consumption. Individual income taxes create incentives for taxpayers to work and invest less. Beyond these effects, taxes create incentives for avoidance or for the underreporting (i.e., evading) of ownership, transactions, and labor income.
In academic research, the behavioral response to an income tax rate change is estimated as the Elasticity of Taxable Income (ETI), which measures the percentage change in income with respect to the percentage change in the marginal net-of-tax rate.2 This behavioral response to an income tax rate change can broadly be decomposed into three categories:
Evasion – Illegal behavioral actions taken to reduce taxable income.
Avoidance or tax minimization – Legal behavioral responses that reduce taxable income, including retiming and offshoring.
Labor supply and investment changes – Changes in labor hours or work effort due to tax policy shifts and changes in savings and investment.
Scorers have many options in how to incorporate these effects into a tax score. Generally, we see three types of scores:
Truly static score – Omits any behavior responses.
Conventional score – Assumes that gross national product (GNP) remains fixed but includes some micro-level behavioral responses to account for avoidance and evasion.3
Dynamic scores – Incorporates a macroeconomic model to estimate changes in GNP and GDP, which includes labor supply and investment responses.
The JCT’s Opaque Conventional Model
The JCT provides revenue scoring for federal tax proposals considered by Congress.4 Many falsely assume the JCT’s score is truly static, and politicians and pundits often refer to it that way. By default, however, the JCT provides conventional scores.5 Then, if dynamic scoring is requested by Congress, the JCT incorporates labor supply and investment responses, which is based upon the estimates produced in the conventional stage.6
The JCT’s conventional scores are based on their Individual Tax Model (ITM). Built in the 1950s-era coding language FORTRAN, the model uses confidential data of individual taxpayers to estimate how tax policy changes affect future revenue. The ITM accounts for some avoidance and evasion, including shifts in economic activity and changes in the timing of activity.7 But the model does not directly estimate all behavioral responses. Instead, the JCT applies mechanical tax minimization behaviors “on-model,” while other behavioral responses, such as credit take-up rates, are handled “off-model.” This on-model vs off-model split seems to reflect what the JCT is able to simulate with the FORTRAN framework versus what must be estimated separately due to the limitations of the legacy codebase.
On-model behavioral responses are typically limited to mechanical choices, such as determining whether a taxpayer is going to claim the standard deduction or itemize, based on which option minimizes tax liability. The ITM assumes that an individual makes the choice that “produces the lowest tax liability.”8 In contrast, it generally does “not model behavioral responses of individuals to changes in disposable income or tax rates.” Rather, “behavioral effects are usually accounted for ‘off-model’” in the conventional score because integrating behavioral response equations “may slow down its operation, or make it more difficult to error-check simulations.”9 For example, non-mechanical behavioral activity such as the take-up rate for a tax credit, which depends on unobserved characteristics like awareness, is handled off-model. For these cases, the JCT estimates the likely response separately and adds a dollar adjustment to the model output estimate after the fact.
The on-model/off-model distinction may appear to be technical, but it has important implications for understanding the JCT methods and assessing their underlying assumptions. Every off-model provision introduces more assumptions and more parameters that could be subject to debate. Yet, the JCT’s public documentation offers only examples of when they opt to go off-model. Outsider researchers are often left guessing what is actually “on” their model. For example, the take-up rates for credits discussed earlier are not in the core model, which makes it difficult for outside analysts to reproduce the estimate, assess its sensitivity to assumptions, or compare it against alternative approaches.
The JCT’s (Partial) Elasticity of Taxable Income
The JCT incorporates a behavioral response by income bracket in their conventional model. They refer to this behavioral response as the “Elasticity of Taxable Income.” But this is not the ETI we see in the academic literature. This is because the ETIs used by the JCT explicitly exclude labor supply responses during the conventional stage.10 Their ETI is more accurately characterized as a “partial” elasticity of taxable income because it only attempts to incorporate the behavioral responses that do not impact the fixed-GNP conventional assumption.
In their documentation, the JCT cites academic studies estimating the elasticity of taxable income. They reference ETI estimates in the academic literature ranging from 0.1 to 1.0.11 More recently, there have been estimates of ETI values of 1.071 (Gorry, Hubbard, and Mathur, 2021), 0.09 to 2.4 depending on method (He, Peng, and Wang, 2021),12 and 2.5 to 3.2 for high earners (Rauh and Shyu, 2024) who bear large shares of the tax burden and are often the targets of progressive tax policy. The large range of ETIs would produce substantially different scores, making the JCT’s choice of ETI a key parameter in their model.
The choice of ETI matters not only for the top-line budget score, but also for determining which provisions to include in tax legislation. Higher assumed ETIs make rate reductions and other pro-growth measures more attractive, while lower ETIs encourage lawmakers to opt for tax cuts that will have little effect on growth. Greater transparency around the JCT’s assumptions would help lawmakers better understand the economic and budgetary effects of each provision.
The JCT says it uses the referenced academic literature to construct and integrate a table of partial ETIs by income bracket.13 Given the importance of these partial ETIs to the score, one would expect the JCT to have thoroughly analyzed and documented the specific parameters. But, the specific partial ETI values used by the JCT in the conventional stage are not disclosed in public documentation or provided upon request by the JCT. Nor has the JCT provided any evidence that the combined behavioral adjustments across conventional and dynamic scores sum to the empirically measured ETI values referenced in the literature.
As a result, while the cited literature informs the JCT’s thinking, the final modeled elasticities are NOT transparently benchmarked to any specific academic estimate.
Ambiguous Weights for the Dynamic Models
When calculating dynamic effects, the JCT uses the individual tax model to calculate the average and marginal tax rates by income type, which are then incorporated into their dynamic macroeconomic models. The off-model behavioral responses are integrated into the conventional scores before being incorporated into the dynamic macroeconomic models. The JCT then uses three macroeconomic models to estimate the dynamic revenue effects of major tax policy changes:
Overlapping Generations Model (OLG)
Macroeconomic Equilibrium Growth Model (MEG)
Dynamic Stochastic General Equilibrium Model (DSGE)
Unlike the conventional model, the JCT has provided specific point estimates for key parameters in their model, including labor supply elasticities.14 But much of their methodology remains opaque. In particular, the JCT does not provide a public methodology for how it weights each macroeconomic model when producing its final dynamic point estimate. Instead, they say they tailor the choice of weights to each piece of legislation. This makes outside replication of the point estimate virtually impossible prior to the release of their score. Additionally, it raises concerns that model weights may change selectively or are inconsistently applied.
Enhancing Trust in the Scores
Given the JCT’s outsized role in tax debates, it is no surprise that it is often the target of much ire from politicians, their staffs, and outside groups. By all accounts, the JCT employs talented economists that want to provide accurate scores to Congress.
But relatively poor transparency has fueled distrust.
Some propose independent estimates that can be compared to the JCT. Outside estimates do exist, but no other entity can replicate the work of the JCT. It is uniquely positioned to provide tax scores to Congress, particularly because it has access to confidential datasets available to only a few outside of the IRS and the Treasury Department.
Instead, the answer should be a greater commitment to transparency. Parameters and methods that have substantial effects on scores should be publicly disclosed by the JCT. This likely isn’t an easy task, particularly given the JCT’s reliance on a model built on a 70-year-old programming language and with so many components being run off-model. But improving trust in scorekeepers is vital in ensuring a credible budget process.
The Committee for a Responsible Federal Budget summarized the various budget estimates of the One Big, Beautiful Bill from these different groups here.
GDP measures the value of all goods and services produced in a country’s borders. GNP measures the value of all goods and services produced by a country’s citizens. GNP might incorporate income produced by US firms that is stored outside of our country’s borders. GDP would not incorporate that income. However, the extent that GNP is actually capturing income evading US tax authorities is likely a narrow point.
For a helpful primer see, the JCT’s Frequently Asked Questions.
This assumption originates from early practices by the JCT following its formation under the Revenue Act of 1926. Scoring by convention was reaffirmed in the Congressional Budget Act of 1974, which required official cost estimates for legislation without macroeconomic feedback.
Since 2003, the JCT has provided macroeconomic analysis for legislation when requested and required to the House Committee on Ways and Means.
See the JCT’s 2011 Summary of Economic Models and Estimating Practices.
See page 39 in JCX- 48-23.
See page 41 in JCX- 48-23.
The elasticity incorporated in this conventional stage would be more accurately characterized as the elasticity of adjusted gross income (less labor supply) with respect to the tax rate.
Based on administrative wage income data: 0.09–0.41 from bunching and 1.15–4.4 from tax reform methods, with 2.423 as authors preference (Table 1). The abstract cites longer-run values of ~0.5 and ~4.0 (Sections 3–4).