Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist

Optimizing economic and public policy is critical to address socioeconomic issues that affect us all, like equality, productivity, and wellness. However, policy design is challenging in real-world scenarios: policymakers need to consider multiple objectives, policy levers, and how people might respond to new policies.

How can we use AI to design effective and fair policy?

At Salesforce Research, we believe that business is the greatest platform for change. We developed the AI Economist framework to apply AI to economic policy design and use AI to improve social good.

Keep scrolling to learn how the AI Economist framework can design policies with two-level reinforcement learning and data-driven simulations. You will learn about desirable features and ethical considerations of AI policy design, and how this might be used to respond to future pandemics.

This work should be regarded as a proof of concept. There are many aspects of the real world that our simulation does not capture. The AI policies learned in this simulation should not be used for policy making, nor is the simulation intended for the evaluation or development of policy. The data this simulation uses to model health and economic tradeoffs are limited in their ability to model impacts to specific segments of the population (for example, the vulnerable groups who have been disproportionately impacted by the COVID-19 pandemic). Our hope is to develop more realistic simulations in the future.

For more info, check out our code, paper, simulation card, and blog.

An AI Policy Design Framework

The AI Economist framework combines data-driven simulations with AI agents and a social planner (policymaker), who optimize their behavior using two-level reinforcement learning. There are many required features when using AI for policy design.

  • Simulating complex economies. The simulation should model the right economic processes that are relevant to the policy objective.
  • Fitting to real data. The simulation should be grounded in real data.
  • Using multiple policy levers. The framework can include different types of policy choices, e.g., taxes, subsidies, closures, etc. Reinforcement learning can optimize any type of policy.
  • Considering many policy objectives. The policy designer can include any metric of interest into the policy objective, and these do not need to be analytic or differentiable.
  • Finding strategic equilibria. Optimal policies should consider how economic agents respond to (changes in) policy.
  • Emulating human-like behavior. Economic agents should behave and respond like humans.
  • Being robust. The performance of learned policies should be robust to differences between the simulation and the real world.
  • Explaining decisions. The causal factors for policy decisions should be explainable.
  • Being implementable. The behavior of policies should be simple and consistent enough to be applicable in the world.

Case Study: Designing Pandemic Response Policy

We now apply this framework to designing pandemic response policy, based on COVID-19 data. We simulate the US and train state and federal policies using RL.

In our model, each state and the federal government need to balance public health and the economy to improve social welfare. Social welfare is a (weighted) combination of two indices:

  • A health index, which decreases as deaths increase.
  • An economic index, which tracks gross domestic product (GDP), unemployment, and federal subsidies.

For a fair comparison, we first determined for which health prioritization the real-world policy achieves the highest social welfare. AI policies were then trained to maximize the same social welfare objective.

The graphic below summarizes how the RL agents and simulation interact:

How do the AI and Real World Policy Compare?

Check out the interactive visualization below to compare outcomes between the AI simulation and the real world.

AI Simulation
Learning AI policies and simulating the US, using real data. Learn more
Real World
What really happened, based on COVID-19 data.
Social Welfare
Active Cases
Total Deaths

Flattening the Curve at the Cost of More Unemployment

AI policies can "flatten the curve," reducing the number of deaths by 50% while increasing average daily unemployment by 1%, compared to the real-world policy implemented in the period of March 2020 to April 2021.

Our framework also forecasts that under these AI-informed policies, deaths would remain substantially lower, while unemployment can be temporarily higher but is predicted to recover to normal levels quickly.

AI forecasts are probabilistic. Shaded areas represent variance over multiple distinct repetitions.

AI policy in simulation
Real-world data
Real-world policy in simulation

AI Policies Can Be More Stringent and Subsidize Less

The stringency level summarizes real-world state policies, like restricting indoor dining. The more stringent a state is (e.g., also restricting outdoor dining or closing schools), the higher its stringency level.

Compared to the real world, AI policies set stringency 5% higher on average. AI policies curb infections early on by setting stringency high, then gradually reduce the stringency. Once vaccinations start, AI policies are able to reduce the stringency even more rapidly.

The federal government subsidizes citizens with direct payments, among other policies. AI policies only require about $36B subsidies on average.

AI policy in simulation
Real-world data
Real-world policy in simulation

The Impact of Different Objectives

Our framework is flexible: AI policies can be trained for any set of priorities to navigate the complex relationship between health and the economy created by the pandemic. Here you can see how the state-level and federal-level outcomes change as policy objectives change.

When states prioritize health (darker blue), the health index improves at the cost of the economic index. When the federal government prioritizes health (thicker lines), it tends to spend more in subsidy to help states economically during shut downs, which results in a higher health index.

Subsidy-driven shutdown has a high economic cost for the federal government, which funds the subsidies. On the other hand, subsidies increase the state-level economic index.

The federal government's subsidy strategy doesn't just depend on its own policy priorities. It also depends on how the states respond to the subsidies. That response changes with the states' own priorities. AI policies are well-suited to navigate these complex interdependencies.

Explaining AI Policies

While reinforcement learning (RL) can be used to learn complex policy models, we keep things simple to make interpreting the AI policy easier. The AI policy learns a single set of weights for converting observations like number of infections, subsidy level, etc. into the decision logits for each stringency level. The higher the logit, the more likely the policy will choose the associated stringency level. Use the interactive visualization to explore the weights learned by the AI policy.

Some of the weights are shared across states. These are shown in the first panel. From these, we can see when the numbers of susceptible people and infected people are high, the policy model recommends shifting towards higher stringency (i.e. stronger closure). On the other hand, as more people are vaccinated, the policy model recommends shifting towards lower stringency (i.e. re-opening). Note: the weights are re-scaled based on the associated feature's value range.

State-by-state differences are learned through state-specific stringency logit biases, which add flexibility in how each state responds to the pandemic. These are shown in the second panel. When calibrating the simulation, we use available data to estimate how each state's policy choices impact its COVID-19 transmission and unemployment. As a result, the optimal policy response is different for different states.

Comparing Social Welfare Improvements Across States

AI policies may achieve higher social welfare than real-world policies, executed in our simulation based on the available, though limited, data. The improvement varies state by state, and depends on how strongly infection rates and unemployment respond to increased stringency. For this comparison, we used the health prioritization that maximizes social welfare for real-world policy.

Ethical Considerations

Intended Use. This simulation is a proof-of-concept and is not intended for the development, implementation, analysis, or evaluation of current United States state or federal policies. It is not meant for real-world deployment or use as a tool for policy-making. Our release is only intended to facilitate robust debate and broad multidisciplinary discussion of our work.

Some of the simulation’s modeling choices are based on data observed during the COVID-19 pandemic in 2020-2021 and hence, the results may not extend directly to other waves of the pandemic or other current or future pandemics.

We do not endorse any particular choice of objective/trade-off, at any jurisdictional level; we evaluate on a spectrum of objectives. Different and broader notions of social welfare can and should be considered in future iterations.

Limitations in the data. This simulation is informed by a limited number of data sets. Current data sources include US Census Data, COVID-19 time-series data (SIR), U.S. Bureau of Labor statistics on unemployment, federal and state COVID-19 policy information, as well as records of COVID stimulus payments and vaccination rates.

Our results are based on public data that are currently available, which are not fine-grained due to data collection issues. The data currently included in the simulation are not disaggregated by race, gender, age, or other protected categories, and this limits the model’s ability to extrapolate health or economic impacts of COVID-19 policies to specific groups or populations. Disaggregation and differential data analysis is particularly important given the disparate impacts of public health policies, COVID-19, and job loss on certain social groups. Any policy analysis that fails to take these differential impacts into account could exacerbate underlying societal vulnerabilities or create new negative impacts for these communities.

The limited number of data sources risks oversimplifying the connection between variables and may restrict the simulation’s ability to accurately model the complexity of the interplay between public health decisions, COVID deaths, and the state of the economy (for example, data on public model adherence to public health policies or unemployment rates across fine-grained job categories are currently unavailable).

Our results motivate the development and collection of fair, representative, diverse datasets and responsible simulations.

Limitations in the model. Our simulation makes assumptions which abstract certain elements of the real world. A more fine-grained model would require more data, which are not currently available. For example, we assume that there is no disease spread between states, that people once vaccinated cannot be re-infected, that individuals who contract COVID-19 have only one of two health outcomes; recovery or death (while long-term health impacts may have a sizable economic effect), and that the magnitude of policy impacts is constant throughout time. More consequentially, the simulation does not consider different demographic or economic groups, and complex economic processes andripple effects are not modeled. Finally, we only consider the COVID-19 variant that was dominant in the U.S. in 2020; we do not model others, e.g., Indian, British variants.

For more information, see our blog.

Data Sources

Our simulation and experiments are based on real-world data.

Join us in Building AI for Economic Policy Design!

If you are a machine learning researcher, an economist, a policy expert, or policymaker, get in touch to build the next generation of policy design frameworks!

  • Fork our code on Github .
  • Read the simulation card to learn about the intended use and ethical aspects of our simulation.
  • Read our paper for all the details.
  • Read the blog for an ethical analysis.
  • Follow us on Twitter : @SalesforceResearch @Salesforce @StephanZheng @AlexTrott.
  • Check out our website and learn more about what Salesforce AI Research is working on!
  • Check out the COVID data hub.
  • Email us with questions or feedback.
  • Authors

    Alex Trott, Sunil Srinivasa, Douwe van der Wal, Sebastien Haneuse (Harvard), Stephan Zheng.


    This work is based on a research project conducted and authored by Alex Trott*, Sunil Srinivasa*, Douwe van der Wal, Sebastien Haneuse, and Stephan Zheng* (* = equal contribution).

    We thank Gang Wu and Denise Perez for implementing and contributing to this demo. We thank Yoav Schlesinger and Kathy Baxter for the ethical review. We thank Michael Jones, Feihong Wu, and Silvio Savarese for their support. We thank Lav Varshney, Andre Esteva, Caiming Xiong, Michael Correll, Dan Crorey, Sarah Battersby, Ana Crisan, Maureen Stone for valuable discussions.


    There are many limitations and caveats on the ethical use of this simulation, click here to read more.