PolicyLogic tracks whether elected officials — governors, mayors, members of Congress, and others — follow through on commitments made during their campaigns and in office. We focus on three questions for each promise:
Did they start? Did the official take concrete action to advance this commitment, or did it remain a talking point?
Did it happen? Was the policy enacted, the program launched, the goal reached?
Did it work? To the extent outcomes can be measured, did reality match the promise?
Each promise moves through three independent scoring stages using standardized rubrics.
| Score | Specificity Level | Example |
|---|---|---|
| 0 | Values statement only | "I believe in stronger communities." |
| 1 | Directional goal, no specifics | "We will improve public safety." |
| 2 | Specific policy named | "I will pass a tenant protection bill." |
| 3 | Specific + measurable target | "Reduce violent crime 20% by 2026." |
| Score | Description |
|---|---|
| 0 | No action taken |
| 1 | Public statements only, no formal action |
| 2 | Proposal introduced, bill filed, or program announced |
| 3 | Passed, signed, or formally launched |
| 4 | Fully operational and implemented |
| Score | Description |
|---|---|
| 0 | Failed, reversed, or abandoned |
| 1 | Under 10% of stated goal achieved |
| 2 | 10–40% of goal achieved |
| 3 | 40–70% of goal achieved |
| 4 | 70–90% of goal achieved |
| 5 | 90%+ of goal achieved |
Outcomes are rarely caused by one person alone. The "Their Role" modifier scales the outcome score based on how directly the official caused the result — from 1.0 (sole cause) to 0.0 (no causal connection). This prevents officials from claiming credit for conditions they didn't create, and protects them from blame for outcomes beyond their control.
Not all promises are equally hard to keep. A promise that requires only executive action is scored at 1.0x. One requiring multi-government coordination scores up to 2.5x. This rewards officials who attempt harder reforms.
| A | Exceptional. Strong delivery across most promises, including difficult ones. |
| B | Strong. Solid execution on core commitments with some gaps. |
| C | Mixed. Meaningful action on some promises, significant shortfalls on others. |
| D | Weak. Most promises not delivered or substantially diminished. |
| F | Poor. Systematic non-delivery, reversals, or pattern of broken commitments. |
The International Commitments tracker operates on a different logic than the official scorecards. International commitments — treaty obligations, multilateral funding pledges, emissions targets, assessed contributions to international organizations — are institutional. They are made by administrations, inherited by successors, honored or abandoned across decades.
Assigning a letter grade to a commitment that has passed through multiple administrations would collapse decades of contested political history into a single letter. Instead, each commitment displays a delivery ratio — the percentage of the pledge delivered as of the research date — alongside a full timeline of events. Administration color bands make clear which events happened under whose watch.
The Presidential Foreign Policy tracker scores individual promises, not the president as a whole. Each commitment carries a status — Kept, Broken, Partial, In Progress, Reversed, Contested — based on the available record. There is no aggregate presidential grade.
Selection effects. Trackable foreign policy promises are not a representative sample of presidential performance. An aggregate grade would reflect political communication as much as governance.
Causal complexity. Foreign policy outcomes are rarely attributable to a single decision. The Their Role modifier partially addresses this, but causal chains in foreign policy are longer and more contested.
Cross-administration comparability. The tracker intentionally shows commitments across administrations to surface reversals and patterns. A grade attached to one president would invite comparisons the underlying data may not support.
Each scorecard is generated in two stages. First, an AI research assistant searches the web for campaign materials, news coverage, legislative records, and outcome data. Second, a scoring model applies the PolicyLogic rubric to produce a structured draft scorecard. All scorecards are marked pending human review until verified by a researcher.
PolicyLogic is a nonpartisan project. Scores are determined by the methodology above, not by political affiliation. Officials of both parties are evaluated on the same rubric.
If you find a factual error, a missing promise, or a score you believe is wrong, please tell us. Every submission is reviewed.
Open a Scorecard to Report an Error →