Wednesday, May 31, 2017

What does the "Partisan Selective Sharing" study say about PolitiFact?

A recent study called "Partisan Selective Sharing" (hereafter PSS) noted that Twitter users were more likely to share fact checks that aided their own side of the political aisle.


On the other hand, the paper came up in a search we did of scholarly works mentioning "PolitiFact."

The search preview mentioned the University of Minnesota's Eric Ostermeier. So we couldn't resist taking a peek to see how the paper handled the data hinting at PolitiFact's selection bias problem.

The mention of Ostermeier's work was effectively neutral, we're happy to say. And the paper had some surprising value to it.

PSS coded tweets from the "elite three" fact checkers,, PolitiFact and the Washington Post Fact Checker, classifying them as neutral, beneficial to Republicans or beneficial to Democrats.

In our opinion, that's where the study proved briefly interesting:
Preliminary analysis
Fact-checking tweets
42.3% of the 194 fact-check (n=82) tweets posted by the three accounts in October 2012 contained rulings that were advantageous to the Democratic Party (i.e., either positive to Obama or negative to Romney), while 23.7% of them (n=46) were advantageous to the Republican Party (i.e., either positive to Romney or negative to Obama). The remaining 34% (n=66) were neutral, as their statements contained either a contextualized analysis or a neutral anchor.

In addition to the relative advantage of the fact checks, the valence of the fact-checking tweet toward each candidate was also analyzed. Of the 194 fact checks, 34.5% (n=67) were positive toward Obama, 46.9% (n=91) were neutral toward Obama, and 18.6% (n=36) were negative toward Obama. On the other hand, 14.9% (n=29) of the 194 fact checks contained positive valence toward Romney, 53.6% (n=104) were neutral toward Romney, and 31.4% (n=61) were negative valence toward Romney.
Of course, many have no problem interpreting results like these as a strong indication that Republicans lie more than Democrats. And we cheerfully admit the data show consistency with the assumption that Republicans lie more.

Still, if one has some interest in applying the methods of science, on what do we base the hypothesis that Republicans lie more? We cannot base that hypothesis on these data without ruling out the idea that fact-checking journalists lean to the left. And unfortunately for the "Republicans lie more" hypothesis, we have some pretty good data showing that American journalists tend to lean to the left.

Until we have some reasonable argument why left-leaning journalists do not allow their bias to affect their work, the results of studies like PSS give us more evidence that the media (and the mainstream media subset "fact checkers") lean left while they're working.

The "liberal bias" explanation has better evidence than the "Republicans lie more" hypothesis. As PolitiFact tweeted 126 of the total 194 fact check tweets, a healthy share of the blame likely falls on PolitiFact.

We wish the authors of the study, Jieun Shin and Kjerstin Thorson, had separated the three fact checkers in their results.

Wednesday, May 24, 2017

What if we lived in a world where PolitiFact applied to itself the standards it applies to others?

In that impossible world where PolitiFact applied its own standards to itself, PolitiFact would doubtless crack down on PolitiFact's misleading headlines, like the following headline over a story by Lauren Carroll:

While the PolitiFact headline claims that the Trump budget cuts Medicaid, and the opening paragraph says Trump's budget "directly contradicts" President Trump's promise not to cut Medicaid, in short order Carroll's story reveals that the Medicaid budget goes up under the new Trump budget.

So it's a cut when the Medicaid budget goes up?

Such reasoning has precedent at PolitiFact. We noted in December 2016 that veteran PolitiFact fact-checker Louis Jacobson wrote that the most natural way to interpret "budget cut" was against the baseline of expected spending, not against the previous year's spending.

Jacobson's approach in December 2016 helped President Obama end up with a "Compromise" rating on his aim to cut $1 trillion to $1.5 trillion in spending. By PolitiFact's reckoning, the president cut $427 billion from the budget. PolitiFact obtained that figure by subtracting actual outlays from the estimates the Congressional Budget Office published in 2012 and using the cumulative total for the four years.

Jacobson took a different tack back in 2014 when he faulted a Republican ad attacking the Affordable Care Act's adjustments to Medicare spending (which we noted in the earlier linked article):
First, while the ad implies that the law is slicing Medicare benefits, these are not cuts to current services. Rather, as Medicare spending continues to rise over the next 10 years, it will do so at a slower pace would [sic] have occurred without the law. So claims that Obama would "cut" Medicare need more explanation to be fully accurate.
We can easily rework Jacobson's paragraph to address Carroll's story:
First, while the headline implies that the proposed budget is slicing Medicaid benefits, these are not cuts to current services. Rather, as Medicaid spending continues to rise over the next 10 years, it will do so at a slower pace than would occur without the law. So claims that Trump would "cut" Medicaid need more explanation to be fully accurate.
PolitiFact is immune to the standard it applies to others.

We also note that a pledge not to cut a program's spending is not reasonably taken as a pledge not to slow the growth of spending for that program. Yet that unreasonable interpretation is the foundation of PolitiFact's "Trump-O-Meter" article.

Correction May 24, 2017: Changed the first incidence of "law" in our reworking of Jacobson's sentence to "proposed budget." It better fits the facts that way.
Update May 26, 2017: Added link to the PolitiFact story by Lauren Carroll

Friday, May 19, 2017

What "Checking How Fact Checkers Check" says about PolitiFact

A study by doctoral student Chloe Lim (Political Science) of Stanford University gained some attention this week, inspiring some unflattering headlines like this one from Vocativ: "Great, Even Fact Checkers Can’t Agree On What Is True."

Katie Eddy and Natasha Elsner explain inter-rater reliability

Lim's research approach somewhat resembled research by Michelle A. Amazeen of Rider University. Amazeen and Lim both used tests of coding consistency to assess the accuracy of fact checkers, but the two reached roughly opposite conclusions. Amazeen concluded that consistent results helped strengthen the inference that fact-checkers fact-check accurately. Lim concluded that inconsistent fact-checker ratings may undermine the public impact of fact-checking.

Key differences in the research procedure help explain why Amazeen and Lim reached differing conclusions.

Data Classification

Lim used two different methods for classifying data from PolitiFact and the Washington Post Fact Checker. She converted PolitiFact ratings to a five-point scale corresponding to the Washington Post Fact Checker's "Pinocchio" ratings, and she divided ratings into "True" and "False" groups using the line between "Mostly False" and "Half True" as the barrier between true and false statements.

Amazeen opted for a different approach. Amazeen did not try to reconcile the two different rating systems at PolitiFact and the Fact Checker, electing to use a binary system that counted every statement rated other than "True" or "Geppetto check mark" as false.

Amazeen's method essentially guaranteed high inter-rater reliability, because "True" judgments from the fact checkers are rare.  Imagine comparing movie reviewers who use a five-point scale but with their data divided up into great movies or not-great movies. A one-star rating of "Ishtar" by one reviewer would show agreement with a four-star rating of the same movie by another reviewer. Disagreements only occur when one reviewer gives five stars while the other one gives a lower rating.

Professor Joseph Uscinski's reply to Amazeen's research, published in Critical Review, put it succinctly:
Amazeen’s analysis sets the bar for agreement so low that it cannot be taken seriously.
Amazeen found high agreement among fact checkers because her method guaranteed that outcome.

Lim's methods provide for more varied and robust data sets, though Lim experienced the same problem Amazeen found in that two different fact-checking organizations only rarely check the same claims. Both researchers used relatively small data sets.

The meaning of Lim's study

In our view, Lim's study rushes to its conclusion that fact-checkers disagree without giving proper attention to the most obvious explanation for the disagreement she measures.

The rating systems the fact checkers use lend themselves to subjective evaluations. We should expect that condition to lead to inconsistent ratings. When I reviewed Amazeen's method at Zebra Fact Check, I criticized it for applying inter-coder reliability standards to a process much less rigorous than social science coding.

Klaus Krippendorff, creator of the K-alpha measure Amazeen used in her research, explained the importance of giving coders good instructions to follow:
The key to reliable content analyses is reproducible coding instructions. All phenomena afford multiple interpretations. Texts typically support alternative interpretations or readings. Content analysts, however, tend to be interested in only a few, not all. When several coders are employed in generating comparable data, especially large volumes and/or over some time, they need to focus their attention on what is to be studied. Coding instructions are intended to do just this. They must delineate the phenomena of interest and define the recording units to be described in analyzable terms, a common data language, the categories relevant to the research project, and their organization into a system of separate variables.
The rating systems of PolitiFact and the Washington Post Fact Checker are gimmicks, not coding instructions. The definitions mean next to nothing, and PolitiFact's creator, Bill Adair, has called PolitiFact's determination of Truth-O-Meter ratings "entirely subjective."

Lim's conclusion is right. The fact checkers are inconsistent. But Lim's use of coder reliability ratings is, in our view, a little like using a plumb line to measure whether a building has collapsed due to earthquake. The tool is too sophisticated for the job. The "Truth-O-Meter" and "Pinocchio" rating systems as described and used by the fact checkers do not qualify as adequate sets of coding instructions.

We've belabored the point about PolitiFact's rating system for years. It's a gimmick that tends to mislead people. And the fact-checking organizations that do not use a rating system avoid it for precisely that reason.

Lucas Graves' history of the modern fact-checking movement, "Deciding What's True: The Rise of Political Fact-Checking in American Journalism," (Page 41) offers an example of the dispute:
The tradeoffs of rating systems became a central theme of the 2014 Global Summit of fact-checkers. Reprising a debate from an earlier journalism conference, Bill Adair staged a "steel-cage death match" with the director of Full Fact, a London-based fact-checking outlet that abandoned its own five-point rating scheme (indicated by a magnifying lens) as lacking precision and rigor. Will Moy explained that Full Fact decided to forgo "higher attention" in favor of "long-term reputation," adding that "a dodgy rating system--and I'm afraid they are inherently dodgy--doesn't help us with that."
Coding instructions should provide coders with clear guidelines preventing most or all debate in deciding between two rating categories.

Lim's study in its present form does its best work in creating questions about fact checkers' use of rating systems.

Sunday, May 14, 2017

PolitiFact and robots.txt (updated)

We were surprised earlier this week when our attempt to archive a PolitiFact fact check at the Internet Archive failed.

Saving a page to the Internet Archive has served as one of the standard methods for keeping record of changes at a website. PolitiFact Bias has often used the Internet Archive to document PolitiFact's mischief.

Webmasters have the option of instructing search engines to skip indexing content at a website through use of a "robots.txt" instruction. Historically, the Internet Archive has respected the presence of a robots.txt prohibition.

PolitiFact apparently decided to start using a limiting robots.txt recently. As a result, it's likely that none of the archived links will work for a time, either at PolitiFact Bias or elsewhere.

The good news in all of this? The Internet Archive is likely to start ignoring the robots.txt instruction in the very near future. Once that happens, PolitiFact's sketchy Web history will return from the shadows back into the light.

PolitiFact may have had a legitimate reason for the change, but our extension of the benefit of the doubt comes with a big caveat: The PolitiFact webmaster could have created an exception for the Internet Archive in its robots.txt instruction. That oversight creates an embarrassment for PolitiFact, at minimum.

Update May 18, 2017:

This week the Internet Archive Wayback Machine once again functioned properly in saving Web pages at Links at to archived pages likewise function properly.

We do not know at this point whether PolitiFact created an exception for the Internet Archive (and others), or whether the Internet Archive has already started ignoring robots.txt. PolitiFact has made no announcement regarding any change, so far as we can determine.