Sunday, March 27, 2016

Basic Data Analysis: How "RealClearPolitics" obscures data and may be too incompetent to exist

I was trolling around the Internet when someone linked me to the RealClearPolitics frequency count of the Democratic Popular Vote. Here's the data as it was being presented, as I would not be surprised if it were to change:


2016 Democratic Popular Vote

5.1k Shares
StateDateClintonSandersSpread
RCP Total-8,924,9206,398,420Clinton +2,526,500
IowaFebruary 1


New HampshireFebruary 995,252151,584Sanders +56,332
NevadaFebruary 20


South Carolina February 27271,51495,977Clinton +175,537
AlabamaMarch 1309,92876,399Clinton +233,529
American SamoaMarch 1


ArkansasMarch 1144,58064,868Clinton +79,712
ColoradoMarch 149,31472,115Sanders +22,801
Democrats AbroadMarch 1-8


GeorgiaMarch 1543,008214,332Clinton +328,676
MassachusettsMarch 1603,784586,716Clinton +17,068
MinnesotaMarch 173,510118,135Sanders +44,625
OklahomaMarch 1139,338174,054Sanders +34,716
TennesseeMarch 1245,304120,333Clinton +124,971
TexasMarch 1935,080475,561Clinton +459,519
VermontMarch 118,335115,863Sanders +97,528
VirginiaMarch 1503,358275,507Clinton +227,851
LouisianaMarch 5221,61572,240Clinton +149,375
NebraskaMarch 514,34019,120Sanders +4,780
KansasMarch 512,59326,450Sanders +13,857
MaineMarch 6


MississippiMarch 8182,44736,348Clinton +146,099
MichiganMarch 8576,795595,222Sanders +18,427
Northern MarianasMarch 12


FloridaMarch 151,097,400566,603Clinton +530,797
IllinoisMarch 151,007,382971,555Clinton +35,827
MissouriMarch 15310,602309,071Clinton +1,531
North CarolinaMarch 15616,383460,316Clinton +156,067
OhioMarch 15679,266513,549Clinton +165,717
ArizonaMarch 22235,697163,400Clinton +72,297
IdahoMarch 225,06518,640Sanders +13,575
UtahMarch 2215,66661,333Sanders +45,667
AlaskaMarch 2699440Sanders +341
HawaiiMarch 2610,12523,530Sanders +13,405
WashingtonMarch 267,14019,159Sanders +12,019
























































People were using this "data" to support the assertion that Hillary Clinton has "more of the popular vote" than Bernie Sanders. That really seems to defy intuition, given the record crowds Bernie draws to his speeches. But really. Look at that data for the last 3 rows. Does it make any sense at all?

Only 500 people showed up to vote in AK? Really?

But its population is about 750K...HI has a population of 1.3M.WA has a population of 7M.

Yet HI had more voters than WA?

Take WA: Even if you assume only 30% of the population is eligible to vote (that's a lot of kids and foreigners!), only 50% of those people are Democrats (which would be low for The Left Coast), and only 25% of Democrats show up for the primaries (because people in WA don't care about politics..FALSE), that would be... 262K people.

What WA *actually* reported are _delegate numbers_. In one precinct in Washington, 149 people showed up. 8 delegates were awarded. That means the ratio of reported delegates to actual participants was 18:1.

If this scale alone were appropriate, the actual popular vote would be about 464K.

This chart is completely invalid. It CLEARLY has a BIG CATEGORICAL ERROR. It compares apples and oranges. It takes only a little common sense to realize this.

I can only attribute such a dramatic failure of data analysis 101 that one would think a high school student in a reasonable education system would be able to catch to either maliciousness or complete and utter incompetence. If I were a "tinfoil hat" wearing person, I would say this data is purposefully being misrepresented so it can be misconstrued...

But I'm a charitable person. So instead, I'll assume it's complete and utter incompetence by "RealClearPolitics" that obscures more than it helps. I might be wild and say this kind of dramatic failure should cause us to immediately discount all of their data as lacking basic data sense, but maybe this particular table is "just the intern."

But isn't it funny how fundamental errors of basic analysis in these types of "canonical data sources" can spread misinformation like a plague throughout the Internet?


Don't forget to question the data, and the methods used to collect and analyze it.