Mar 17 2007

Reading First In Madison

Published by rightwingprof at 12:27 pm under Math

Ken DeRosa took a journalist to task for inaccuracies in her article about four Reading First schools in Madison, Wisconsin (go here for all the relevant information) then pointed me to the reading proficiency data the state of Wisconsin reported for all schools (the Wisconsin data are here). I downloaded the data, cleaned them up in Excel, and ran the stats, comparing the 98-99 and 04-05 school years. They reported four proficiency levels: minimal, basic, proficient, and advanced. We are interested in the percentages testing proficient or above (proficient+ in the tables below), so I added the percent proficient and percent advanced, and analyzed those data.

Before I go on, let me quickly address why we must analyze the data statistically, and cannot just report means. If we gave the same kids the same proficiency exams on two different days, say only a week apart, their scores would be different. Anytime we see a difference between scores, without statistics, we do not know if those differences are due to random variation or not. We cannot without statistics point to two different scores or means and say, "See? The scores increased!"

Also, let me mention a few crucial points.

  • The more data we have, the more reliable our statistical analysis will be (this will become an issue later on).
  • Means (averages) alone do not give us a complete picture, particularly when they are means of aggregated data, as these are (this is why I look at other descriptive statistics).
  • Statistics always deals with probability (uncertainty), and we calculate our statistics to a specific probability, 95% here (sometimes statistics are calculated to a 99% probability). This is the level of significance (alpha), here, 0.05, or 5%.
  • We are assuming here either that the proficiency exam standards did not change between the two years or that the proficiency reports for the two years are comparable (if they are not, then Wisconsin cannot make any statement about their proficiency levels over time — and we will address this later).

First, I tackled the reading proficiency scores for the entire state. Here are the descriptive stats:

% Proficient+ 98-99 % Proficient+ 04-05
Mean 71.04 Mean 87.54
SE 0.48 SE 0.36
Median 73.58 Median 91.00
Mode 75.00 Mode 100.00
Stdev 16.12 Stdev 12.11
Sample Variance 259.79 Sample Variance 146.72
Kurtosis 1.03 Kurtosis 4.27
Skewness -0.92 Skewness -1.86
Range 100.00 Range 83.40
Minimum 0.00 Minimum 16.70
Maximum 100.00 Maximum 100.00
Sum 80559.47 Sum 100494.30
Count 1134.00 Count 1148.00
CL (95.0%) 0.94 CL (95.0%) 0.70

The mean increased from 98-99 to 04-05 by 16.5%. The standard deviation — the amount of variance, which we can roughly define as the average amount each school differed in one direction or another from the mean — decreased. The lower the standard deviation, the less "spread out" the data. This suggests that in the 04-05 school year, more schools clustered around the mean.

The kurtoses support this. Kurtosis is the "peakedness" of the data distribution. Visualize a bell curve as a water balloon (sorry no, I am not going to do graphics; you’ll have to use your imagination). Now, if you place your hand on the top and squish it downward, more of it will squish into the tails at either side. This would be a "flattened" curve, with a low kurtosis. Now, instead of squishing the top down, visualize taking either hand and pushing it in from the sides. The water (data) would push the middle of the curve upward, or make it more "peaked." This would be a curve with a higher kurtosis. So the higher the kurtosis, the more data is squished up into the middle, where the mean is. Now note that the kurtosis for 04-05 is higher than that for 98-99. This supports what the standard deviations tell us, that in 04-05, more data were clustered around the mean (and in 98-99, more data were squished into the tails).

So far, everything looks positive, until we look at the skewness. Go back to that bell curve water balloon. If you grab the tail on the right and pull it further to the right, more of the water will spill into that tail, right? That’s what we call a right-skewed curve, and it has a positive skewness factor. If you pull the left tail out, more water spills into that left tail, and we have a left-skewed curve, with a negative skewness factor. When we look at the skewness for the two years, both are left-skewed — that is, in both years, there are more data in the left tails (lower end of proficient) — but the 04-05 curve is more left-skewed than 98-99 (-1.86 and -0.92, respectively). So even though it does look like Wisconsin may have improved the reading proficiency between the two years, they also slightly increased those who were at the low end of proficient (if this seems like a paradox to you, think of the water balloon again, and all will be made clear).

However, we cannot say whether these differences are meaningful (statistically significant) or whether they are due to random variation from looking at the descriptive statistics alone. So we ran ANOVA to a probability of 95% (alpha=0.05) to test the null hypothesis, that the differences are due to random variation:

Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
% Proficient+ 98-99 1134 80559.47 71.04 259.79
% Proficient+ 04-05 1134 99261.70 87.53 146.58
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 154221.06 1 154221.06 759.01 2.3E-144 3.85
Within Groups 460420.91 2266 203.19
Total 614642 2267        

The value of p is extremely small (2.3E-144), far smaller than 0.05, so to a 95% probability, we have disproved the null hypothesis. In other words, we are 95% certain that the difference between the percent proficient and above in the two years is not due to random variation. Wisconsin can from these data validly claim that they raised their proficiency.

Now, let’s turn to Madison. First, the descriptive stats:

% Proficient+ 98-99 % Proficient+ 04-05  
Mean 62.82 Mean 82.46
SE 2.14 SE 1.91
Median 63.91 Median 82.80
Mode #N/A Mode 92.30
Stdev 10.89 Stdev 9.76
Sample Variance 118.59 Sample Variance 95.34
Kurtosis 0.54 Kurtosis -0.35
Skewness -0.43 Skewness -0.55
Range 48.64 Range 35.30
Minimum 36.36 Minimum 60.30
Maximum 85.00 Maximum 95.60
Sum 1633.45 Sum 2143.90
Count 26.00 Count 26
CL (95.0%) 4.40 CL (95.0%) 3.94

We certainly see a difference between the two years in the means, a larger difference than we saw for the whole state, though keep in mind that while we had over a thousand reporting schools for the state, we have 26 reporting schools for Madison, and recall that the more data we have, the more reliable our statistics are. So don’t jump immediately to conclusions. The kurtosis in 04-05 is flatter than that in 98-99, indicating that there are more data in the tails in 04-05; the skewness in both years is roughly identical.

Again, we have to run ANOVA to test the null hypothesis, that the difference is due to random variation:

Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
% Proficient+ 98-99 26 1633.45 62.82 118.59
% Proficient+ 04-05 26 2143.90 82.46 95.34
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 5010.78 1 5010.78 46.84 1.05E-08 4.03
Within Groups 5348.45 50 106.97
Total 10359.23 51        

And again, ANOVA disproves the null hypothesis. The value of p is 1.05E-08, so to a 95% level of probability, the difference is not due to random variation (alpha=0.05). Madison can from these data claim that they raised their proficiency levels.

However, the real issue here is the group of Reading First school systems in Madison (Glendale, Hawthorne, Lincoln, and Orchard Ridge). But before I go on, let me mention a crucial statistical issue we’re about to encounter, one I listed above and said would become an issue later on.

For the entire state of Wisconsin, we have over 1,000 schools reporting proficiency levels. For Madison, we have 26 reportings (this is why the p-value for the Madison ANOVA is larger than the p-value for the whole state, even though Madison reports a larger difference in proficiency than the state). A data set of twenty-six data points isn’t statistically ideal, but we can work with it.

At issue here, however, are the Reading Firsts in Madison, and there are only four. We could run ANOVA, but with only four data points the results wouldn’t be reliable. All we can do is calculate the descriptive statistics and interpret them cautiously:

% Proficient+ 98-99 % Proficient+ 04-05
Mean 50.64 Mean 66.50
SE 7.53 SE 2.82
Median 47.14 Median 65.90
Mode #N/A Mode #N/A
Stdev 15.06 Stdev 5.63
Sample Variance 226.92 Sample Variance 31.73
Kurtosis 2.46 Kurtosis 1.33
Skewness 1.29 Skewness 0.61
Range 35.55 Range 13.60
Minimum 36.36 Minimum 60.30
Maximum 71.91 Maximum 73.90
Sum 202.56 Sum 266.00
Count 4.00 Count 4.00
CL (95.0%) 23.97 CL (95.0%) 8.96

Did the percentage of students proficient or higher increase? Yes (50.64% to 66.5%). Again, though, we only have four data points here, so even the descriptive stats are questionable. The kurtosis suggests that there were more data around the mean than in the tails in 04-05 than there were in 98-99, though the standard deviations and ranges indicate the reverse, but again, there are only four data points here.

From these data, we cannot say that the Reading First schools did or did not increase their proficiencies. There just aren’t enough data. But — and this is crucial — neither can the Reading First schools claim that they raised their proficiencies using these data. The only way we can determine whether these school systems did or did not raise their proficiencies is by analyzing the raw data, and not the aggregates by school. In other words, Ken was right, and the journalist was wrong.

Remember that I said we are assuming that the state standards did not change, or if they did, that the scores in the two years we looked at are comparable? And remember how Wisconsin’s percentage of proficient or better went from 71.04% to 87.54% between the 98-99 and 04-05 school years? Ken states:

As the NAEP data clearly shows, the Wisconsin’s proficiency exam standards did change between 1998 and 2005. NAEP scores declined slightly, while the Wisconsin scores magically skyrocketed. Suspiciously so.

So, when RWP says that Wisconsin and Madison can validly claim that scores have increased, this conclusion only holds if Wisconsin’s proficiency exam standards didn’t change between 1998 and 2005. And, as we know from NAEP scores, they did.

Houston, we have a problem. You can write exams that are comparable to earlier exams. That’s why standardized test scores (SAT, GRE, GMAT, LSAT, etc.) are reliable from one administration to another. But if you change the standards, you can’t, because you change the definition of proficiency. Wisconsin has to explain the discrepancy between the NAEP data and their reported data, and if they did change standards, they must explain how, exactly, their scores from year to year are comparable, and how they can claim their reading proficiency increased.

Here endeth the lesson (add cute little smiley face here).

UPDATE: It looks like Wisconsin fudged the data.

9 responses so far

9 Responses to “Reading First In Madison”

  1. Catherine Johnsonon 18 Mar 2007 at 12:06 pm

    This is fantastic!

    So helpful.

    I don’t grasp this part:

    So even though it does look like Wisconsin may have improved the reading proficiency between the two years, they also slightly increased those who were less than proficient (if this seems like a paradox to you, think of the water balloon again, and all will be made clear).

    For some reason, the water balloon image isn’t making this clear….

    (Will finish reading, then re-read.)

  2. Catherine Johnsonon 18 Mar 2007 at 12:08 pm

    hmm…interestingly, reading the chart makes it clearer….

    For some reason, reading the chart and imagining the curve seem to “work.”

  3. Catherine Johnsonon 18 Mar 2007 at 12:11 pm

    I think the water balloon is distracting on the issue of skewedness. I didn’t start getting the point (which is a point I ought to know — it’s not new to me—) until I dropped the water balloon and visualized a bell curve on a chart AND read the data.

    The balloon was helpful on the issue of kurtosis.

    I don’t mean to nitpick; I have no idea whether other people would have the same experience.

    I’m reporting this just as a “data point” for instruction!

  4. rightwingprofon 18 Mar 2007 at 12:26 pm

    Cause it’s the shape of the curve. You can squish the water balloon so that you get more water up in the top and more water in the left tail, you see.

  5. Linda Seebachon 18 Mar 2007 at 3:04 pm

    I have to know this before I can wrap my head around the other stuff . . . Madison, Wis. . . . tiny Madison, Wis. . . . has 26 legally separate school districts? I’m in Denver, which is quite a lot bigger, and we have only one.

  6. KDeRosaon 18 Mar 2007 at 6:20 pm

    No, those are individual schools.

  7. rightwingprofon 20 Mar 2007 at 4:39 pm

    Linda Seebach on March 18, 2007 at 3:04 pm said:

    I have to know this before I can wrap my head around the other stuff . . . Madison, Wis. . . . tiny Madison, Wis. . . . has 26 legally separate school districts? I’m in Denver, which is quite a lot bigger, and we have only one.

    I guess it depends on your perspective, but considering that Madison has around 230,000 people, I wouldn’t call it tiny — that’s roughly half the size of Denver. The town where I grew up had 2,000 people. The county seat here has about 6,000 people.

    And I misspoke, er, whatever. I shouldn’t have said districts. They’re different schools.

    Since I submitted this to the carnival, I’ll correct that.

  8. Right Wing Nationon 21 Mar 2007 at 1:48 pm

    […] If you recall, I ran a statistical analysis of Wisconsin’s reading proficiency stats, and found that Madison’s Reading First schools could not validly claim that they had raised their proficiency levels. That analysis, of course, rested upon the assumption that Wisconsin had not changed their standards between 98-99 and 04-05, an assumption Ken DeRosa challenged: As the NAEP data clearly shows, the Wisconsin’s proficiency exam standards did change between 1998 and 2005. NAEP scores declined slightly, while the Wisconsin scores magically skyrocketed. Suspiciously so. […]

  9. Right Wing Nationon 24 Mar 2007 at 11:13 am

    […] Given the apparent inability of Wisconsin’s educrats to interpret data (see here and here), I thought I’d check for myself–after all, that’s quite a claim the superintendent is making. It took a few minutes to find the data (they aren’t on the download page with the other data, but with the reports), but find it, I did: […]

  • Recent Comments

    • Rich Horton: Farewell, and God bless Professor.
    • Curmudgeon: Good Night, Professor.
    • jimmyb: Rest in peace, Prof.
    • Glenn B: I don’t know where I have been lately, maybe my head was up my toosh. I have not been keeping up with...
    • Bitter American: From Wyatt Earp’s blog: sending you all my good thoughts every day.
  • Recent Trackbacks

  • Calendar

  • Archives

  • A Few Friends

  • A-List

  • Absolutely Essential

  • Activism

  • American Liberty

  • Buy Red

  • Columnists

  • Greylist

  • Military Blogs

  • Moral - Ethik - Kirche

  • News and Commentary

  • Research

  • Right Wing Blogs

  • RKABA and Firearms

  • Sane Muslims

  • Support the Troops

  • Talk Radio

  • Unapologetically Humorous

  • University Sites

  • Warzone Blogs

  • Meta

  • Stats 'n Stuff







  • Anglosphere Consortium