Mar 04 2008
Pesky, Pesky Ed Data!
I realize I’m being redundant here, but it makes little sense to link to a short post, so bear with me.
Ken De Rosa of D-Ed Reckoning fame, turned me on to schooldatadirect, which is, as he put it, the motherlode of all education data sites. You can download data aggregated by schools or school districts, for each state, but read this before you do.
I don’t know what moron organized the data, but these files are an incredible pain in the ass to analyze. Each variable for each school or district (depending on which file you download) is in its own row. This means you can’t just download it and analyze it. The data file for Pennsylvania (by district) has too many rows to open in Excel. So after fighting with the data, here’s what finally worked (though like I said, it’s a pain in the ass).
Download the file. It will be delimited with a | character. Create an Access database, then import the data file as a table (File, Get external data). Some rows will not import. I have no idea why, and after all the fighting I had done, I didn’t care.
Because of the idiotic way in which the data are organized, you’ll have to create queries, each one containing the variable you want to analyze, and export them to Excel. This is even more of a pain in the ass than it sounds because the idiot who entered the data used reserved SQL characters (%, for example) in the data. Since trying to work with these data is such an incredible, time consuming pain in the ass, I have so far only analyzed three four variables.
I looked at the percent proficient on the state math exam for grade 11, both the total proficient and the SES proficient. Now, because nearly every variable imaginable exists in these data, there are several different variables to use for SES. I used the “economically disadvantaged” variable, students who qualify for school lunch. I also looked at core spending per student, and the “Salaries by Function - Instruction ($ Per Student)” variable, because it’s instruction-specific; again, there are about 120 different financial variables, and therefore, God knows how many ways of looking at spending per student.
I exported only the rows (districts) that had values for all four variables. There are 500 or 501 districts in Pennsylvania represented in the data; 446 of those districts reported data for all four variables. Therefore, my sample size is 446 (n = 446).
First, let’s look at the two financial variables: Core spending per student ($ per Student) and Salaries by Function - Instruction $ Per Student (Inst per student).
| Inst per student | $ per Student | |
| Mean | $3,918.86 | $8,220.72 |
| Range | $5,138.00 | $10,435.00 |
| Minimum | $2,270.00 | $5,952.00 |
| Maximum | $7,408.00 | $16,387.00 |
Note that the monies spent on actual instruction are less than half the core spending per student. When I ran a correlation between the two financial variables and the two proficiency variables (SES State Math exam % proficient, and Total State Math exam % proficient), the results were striking.
| Inst per student | $ per Student | |
| Inst per student | 1 | |
| $ per Student | 0.877978 | 1 |
| SES Math 11 Prof % | 0.050577 | -0.01789 |
| Total Math 11 Prof % | 0.075703 | -0.05091 |
There are no statistically significant correlations between either the core spending per student or the instructional spending per student and either of the proficiencies on the state math exam. This does not support the “give us more money!” argument.
If we look at the maximum and minimum proficiency scores for both variables, we see that although the max schools spent more than the mean for core spending per student, so did the schools with the minimum proficiency percentages. In fact, the school district that produced the lowest proficiencies total and SES spent more money per student than either of the schools that produced the best proficiencies:
| Max: | $ per student | SES Math 11 Prof % | Total Math 11 Prof % |
| SES | $8,794 | 88.2 | 77.5 |
| Total | $10,163 | 86.9 | 87 |
| Min: | $ per student | SES Math 11 Prof % | Total Math 11 Prof % |
| SES | $8,495 | 0 | 27.2 |
| Total | $11,937 | 0 | 4.9 |
Nothing in these data — at least these financial variables — supports the hypothesis that spending more money will produce better education.
Let’s turn to the proficiency percentages. Here are the descriptive stats:
| SES Math 11 Prof % | Total Math 11 Prof % | |
| Mean | 34.41 | 51.68 |
| StdErr | 0.70 | 0.65 |
| Median | 33.35 | 52.40 |
| Mode | 33.30 | 57.40 |
| StdDev | 14.82 | 13.63 |
| SampVar | 219.69 | 185.87 |
| Kurtosis | 0.29 | 0.39 |
| Skewness | 0.39 | -0.37 |
| Range | 88.20 | 82.10 |
| Min | 0.00 | 4.90 |
| Max | 88.20 | 87.00 |
| Sum | 15347.50 | 23048.80 |
| Count | 446 | 446 |
| CI (95%) | 1.38 | 1.27 |
It certainly looks like there is a significant difference between the two, but because variation can be due to random factors, I ran ANOVA with an alpha of 0.01 (99%), rather than the more usual 0.05 (95%).
| Anova: SES Math Prof, Total Math Prof | ||||||
| Groups | Count | Sum | Average | Variance | ||
| SES Math 11 Prof % | 446 | 15347.5 | 34.41143 | 219.6855 | ||
| Total Math 11 Prof % | 446 | 23048.8 | 51.67892 | 185.8701 | ||
| Source of Variation | SS | df | MS | F | P-value | F crit |
| Between Groups | 66491.06 | 1 | 66491.06 | 327.9011 | 1.24E-62 | 6.663443 |
| Within Groups | 180472.2 | 890 | 202.7778 | |||
| Total | 246963.3 | 891 | ||||
And the value of F is significantly larger than the critical value of F (F crit above), so there is a 99% probability that the differences between the SES and total proficiencies is not a result of random variation (that’s a more specific, and accurate, way of saying that the difference is statistically significant). It’s highly unlikely, however, that funding has anything to do with the difference, given the variables I have so far analyzed, and that implies that spending more money will do nothing to alleviate the problem.
I’ll play around more with these data, though like I said, it’s a pain in the ass. But the one variable I would most like to see is never present in these data files, that is, student GPAs. An excellent way to know how well you’re teaching the material is to run your own grades against test scores; I would very much like to see how teacher-assigned grades correlate with test scores. Then, if I were a teachers’ union, I’d make very sure that student GPAs never appeared in any publically-accessible data.
Ken is working on the data. Here’s an analysis of teacher compensation, and here’s an analysis of SES performance. And here’s my analysis of teacher compensation and proficiency.
One response so far

[…] noting that the spending variables do not correlate with math proficiency rates, I thought I’d look specifically at instruction compensation (teacher salaries/benefits), and […]