There may be a few problems with the data behind this headline. It could be real or it could be tangled up with a few data anomalies. I assume the authors know this.
In 2008 The primary and secondary education in the US began reporting race ethnic codes in a two question format. For post secondary this format was optional in 2008 but required in 2010. It may be just coincidence but the upward trend begins in this time. Second, and this is what really caught my attention, the notes state that there were small sample size problems and so they used a moving average. Couple this with the data change and it may cause a magnification in the error. I have seen a lot of these kind of errors recently around the reporting change. Students are now allowed to select an “ethnic” code of Hispanic or not, then also select multiple “Race” Codes. This has two effects that increases the Hispanic reporting and washes the other races. First, and this affects the sample, when a student reports Hispanic and Asian or White the student will be part of the Hispanic sample under then new reporting and data shows that in this case students just as often self selected as Asian or White in the old reporting. Second, though it would not have an effect on this sample, in the new coding, when a student reports Asian and White or Native American and White they will be removed from their “Race” category and put in the new Two or more races category, where students usually would have reported just native American or Asian in the past. Third, and it may be found in the methods of the study, but often the ethnic code between secondary and higher education changes, compound this with the reporting and re-sampling changes over the change years and this may have created a system where the students who failed to show up in post secondary the next year and coded under the old system could skew the data.
None of this may have an effect and it is just a coincidence, it could also be that had there been a consistent measure there would be a real effect, or some well meaning think tank decided to take advantage of the data to make a headline.
There may be a few problems with the data behind this headline. It could be real or it could be tangled up with a few data anomalies. I assume the authors know this.
In 2008 The primary and secondary education in the US began reporting race ethnic codes in a two question format. For post secondary this format was optional in 2008 but required in 2010. It may be just coincidence but the upward trend begins in this time. Second, and this is what really caught my attention, the notes state that there were small sample size problems and so they used a moving average. Couple this with the data change and it may cause a magnification in the error. I have seen a lot of these kind of errors recently around the reporting change. Students are now allowed to select an “ethnic” code of Hispanic or not, then also select multiple “Race” Codes. This has two effects that increases the Hispanic reporting and washes the other races. First, and this affects the sample, when a student reports Hispanic and Asian or White the student will be part of the Hispanic sample under then new reporting and data shows that in this case students just as often self selected as Asian or White in the old reporting. Second, though it would not have an effect on this sample, in the new coding, when a student reports Asian and White or Native American and White they will be removed from their “Race” category and put in the new Two or more races category, where students usually would have reported just native American or Asian in the past. Third, and it may be found in the methods of the study, but often the ethnic code between secondary and higher education changes, compound this with the reporting and re-sampling changes over the change years and this may have created a system where the students who failed to show up in post secondary the next year and coded under the old system could skew the data.
None of this may have an effect and it is just a coincidence, it could also be that had there been a consistent measure there would be a real effect, or some well meaning think tank decided to take advantage of the data to make a headline.