AIMSweb

Reading Curriculum-Based Measurement

Rating Summary

Classification Accuracyhalf bubble
GeneralizabilityModerate High
Reliabilityfull bubble
Validityhalf bubble
Disaggregated Reliability and Validity Datahalf bubble
Efficiency
AdministrationIndividual
Administration & Scoring Time1-5 Minutes
Scoring KeyComputer Scored
Benchmarks / NormsYes
Cost Technology, Human Resources, and Accommodations for Special Needs Service and Support Purpose and Other Implementation Information Usage and Reporting

Annual cost per student:

AIMSweb assessment materials are included with an AIMSweb System software subscription:

AIMSweb Systems
Grades K – 8: $3.00-$5.00 per student per year.

Included in the price are manuals and test materials, directions for administration, test forms, technical manuals, and protocol per student.

*all materials are provided via download in PDF format

Internet access is required for full use of product services.

Testers will require 4 - 8 hours of training.

Paraprofessionals can administer the test.

Alternate forms available in Spanish for benchmarking.

Pearson
19500 Bulverde Road
San Antonio, TX 78259
Phone: 210-339-5247
Visit AIMSweb.com
Tech support: aimswebsupport@pearson.com

Field tested training manuals are included and should provide all implementation information.

AIMSweb Training sessions are available.

Ongoing technical support is provided.

As a reading screening tool, Reading-CBM is utilized to identify children at-risk of reading failures and those students significantly below grade-level expectations.

As a progress monitoring tool, additional standardized, equivalent, and graded alternate forms are used to frequently measure student progress towards specific goals and monitor the effects of instructional changes.

Reading-CBM is a 1 minute standardized measure of oral reading of graded passages to administer for individual students.

Raw score, percentile score, developmental benchmark scores (cut points and benchmarks), probability scores, and error analysis scores are available.

Raw scores are computed by computing the total number of words read correctly within the 1 minute time period. A raw score is also reported for the total number of errors (words read incorrectly). These data can be interpreted in a norm-referenced way via percentiles or categorically in a standard interpretive format (e.g., below average, average, above average, etc.). Scores are also interpreted by converting progress over time into a Rates of Improvement (ROI) index, typically derived by using an ordinary least squares regression line through the data. A composite score is not calculated.

Reading-CBM has 33 alternate forms available for each grade 2 through 8, and 20 alternate forms for grade 1.

Individually administered.

 

Classification Accuracy

Classification Accuracy in Predicting Proficiency on Pennsylvania System of School Assessment
  Grade 1 Grade 3
False Positive Rate 0.10 0.19
False Negative Rate 0.28 0.23
Sensitivity 0.72 0.77
Specificity 0.90 0.81
Positive Predictive Power 0.74 0.65
Negative Predictive Power 0.89 0.88
Overall Classification Rate 0.85 0.80
AUC (ROC) 0.88 0.88
Base Rate Not reported Not reported
Cut Points: 36 110
At 90% Sensitivity, Specificity equals Not reported Not reported
At 80% Sensitivity, Specificity equals Not reported Not reported
At 70% Sensitivity, Specificity equals Not reported Not reported

 

Classification Accuracy in Predicting Proficiency on TerraNova Achievement Test
  Grade 2
False Positive Rate 0.09
False Negative Rate 0.21
Sensitivity 0.79
Specificity 0.91
Positive Predictive Power 0.81
Negative Predictive Power 0.90
Overall Classification Rate 0.87
AUC (ROC) 0.94
Base Rate Not reported
Cut Points: 81
At 90% Sensitivity, Specificity equals Not reported
At 80% Sensitivity, Specificity equals Not reported
At 70% Sensitivity, Specificity equals Not reported

 

Classification Accuracy in Predicting Proficiency on North Carolina End of Grade Test
  Grade 3 Grade 4 Grade 5
Fall Winter Fall Winter Fall Winter
False Positive Rate 0.24 0.25 0.26 0.23 0.26 0.27
False Negative Rate 0.23 0.23 0.22 0.22 0.25 0.21
Sensitivity 0.77 0.77 0.78 0.78 0.75 0.79
Specificity 0.76 0.75 0.74 0.77 0.74 0.73
Positive Predictive Power 0.59 0.61 0.51 0.58 0.51 0.53
Negative Predictive Power 0.88 0.87 0.91 0.90 0.90 0.90
Overall Classification Rate 0.76 0.76 0.75 0.77 0.75 0.75
AUC (ROC) 0.85 0.86 0.83 0.86 0.83 0.84
Base Rate 0.31 0.34 0.26 0.29 0.26 0.28
Cut Points: 84 103 101 113 115 135
At 90% Sensitivity, Specificity equals 0.66 0.68 0.66 0.67 0.65 0.65
At 80% Sensitivity, Specificity equals 0.77 0.76 0.76 0.83 0.75 0.77
At 70% Sensitivity, Specificity equals 0.86 0.85 0.81 0.87 0.82 0.84

 

Classification Accuracy in Predicting Proficiency on Illinois Standards Achievement Test (ISTAT)
  Grade 6 Grade 7 Grade 8
Fall Winter Fall WinterFall Winter Spring
False Positive Rate 0.26 0.27 0.25 0.24 0.21 0.26
False Negative Rate 0.23 0.22 0.24 0.23 0.20 0.21
Sensitivity 0.77 0.78 0.76 0.77 0.80 0.79
Specificity 0.74 0.73 0.75 0.76 0.79 0.74
Positive Predictive Power 0.48 0.46 0.48 0.48 0.47 0.46
Negative Predictive Power 0.91 0.92 0.91 0.92 0.94 0.92
Overall Classification Rate 0.75 0.74 0.75 0.76 0.79 0.75
AUC (ROC) 0.83 0.83 0.84 0.84 0.86 0.84
Base Rate 0.24 0.23 0.23 0.22 0.19 0.22
Cut Points: 128 141 133 144 130 144
At 90% Sensitivity, Specificity equals 0.59 0.58 0.64 0.67 0.69 0.63
At 80% Sensitivity, Specificity equals 0.76 0.75 0.76 0.78 0.81 0.76
At 70% Sensitivity, Specificity equals 0.83 0.85 0.84 0.84 0.88 0.82

 

Generalizability

Description of Study Sample 1: Keller-Margulis, M., Shapiro, E. S., & Hintze, J. M. (2008). Long term diagnostic accuracy of curriculum-based measures in reading and mathematics. School Psychology Review, 37, 374-390.

  Grades 1-3
Number of States: 1
Size: Approximately 200
Gender: Unknown 100% (not reported)
SES: Eligible for free or reduced-price lunch 32.8% (low-income level based on national poverty levels)
Race/ Ethnicity: White, Non-Hispanic 58%
Black, Non-Hispanic 9%
Hispanic 31%
American Indian/Alaska Native <1%
Asian/Pacific Islander 3%
Language proficiency status: % LEP 8%

Description of Study Sample 2:

  Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Fall Winter Fall Winter Fall Winter Fall Winter Fall Winter Fall Winter
Number of States: 1 1 1 1 1 1 1 1 1 1 1 1
Size: 1,105 1,248 1,193 1,332 1,105 1,171 1,393 1,599 1,444 1,656 1,276 1,223
Gender Male 39% 40% 37% 39% 31% 33% 53% 52% 51% 50% 50% 50%
Female 37% 39% 37% 38% 34% 35% 42% 44% 45% 47% 46% 46%
Unknown 24% 21% 26% 23% 34% 33% 5% 4% 4% 3% 4% 4%
SES: Eligible for free or reduced-price lunch             36% 34% 31% 30% 32% 36%
Race/Ethnicity White, Non-Hispanic 39% 36% 36% 33% 35% 34% 46% 47% 47% 45% 48% 56%
Black, Non-Hispanic 18% 23% 21% 25% 20% 22% 11% 12% 9% 10% 9% 12%
Hispanic 10% 11% 11% 12% 7% 8% 17% 18% 17% 18% 16% 20%
American Indian/Alaska Native 0.1% 0.1% 0.2% 0.1% 0.3% 0.3% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1%
Asian/Pacific Islander 3% 3% 2% 2% 2% 2% 1% 2% 1% 1% 1% 1%
Other 4% 4% 3% 3% 3% 3% 2% 2% 2% 3% 2% 3%
Unknown 25% 22% 26% 24% 32% 31% 22% 19% 25% 23% 23% 72%
Disability classification: % with disability classification 2% 3% 2% 3% 1% 1% 8% 8% 9% 8% 7% 7%
Language proficiency status: % ELL 4% 4% 3% 3% 1% 1% 3% 3% 2% 2% 1% 1%

Reliability

Type of Reliability Age or Grade n Coefficient SEM Information (including normative data)/Subjects
range median
Alternate form 1 (W,S) 1,000 0.97-0.97 0.97 6.1 Calculated separately at each benchmark period (fall, winter, spring). Reliability of the median of three probe scores, based on the average inter-form correlation. Alternate-form reliability of the median: Just as the mean of several scores is more reliable that a single score, the median is also more reliable than a single score. However, there is no formula for estimating the reliability of a median that is analogous to the Spearman-Brown formula for estimating the reliability of a mean. Therefore, we conducted a series of 1,000-case simulation studies, each of which assumed an average inter-probe correlation at one of the values from 0.88 to 0.95. In each study, six variables were generated by adding random error to a true-score variable, with the error variance controlled so as to produce intercorrelations among the variables equal to the target value of the inter-probe correlation. The median score on variables 1 to 3 was then correlated with the median score on variables 4 to 6 to yield the alternate-form reliability of the median of three probes. Results indicated that average probe intercorrelations of 0.88 through 0.90 produced a reliability-of-the-median value of 0.95; intercorrelations of 0.91 through 0.93 produced reliability-of-the-median of 0.96; and intercorrelations of 0.94 through 0.96 produced reliability-of-the-median values of 0.97.

Gender: F 50% M 50%
Ethnicity:
African American 10%
Asian 4%
Hispanic 10%
White, non-Hispanic 73%
Other 3%

2 1,000 0.97-0.97 0.97 6.4
3 1,000 0.96-0.97 0.97 7.2
4 1,000 0.97-0.97 0.97 6.6
5 1,000 0.97-0.97 0.97 7.3
6 1,000 0.96-0.97 0.97 7.6
7 1,000 0.96-0.97 0.97 7.8
8 1,000 0.96-0.97 0.96 6.9
Inter-rater 2 61   0.99   Administrations of a single R-CBM probe were digitally recorded, and then each recording was independently scored by two different raters who did their own timing. The sample was obtained at two suburban Minnesota schools and three urban Texas schools. Demographic characteristics were similar at all four grades; for the total sample, they were:

Gender: F 51% M 49%
Ethnicity:
African American 2%
Asian 1%
Hispanic 65%
White, non-Hispanic 32%
ELL: 1%
Receiving special education services: 3%

4 63   0.99  
6 73   0.99  
8 63   0.99  
Split-half 2 61   0.97   Same sample as for inter-rater study. Correlations between scores (WRC) on the first and second 30-second portions of each probe were adjusted by Spearman-Brown and then used to compute the reliability of the median of three probe scores.
4 63   0.97  
6 73   0.96  
8 63   0.97  
Retest 1 1,000   0.91   Correlations between median scores at adjacent benchmark periods (fall-winter or winter-spring; winter-spring only for grade 1). Same sample as for alternate-form reliability.
2 1,000 0.93,0.94 0.93  
3 1,000 0.93,0.94 0.93  
4 1,000 0.94,0.95 0.94  
5 1,000 0.95,0.95 0.95  
6 1,000 0.95,0.95 0.95  
7 1,000 0.95,0.95 0.95  
8 1,000 0.95,0.96 0.95  

 

Validity

Type of Validity Age or Grade Test or Criterion n (range) Coefficient (if applicable) Information (including normative data)/Subjects
Range Median
Predictive 2 PSSA ~ 200 0.69-0.71 0.71 1 year interval (Keller-Margulis et al., 2008)
4 PSSA ~ 200 0.67-0.69 0.69
Predictive 3 (F,W) MCA 2,051 0.68-0.70 0.69 Silberglitt & Hintze (2005)
Predictive 3 (F,W) PSSA 185 0.65-0.66 0.65 Shapiro et al. (2006)
4 (F,W) MAT8 213 0.71-0.72 0.71
5 (F,W) PSSA 185 0.68-0.69 0.68
Predictive 3 (F) MAP 137   0.76 Andren (2010)
3 (F,W) NECAP 137 0.68-0.71 0.69
Predictive 3 (F) NCEGT 1,087   0.69 (0.67) 2009-2010
4 (F) NCEGT 1,174   0.70 (0.65)
5 (F) NCEGT 1,088   0.68 (0.66)
6 (F) ISAT 1,326   0.64 (0.64)
7 (F) ISAT 1,328   0.63 (0.65)
8 (F) ISAT 911   0.60 (0.62)
3 (W) NCEGT 1,087   0.71 (0.70)
4 (W) NCEGT 1,174   0.71 (0.67)
5 (W) NCEGT 1,088   0.67 (0.66)
6 (W) ISAT 1,326   0.65 (0.66)
7 (W) ISAT 1,328   0.63 (0.65)
8 (W) ISAT 911   0.60 (0.62)
Construct 3 MCA 2,126   0.71 Silberglitt & Hintze (2005)
Construct 3 PSSA 185   0.67 Shapiro et al. (2006)
4 MAT8 213   0.70
5 PSSA 206   0.67
Construct 2, 3, 4 MAP 71-85 0.68-0.72 0.70 Merino & Beckman (2010)
Construct 3 (F,W) MAP 137 0.77-0.81 0.79 Andren (2010)
Construct 3 (S) NCEGT 1,087   0.72 (0.71) 2009-2010
4 (S) NCEGT 1,174   0.72 (0.68)
5 (S) NCEGT 1,088   0.69 (0.67)
6 (S) ISAT 1,326   0.64 (0.65)
7 (S) ISAT 1,328   0.62 (0.64)
8 (S) ISAT 911   0.60 (0.62)
Content The passages used in AIMSweb R-CBM were developed to represent the types of narrative text that students in a particular grade typically encounter in school. The creation and refinement of the set of passages followed a careful development process documented by Howe and Shinn (2002). Key components of this process included:
  • authors with experience writing for students at various grade levels
  • specifications for readability (such as number of syllables and sentences per 100 words)
  • monitoring readability using the Fry formula and Lexile scaling
  • field testing the passages and eliminating those with low alternate-form reliability, an atypical score level, or an inappropriate readability index or Lexile score

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations

Disaggregated Classification Accuracy

Classification Accuracy in Predicting Proficiency on North Carolina End of Grade Test
  Grade 3 Grade 4 Grade 5
Fall Winter Fall Winter Fall Winter
African American White, non-Hispanic African American White, non-Hispanic African American White, non-Hispanic African American White, non-Hispanic African American White, non-Hispanic African American White, non-Hispanic
False Positive Rate 0.27 0.29 0.23 0.23 0.29 0.31 0.28 0.28 0.37 0.26 0.33 0.26
False Negative Rate 0.25 0.15 0.22 0.17 0.25 0.23 0.21 0.16 0.23 0.13 0.21 0.18
Sensitivity 0.75 0.85 0.78 0.83 0.75 0.77 0.79 0.84 0.77 0.87 0.79 0.82
Specificity 0.73 0.71 0.77 0.77 0.71 0.69 0.72 0.72 0.63 0.74 0.67 0.74
Positive Predictive Power 0.79 0.30 0.84 0.35 0.67 0.24 0.71 0.30 0.70 0.35 0.73 0.34
Negative Predictive Power 0.68 0.97 0.70 0.97 0.78 0.96 0.80 0.97 0.70 0.97 0.73 0.96
Overall Classification Rate 0.75 0.73 0.77 0.78 0.73 0.70 0.75 0.74 0.70 0.76 0.73 0.75
AUC (ROC) 0.82 0.86 0.85 0.87 0.79 0.82 0.82 0.85 0.78 0.84 0.80 0.82
Base Rate 0.58 0.13 0.60 0.13 0.44 0.11 0.47 0.13 0.53 0.14 0.54 0.14
Cut Points: 80 90 98 102 97 108 109 122 117 123 132 143
At 90% Sensitivity, Specificity equals 0.62 0.71 0.65 0.75 0.55 0.60 0.59 0.67 0.51 0.74 0.56 0.66
At 80% Sensitivity, Specificity equals 0.73 0.80 0.81 0.83 0.71 0.70 0.76 0.80 0.63 0.80 0.68 0.74
At 70% Sensitivity, Specificity equals 0.84 0.87 0.88 0.89 0.76 0.78 0.81 0.87 0.75 0.83 0.81 0.82

 

Classification Accuracy in Predicting Proficiency on Illinois Standards Achievement Test (ISAT)
  Grade 6 Grade 7 Grade 8
Fall Winter Fall Winter Fall Winter
Hispanic White, non-Hispanic Hispanic White, non-Hispanic Hispanic White, non-Hispanic Hispanic White, non-Hispanic Hispanic White, non-Hispanic Hispanic White, non-Hispanic
False Positive Rate 0.29 0.34 0.23 0.28 0.36 0.27 0.37 0.27 0.29 0.20 0.26 0.21
False Negative Rate 0.16 0.25 0.22 0.25 0.24 0.19 0.25 0.18 0.20 0.17 0.23 0.20
Sensitivity 0.84 0.75 0.78 0.75 0.76 0.81 0.75 0.82 0.80 0.83 0.77 0.80
Specificity 0.71 0.66 0.77 0.72 0.64 0.73 0.63 0.73 0.71 0.80 0.74 0.79
Positive Predictive Power 0.55 0.39 0.58 0.42 0.54 0.46 0.50 0.45 0.53 0.47 0.55 0.43
Negative Predictive Power 0.91 0.90 0.89 0.91 0.83 0.93 0.84 0.94 0.90 0.96 0.88 0.95
Overall Classification Rate 0.75 0.68 0.77 0.73 0.68 0.74 0.67 0.75 0.74 0.81 0.75 0.80
AUC (ROC) 0.84 0.80 0.84 0.81 0.76 0.85 0.78 0.85 0.82 0.88 0.81 0.88
Base Rate 0.30 0.23 0.29 0.21 0.36 0.22 0.33 0.21 0.29 0.17 0.30 0.16
Cut Points: 127 136 133 144 136 138 144 151 132 128 141 140
At 90% Sensitivity, Specificity equals 0.68 0.53 0.64 0.53 0.44 0.66 0.50 0.71 0.59 0.74 0.61 0.75
At 80% Sensitivity, Specificity equals 0.74 0.66 0.80 0.72 0.64 0.75 0.63 0.80 0.77 0.87 0.76 0.83
At 70% Sensitivity, Specificity equals 0.84 0.78 0.91 0.80 0.71 0.83 0.75 0.84 0.78 0.91 0.80 0.89

Disaggregated Reliability

Type of Reliability Age or Grade n Coefficient SEM Information (including normative data)/Subjects
range median
Alternate form 1 (W,S) 100 0.97-0.97 0.97 6.1 African American students.
Reliability of the median of 3 probe scores administered at a benchmark testing period.
2 54 0.96-0.97 0.96 6.3
3 77 0.97-0.97 0.97 6.5
4 51 0.97-0.97 0.97 6.6
5 46 0.97-0.97 0.97 6.8
6 59 0.96-0.97 0.97 8.0
7 88 0.96-0.97 0.96 6.6
8 130 0.97-0.97 0.97 6.9
Alternate form 1 (W,S) 105 0.96-0.97 0.96 5.1 Hispanic students.
Reliability of the median of 3 probe scores administered at a benchmark testing period.
2 58 0.96-0.97 0.96 6.2
3 68 0.97-0.97 0.97 7.7
4 69 0.97-0.97 0.97 6.6
5 69 0.95-0.97 0.96 7.3
6 88 0.96-0.97 0.96 6.9
7 63 0.97-0.97 0.97 7.0
8 44 0.95-0.96 0.95 6.2
Alternate form 1 (W,S) 449 0.97-0.97 0.97 6.0 White non-Hispanic students.
Reliability of the median of 3 probe scores administered at a benchmark testing period.
2 443 0.96-0.97 0.96 6.9
3 511 0.96-0.97 0.96 7.5
4 399 0.97-0.97 0.97 6.8
5 468 0.97-0.97 0.97 7.4
6 608 0.97-0.97 0.97 7.2
7 487 0.96-0.97 0.97 8.0
8 508 0.96-0.97 0.97 6.5
Alternate form 1 (W,S) 65 0.96-0.97 0.97 5.3 ELL students.
Reliability of the median of 3 probe scores administered at a benchmark testing period.
2 75 0.96-0.97 0.97 6.4
3 74 0.96-0.97 0.97 6.5
4 51 0.96-0.97 0.96 5.9
5 60 0.96-0.96 0.96 6.7
6 77 0.95-0.95 0.95 8.0
7 98 0.96-0.96 0.96 7.2
8 82 0.97-0.97 0.97 6.2
Alternate form 1 (W,S) 175 0.97-0.97 0.97 5.6 Students receiving free/reduced lunch.
Reliability of the median of 3 probe scores administered at a benchmark testing period.
2 133 0.96-0.97 0.96 6.6
3 162 0.96-0.97 0.97 6.9
4 157 0.97-0.97 0.97 6.9
5 181 0.96-0.97 0.96 7.6
6 246 0.95-0.96 0.96 8.0
7 139 0.97-0.97 0.97 7.4
8 100 0.96-0.97 0.97 6.7

Disaggregated Validity


Type of Validity
Age or Grade Test or Criterion n Coefficient Information (including normative data)/Subjects
median
Predictive 3 (F) NCEGT 201 0.72 (0.65) African American students in the Classification Accuracy samples
4 (F) NCEGT 246 0.64 (0.59)
5 (F) NCEGT 210 0.64 (0.61)
6 (F) ISAT 144 0.56 (0.57)
7 (F) ISAT 100 0.56 (0.55)
8 (F) ISAT 96 0.56 (0.55)
3 (W) NCEGT 201 0.72 (0.67)
4 (W) NCEGT 246 0.68 (0.62)
5 (W) NCEGT 210 0.66 (0.64)
6 (W) ISAT 144 0.66 (0.65)
7 (W) ISAT 100 0.55 (0.55)
8 (W) ISAT 96 0.57 (0.54)
3 (F) NCEGT 103 0.70 (0.60) Hispanic students in the Classification Accuracy samples
4 (F) NCEGT 127 0.68 (0.58)
5 (F) NCEGT 81 0.55 (0.43)
6 (F) ISAT 228 0.67 (0.62)
7 (F) ISAT 211 0.58 (0.54)
8 (F) ISAT 177 0.64 (0.64)
3 (W) NCEGT 103 0.70 (0.61)
4 (W) NCEGT 127 0.73 (0.65)
5 (W) NCEGT 81 0.59 (0.50)
6 (W) ISAT 228 0.70 (0.66)
7 (W) ISAT 211 0.62 (0.57)
8 (W) ISAT 177 0.65 (0.65)
Construct 3 (S) NCEGT 201 0.68 (0.61) African American students in the Classification Accuracy samples
4 (S) NCEGT 246 0.70 (0.64)
5 (S) NCEGT 210 0.66 (0.63)
6 (S) ISAT 144 0.63 (0.61)
7 (S) ISAT 100 0.54 (0.54)
8 (S) ISAT 96 0.64 (0.61)
3 (S) NCEGT 103 0.70 (0.63) Hispanic students in the Classification Accuracy samples
4 (S) NCEGT 127 0.70 (0.63)
5 (S) NCEGT 81 0.58 (0.50)
6 (S) ISAT 228 0.68 (0.63)
7 (S) ISAT 211 0.57 (0.56)
8 (S) ISAT 177 0.61 (0.62)