IGOs and Stanford Testing in Preston County Schools

Commentary by
Timothy C. Miller, M.D.
Member, Preston County Board of Education

April 10, 2000


Outline

I. Why do some teachers object to teaching the IGOs?
II. Is the emphasis placed on standardized testing in Preston County schools inappropriate?
III. If the issues addressed by standardized testing are valid, is the SAT 9 an appropriate tool with which to address those issues?
IV. Summary and conclusions
Appendix One: Standardized Testing in Preston County Schools


During the 1999-2000 school year, the Preston County Education Association (PCEA) conducted a "Morale Survey" of its membership. One of the questions on that survey was as follows:

"If you believe that you are suffering from low morale now, which of the following have been contributing factors?"

The respondent was given a series of statements with which to either agree or disagree. The second most commonly identified factor contributing to low morale was "Extreme emphasis on Stanford Test results making the test more important than the students."

In an attempt to clarify the issue, the Preston County Board of Education asked the PCEA to perform a follow-up survey of its membership. This follow-up survey asked five questions, and the respondent composed his/her own reply:

1) Give specific examples of how the "extreme emphasis" has translated into extra and unreasonable demands placed on you outside the classroom.

2) What unreasonable tasks are you required to do in the classroom which you otherwise would not do as a result of this "extreme emphasis?"

3) What unreasonable modifications have you made to your lesson plans in order to comply with this extreme emphasis?

4) If the reasons for low morale related to Stanford tests are not explained by the above three broad areas of concern, then what is it about these tests which has you so upset?

5) If you object to standardized tests in general, what other instrument should the BOE, parents, and taxpayers use to gauge the performance of its schools?

Seventy survey documents were returned from this follow up survey which serve as the basis for this commentary. This commentary addresses a specific subset of concerns -- specifically those dealing with West Virginia Board of Education Instructional Goals and Objectives (IGOs) and the use of the Stanford 9 examination (SAT 9) as the basis for assessing mastery of those objectives.


The most frequently cited objection in regards to "extreme emphasis on the Stanford Test" is that teachers have been directed to teach from specific Instructional Goals and Objectives (IGOs) rather than the teacher's own curriculum goals. (31 comments). Examples of this sentiment, often expressed in terms such as "teaching the test -- the test is the curriculum," are as follows:

While other examples of "teaching the test," such as using classroom time to take practice tests (four comments) or giving regular classroom exams in the Stanford multiple choice format (three comments) are cited, teachers generally agree that the term "teaching the test" may be defined as focusing the curriculum on a group of selected IGOs (Instructional Goals and Objectives).

In reality it is the IGOs and the emphasis to base curriculum on them - rather than the test itself - which seems to be the major cause of resentment related to the Stanford Test.

The IGOs - West Virginia Board of Education Policy 2520, Instructional Goals and Objectives for West Virginia Schools - define the instructional goals and objectives for the programs of study and establishes a standardized format for such in West Virginia public schools. The effective dates of the current policy are given as July 1, 1997; the most recent revision was effective February 27, 1998. The IGOs are the curriculum standards which educators are asked to meet. For example, the Mathematics IGOs for Grade 8 consist of 60 separate IGOs for Grade 8 mathematics, 32 of which have been designated (in bold face) as the emphasized or tested IGOs. Three of the IGOs are listed here for further example. The numbers in parenthesis indicate the grade levels in which these particular IGOs are to be addressed.

8.11 (5,6,7) add, subtract, multiply, or divide fractions, mixed numbers, and integers resulting from problem situations using mental math, paper/pencil, and calculators

8.12 develop computational strategies based on the commutative, associative, and identity properties with emphasis on the inverse and distributive properties

8.13 (9,10,11) solve traditional and non- routine problems, which may include missing information, using appropriate tools

All the IGOs for all the grade levels may be found at the West Virginia Department of Education (WVDE) website at http://wvde.state.wv.us

I. Why do some teachers object to teaching the IGOs?

Either ...
1) The teacher objects to externally imposed standards of instruction in the classroom, taking the position that the teacher should be the sole arbiter of the subject matter that will be covered in his/her classroom. Or...
2) The teacher is willing to accept that there must be standards concerning the curriculum, but that this particular set of IGOs is flawed.

I-1) Should the teacher be the sole arbiter of classroom content?

Some teachers express the opinion that the teacher should use his/her individual judgement to determine what should be taught in the classroom - that rather than conforming to a set of externally imposed standards, the teacher should set his/her own educational objectives. For example:

The imposition of academic standards is a national as well as a local issue. Ravitch notes that the decade of the 1970s and into the 1980s ...

Ravitch goes on to point out that...

No further to attempt to justify the need for academic standards in our public schools will be made here, except to agree with Ravitch that "The actual practice of setting standards is now recognized by virtually everyone as a function legitimately lodged with the states."(Ravitch 1996).

I-2) Are the current IGOs flawed?

Many Preston County educators acknowledge that standards are needed, but object to this particular set of standards - the West Virginia State Board of Education Instructional Goals and Objectives, the IGOs - as the standard to which we should align our curriculum.

The "flawed IGO" objection generally takes one of several forms.

a) The IGOs were specifically written to match the content of the Stanford 9 test. Rather than defining a good and efficient curriculum for a given grade level, the IGOs were designed to match areas tested on the Stanford 9 - "A faulty curricula based on a 40 question test."

b) The IGOs (and the test which is intended to test the mastery of those IGOs) are too advanced for the grade level under consideration. c) Too many IGOs -- IGOs may be appropriate, but there is more material to cover in a year than is possible. Two questions pertaining to the "flawed IGO" issue should be addressed:

a) Were the IGOs designed specifically and narrowly written to lead to success on the Stanford 9 test?

b) Do the IGOs outlined in West Virginia State Board of Education Policy 2520 constitute a rational standard on which to base public education?

I-2-a) Were the IGOs designed specifically and narrowly designed to lead to high scores on the Stanford 9 test?

This question echos the old conundrum "Which came first - the chicken or the egg?" "Does the test drive the curriculum, or does the curriculum drive the test?" How were the IGOs developed? Were they in fact taken from the latest editions of the Stanford 9 examinations? This question was posed to Mr. William Luff, Associate Superintendent, West Virginia Department of Education. Mr. Luff notes... In other words, the West Virginia State Board of Education IGOs were developed by leading educators in our state.

In reality, the question "Whence cometh the IGOs" can be viewed as moot if we recognize that the IGOs should be judged on their own merit. Either they are good or they are bad. West Virginia State Board of Education Policy 2520 either constitutes a sound basis for our curriculum or it doesn't. If the IGOs are sound, then it makes no difference whether they were formulated after years of extensive study by national experts, drawn up by the faculty senate, or even, yes, taken from the current edition of the Stanford 9. The same observation may be made if they are judged to be unsound. Either they are good or bad - how they were derived is not the issue.

I-2-b) Do the IGOs outlined in West Virginia State Board of Education Policy 2520 constitute a rational standard on which to base public education?

Standards such as the West Virginia State Board of Education IGOs are of necessity formulated by committee. It is to be expected that individuals may agree with some of the standards and disagree with others. Cizek has noted that...

That said, is there any peer review data which might shed light on questions regarding the suitability of the West Virginia State Board of Education IGOs?

Education standards in all 50 states have recently been reviewed by the American Federation of Teachers (AFT), published in their report "Making Standards Matter 1999." The authors of this report note that...

To assess the progress being made in the setting of academic standards in America's public schools, the AFT examined each state's achievement standards in the four core academic subjects - English, math, science, and social studies. An assessment of each state's standards was made to determine if said standards were "clear, specific, and grounded in content." The following criteria were used:

1. Standards must define in every grade, or for selected clusters of grades, the common content and skills students should learn in each of the core subjects.

2. Standards must be detailed, explicit, and firmly rooted in the content of the subject area to lead to a common core curriculum.

3. For each of the four core curriculum areas, particular content must be present.

4. Standards must provide attention to both content and skills.

Their question, "Are the standards clear, specific, and grounded in content," was answered either yes or no for each state for each of the 4 core subject areas for each of the 3 levels - elementary, middle, and high school. This resulted in 12 separate determinations for each state. For a state to be judged as having quality standards overall, at least nine of the 12 determinations must have been judged to be clear and specific and include the necessary content. (American Federation of Teachers, B, 2000).

West Virginia's IGOs earned a quality rating from this group, adjudged as having met criteria in 10 of 12 determinations. The two areas which were judged as not having met criteria were elementary and high school social studies, noting "vague U.S. and world history standards." (American Federation of Teachers, C, 2000). Of interest, one of the conclusions drawn from the entire study is that "most states have more difficulty setting clear and specific standards in English and social studies than in math and science," and that "social studies standards are particularly weak across the states; these standards tend to lack specific references to U.S. and/or world history. Only six states have social studies standards that are clear, specific, and grounded in content across all three levels of schooling." The authors go on to speculate that "the overall weakness of the social studies and English standards may be due to the controversy surrounding efforts to develop national standards in these subjects by the subject-area professional associations." (American Federation of Teachers, D, 2000). Of final note, only 2 states earned a "perfect 12" by this group - Arizona and California. Six states scored 11 and West Virginia was one of 7 states scoring a 10. One might conclude that West Virginia's IGOs, as a document describing academic standards in schools, was ranked among the top 15 in the United States. (American Federation of Teachers, E, 2000).

In another independent analysis of state standards - "Quality Counts 99" - the AFT's assessments of state standards were expanded upon by considering each state's assessment and accountability process. West Virginia's public education system was awarded an "A-" in the area of "academic standards, assessments, and accountability," and was ranked 6th in the nation in this regard. (Education Week, 1999).

Finally, the quality of states' academic standards has also been evaluated in studies sponsored by The Thomas B. Fordham Foundation and released as "The State of State Standards 2000." (Finn, 2000). This group evaluated the academic standards of the various states in five "core subject" areas of English, History, Geography, Mathematics, and Science according to published criteria. This group's assessments are summarized in the following table.

The State of Standards 2000
Grades assigned to West Virginia State Board of Education Instructional Goals and Objectives (Finn 2000)
STATE ENGLISH HISTORY GEOGRAPHY MATH SCIENCE CUM. GPA GRADE RANK
WV B C B B F 2.2 C+ 14
USA C- D+ C- C C 1.72 C-.
RANK. 7th among states. Among top 8 in nation 6th among states. 6th among states 33rd among states. 27 states received a D or better...

As the data above indicates, West Virginia IGOs were rated among the top ten in the areas of history, geography and mathematics; well above average in English (no rank given in this particular section) and unsatisfactory in science. Overall, West Virginia's IGOs were ranked above the standards used in other states, in 14th place overall.

These studies and assessments represent the judgements of well-credentialed independent educators from across the nation. It is stipulated that these assessments represent their opinions, and opinions are always subject to question. For example, it should be noted that the science curriculum assessment offered by the Fordham report is credited to only one person (assessments in other areas were performed by a panel of experts) and thus could be shaded by one person's singular preferences. (For example, the science IGOs were soundly criticized, among other reasons, because of the "bolding" of selected IGOs implying that other objectives were less important. Interestingly, other evaluators in other subjects praised the identification of emphasized IGOs in that it provides clear and concise guidance to the teacher). Furthermore, it may be unclear if the various "evaluators" were basing their assessments on the latest documents.

It is stipulated that any such set of Instructional Goals from any state should go through a continual process of review, refinement, and improvement. It may be noted that the current West Virginia State Board of Education IGOs do go through a continual process of review and refinement, and even at the time of this writing the K-3 English Language Arts IGOs are under review for proposed revision. Nevertheless, it is difficult to accept the proposition that the West Virginia State Board of Education IGOs are without merit - as many would have us believe. The available evidence indicates that West Virginia's IGOs were formulated by leading educators in our state and have been the subject of detailed analysis by other independent experts. The West Virginia IGOs have been found to be at least suitable if not among the best in the nation for use as a set of standards on which to base our public education.

II. Is the emphasis placed on standardized testing in Preston County schools inappropriate?

In order to address this question, one must ask two questions:

1) What is the purpose of standardized testing in West Virginia public schools?

2) Are the stated purposes for administrating the test valid?

II-1) What is the purpose of standardized testing in West Virginia public schools?

On the surface, the reasons for giving tests to school students would seem to be self evident. Most would agree that testing is required to determine if a student has mastered the subject matter and assign a grade to the student's performance. Cizek has noted that standardized achievement tests are given in America for a variety of reasons, running the gamut from a simple classroom test to large scale achievement tests such as the National Assessment of Educational Progress (NAEP) designed to assess the country's overall educational health. "Near the middle of the continuum are state-level competency tests used by many states as gatekeepers' for grade-to-grade promotion or graduation" and/or used (or mandated) by parents and policy makers "as markers in efforts to improve the education of American children."(Cizek, 1998).

Why are standardized tests administered in West Virginia public schools? The following tables summarize the implications of standardized test results in West Virginia. This data was gleaned from the West Virginia Board of Education Training Manual and Handbook for Education Performance Audits (Accreditation Manual), West Virginia Board of Education, Policy 2510, and Chapter 18A-3A-2B of the West Virginia State Code.

Implications of Standardized Test Scores in West Virginia Public Schools
. Identify seriously impaired schools Identify students for skill improvement efforts Graduation warranty Graduation warranty
Test score The total basic skills score for one or more grade levels in grades 3 through 11 is at or below the 30th percentile in the most recent year for which data are available and one of the two preceding years. The student scores below the 50th percentile in the areas of reading, mathematics, and/or language arts at grade 8 or above.. The student scores at the 50th percentile or greater at grade 11 in the areas of reading, mathematics, and language. The student scores at the 70th percentile or greater at grade 11 in the areas of reading, mathematics, and language.
Results in... The school shall be considered "seriously impaired" Student is placed in a skills improvement program. Upon graduation, the student is issued a "warranty" indicating competency in basic skills. Upon graduation, the student is issued a "warranty" indicating competency for advanced work place positions and entry into post-secondary education.
Immediate effect The West Virginia Board of Education shall appoint a team of improvement consultants to make recommendations within sixty days of appointment for correcting the impairment. Principal required to attend the next Principals Academy.* Written curriculum must be designed to implement the skills improvement program which must concentrate on improving deficiencies. The warranty indicates that the graduate has mastered the basic skills of reading, mathematics, and language at a level appropriate for an entry level position in the workplace The warranty indicates that the graduate has mastered the basic skills of reading, mathematics, and language at a level appropriate for advanced work place positions and entry into post-secondary education.
Possible ultimate effect If progress is not made in correcting impairments, the State Board of Education may ultimately "intervene in the operation of the school system to cause improvements to be made" which may include a variety of personnel actions up to and including "declaring that the office of the county superintendent is vacant." Additional time and resources spent by teachers and school administrators to analyze student areas of weakness and design appropriate curriculum for reteaching. Student will be given opportunity to bring basic skills scores to the 50th percentile. If the student does not function successfully, the graduating school system will provide additional instruction in the basic skills at no cost to the student, employer, or post secondary institution. Warranty in effect for five years. If the student does not function successfully, the graduating school system will provide additional instruction in the basic skills at no cost to the student, employer, or post secondary institution. Warranty in effect for five years.
Reference WV BOE Accreditation Manual: 8.1, 8.5.1, 8.5.2, 8.5.3, 10.6.1, 10.6.2, 10.6.3 WV BOE Accreditation Manual: 5.6.22 WV BOE Policy 2510: 8.2.8 WV BOE Accreditation Manual: 5.6.23, WV BOE Policy 2510: 5.49, 8.2.7, WV BOE Accreditation Manual: 5.6.23, WV BOE Policy 2510: 5.49, 8.2.9,

. A factor in school accreditation A stated objective A stated objective
Test score A minimum of 50% of the school's students in grades 3 through 11 perform at or above the 3rd quartile in total basic skills; and no more than 15% of the students perform within the 1st quartile; or the percentage of students performing within the 1st quartile is decreased based on two of the most recent three years The percentage of graduates attaining the 50th percentile in reading, mathematics, and language is at or above 60%. The percentage of graduates attaining the 70th percentile in reading, mathematics, and language is at or above 33%.
Results in... The school should address the area(s) in the Unified School Improvement Plan or equivalent strategic plan. No particular consequences apparent at this time. No particular consequences apparent at this time.
Immediate effect Written curriculum must be designed to implement the skills improvement program which must concentrate on improving deficiencies. Principal may be required to attend the next Principals Academy. * Applies to students entering the 9th grade in the fall of 1998. Applies to students entering the 9th grade in the fall of 1998.
Possible ultimate effect Additional time and resources spent by teachers and school administrators to analyze student areas of weakness and design appropriate curriculum for reteaching. Could lead (along with other factors) to less than full accreditation for the school if not corrected; could lead to "seriously impaired status." ..
Reference WV BOE Accreditation Manual: 4.1 WV BOE Accreditation Manual: 4.12 WV BOE Accreditation Manual: 4.12

* Principals are required to attend the Principals Academy every four years regardless of standardized test performance.

It may be concluded that in West Virginia public schools, standardized test results are used ...

a) To serve as one factor in the overall accreditation of a school (and school system).

b) To identify seriously impaired schools (and school systems).

c) To assess a student's mastery of the IGOs.

d) To identify students with academic weaknesses in basic skills - reading, mathematics, and language arts - and to provide specific information to teachers concerning those areas in which the student needs additional instruction.

e) To identify students who will be placed in a skills improvement program.

f) To serve as the basis for the school system to certify its graduates as proficient in basic skills at two levels - the graduation "warranty."

Of note, West Virginia does not require a threshold performance on a standardized test for a student to advance a grade level or graduate as do some states (Texas and North Carolina for example). Additionally, there is no indication that standardized test results are used in the individual classroom teacher's evaluation. West Virginia Board of Education policy 5310 - Performance Evaluation of School Personnel - makes no mention of standardized test results as figuring in to the teacher's performance evaluation (in fact, the word "test" does not appear in the policy).

2) Are the stated purposes for administering a standardized test valid? One might suppose that there would be little debate concerning the validity of at least some of the issues which standardized testing in West Virginia public schools is intended to address. However, such is not the case. Some teachers object to the use of standardized test scores to draw any conclusions about the school's performance. For example...

Some may object to the very concept of attempting to "re-teach" students with documented deficiencies in basic skills. For example... These objections aside, it is difficult to justify the position that the individual school, the county school system, or the West Virginia public education system should be free from any attempt to evaluate effectiveness. In that schools exist to educate students, then it must follow that schools should be judged at least in part if not exclusively on how well they educate students. School performance must be tied to student performance.

The degree to which schools and teachers should focus on re-teaching students who are academically deficient may be debatable. What is "re-teaching" if it is not "teaching a student something that he/she doesn't know or hasn't mastered?" If we object to teaching a student something he/she doesn't know, then the whole idea of education would seem pointless. And how can we assess what a student does or doesn't know without some objective test or measurement?

III. If the issues addressed by standardized testing are valid, is the SAT 9 an appropriate tool with which to address those issues?

If we may stipulate that it is necessary to assess school and individual student performance, and that some mechanism should be in place to provide objective identification of deficiencies, and that there should be an attempt to correct identified deficiencies, then the question presents: Is the Stanford 9 an appropriate tool with which to address those issues?

Several teachers have expressed the opinion that the SAT 9 in fact is not an appropriate tool with which to assess the mastery of IGOs by Preston County students. For example...

In addition, other objections to the SAT 9 examination seem to be indirectly related to its use as a tool to measure mastery of the student's mastery of the IGOs. Fifteen teachers objected to the use of the SAT 9 scores as a tool to assess the performance of the teacher, the principal, or the school. For example... Ten teachers objected to the use of the SAT 9 test to plan skill improvement programs for students. For example... The objections specifically directed against the Stanford 9 test as an instrument of assessing mastery of the West Virginia State Board of Education IGOs fall into several categories:

1) The SAT 9 examination and the manner in which it is administered is technically flawed. Examples given include...

2) The SAT 9 is not properly keyed to the West Virginia State Board of Education IGOs.

3) SAT 9 results are neither an appropriate factor to consider in school accreditation nor to identify seriously impaired schools.

4) The SAT 9 is not an appropriate tool with which to assess a student's mastery of IGOs.

5) The SAT 9 is not an appropriate tool to identify students with deficiencies in basic skills - reading, mathematics, and language arts.

6) The SAT 9 is not an appropriate tool to identify students who will be placed in a skills improvement program.

7) The SAT 9 is not an appropriate tool to provide specific information to teachers concerning those areas in which the student needs additional instruction.

And, for completeness sake although not commented on in the recent survey,
8) Should SAT 9 results serve as the basis for the school system to certify its graduates as proficient in basic skills at two levels - the graduation "warranty?"

III-1) Is The SAT 9 technically flawed?

Any test, whether it is a nationally normed "standardized test" or the weekly classroom math quiz, may contain unfair or ambiguous questions, may include items which are "too difficult," or may be administered under suboptimal conditions. Any test, whether devised locally or at the state level, whether it is criterion referenced or norm referenced (terms discussed later in this report), or whether it is select-response (e.g., multiple-choice, matching, true/false) or constructed-response (e.g., essay, short-answer, speech, project) will be subject to these same criticisms. These are valid concerns; however, it is doubtful that any test could be devised which would be free from such problems. Standardized tests such as the SAT 9, CTBS, etc, are reported to have been extensively analyzed for such errors and claims are made by the publishers that such errors are minimal. Whether a test could be devised at the local or state level which would be free of such errors is open to speculation. Were the SAT 9 scrapped in favor of another assessment tool, one might predict that these same criticisms might persist.

Problems with infrequent "norming" of the test and resultant score inflation ("Lake Wobegon effect") have been well described (Cizek, 1998). In addition, reports abound which suggest that many well known national "standardized tests" may be biased against a particular racial, gender, or socioeconomic group. The Texas Assessment of Academic Skills (TASS), a criterion-referenced "high stakes" test ("high stakes" in that satisfactory performance on this exam is required to advance to the next grade level or graduate) has been the subject of two separate lawsuits based on racial discrimination (Phelps, 1999). The TASS case is interesting in that it is a test locally developed in Texas and specifically written to assess Texas instructional goals. One may speculate that even if there were a locally developed "West Virginia test" such "discrimination" charges might still surface.

Choosing a "norm group" which is not representative of the tested group is a significant issue. An example of the difficulties which may arise in interpreting scores from norm referenced tests is illustrated by examining Preston County standardized scores from the past 22 years (Appendix One). Preston County scores may be compared with West Virginia's average county score for each year for grades 3, 6, 9, and 11. Analyzed in this fashion, the "reference group" for a given year is not that particular test edition's "national norm group," but rather the performance of students in other counties in West Virginia. This would seem to control for socioeconomic differences between the tested group and the reference group. This approach would also seem to control for infrequent or unexpected "re-norming" of the test. The change from the CTBS test to the SAT 9 which occurred in 1997 is controlled for, in that all counties made the change the same year, and analysis is based on performance compared to other counties in the state - rather than the specific test's "national reference group."

When Preston County's total basic skills scores are analyzed in this fashion, the following picture develops: Third grade basic skills scores have been below the state average every year since 1981, but have been over the 50th "national percentile" every year during the same interval (with the exception of the 1989-90 year when no testing was done). Similarly, sixth grade basic skills scores have been below the state average every year since 1979, but above the "50th national percentile" every year except for 1978. Eleventh grade basic skills scores have only reached the state average one year (1994) but have been above the 50th percentile nationally 7 out of the last 22 years, and 6 out of the last 7 years.

Analysis of test scores in this fashion demonstrates several caveats in interpreting test scores. For instance, one might be heartened to find that Preston County's 3rd grade basic battery score in 1999 was at the 56th percentile compared to the "national norm reference group." However, the state average that year for the same group was the 63rd percentile. While Preston County scores could be said to be "above the national average," those same scores were well below the state average. The average scores for West Virginia counties at all grade levels have been above the 50th percentile compared to the "national norm reference group" every year since 1987 - The "Lake Wobegon effect - where all students are above average." Of further note, Preston County's third grade score in basic battery apparently "improved" from the 52nd percentile in 1997 to the 55th percentile in 1998. While this improvement is notable, equally notable is that the state average score for the same group improved from 58 to 62 in the same interval. One could say that Preston County's third grade scores improved three points in 1998, or one could say with equal authority that our third grade students fell behind their West Virginia peers by one point the same year.

Comparing our student performance only to those in other counties in West Virginia gives only an incomplete picture, as will be discussed later in this report. Nevertheless, such technical objections to any given test regarding its format, difficulty, reliability, and validity will continue to be raised. Such objections may be easy to raise but difficult to remedy.

II-2) Is the SAT 9 properly keyed to the West Virginia State Board of Education IGOs?

If we were to take the assertions of many of our teachers at face value, such as...

... then it would seem self-evident that the answer is yes -- the test is aligned with the curriculum. If one makes the objection that the entire curriculum is based around this test, then one concedes that the SAT 9 is a specific test for the current IGO based curriculum.

"How can it be that an off-the-shelf' test such as the SAT 9 can prove to be keyed to the West Virginia State Board of Education IGOs?" This question was posed to Mr. William Luff, Associate Superintendent, West Virginia Department of Education. Mr Luff notes:

It may at first glance seem curious that IGOs specific for West Virginia public schools would prove to be properly keyed to an "off the shelf" testing instrument such as the SAT 9 without performing significant alignment of the IGOs to that particular test. Cizek has noted that "to remain competitive, commercial publishers of norm-referenced tests have retained their traditional goal of providing comparative information, but have also begun marketing tests that are keyed to the content standards promulgated by professional organizations such as the National Council of Teachers of Mathematics (NCTM, 1989), and that are capable of providing diagnostic information about students' areas of strength and weakness along the lines of criterion-referenced tests. These diverse aims blur traditional terminology and conceptions of NRTs, CRTs, and SRTs." (Cizek, 1998). In that the West Virginia State Board of Education IGOs are to a large part based on "national standards" and that the major commercial standardized tests are increasingly keyed to those same standards, it should not seem curious at all that the test could very well match the IGOs after minimal further alignment of the IGOs for the specific test.

Furthermore, the degree to which the assessment tool (the test) achieves critical importance is relative only to the implications which attend success or failure on the test. For example, Texas public school students must pass the Texas graduation test to receive a diploma. In this case the mechanics of the test, including how well it reflects the curriculum offered the student, achieve critical importance since graduation is directly tied to performance on the test. (No doubt explaining why such "high stakes" tests are often the subject of litigation). In West Virginia, the implications of success or failure on the SAT 9 do not approach this level of significance. For the most part, poor performance on this test only means that curriculum must be adjusted for the school and/or for the student. However, in that the curriculum is based on demonstrably valid IGOs, how can curriculum adjustments based on valid IGOs possibly be viewed as objectionable? The rhetorical question arises: "How is it a bad thing to teach a student something he/she doesn't know?"

The debate may continue concerning the degree to which the SAT 9 is accurately keyed to the West Virginia State Board of Education IGOs. However, most would agree that it is the Instructional Goals - the standards - which define our attempts to educate our children and which are of primary importance.

III-3) Are SAT 9 results an appropriate factor to consider in school accreditation and to identify seriously impaired schools?

One might concede that as far as standardized tests go the SAT 9 is in and of itself not a bad test. However, one might correctly raise objections to its use for specific purposes - "A good tool but used for the wrong purpose." The objections addressed above concerning tests in general - technical flaws and proper keying to the IGOs - may apply to any type of test. In order to further analyze the suitability of a test such at the SAT 9 for a given objective, it is necessary to briefly review the various types of standardized tests, note several definitions and distinctions among the tests, and examine purposes for which various types of tests might be used.

Norm-referenced tests (NRTs) are designed to describe relative rank among students at a particular grade level, providing information about how a student's performance compares with a reference group of students called the norm group. The Stanford 9 (and most "achievement tests") are examples of NRTs. For example the national standardization sample (norm group) for the current edition of Stanford 9 is based on spring and fall 1995 testing, with between 500,000 and 600,000 students participating (Harcourt, Stanford 9 Technical Information, 2000). A student who performs at the 50th percentile on a norm-referenced test may be said to have performed as well as or better than 50% of the students in the norm group who took the test. Other examples of NRTs include the Comprehensive Test of Basic Skills (CTBS), the California Achievement Test, and Terra Nova, all published by CTB/McGraw -Hill; the Metropolitan Achievement Test, published by Harcourt-Brace Educational Measurement (as is the Stanford 9), and the Iowa Tests of Basic Skills, published by Riverside Publishing. "Together, these tests substantially define large-scale, norm-referenced achievement testing in the United States. Nearly 60% of the state-mandated achievement tests used across the country are commercially published, with the achievement tests of these three major publishers accounting for 43% of all system-wide tests" (Cizek, 1998).

As an aside, a commonly heard observation concerning norm-referenced tests is that the nature of such tests dictates that there will be questions on the test which the test taker will not be expected to be able to answer; that, in order to allow for a wide dispersal of scores and a uniform "curve," there will be questions intentionally placed in such tests which are far above the expected skill or knowledge level of the test taker. Of note, Cizek's definition includes the proviso that norm-referenced tests "are constructed to cover content that is considered fairly universal at each grade level." (Cizek, 1998).The publishers of the Stanford 9 assert that "All items (in the current Stanford 9) are grade-level appropriate so that they are within the experience of students taking the test" (Harcourt, Stanford 9 Overview, 2000).

Criterion-referenced tests (CRTs) are intended to "gauge whether a student knows or can do specific things" (Cizek, 1998). They are based on content judged to be important in regards to the area being tested, and criteria for success are established in a judgmental fashion. An example of a criterion-referenced test would be the recertification exam given by the American Board of Surgery. The questions on this test are based on knowledge that a practicing surgeon is expected to have at his/her command. Success (recertification) or failure is based on how the individual scores on the test - 70% is pass, less than 70% is fail. Theoretically, everyone who takes the test could pass, or everyone could fail. An ordinary classroom test is an example of a criterion-referenced test. "High stakes" tests (see below) are generally criterion-referenced.

Standards-referenced tests (SRTs) are similar to criterion-referenced tests, with an additional attempt made to "link students' scores to concrete statements about what performance at the various levels means" (Cizek, 1998). Content standards are devised to represent "what the student should know" and performance standards are developed to describe "how well students need to be able to perform on a set of content standards in order to meet pre-defined specified levels of expected performance." An example of a standards referenced test is the National Assessment of Educational Progress (NAEP), which reports student's performance as Basic, Proficient, and Advanced.

"High Stakes" test - A test in which significant consequences are associated with performance on the test. A test to determine if a student will graduate or pass to the next grade would be a "high stakes" test. A test which one must pass in order to obtain licensure or qualify for a specific job or occupation would be a "high stakes" test.

"Low Stakes" test - A test in which serious consequences do not follow from the performance on that single test. A weekly classroom quiz which is averaged with other tests to arrive at a particular grade in a course would be a "low stakes" test. The NAEP (National Assessment of Educational Progress) is described as a "low stakes" test for the individual student in that individual student scores are not even reported.

Comparisons of three types of "standardized tests"
From Cizek, 1998
. Norm-referenced tests
(NRTs)
Criterion-referenced tests
(CRTs)
Standards-referenced tests
(SRTs)
Answers the question... "Where does this student stand compared to others at his or her grade level?" "Can the student demonstrate knowledge or skill to a specified level?" "How would this student's performance be rated, according to pre-set standards?"
Scores are reported as... Percentile rank Generally, pass or fail Terms such as "Beginning, Proficient, Expert," or "Good, Better, Best." A, B, C, D, F
Key point A student's performance "does not necessarily indicate anything about the knowledge or skills a student has mastered, nor whether scoring at the reported percentile represents acceptable progress, nor whether instruction has been of sufficient quality, nor whether the content is sufficiently challenging or the outcomes measured desirable." A student's performance "does not necessarily indicate anything abut whether the student is better or worse than average, nor whether the criteria represent noteworthy expectations given the student's age or grade level, nor whether the content is challenging or the outcomes measured desirable. A student's performance "(does) not necessarily indicate anything about whether the student is better or worse than average, nor whether the criteria represent noteworthy expectations given the student's age or grade level, nor whether the content standards associated with the performance are particularly challenging.
Caveats and unanswered questions "Performing at grade level means only that a student is performing about as well as the average performance of the norm group; no evaluation is made regarding whether the norm group as a whole is performing superbly or terribly."

"A student performing at grade level' on an NRT could be well-prepared for global competition or woefully lacking in even the most rudimentary areas."

Because the criteria (what the student is expected to know) are established in a subjective manner, results are linked to the expectations of those who establish the criteria and write the test. "Because performance standards ... are established in a subjective manner, classifications such as "Proficient or Expert are inextricably linked to the conceptions of competence held by those who establish them. If those who set the standards have high expectations for performance, a classification such as "proficient" might mean magnificent accomplishment; if the standard-setters have low expectations, the same classification could represent mediocrity."

The question at hand is, "Are SAT 9 results an appropriate factor to consider in school accreditation and to identify seriously impaired schools?"
In theory, the answer might seem to be "no." A criterion-referenced test rather than a norm referenced test such as the SAT 9 might be advocated when testing is "high-stakes" (and for the moment we shall stipulate that this accreditation process is "high-stakes"). All schools should have the opportunity to be accredited based on their own merit and compliance with established standards. There should be no stipulation that a certain percentage of schools should receive less than full accreditation, and success should not depend on the relative performance of other schools during a given assessment period. Under this "ideal" model, curriculum goals (IGOs) for each grade level would be established and a test devised to assess student mastery of those goals. Students would take the test and each student would either pass or fail based on his/her own performance (percentage of questions answered correctly). Each student could pass, or each student could fail. Average student scores for each grade level could be devised, and the performance of each grade level and/or the school as a whole could be assessed and rated as "satisfactory" or "unsatisfactory" based upon average student score and established performance standards.

In reality however, the SAT 9 does seem to be an appropriate test for this purpose. Perhaps in an ideal world, each school would set its own curriculum goals and devise its own assessments, as was suggested.

Unfortunately, such an assessment program would only reflect the expectations of those who establish the instructional goals and write the test. Such assessments "would not necessarily indicate anything abut whether the school's performance is better or worse than average, nor whether the instructional goals represent noteworthy expectations given the student's age or grade level, nor whether the content is challenging or the outcomes measured desirable." (Cizek 1998). Furthermore, as distasteful as it may be for the teacher in the classroom whose academic credentials may be equivalent or superior to those who set the standards, it is the State of West Virginia, rather than the school or the county which is constitutionally charged with assuring a "thorough and efficient education" for West Virginia children. It is the state standard which each school is obligated to recognize. It is the state which may select the tool with which to assess the school in regards to compliance of its standards.

If the state standards match the local school standards, then one assessment tool would seem to suffice. However, in our ideal world where the school establishes its own standards, a second testing program would be required to assess the school's performance according to state standards. This second test should be a criterion-referenced test developed by the West Virginia Department of Education to assess the student (and school) against state expectations. Every school could theoretically be accredited or not accredited. Unfortunately, the same problems would exist with this state assessment as with the county assessment - the state criterion-referenced assessment test would not necessarily indicate anything about whether the school and state's performance is better or worse than average, nor whether the instructional goals established by the state represent noteworthy expectations given the student's age or grade level, nor whether the content is challenging or the outcomes measured desirable. (Cizek 1998).

This scenario finally plays out when the student each year would need to take yet a third standardized test, such as the NAEP, to determine if the performance of the school, as defined by state standards, meets national standards. In summary, in this "ideal" world the student would take not one but three (or more) major standardized tests each year, each subject to the same criticisms which are currently leveled against the SAT 9.

One may correctly conclude that although this might represent the "ideal," the cost involved in terms of dollars and time spent on testing would make this approach prohibitive.

Cizek has noted that "public demands for accountability and legislative responses tied to testing have created the need for tests that serve many masters and purposes. Responding to pressures to address these diverse concerns, commercial test publishers have attempted to develop products that attempt to serve multiple purposes." (Cizek, 1998). In that the SAT 9 is a norm-referenced test, an additional use of the test does present. Information can be gleaned from the test regarding the student and school performance as it relates to other students and schools throughout the nation. Some measure of the rigor and suitability of West Virginia IGOs may be made in relation to educational standards which exist in other states. It is stipulated that "performing at grade level means only that a student is performing about as well as the average performance of the norm group (and that ) no evaluation is made regarding whether the norm group as a whole is performing superbly or terribly." (Cizek 1998). Nevertheless, the SAT 9 does provide school assessment information which not only can be correlated to statewide standards but also indicates performance relative to a nationally representative group.

It is stipulated that there is a theoretical disadvantage to the use of the SAT 9 for performance assessment, in that the SAT 9 is a norm-referenced test. As has been noted, in "high stakes" testing there should be no expectation that a certain number of those tested should fail. As an example, West Virginia State Board of Education accreditation standards dictate that for a given school grade level, performance below the 30th percentile in grades 3 through 11 in one or more grade levels in the most recent year for which data are available and one of the two preceding years mandates school classification as "seriously impaired." On the surface, one might conclude that 30% of schools in West Virginia would be relegated to "seriously impaired" status each year. However, this conclusion is faulty.

First, it is debatable whether the SAT 9 test is truly a "high stakes" test for the school. It is true that "seriously impaired" or other less than full accreditation status is attached to test results, but the next question is "So what?" What are the implications of less than full accreditation? The only immediate implication is that the school personnel must set about to write improvement plans and make curriculum adjustments. Investigators and "improvement teams" may appear on the scene. Principals and teachers may be directed to modify their instructional techniques. Teachers and principals must spend additional time preparing reports and making such curriculum improvements as directed. Principals may be required to attend the Principal's Academy a year or two sooner than they otherwise would have. Prides will be wounded. If improvement is not noted after months of intervention and curriculum adjustment, teachers and principals might be transferred or given improvement plans and/or other personnel changes could be made. However, rarely does the process get farther than the paperwork and curriculum alignment stage. A pronouncement of "seriously impaired" or "less than fully accredited" cannot singularly lead to termination of an employee, a reduction in pay, or have any other adverse consequence on the school personnel.

Secondly, the assertion that "thirty percent of the schools are relegated to seriously impaired status" is flawed. A school or county's grade level percentile rank as reported by SAT 9 results is relative to a national norm reference group. Thirty percent of grade level scores in West Virginia are not at the 30th or lower national percentile. As has been noted earlier, the average percentile rank for grade levels in West Virginia is well above the 50th national percentile. The 1999 average county grade level score on the SAT 9 was 61 (both mean and median) with a range of 58 to 65 for each of grades 3-11. The only way that these scores could be manipulated to result in 30% of schools failing to meet a certain criteria would be if all West Virginia school grade level scores were plotted and a second norm curve relative to West Virginia grade levels was derived. In this case, a school grade-level score of 30th percentile relative to the national norm might in reality be at the 5th or 10th percentile relative to state norms (if indeed any grade levels in West Virginia actually scored that low).

Those who condemn the use of a norm-referenced test in "high-stakes" testing must accept the alternative - a criterion-referenced test. This would require that schools and grade levels be assessed according to absolute performance on a criterion-referenced test devised by the West Virginia Department of Education. Every school could pass or every school could fail. Criterion-referenced tests, like norm-referenced tests, are based on content judged to be important in regards to the area being tested. However, with criterion-referenced tests, criteria for success are established in a judgmental fashion. Those criteria will be open to question, particularly if a significant number of test takers fail (or pass) the test. One may speculate that if such a change to criterion-referenced testing with minimum competency standards were made, the result would be simply trading one set of objections for another.

As a final note, Cizek has observed, that "because no single approach currently provides a complete picture of student achievement, those responsible for mandating, conducting, or interpreting the results of testing programs must demand as much standards-based and norm-referenced information as possible." (Cizek, 1998). One may ponder the following question: What happens if, during the course of the year, everyone in a particular class has a "failing" (below 70% performance) grade? Generally, the grades are "curved." In other words, the criteria for success are adjusted by applying some norm-referenced standard. In the final analysis, one might contend that all criterion-referenced tests are in reality norm-referenced tests, in that the criteria for success are calibrated by norm-referenced data.

In summary, we may conclude that schools and teachers are obligated to design curriculum around the IGOs as these represent the standards appropriately established by the state. The SAT 9 is keyed to those standards. The state may attach such criteria for "success" as it chooses. Less than satisfactory performance by the school in relation to the SAT 9 test results in exhortations to the school/teacher to give more attention to achieving established educational goals, but has no other immediate detrimental effect. It is debatable whether this even amounts to "high stakes" testing. SAT 9 results are an appropriate factor to consider in the school accreditation process and to identify seriously impaired schools.

III-4) Is the SAT 9 an appropriate tool with which to assess a student's mastery of IGOs?

Perhaps the real question might be "Is (or should) the SAT 9 (be) the only tool which should be used to assess a student's mastery of the IGOs? This might be a philosophical question, but in reality the answer is no - in West Virginia the SAT 9 is not the only tool used to assess the student's mastery of the IGOs. With the exception of the skills improvement program and the graduation warranty (discussed below), performance on the SAT 9 test seems to figure little into the overall assessment of the student's academic performance. Promotion to the next grade level is still based on performance standards set by the classroom teacher. Graduation is still dependant on the composite of those teacher chosen standards. Standardized test performance does not figure in to a student's eligibility for participation in extra-curricular activities as does the teacher-chosen performance standards.

Some students do poorly on the SAT 9 test, some do average, some do well. The next question is another "So what?" What are the implications of a student performing below or above an admittedly arbitrary standard on the SAT 9?

a) The student may be placed in a skills improvement program

b) Certain pronouncements will be attached to the student's graduation credentials - the graduation "warranty."

In order to further discuss this question, one must analyze these implications.

III-5) Is the SAT 9 an appropriate tool to identify students with deficiencies in basic skills - reading, mathematics, and language arts?

This is a difficult question to address. "Deficiencies" is entirely a subjective pronouncement. Whether a student is "deficient" or "proficient" depends on how the terms are defined and what judgmental standard is used. One educator may be of the opinion that a given student's skills are deficient, while another educator might feel that the same student's skill are quite satisfactory. This survey has uncovered opinions from several high school teachers who indicate that many students advance to high school deficient in the basic skills. Perhaps the student's elementary and middle school teachers might disagree. It all boils down to the following questions: "What do we expect a student to master at a given grade level, what performance standards shall we set, and what are the implications for a student who fails to meet those standards? How do we define 'deficient' and what are the implications of being identified as deficient?'"

Answering those questions is the intent behind educational standards (the IGOs) and the use of a uniform tool as one factor in assessing mastery of those standards. Performance standards are judgmental in nature, and it is the state which may set those standards. The validity of a given set of performance standards and the tool used to assess those standards is relative to the implications which attend success or failure in meeting those standards. For instance, one could set a standard: "All classroom teachers in West Virginia will pass minimum competency tests." If there are no particular implications attendant to that standard, then this standard might correctly be viewed as a minor annoyance. If it were to turn out that significant numbers of teachers were to be terminated next year because of failure to demonstrate proficiency, then, of course, the validity of the standard and the mechanics of the competency test come seriously into question.

Ultimately, the question comes down to another "So what?" Students who score below the 50th percentile in basic skills will be placed in a skill improvement program. So what?

III-6) Is the SAT 9 an appropriate tool to identify students who will be placed in a skills improvement program?

III-7) Is the SAT 9 an appropriate tool to provide specific information to teachers concerning those areas in which the student needs additional instruction?

In that the end result for the student is that he/she is being given additional instruction in areas of deficiency, then it would seem that there should be no objection to this use of the test for that purpose. The student is not being punished. No one should disagree with the concept of teaching a child something he/she doesn't know. Nevertheless, objections may be raised by parents and well as educators:

1) My child has made A's and B's all his life in English. Now you want to put him in remedial "bonehead" English. My child is not dumb. How dare you suggest my child is stupid!

We should identify skill improvement efforts for what they are - efforts to improve the student's skills in a particular area. We should avoid negative connotations which are often attached to such efforts, such as "remedial" or "bonehead" English. The choice of indicator used to place a student in a skills improvement program, whether it is performance on the SAT 9, overall performance in classroom work, recommendations made by a committee of educators, or some other process, is not the critical issue. As long as the student is being taught something he/she doesn't know, there should be no objection to this endeavor. Perhaps a public relations program to explain efforts directed to skill improvement is in order.

2) Why should the student be forced to waste his/her time covering IGOs relating to basic skills in English, math, and language arts when there are other subjects on which he/she could be spending time?

Phelps has noted that "Survey results show clearly that the public wants students to master the basics skills first, before they go on to explore the rest of the possible curriculum. If this means spending more time on the 'basics,' so be it." (Phelps, 1999). Teaching Stanford deficiencies is teaching basic skills in English, math, and language arts. If we object to spending extra time in this endeavor, then what might we suggest is more important? In fact, many teachers in the current survey lament the fact that students arrive to their classroom deficient in basic skills - the very problem this skills improvement program is designed to address. While many may object to students spending extra time on basic skills, the tide of opinion seems to be flowing in the other direction - that schools need to emphasize the "basics" before going off into the electives.

3) The skill improvement is a waste of time for the student. He/she already knows the material being taught in the skill improvement class and this time could be better spent for something else. The SAT 9 has forced the student to spend time on curriculum he/she has already mastered.

Skill improvement activities should be worthwhile. The student should be taught something he/she doesn't know. If students find themselves in class being taught something they have already mastered, then there is a flaw in the system. This should be a simple problem to resolve. Students could be given a pre-test prior to the implementation of the re-teaching process based on the specific IGOs which will be covered in the re-teaching effort. At the conclusion of the re-teaching effort a post-test could be given. If the student improves his score from pre-test to post test, then we can conclude that the reteaching effort was worthwhile - that the student learned something he/she didn't know. If the student does not improve his performance on the post-test, it would be for one of three reasons:

a) The student performed quite satisfactorily on the pre-test and probably didn't need the re-teaching (flawed selection process)

b) The re-teaching process itself was flawed.

c) Despite our best efforts the student for whatever reason didn't learn anything.

If it is determined that students are being put into skill improvement classes which they don't need (e.g. they have already mastered the curriculum being taught in those classes), and assuming that the curriculum is based on IGOs identified by analysis of SAT 9 tests, then we may conclude that the SAT 9 is a flawed indicator for placement of students in those classes. If, however, we can conclude that the student did learn something in the skill improvement program, then it is successful regardless of the process by which the student was placed in the program. If the student did poorly on both the pre-test and the post-test, then either the re-teaching process itself was flawed or the student for whatever reason can't or doesn't want to learn.

4) My students do need skill improvement, but not in the areas indicated by the SAT 9. This test does not give useful information about where my students need help.

Are re-teaching efforts based on SAT 9 data beneficial? If not, where is the flaw? It could be that some other indicator should be used to identify the basic skills, if any, which should be "re-taught" to the student.

Should we be using another indicator as the basis for identifying areas in which the student needs re-teaching?. We could only arrive at these answers if we set about to perform critical analysis of our SAT 9 re-teaching efforts as described above. This would require, unfortunately, more tests and more test analysis. If another mechanism were to be advocated for identifying basic skills needing improvement, then the same scrutiny must be applied to the alternate mechanism. We must again ask: "What do we expect a student to master at a given grade level, what performance standards shall we set, and what are the implications for a student who fails to meet those standards? How do we define "deficient" and what are the implications of being identified as "deficient?" The classroom teacher may take the position "I don't need a standardized test to tell me what my students need..." In that case, it becomes incumbent for the teacher to demonstrate that his/her expectations match those of the teachers who have preceded and will follow in the student's education, that all the involved teachers agree on acceptable performance standards for that student, and that those standards are congruent with the standards set by the state.

In summary, the current SAT 9 driven IGO re-teaching efforts are valid until and unless we determine that students who need no improvement in the basic skills are being placed in skill improvement programs. Otherwise, if it is demonstrated that students are learning something in skill improvement programs driven by SAT 9 testing and IGO re-teaching, then this program is appropriate and successful and the SAT 9 is an appropriate test to use in this regard.

III-8) Should SAT 9 results serve as the basis for the school system to certify its graduates as proficient in basic skills at two levels - the graduation "warranty?"

This issue was not raised by the survey but is listed here for completeness sake. No data is available on this program, and no further discussion of this question will be attempted here.

Summary and Conclusions

The practice of setting educational standards and instructional goals for public schools is recognized by virtually everyone as a function legitimately lodged with the state. The State of West Virginia has the constitutional mandate to assure a "thorough and efficient education" for its children. The available evidence indicates that the West Virginia State Board of Education Instructional Goals and Objectives were formulated by leading educators in our state and have been the subject of detailed analysis by other independent experts. West Virginia's Instructional Goals and Objectives have been found to be at least suitable if not among the best in the nation for use as a set of standards on which to base our public education. Schools and teachers are obligated to design curriculum around the West Virginia State Board of Education Instructional Goals and Objectives as these represent the standards appropriately established by the state.

The state may use such criteria for accreditation of its schools as it chooses. In West Virginia the Stanford 9 test is only one of several indicators used to assess school and individual student performance. The Stanford 9 is appropriately keyed to the West Virginia State Board of Education Instructional Goals and Objectives. Technical objections may be made to the Stanford 9, but these objections may be raised against any test and are neither specific to nor do they invalidate the SAT 9 as an appropriate testing instrument. Less than satisfactory performance by the school in relation to the SAT 9 test results in exhortations to the school/teacher to focus on achieving established educational goals, but has no other immediate detrimental effect. The Stanford 9 results are therefore an appropriate factor to consider in the school accreditation process and to identify seriously impaired schools.

The individual student who performs below the 50th percentile on the Stanford 9 in the areas of reading, mathematics, and/or language arts is placed in a skills improvement program in those basic skills areas. The utility of the Stanford 9 in this regard can only be assessed by determining if students placed in these improvement programs actually improve their basic skills. If it can be demonstrated that students are learning something in skill improvement programs driven by Stanford 9 testing and re-teaching of the instructional goals and objectives, then the Stanford 9 is being appropriately and successfully used to help the students learn. If students who need no improvement in the basic skills are inappropriately placed in skill improvement programs on the basis of their Stanford 9 performance, then the use of the Stanford 9 in this regard would seem inappropriate.


Appendix One: Standardized Testing in Preston County Schools

This following tables summarizes Preston County's performance on standardized testing over the last 22 years.

The data in Table One lists two numbers for each grade level for each year. The first number indicates how many of the 55 counties in West Virginia scored the same as or below Preston County for a given year in basic skills on that year's standardized test. The number in parentheses indicates how many points above or below the state average Preston County scored that year. For instance, in 1977 our third grade CTBS basic skills score was equal to or better than 44 of 55 counties, and was 5 points above the state average. The same year, our 11th grade CTBS basic skills score was the same as or better than 18 of 55 counties, and was 4 points below the state average.

The data in Table Two lists two numbers for each grade level for each year. The first number indicates Preston County's "national percentile rank" for a given year in basic skills on that year's standardized test. The number in parentheses indicates the West Virginia state average for the "national percentile rank" in basic skills. For instance, in 1977 our third grade CTBS basic skills score was at the 58th percentile nationally, while the state average was the 53rd percentile nationally. The same year, our 11th grade CTBS basic skills score was at the 40th percentile nationally, while the state average was the 44th percentile nationally.

The CTBS (Comprehensive Tests of Basic Skills) was used 1977- 1996; SAT-9 (Stanford Achievement Test, 9th Edition) has been in use beginning in 1997.

Table One
Preston County Schools Standardized Test Scores:
Number of counties scoring the same as or below Preston County in basic skills (Points above or below the state average in basic skills)
Table Two
Preston County Schools Standardized Test Scores:
Preston County's national percentile rank score in basic skills (West Virginia average national percentile rank score in basic skills)
Grade 3 6 9 11 Grade 3 6 9 11
1977 44 (+5) 25 (-3) 19 (-6) 18 (-4) 1977 58 (53) 47 (50) 41 (47) 40 (44)
1978 43 (+4) 30 (+1) 27 (-1) 23 (-4) 1978 54 (50) 50 (49) 46 (47) 40 (44)
1979 32 (+2) 27 (-2) 17 (-5) 21 (-4) 1979 55 (53) 49 (51) 43 (48) 41 (45)
1980 40 (+4) 9 (-11) 25 (-3) 16 (-4) 1980 60 (56) 43 (54) 46 (49) 41 (45)
1981 24 (-2) 15 (-6) 26 (-2) 11 (-8) 1981 55 (57) 49 (55) 48 (50) 38 (46)
1982 25 (-2) 14 (-7) 24 (-3) 16 (-6) 1982 57 (59) 51 (58) 49 (52) 41 (47)
1983 10 (-8) 14 (-7) 25 (-5) 25 (-4) 1983 52 (60) 51 (58) 49 (54) 45 (49)
1984 11 (-6) 15 (-6) 13 (-7) 16 (-5) 1984 55 (61) 53 (59) 49 (56) 44 (49)
1985 20 (-3 ) 13 (-6) 21 (-5) 20 (-7) 1985 54 (57) 49 (55) 45 (50) 47 (54)
1986 10 (-8) 14 (-6) 20 (-7) 8 (-11) 1986 54 (62) 54 (60) 43 (50) 44 (55)
1987 21 (-3) 21 (-2) 17 (-7) 6 (-15) 1987 62 (65) 60 (62) 44 (51) 38 (53)
1988 4 (-11) 16 (-6) 18 (-3) 8 (-10) 1988 57 (68) 56 (62) 49 (52) 48 (58)
1989 17 (-3) 20 (-2) 11 (-7) 8 (-10) 1989 65 (68) 60 (62) 46 (53) 47 (57)
1990 * * 15 (-3) 17 (-4) 1990 * * 50 (53) 54 (58)
1991 17 (-3) 9 (-6) 5 (-11) 2 (-14) 1991 67 (70) 58 (64) 42 (53) 44 (58)
1992 23 (-2) 18 (-3) 26 (-1) 13 (-7) 1992 57 (59) 50 (53) 55 (56) 48 (55)
1993 9 (-6) 25 (-3) 18 (-3) 9 (-8) 1993 57 (63) 55 (58) 54 (57) 48 (56)
1994 3 (-11) 6 (-9) 43 (+5) 30 (0) 1994 54 (65) 50 (59) 62 (57) 59 (59)
1995 20 (-3) 10 (-5) 42 (+4) 24 (-2) 1995 63 (66) 54 (59) 63 (59) 56 (58)
1996 18 (-6) 16 (-5) 30 (+1) 22 (-3) 1996 64 (70) 58 (63) 61 (60) 56 (59)
1997 14 (-6) 23 (-2) 33 (0) 21 (-3) 1997 52 (58) 61 (63) 55 (55) 53 (56)
1998 6 (-7) 18 (-4) 40 (+1) 19 (-3) 1998 55 (62) 61 (65) 59 (58) 55 (58)
1999 4 (-7) 20 (-4) 4 (-6) 22 (-1) 1999 56 (63) 61 (65) 53 (59) 58 (59)

* Strike year - no testing done at grades 3 and 6.


Third Grade: Number of counties scoring the same as or less than Preston County.
Yr 7 7 78 79 80 81 82 83 84 85 86 87 8889 90 91 92 93949596979899
# 44 43 32 40 24 25 10 11 20 10 21 4 17 * 17 23 9 320181464

Third Grade: Variance of Preston County test scores from the state mean.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 9 2 93 949596979899
Va +5 +4 +2 +4 -2 -2 -8 -6 -3 -8 -3 -11 -3 * -3 -2 - 6 -11-3-6-6-7-7
* Strike year 1990. No testing.


Sixth Grade: Number of counties scoring the same as or less than Preston County.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93949596979899
# 25 30 27 9 15 14 14 15 13 14 21 16 20* 9 18 2561016231820

Sixth Grade: Variance of Preston County test scores from the state mean.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 949596979899
V -3 +1 -2 -11 -6 -7 -7 -6 -6 -6 -2 -6 -2 * -6 -3 -3 -9-5-5-2-4-4


Ninth Grade: Number of counties scoring the same as or less than Preston County.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93949596979899
# 19 27 17 25 26 24 25 13 21 20 17 18 11 15 5 26 1843423033404

NinthGrade: Variance of Preston County test scores from the state mean.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93949596979899
V -6 -1 -5 -3 -2 -3 -1 -7 -5 -7 -7 -3 -3 -3 - 11 -1 - 3+5+4+10+1-6


Eleventh Grade: Number of counties scoring the same as or less than Preston County.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93949596979899
# 18 23 21 16 11 16 25 16 20 8 6 8 8 17 2 13 9302422211922

Eleventh Grade: Variance of Preston County test scores from the state mean.
Yr 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93949596979899
V -4 -4 -4 -4 -8 -6 -4 -5 -7 -11 -15 -10 -10-2 -14 -7 -80-2-3-3-3-1


References

American Federation of Teachers (2000, A) Introduction [Online] Making Standards Matter 1999, Available: http://www.aft.org//Edissues/standards99/intro.htm

American Federation of Teachers (2000, B) Judging State Standards Reforms [Online] Making Standards Matter 1999, Available: http://www.aft.org//Edissues/standards99/Judging.htm

American Federation of Teachers (2000, C) State by State Analysis, West Virginia [Online] Making Standards Matter 1999, Available: http://www.aft.org//Edissues/standards99/states/Westvirginia.htm

American Federation of Teachers (2000, D) Major Findings, Standards [Online] Making Standards Matter 1999, Available: http://www.aft.org//Edissues/standards99/findings.htm

American Federation of Teachers (2000, E) Table 2 [Online] Making Standards Matter 1999, Available: http://www.aft.org//Edissues/standards99/Table2.htm

Braden, Lawrence S., with Ralph A. Raimi (1998, March) State Mathematics Standards [Online] Fordham Foundation Standards Project, Vol. 2, No. 3. Available: http://www.edexcellence.net/standards/math.html

Cizek, Gregory J. (1998, October) Filling In the Blanks -- Putting Standardized Tests to the Test [Online] Fordham Report, Vol. 2, No. 11. Available: http://www.edexcellence.net/library/cizek.pdf

Education Week on the Web (1999) Academic Standards, Assessments, and Accountability [Online] Quality Counts 99, Vol. 18, No. 17. Available: http://www.edweek.org/sreports/qc99/states/indicators/in-t2.htm

Finn, Charles E., with Petrilli, Michael J. (2000, January) The State of State Standards 2000 [Online] Fordham Report, Available: http://www.edexcellence.net/library/soss2000/standards%202000.html

Munroe, Susan, with Terry Smith (February 1998) State Geography Standards [Online] Fordham Foundation Standards Project, Vol.2, No. 2. Available: http://www.edexcellence.net/standards/geogrph.html

Harcourt Brace Educational Measurement (2000, January) Stanford Nine - Technical Information [Online] Harcourt Brace Educational Measurment. Available: http://www.hbem.com/trophy/achvtest/techinf.htm

Harcourt Brace Educational Measurment (2000, January) Stanford Nine - Overview [Online] Harcourt Brace Educational Measurment. Available: http://www.hbem.com/trophy/achvtest/sat9view.htm

Lerner, Lawrence S. (1998, March) State Science Standards [Online] Fordham Foundation Standards Project, Vol. 2 No. 4. Available: http://www.edexcellence.net/standards/science.html

Phelps, Richard P. (1999, January) Why Testing Experts Hate Testing [Online] Fordham Report, Vol. 3, No 1. Available: http://www.edexcellence.net/library/phelps.htm

Ravitch, Diane. (1996, December) The State of Standards [Online] Network News & Views. Available: http://www.edexcellence.net/library/standard.html

Saxe, David Warren (1998, February) State History Standards [Online] Fordham Foundation Standards Project, Vol. 2, No. 1. Available: http://www.edexcellence.net/standards/history.html

West Virginia Board of Education (1999, January) Training Manual and Handbook for Education Performance Audits


Return to the front page