Lakewood the Thinking City

 

OHIO SCHOOL PROFICIENCY TESTS: An Analytic Survey

 

 

What This Report Is About

       Most discussions of the proficiency-test controversy have been reminiscent of the story about the blind men and the elephant.  They have aimed to show only one side of the controversy and one part of the overall situation.  Hence they could not show relationships between one aspect of the controversy and another. 

       In this report I aim to present the controversy clearly and as a whole.  I aim to show the structure of the controversy, to examine its various threads and untangle them.  This means considering all the arguments pro and con, as well as the criticisms and rebuttals.  It means focusing on the issues (questions) that need to be answered in order to resolve the controversy. 

       This report will not provide many new facts, but rather aims to show what facts, and value-judgments as well, are needed to come to a justified conclusion.

       The proficiency-test program is a program in flux.  I will not be talking about any specific version, but rather about the general characteristics of the program (standardized tests in the fourth, sixth, ninth or tenth and twelfth grades, with passage of the ninth/tenth grade test required for graduation) as of spring 2001.

       The report is based on research plus six discussion meetings held at Lakewood Public Library from the end of February through the beginning of April, 2001, attended by a mix of teachers and other citizens.  Members of the Committee for the Fourth “R” provided valuable comments.  

       The accounts of the arguments will be not only anonymous but impersonal, with the exception of one published report.  In the controversy as depicted here, the players are arguments and issues, not persons.  The aim is to show the logic of the controversy – the structure of the reasoning – rather than the politics. 

 

Distinctions and Related Areas

       Understanding the controversy requires recognizing a number of distinctions, of which these seem to be the most crucial:

Standardized testing vs. High-stakes testing.  The proficiency tests are both standardized (identical throughout the state) and high-stakes (serious negative consequences attached to failure), but different conclusions follow from these two characteristics. 

       Within the concept of high-stakes testing, we may distinguish objectively high-stakes (carrying important consequences) vs. subjectively high stakes (felt to be important by the persons involved.)

Lower-grade tests (4th and 6th grade) vs. upper-grade tests (9th/10th and 12th). 

Among subjects:  reading (which by all accounts is fundamental and all-important) vs. science, citizenship or writing (with math perhaps in-between).

Among uses to which the tests may be put:  diagnosis vs. intervention vs. discipline.

Among types of school district – in particular, inner-city vs. suburban.

Between individual students taking the tests, and schools or school districts.

Among educational goals:  expertise for an elite (e.g. future computer designers) vs. everyday knowledge (e.g. how to make change for a dollar).

Between criticisms of the tests overall, and criticism of specific aspects.

Between judgments that apply universally and those that apply on a case-by-case basis.

 

Finally, it’s worth mentioning other areas of discussion or dispute that the proficiency-testing controversy relates to, most notably:

              school funding

              the purposes of K-12 education

              school vouchers

              educational innovation.

Discussions of proficiency testing almost inevitably lead into one or more of these areas.

 

The ultimate aim of any discussion of the proficiency tests is evaluation – deciding what to do about them.   For this, we need to know the arguments pro and con, and we need to identify the issues (questions) they raise.  Therefore I will give the arguments and the criticisms of the arguments and the themes that appear in these arguments.  I will then look at  the proficiency tests as a means to achieving the goals of the educational system and consider those goals, or lack of them.  After that I will look at the major issues raised by the controversy and finally I will suggest a few policy alternatives.

 

Arguments For and Against

The basic argument for proficiency testing, implicit in the state’s policies, begins with the assertion that education in Ohio ought to be improved.   One often-cited reason for this is the need for remedial teaching – teaching of material that should have been learned in high school – to large numbers of college freshmen.   From this basic premise, the argument proceeds:

         1)  Ohio schools should achieve the goals of the educational system.

         2) To meet the goals of the educational system, it is necessary to make the

schools more accountable to the state.

           Therefore, the schools should be more accountable to the state.

            3) Proficiency tests are the best and most effective way to make the

                 schools accountable to the state.

               Therefore, proficiency tests should be given.

 

This is the basic argument for proficiency testing.   Other arguments, not appealing to accountability, have been given – for example, that the proficiency tests force school districts to re-examine their methods and curricula, and that the tests uncover talented students who for one reason or another have not done well in class.  However, most of the controversy centers on this argument.   Criticisms of the tests are all or almost all attacks on this argument, in one form or another, and specific arguments in favor of the tests consist of attacks against the attacks.

       We might note that the basic premise of the argument – referring to the “goals” of the educational system – is exceedingly vague.  I will revisit this point, but first I will look at the ways in which the argument, and therefore the proficiency test system, has been attacked.

 

Attacks on the argument:  rejecting accountability to the state

The first premise in the argument is not an object of attack.  No one would deny that the schools should meet the goals of the educational system. 

Some criticism, however, reject the second premise.  They deny that the schools should be more accountable to the state, claiming that efforts toward greater accountability, such as proficiency tests, are simply an effort by the state to exert greater control over curriculum on behalf of the business community.  The argument against accountability – against the second premise of the argument above – can be put this way:

Local districts should have freedom to determine their own methods and curriculum.

More accountability to the state prevents such freedom.

 Therefore, local districts should not be made more accountable.

 

The counter-argument is that local control is unimportant compared with the need to provide good education for all students. 

 

Attacks on the argument:  rejecting proficiency tests as the means to accountability

However, by far the greatest amount of controversy concerns the third premise, which holds that proficiency tests are the best and most effective way to insure accountability.

       The premise is not self-evidently obvious.   To begin, there are other ways to promote accountability.   For example, state inspectors – or better, inspector-consultants – could examine instruction in the schools.  (I am not advocating this plan.  I doubt that it would be popular with local school districts.  I merely mention it to show that there are alternatives.)   And of course requiring proficiency tests does nothing to provide funding needed to correct deficiencies that the tests may uncover.   (Here, discussion of proficiency testing immediately runs over into discussion of school funding.)   In this regard, proficiency testing may be a sort of unfunded mandate.

Here are the criticisms of the premise that proficiency tests are the best and most effective way to achieve accountability.  In considering such criticisms, we must look at the proficiency tests as an entire system, including the tests themselves, the way they are to be given, and the way the results of the tests are to be used.

a)  The proficiency tests require preparation that crowds out more valuable learning.

b)   Proficiency test results are closely correlated with socioeconomic level, so they are at best superfluous.  Worse, they are taken to indicate that poor performance is the fault of poor teaching; thus they are grossly unfair in addition. 

c) The tests cause excessive emotional distress among students and in some cases among teachers as well.

d)  They represent only one type of testing and therefore are inadequate to measure actual performance.

e)  They are badly crafted. 

Let’s look at these criticisms, and the arguments in rebuttal, more closely:

 

a) Criticism:  The proficiency tests crowd out more valuable learning.

       Critics paint a picture in which, because of the high stakes involved in the proficiency tests, teachers spend vast amounts of time “teaching to the test,” so as to insure that their students do well.  This requires coaching the students in how to take a standardized test as well as extensive drill in the specific material the proficiency test is to cover.  This material is factual and low-level and the preparation for it leaves little time for teaching students how to be critical and creative.  (The assumption of course is that learning to be critical and creative is better than learning merely factual material.  Thus discussion of proficiency testing immediately crosses over into discussion of the goals of education.)

       Furthermore, since the tests are standardized, with standardized material, the teacher is hampered in attempts to adapt his/her teaching to the students’ particular characteristics.

Rebuttal:  Such extensive preoccupation with the tests ought to be unnecessary.  The proficiency tests only cover the minimum that every student should learn.  If teachers are teaching what they should be teaching, and teaching it adequately, students should be able to take the tests in stride, without undue preparation that takes time away from the “higher” parts of the curriculum.  Conversely, if teachers find that the proficiency tests cover material they are unfamiliar with, it is because they are not following the state guidelines.   

       This aspect of the dispute involves two ambiguous concepts:  “teaching to the test” and instruction in taking the test.

       “Teaching to the test” can refer to any of three scenarios:   1)  It may mean merely that a teacher tests students on what they have been taught.  (This might be called “testing to the teach.”)

Or 2) it may mean that certain tests are required and the curriculum must be fashioned to enable students to pass those tests. In other words, the test is used as a means of mandating a certain curriculum.  In this case complaints about “teaching to the test” are in effect complaints about the curriculum attached to the tests, and they raise the question as to what the curriculum should be.

Or 3) it may mean that students must be drilled specifically on possible test questions, either because a) the test questions do not sample significant material from the curriculum or b) because the students have been taught material covered on the test but have forgotten it.  In this case, of course, the test needs to be improved or the students’ memories need to improve.

       As for instruction in taking the test, we must distinguish two types.  One is mechanical instruction in filling in the bubbles, etc.  This is indeed sterile, but it’s hard to believe that it takes up any substantial amount of time.   The other type is substantive instruction in such things as methods for determining which multiple-answer choice is the correct one.  This may take time, to be sure, but the instruction may be valuable.  In fact, it may be the student’s first acquaintance with the notion of inference.

 

b) Criticism:  Proficiency test results correlate highly with socioeconomic status.  Therefore it is unfair to attribute low scores to poor teaching.

       Proficiency tests are an attempt to solve the problem on the cheap by endorsing a bogus accountability which scapegoats teachers and administration, when in fact the cause of poor performance lies in students’ backgrounds and the subsequent failure of state government to provide sufficient aid.

       The logical corollary is that money now spent on proficiency tests should be given to schools in poverty-stricken areas.

       The cutting edge of this criticism is a study by Randy L. Hoover, Ph.D., of Youngstown State University.   He found a correlation of 0.80 between test scores in 593 Ohio school districts and measures of socioeconomic advantage-disadvantage.

Rebuttal: Schools with low scores can improve, no matter what the socioeconomic situations of their students, and proficiency-test scores can be a stimulus to innovation.  Furthermore, the prevalence of poorly-educated graduates shows that inadequate performance is not confined to poverty-area schools. 

       In any case, this criticism has to do with the use of the proficiency tests, not the tests themselves.  It implies that the state should intervene to help low-scoring districts rather than blaming or threatening them.  So a combination of testing and intervention would be appropriate.  Schools in poverty areas would be given additional funds and tested to see whether they improve.  If they didn’t improve in, say, three years, they would be disciplined.  (In order to avoid the perverse incentive to perform poorly in order to get additional funds, low socioeconomic standing would be the basis for funding, not merely poor performance.)

 

c) Criticism:  Proficiency tests cause excessive emotional distress for students, and this distress, in addition to being an evil in itself, has deleterious effects on the student’s attitude toward school.  In addition, some teachers are highly distressed.  These injurious effects outweigh any benefit the tests may provide.

One hears many stories of mental and even physical distress – of students vomiting in the test room, of students dropping out because of the tests, of even good  students in the fourth grade worrying obsessively.   The complaints may be anecdotal, but they are too substantial to ignore.

Rebuttal:  Much the same can be said for this criticism as for the criticism of crowding-out.   If the proper curriculum is being taught, students should be able to take the test in stride, without stress.   If students have a medical/psychological condition that causes undue stress (determined independently, of course, and certified by a doctor), they can be excused from the tests.

 

d)  Criticism:  Proficiency tests by themselves are inadequate to show how much students have learned.

       An adequate assessment must be based on a variety of measures, because different students may have different strengths and weaknesses.   Among the other measures that should be used are student portfolios, written work, debate/discussion performances, classroom work as observed by the teacher, writing assignments and a variety of standardized tests other than the Ohio proficiency test.

       Assessment of schools or school districts should be based not only on test scores but also on measures, e.g. library or computer resources and usage. 

Rebuttal:  Standardized tests are useful because they are standardized.   Non-standardized tests are relative to the teacher who creates and grades them.  Therefore they provide no better measure of a student’s achievement than passing grades – and it is the hollowness of passing grades as a measure that has made proficiency testing necessary in the first place.

       If the present proficiency tests are to be replaced by a variety of standardized tests, then these would be subject to the same objections that apply to the present standardized tests.  Furthermore, any collection of tests would have to live up to the standards set forth by the Department of Education.   Which is to say that the Department would have to devise its own set of standardized tests, and that is what we already have in the proficiency tests.

 

e) Criticism:  Proficiency tests are badly crafted.

       Among the complaints are these: 

Questions are too difficult for the grade level (especially for lower grades).

The wording of questions is confusing, and since test administrators have no knowledge of the test until it is given, they can’t get clarification.

Questions are not phrased in the language that students are familiar with.

In exams on writing, the format may be foreign to the ways in which writing has been taught.

The testing schedule is too rigorous, especially for fourth or sixth grades – too much testing is crammed into too few days.

Grading is arbitrary, subjective, sloppy, and/or done too hastily.

Notwithstanding that students have multiple opportunities to pass a given test, a school and to some extent the individual student are judged on what the student has done on one particular day, and he or she may do badly on that day for extraneous reasons.

Rebuttal:  Questions on the proficiency test have been carefully composed by means of an elaborate process, and flaws will be eliminated as time goes on.  In any case, some persons will always see some flaws in every program.   Students have several chances to pass any one test, and that should allow them to familiarize themselves with the format.

       If the questions seem too difficult, it is because teachers haven’t covered the material.  As for grading, it is done in a well-supervised manner, and if it can be proven (by actual examination of the graders’ decisions) that grading is done too hurriedly, the remedy lies in making sure that graders work more slowly, not in eliminating the tests altogether.  

As for subjectivity in grading:  This never occurs in multiple-choice or true-false questions, but only in the relatively higher-level questions --  the kind of questions that critics say should be emphasized.  Every teacher knows that whenever a question is answered by the student’s own response, the grading of that question necessarily has an element of subjectivity.  So the critics can’t have it both ways:  They can’t demand that questions call for creative responses, and then demand that the answers be graded as objectively as the answers to multiple-choice tests.

 

 

The Controversy in Brief:  Themes, Tradeoffs

     Here is a distillation of the foregoing arguments:

       At the core of the state’s justification of the proficiency tests is the concept of accountability, involving negative consequences for failure to perform up to standards set by the state. 

       In arguing for accountability, therefore, the state implies that poor-performing schools either:

              1) don’t know what they should be teaching

                                    or

2) do know what they should be teaching, but are unwilling or unable to teach it adequately.  (And if they are unable, it is because of culpable shortcomings on the school’s part, not because of circumstances beyond the school’s control.)

       (State intervention – i.e. intervention in the form of help rather than punishment – does not enter into the concept of accountability, for it does not rest on the assumption that the fault for poor performance is in the schools.)

       Opponents of the proficiency tests make some or all of the following claims:

·        The state has no right to dictate what should be taught.

·        The state doesn’t have a better idea of what should be taught than local districts and teachers.

·        Therefore the necessity of preparing for the tests crowds out worthwhile learning.

·        The high-stakes nature of the tests produces undue stress.

·        Standardized tests, or at least a single set of standardized tests, don’t provide a good assessment of student performance.

·        Poor performance is a result of socioeconomic factors and therefore cannot be fairly attributed to poor teaching.

·        The tests have crucial flaws some of which may appear only when the tests are being given and it is too late to correct them.

These themes will be expanded on below, in the section on issues.  Accountability will also be discussed at greater length further on. 

 

Tradeoffs/Dilemmas

 

Matching the state’s purposes with the criticisms just described brings out three sets of opposing considerations.  That is to say, there are three sets of extremes: for each set the questions may tend in the one direction or the other.  Thus the test designer is faced with three tradeoffs, or dilemmas, depending on your point of view. These are:

 

1)  Ideal content vs. minimal content.  On one hand, the test might be ambitious, covering everything we would want a student to know.  On the other, it might be pegged to a minimal level, covering the very least we would expect every person to know.   The former, of course, promises better results but is more demanding of teachers’ time, thus threatening to crowd out individual spontaneous effort.  The latter is less demanding but only promises to raise every student to the level of mediocrity. 

       Looking at the tests, I get the impression that the fourth and sixth grade tests are pegged at the ideal level, while the ninth grade tests are pegged at the minimal level.  This has not come out in the criticisms of the tests, however (although one fourth-grade teacher said, “If the public could see what fourth graders are required to do, they would be amazed.”)

       Pegging the tests at a level somewhere in the middle is a possibility, though it raises the question as to exactly what that middle consists of.  (More about this in discussion on the goals of education, below.)  Also there is the question as to whether the  middle course would  have the best of both worlds or the worst of both worlds. 

2)  High pressure vs. low pressure.   At the one end, seriously unfavorable consequences are attached to poor performance.  This, it would seem, motivates teachers and educators to a high degree, but it also produces the stress that has brought complaints.  On the other end, the unfavorable consequences are slight, which means less stress but perhaps less incentive to perform well.

3)  Standardization vs. customization.  At the one end, complete standardization of the tests allows for across-the-state comparisons and discourages teachers or schools from going off in their own possibly-unproductive directions, but it allows for no innovation by the individual teacher or school.   At the other end, allowing teachers or schools to customize their curricula has the opposite faults and the opposite virtues.

       Needless to say, the present proficiency tests are at the standardization extreme.

 

Explanations of non-factors:

Before continuing the explanation, I will mention several points that have come up in discussion but which do not seem to play a significant role in evaluating the proficiency tests:

-- Curriculum models:  These are handed down by the state for local districts to follow, and the proficiency tests are based on them.  On some accounts they are significant because if teachers have not been following the models, the materials covered by the tests will be foreign to them.  Conversely, it is claimed, once the teachers begin to follow the models, they will be able to take the tests in stride. 

       This may be true, but it is still true that the curriculum models merely express the same requirements that the proficiency tests do.   It is the requirements themselves that are significant, not the fact that the models reflect them.  The curriculum models would serve to rebut an argument that the state fails to disclose what is in the proficiency test, but that is all.

-- Grading methods:  These have come under some criticism, alluded to above.  However, as mentioned, there will always be some element of subjectivity in the grading of any answers given in the students’ own words, and any further failings can apparently be corrected.  So the low quality of grading (if indeed it is of low quality) may be one item in an accumulation of flaws that weigh against the proficiency tests, but by itself it doesn’t merit consideration.

-- Percentage of correct answers required:   In many of the tests, a rather low score is required to pass.  I have not taken this into considerations, because a teacher will feel compelled to teach all the material covered by a test, even though students are only required to know part of it. 

 

 

On Proficiency Tests as a Means to a Goal

 

       As mentioned above, the rationale for proficiency tests is that they are necessary for accountability. Accountability, in turn, is required to achieve the goals of education.  It does so through its three essential elements:

       1) A clear statement of specific and agreed-on goal(s) of the educational system.  2)  Assessment of the achievements of students in meeting that goal.

3)  “Teeth,” that is consequences for not meeting assessment standards that will induce students and teachers to give the tests high priority. 

These three elements are necessary to one another.   Educational goals have no effect unless students are assessed to determine whether they are achieving those goals or not.    Assessment has no effect unless there are “teeth” to give incentives for avoiding failure.  So the rationale for the assessments and that for the “teeth”  both rest on the clear statement of specific and agreed-on goals.

 

No clear and specific statement of goal(s).   But in fact, there is no clear statement of specific and agreed-on goal(s) from the state.   All we find are vague general statements and empty rhetoric, such as “high expectations,” “meeting expectations of employers,” “moving into the workforce” “moving on to higher education,” “success of all children,” “high-quality education” and the like.  These statements tell us nothing; they raise more questions than they answer.   What is a high-quality education?  What constitutes success?  What is required to move into higher education?  What expectations do employers have, and which employers are we talking about?   We find no answers to such questions.  We find nothing but high-sounding but vacuous phrases, with no statement specific enough to be useful. 

       The proficiency-test system is in effect a chain of reasoning culminating in a set of quite specific judgments – pass or fail – about the students who take the tests (and about teachers and schools).  You cannot legitimately derive such specific judgments from vague and meaningless statements such as the ones just mentioned. 

       Compounding the difficulty is the fact that for some fields, different goals are

appropriate for different students.  Math is an example.  We would like to have some students – but not all – achieve to the level where they could , say, calculate the trajectory of a missile.   For more ordinary students, we would like to see them achieve to the level where they can understand a loan application. Two completely different levels of achievement – and therefore two completely different sets of test questions – apply to these two groups, and the fact that one math test applies to all is testimony to the failure of the test-makers to get a grasp on the goals they want to achieve.

Goals implicit in the standards and questions?  In reply, it might be claimed that the specific goals to be achieved by the education system are intuitively obvious and are reflected in the test questions themselves along with standards (“outcomes”) that the questions are based on.  In other words, it is claimed, the test-makers had the proper educational goals in mind when making up the questions on the tests, and so passing the tests will mean achieving those goals.  For example, a math test question will require calculation of a rectangular area, because one of the goals of the educational system, obviously, is to enable graduates to determine the areas of walls or other surfaces they will encounter in their lives.

       This claim is not borne out by consideration of the questions and the standards.  Consider the sixth-grade science test of March 2000.  One of the questions concerns the three forms of energy – kinetic energy, heat energy and potential energy – that are involved when a boy pulls a swing on the swingset.   This question calls on the student to identify the transformations between these types of energy as the swing moves back and forth, and asks which of the transformations causes the swing to stop moving.  What goal is achieved by requiring all students to understand these energy transformations, and why do they need to understand it by the sixth grade?

       If the sixth-grade science test may seem too advanced, the ninth-grade science test may seem too easy.  Most questions deal ask the students read maps, charts or diagrams (which means that they do not test scientific understanding but rather the ability to read maps, charts or diagrams).  Such conceptual understanding is laudable, but there is nothing concerning such concepts as controlled experiment or other aspects of the scientific method, nor is there any mention of sampling bias, nor the difference between the validity and the reliability of tests – all of which are directly relevant to judgments a person must make as consumer and citizen. 

       Science is a field in which there is a marked distinction between experts and laypersons.  Let’s look at a field – “Citizenship” – whose requirements hold for all alike.  The ninth-grade citizenship test for March 2000 consists of questions touching on a variety of topics including U.S. history, economics, geography, consumer choice, and basic attributes of the government in the U.S.  Some are quite elementary (e.g. locating the United States on a map), and a few are fairly sophisticated (e.g. drawing an inference from a newspaper story).  But there is nothing about the role (if any) of money in politics.  There is nothing about civilian control of the military, an essential and glorious aspect of our political culture.  (In some other countries, a general like MacArthur would have become dictator instead of merely fading away.)  There is nothing about market imperfections (as in the case of pollution, for example), a subject vital to the understanding of free-market economics.  There is nothing about alternative forms of democracy, such as the parliamentary system or proportional representation.  If we are giving a test to determine whether a student is fit to become a citizen, it seems to me that some understanding of these topics would be an essential ingredient.  But they are completely ignored. 

 

       In pointing to these examples, my purpose is not to argue for a different set of questions, but only to demonstrate that the selection of questions and standards is to some extent arbitrary, lacking a connection to any specific and well-justified set of educational goals, either explicit or implied by the standards and questions.  

 

The other elements of accountability are therefore unjustified.   Remember that the proficiency tests are justified by their (supposedly) being necessary for accountability, and that accountability has three related elements of which the first is a clear statement of specific agreed-on goals.   The second element – assessment – is justified by its relation to the first, and the third element – the “teeth” – is justified by its relation to the second.   If the first element is missing – as we have seen to be true – then both of the other elements lose their justification.  It cannot be claimed that the proficiency tests assess achievement of the proper goals of education, and therefore it is unjustified to inflict sanctions (the “teeth”) on those who do not succeed in the tests.

       The proficiency test system eventuates in sharply specific judgments about the status of specific students.  And just as you cannot derive gold from lead, you cannot legitimately derive specific judgments of fitness from vague and mushy statements of basic goals.  There is no substitute for first  working out a clear statement of specific agreed-on goals and then deriving standards and questions from that statement.   (This may be a difficult process, perhaps impossible, but still there is no substitute for it.)

 

The standards and questions set the de facto goals.   In reality, of course, the goals of the proficiency tests – and therefore of the education system as a whole – are determined by the standards and questions themselves.  The standards and questions are made up by educators and others through a process of reasoning best understood by them, and whatever goals these standards and questions conduce to are the de facto goals of the education system.  Needless to say, this puts the cart before the horse. 

       In commenting on the meaning of “intelligence,” some psychologists sarcastically say, “Intelligence is what intelligence tests test.”  Similarly we can say, “Proficiency is what proficiency tests test.” 

 

What are such tests good for?  Still, there must be tests, so it useful to ask in what ways the proficiency tests are valuable, and in what ways they are not.

What such tests can do:

¨      They can provide comparisons of students and schools with respect to the kinds of materials covered.   Of course, drawing any implications from such comparison is another matter.

This brings up an interesting point that has been made in defense of the proficiency tests, namely comparison with the SATs.  The SATs, it is said, are standardized tests – even more standardized than the proficiency tests – and they have served very well.  However, laying aside the multitudinous controversies over the SATs, there are significant differences.  The SATs do not serve to divide students into successes and failures, as the proficiency tests do.  Rather, they serve simply to rank applicants in the competition for places in a college class, based on their purported ability to predict success in college.  So if they are justified it is because they do predict success.   But as we have seen, there is no way the proficiency tests can claim to predict success.

¨      They provide diagnostic information about an individual student.  This may serve as a signal that intervention is needed and/or that the instruction being given that student may need further examination.  (Note the difference between saying that the instruction may need further examination, vs. judging the instruction necessarily to be defective.) 

¨      They provide the occasion and the impetus for school districts and individual teachers to examine their curricula (and to some extent this holds true for the state educational apparatus as well).

       What such tests cannot do:

¨      As I have discussed at length, they cannot claim to be a justified measure of a students’ success in achieving agreed-on educational goals. 

¨      They cannot claim necessarily to test material more valuable than what a teacher would be teaching in the absence of the tests.   (To be sure, the proficiency-test material may well be more valuable than what a teacher would be teaching otherwise, but it could be the other way around.  What if a teacher wanted to teach about market imperfections?)

 

Issues

 

The goal to be achieved by the proficiency tests, however important, is only one aspect of the program, and raises only one of the issues to be addressed in evaluating the tests.   Here is a list:

      

What are the proper goals of K-12 education?  (What is a good K-12 education? What materials must be mastered?)

 

The importance of this issue has been dwelt on at length.  Here I will only mention some of the dimensions to be considered.

Factual vs. conceptual vs. critical/analytical.  By all accounts, education should go beyond merely factual instruction to impart an understanding of concepts, but that leaves open the question as to how much factual knowledge should be instilled.  And beyond conceptual knowledge are the analytical and critical abilities – perhaps best called “reasoning abilities” – e.g., recognition of premises and conclusions, analysis of evidence claims, and the like. 

Strictly intellectual vs. “creative” or “holistic.”   One of the familiar complaints against the proficiency tests is that they crowd out efforts to make children “creative.”   Similarly, it is claimed that education should be well-rounded (“holistic”), involving the arts, foreign languages, career development, sports, and the like.  (The notion of “multiple intelligences” appeared in discussion of this point.)  Clearly, public education is devoted to making its students well-rounded in this sense.  But how far should it go in this direction, and exactly what should be included?  (Should character-building be part of the educational program, for example?)     

Abilities for all, vs. abilities for some.   This has been touched on above, in connection with math, and it also applies to science and possibly to writing.  We want to put some students on the road to elite technical expertise, but it would be both unrealistic and uneconomical to try to put all students on this road.  What goal(s) apply to all students and what goals apply to only selected students, and how do we define the select group in a particular case?

The role of reading.  Reading is uniquely important.   How can its importance be recognized in practice? 

 

Do the proficiency tests conduce to the achievement of the proper goals of education (whatever they may be) and to what extent?

       This and the next issue assume that the proper goals of education have been determined.   Given that we have a specific and agreed-on conception of these goals, we can ask how the proficiency tests relate to them.  Three possibilities have appeared:

1)  Proficiency tests fail to cover all worthwhile curricular material, and they are so obtrusive that they crowd out much of the worthwhile curriculum.  (criticism a)

2)  Proficiency tests merely cover the minimum that is necessary for competence as a citizen and functioning human being.  Therefore, students and teachers should be able to take them in stride.  (rebuttal to criticism a).  On this view, it makes little difference how we define a good education, for the proficiency tests will cover the minimal basics in any case. 

3)  The proficiency tests cover material that ought to be covered and challenge students and teachers to attain a higher level of mastery; they raise expectations.

       Needless to say, there are conflicting answers on this issue.   In looking at the conflicts, we find views on the proficiency tests themselves entangled with differing views on what the goals of the tests should be, which is of course the previous issue.  A clear understanding requires that these two types of views be distinguished. 

 

In the absence of the proficiency tests, to what extent would teachers be achieving the proper goals of education? 

       To point out deficiencies in the state’s approach is not to imply that all teachers and school districts are beyond reproach.  The situation of course varies from case to case.  But we need to have some sort of generalization or summary view if we are to evaluate the proficiency tests.  

The picture is far from clear.   When teachers complain that the tests distort their curricula, they are of course implying that what they would otherwise teach is of high value.  But some observers contend that the proficiency tests have forced teachers and school districts to re-examine their curricula, the implication being that the pre-existing curricula were below par. 

We need to know more about this.   We need to know, for example, exactly what teachers refer to when they use terms such as “creativity.”   Or to what extent teachers are introducing the elements of critical thinking and reasoning. 

It’s difficult to find answers, largely because as citizens we know very little about what happens in the classroom.   Through the media we get on one hand descriptions of exceptional classrooms, and on the other abstract statistics such as school report cards.  But  exceptional classrooms tell us little about the general case, and abstract statistics are not only open to various interpretations (as we have seen), but tell us nothing about actual teaching processes.   Parents see the teaching process through the highly distorted lens of their children’s accounts.  And that is all.   We could use an overall survey, for example one that describes various models of teaching in each subject and tells what proportion of classes conform to each model.  That is a tall order.  In the absence of such surveys, we need to recognize our uncertainty.

 

To what extent should educational content be determined by the state, and to what extent by local districts and individual teachers?

       This issue arises primarily because of the argument that the state should not control educational policy because state control is anti-democratic and excessively business-oriented.  If this argument is rejected, then the desirability of state control is an issue only as it is related to issues of standardization or of content, as previously mentioned.

 

What is the best method for assessment:  Standardized tests or tests created by individual teachers?

       On the one side it is argued that standardized tests are narrow and routine.  On the other side it is argued that only standardized tests can compare students across the state and escape the relativism in teachers’ assessments. 

 

What is the explanation for poor test scores?   Is the strategy of holding teachers

and districts accountable an effective one?

       As we have seen, the state seems to imply that teachers and school districts are responsible for poor test scores and that if they are sufficiently disciplined the scores will improve.   Critics, on the other hand, contend that poor scores are to a high degree the product of unfavorable socioeconomic conditions.

Is it possible to implement other forms of accountability, not using standardized tests?

       I have used the term “accountability” to apply to a system based on standardized tests, i.e. the proficiency tests.   But other forms of accountability are conceivable.  For example, schools might be held accountable to use certain practices as prescribed by an inspector-consultant (as mentioned earlier).   Whether an alternative method would be feasible and desirable – especially financially -- is of course an open question.

      

Can flaws in the proficiency tests be repaired?  If not, how serious are they? 

       The flaws cited by critics might be inherent in the system of proficiency tests, in which case they would provide reason for abolishing the tests.  Or they might be relatively superficial, in which case teachers and students can reasonably be expected to live with them.  Or they might be subject to correction, in which case they are not an objection to the proficiency tests per se.


Policy Alternatives

       Finally, here is a list – clearly not exhaustive -- of policy alternatives suggested by the issues, along with their rationales.  We begin with the two extremes:

I.   Keep the program essentially as it is, and everything will fall into place.

Everything covered in the tests is material that students should know.  If teachers have trouble, it’s either because they haven’t aligned with the model curriculum or because they are going off in their own directions.   Once teaching is aligned with the model curriculum, teachers should have no trouble and students shouldn’t feel stressed.

I.a).   Keep the program, but make a strong commitment – including a financial commitment – to fixing the flaws that have furnished cause for complaint.  For example, pay graders enough to insure that they are all properly qualified, and can spend sufficient time on each test.

 

II.  We know what’s wrong and testing won’t help, so scrap the program.

Test scores are determined by socioeconomic status, so they really don’t show anything.  Since they don’t show anything, they can’t be used to make teachers or districts accountable, and in the meantime they degrade the curriculum and cause unnecessary stress.   Money saved by scrapping the program can be used to help schools in poor areas.

 

III.  Continue proficiency tests as they are, but take the pressure off.

Test scores are good indicators of achievement, but it’s unfair to use information on achievement to make judgments about the quality of teaching.  So let the schools use the scores diagnostically, and use the results to show which schools need extra help.

III.a).  Create an index of socioeconomic level, and rank schools according to this index.  Also rank them according to their test scores.  In the cases in which district or school scores fall significantly below its socioeconomic rank (probably few), single that district or school out for attention – find out what the school is doing wrong, and take steps to improve its performance.   (For schools with low scores on both the socioeconomic index and index of test results, the remedy lies in more equitable school funding.)

IV.  Replace the present program with a varied and integrated system of diagnostic assessments and interventions.

       Give a multitude of tests, at all levels, and use them in the first instance for diagnosis, in combination with well-funded interventions.   For example, a consultant visiting a school might begin by entering into a dialogue with teachers and administrators about the school’s educational goals.  Then the consultant would survey practices in use at the school and suggest changes where appropriate.  The various assessments and the reports of the interventions will provide a total well-rounded picture of a school’s or a teacher’s competence, and on that basis state officials may decide what more, if anything, needs to be done.

 

V.  Make the proficiency tests into a measure of truly minimal standards.

The trouble with the tests is that to a large extent they don’t measure what every student must know, but rather what every student should know.  So review the tests and cut them back to insure that they are in fact testing the minimal level required and can be taken in stride while teachers proceed on to higher-level, more innovative material. 

V.a).  Also create another set of tests to measure achievement beyond the minimum.  Use these for diagnostic purposes only.  (Even with this provision, there will be a danger that these tests come to play the same role, with all its drawbacks, that proficiency tests now play.) 

VI.   By their tests you shall know them.

       Require every school district to make up its own set of tests, and make sure they are administered fairly.   Disseminate the tests themselves, along with results.  The tests composed by a school district will be the best reflection of that district’s expectations, and so they will give interested parties -- parents and professionals, mostly – a good picture of its character and capabilities.  Tests that are exceedingly elementary and/or scores that are exceedingly low may provide reason for state intervention. 

 

VII.  Reading is everything.  So expand the role of the reading tests and do away with all the others. 

Reading is crucial for almost all kinds of academic achievement, including success on the proficiency tests.   So enlarge the set of reading tests.  Have more of them, so that any one test becomes less important, and therefore students feel less pressure.   For students who do poorly, have state-funded intervention programs.    Such a guarantee that every student will read well is worth the trouble and effort.   Tests in the other subjects are not, so get rid of them.

                                                                                                                          - G. B.