Educational Research - Discovering the Truth, Learning the Tricks or Forecasting the Weather? Michael Bailey School of Educational Psychology, Measurement and Technology University of Sydney NSW 2006 e-mail : m.bailey@edfac.usyd.edu.au Abstract Research in education is produced in ever-increasing amounts. Despite this, many people believe that the growth in our knowledge has been slow. This paper argues that most empirical educational research is implicitly based on one of two models or belief-structures about the nature and purposes of educational inquiry: either that it should be an effort to discover universal laws which apply to human thinking, learning and behaviour, or that it should be an attempt to provide prescriptions for the appropriate methods and techniques to use in facilitating learning. This implicit basis results in findings which, in quantitative research, use inappropriate statistical assumptions. It also leads to conclusions being expressed with too much generality. The paper proposes that a third model may more closely reflect the nature of the reality which we are investigating: that we should explicitly recognise the difficulty of generalising about people, and should report findings and conclusions in terms of probabilities and expectations with limited scope of application, while believing this limitation to be inherent in the nature of the phenomena being studied, as in the case of weather-forecasting. Situated cognition has been accepted; situated assessment is increasingly acceptable; now it is time for a model of situated research findings. Educational Research - Discovering the Truth, Learning the Tricks or Forecasting the Weather? This argument was originally devised as a reaction to Michael Scriven's (reprinted 1986) paper "Evaluation as a Paradigm for Educational Research". In a more general way it seeks to present a case for a view of educational research of a non-evaluative kind which accepts neither the simplistic (Physics-like or Biology-like?) paradigm rightly criticised by Scriven and also by Glass (1973) and others, nor the nihilistic view that pure research in education is impossible and that the proper role of educational researchers is the evaluation of specific techniques and tricks designed for the improvement of particular learning experiences. It is inevitable that any attempt to study situations in which the effects on and reactions of individuals are the main focus will run into difficulties of generalisability, but I do not believe that it follows that we should give up, or declare the attempt to be out of bounds or illegitimate. One role of educational research may be to study the application of general principles in particular contexts. To avoid, or limit the use of, the classical experimental approach is not, however, to be unscientific; many sciences - cosmography, palaeontology and climatology, to name only three - by their nature cannot be based on experiments. The assumption that scientific method is confined to the development of general laws by repeated experimentation is quite erroneous. It is my contention that Scriven's argument for the central role of evaluation as a research model in education is too simplistic. In asserting the central importance of evaluation to educational research it ignores the influence of ideology on evaluation. As Campbell (1988a) puts it (p. 323): "The evaluation model we offered mistakenly bought into the logical positivist's definitional operationalism, specifying as program goals fallible measures open to bureaucratic manipulation." Moreover, in attempting to declare fundamental issues of, say, learning theory, as off-limits to educational research and the exclusive concern of psychologists, it does not make them any easier to resolve or less relevant to educators. Scriven's view may stem from the valid insight that "scientific" research in the reductionist, Laplacian sense hasn't got us very far. However, not only is this paradigm no longer in use in the "hard" sciences, but the criticism that educators have learned little from educational research could be applied with equal or greater force to psychology. When psychologists have studied human learning, the kinds of learning they have studied have usually been reductionist in nature and avoided the problems of cognition and cognitive processing; thus they have had little relevance to education. This may account for the persistence of the behaviourist approach and such simplistic measures as "time-on-task". This is not to say that I disagree with Scriven's definition of educational research as "research that contributes to the facilitation of education" (though I'm not sure why he excludes learning theory, which I would have thought fundamental: my criticism of it is that most of it doesn't tell us much about how, still less why, people learn). Many kinds of research can facilitate education (the physiology of visual perception is a good recent example), and they don't need to have any evaluative components. However, when Scriven instances a practical research approach (I take his example "how to maintain order in an unruly classroom" as an instance), whereas I would agree that we don't need to develop a theory of unruly behaviour, his prescription suggested to me, not a sudden access of commonsense, as he asserts, but only that he had spent too long remote from the practical problems of classroom management. He proposes to "begin by identifying a number of practitioners who are outstandingly successful at the task in question; you must then use all the tricks in the book to identify the distinctive features of their approach....... you then teach new or unsuccessful practitioners to use the winning ways and retest until you get an exportable formula." But if there were any such exportable formulae, there would be no disorderly classrooms. The problem is, as every practitioner knows, that sometimes it works and sometimes it doesn't; every competent teacher has had the experience of doing apparently identical things with different groups of learners and getting diametrically opposite reactions; every experienced teacher has worked with successful, and unsuccessful, colleagues, who use every conceivable kind of classroom management technique. What works this year doesn't work next year, what works for Ms A doesn't work for Mr B, what works in Downtown High doesn't work in Suburban Road. Research findings may help to refine and improve specific elements of technique, but the broad issues of education cannot be handled by the cookbook approach for the same reason that Alice couldn't carve the pudding: it wasn't a lack of technique, but the unexpected complexity and elusive nature of the interaction, as soon as some human characteristics were revealed by the pudding. All this is hardly new; Cronbach (1975) pointed out that complexity of circumstances and changes over time were inherent in educational investigations. Kuhn's paradigm-embeddedness appears to require the constant reinterpretation of conclusions and decisions, and Campbell's (1988b) argument that we need to accept at least a degree of ontological relativism in educational research appears conclusive. However, as he also points out (p. 518), ontological nihilism, or complete situational and historical uniqueness, would eliminate the possibility of theory development and even the emergence of shared meanings. We have to find a way to proceed. The recent development of chaos theory suggests a way of understanding the nature of this complexity. In education we almost certainly are often dealing with chaos phenomena, where very small changes in initial conditions can rapidly generate wide variations in system output. Large effects do not necessarily require large causes. Cziko (1989), Sungaila (1990) and Braggett (1992) have offered some relevant ideas on the applicability of chaos concepts in education, but there may be many more applications than they consider. In addition, as will be argued, the idea of chaos theory as applied to education may throw light on the apparent lack of relevance of much educational research rightly complained of by Scriven. We should expect similarities, and characteristic patterns, but not necessarily simple deterministic predictability. It must be admitted that many writers do not share Scriven's rather one-sided - perhaps polemical - viewpoint. Orteza Y Miranda (1989) takes a rather more balanced view than Scriven of the nature of educational research. She classifies it broadly into conclusion-oriented research - what is the case, and how it came about - and decision-oriented research - what to do, approximately equivalent to Scriven's position. Noting that action in education must proceed anyway, and cannot wait for research findings, she defends collaborative (action) research as a third valid approach despite the difficulty of generating sufficient research expertise in teachers, and the problem of aligning academic researchers' viewpoint (usually of events) with those of teachers (often of intentions). Getzels, in Suppes (1978), wrote: "Research may have the greatest effects ... not so much where it attempts to alter an element of practice directly (in the mode of the packager) as indirectly ... where it raises new questions and contributes to transformations in the general paradigms of the human being ..." In the U.S., again, in the previous year, the National Institute of Education (1977) produced a report which distinguished four purposes, or regions of influence, for research in education: the view of reality (including language and concepts as well as values), a vision of the achievable, know-how, and the commitment to act. Scriven seemed mainly concerned with the third, the very one with which Getzels suggested we are perhaps too narrowly preoccupied. Scriven may have been flogging an all too willing horse. More recent developments in our ideas about what research should do is that it may also seek understanding: trying to get behind the obvious and immediately observable, and at least to incorporate a humanistic perspective. Here Scriven may perhaps be forgiven for writing in a particular historical context, when the "positivist" or naive quasi-scientific model of social science was still influential in the social sciences, and the chief purpose of researchers was still to mould reality nearer to their (or others') desire - the social engineering model. In what reads like a remarkably prescient analysis of current tendencies in Australia, Wrigley (1989) also took a rather different view from that of Scriven. He noted with concern the increasing emphasis on sponsored and commissioned research in Britain. He saw this as encouraging evaluation and short-term research at the expense of fundamental research and the construction of an adequate theoretical base, and as threatening both the time available for research - absorbing it in the preparation of elaborate and detailed research proposals and tenders - and academic freedom - since those who commission research, like the Queen of Hearts, usually want the findings of investigations to support their already-held views. Wrigley's argument appears to support that of Getzels referred to above in asserting that there is too much research, rather than too little, which fits Scriven's desired model. But what is actually happening in educational research apart from these commissioned evaluations? Plainly educational research is a growth industry. The most casual glance at the shelves of a university library, or reference to the ERIC index, will reveal the ever-increasing amount of published educational research. One would imagine, therefore, that knowledge in the field would be increasing at the same rapid rate. However, this doesn't seem to be happening. Perhaps we should ask why it isn't. If we are discovering the truth, why does it seem to be so elusive? The first thing to note is that, even in conclusion-oriented quantitative research, conclusive findings are rare. It appears to be obligatory to write "More research is needed", or words to that effect, at the end of every research article, despite the customary use, in the analysis section of any study with a quantitative component, of decision-theory logic which would appear to imply quite the opposite. It would be refreshing to find an author claiming to have made the definitive study of a problem, and settled it once and for all. But then research paper titles often have little relationship to the actual content of the study. Consider the following three published papers1, selected largely at random from articles whose titles sounded interesting to me, all from respectable international refereed journals, and all published within the last seven years: 1. Patterns of spatial awareness 2. Teacher judgement in student evaluation: a comparison of grading methods 3. Homework: a survey of teachers' beliefs and practices. At first glance, one might expect the first paper to deal with issues of some generality in the field of perception and cognition, the second to be a methodological paper in the field of assessment and evaluation, and the third to be a fairly wide-ranging descriptive study of its topic. Readers who expected any of these things would be in for some degree of disappointment - but perhaps not if they merely read the conclusions of the papers. Briefly summarised, these were: 1. The development of map-reading skills in students is needed and will assist cognitive understanding in the social sciences. 2. Teachers are able to discriminate between effort and achievement ratings, and their professional judgement based on knowledge of students' work over a period makes a significant contribution to the prediction of standardised test results in addition to that made by student grades. 3. Teachers believe that homework is a good thing; whole-school policies should be used to ensure that expectations are clear, and in-service education for beginning teachers is needed to ensure that they understand the benefits of homework. The disappointment emerges when one looks at the evidence on which these conclusions are based. Study 1 was a long-running study of first year geography students in an American State College. Data accumulated over each of two five-year periods on the performance of students (about 450 in all) in identifying the States of the U.S.A. on an outline map. Many of them were not able to identify many of the states correctly, and in the more recent five-year period somewhat fewer of them could do so than in the earlier period. Whether the intakes were comparable, and whether any task beyond simple recognition should have been assessed, were issues not considered. Study 2 was a correlational study of the judgements of 39 Primary teachers in a U.S. school district compared with standardised test results of students. Each teacher was asked to rate five students at different ranking levels within their class, and the analysis treated the data-set as 195 cases independently randomly sampled from the population of students, rather than 39 cases from the population of teachers. Nesting of teachers within schools was also not considered. Study 3 surveyed 120 teachers in a single American secondary school using a researcher-constructed Likert-type attitude survey instrument. The response rate was 52%. We may note that all three of these studies would have satisfied Scriven's primary criterion as being evaluative in character. Also only one of them was methodologically dubious, and that only because it used the very common procedure2 of disregarding the nested structure of the data. Study 1 and Study 3 certainly added something to our knowledge, and so, if we overlook the methodological problem, did Study 2. But what did they add? Perhaps it was not quite so much as the authors' conclusion sections led us to suppose. We might think of each of the issues researched as being a sort of giant mosaic, in which each of the authors has placed - perhaps quite correctly - one very small piece. To pretend that this tiny piece is the whole picture, or even that it can indicate the nature of that whole picture, is manifestly absurd. However, if the big picture really is a mosaic, only the accumulation of little pieces will reveal it. Meta-analysis constrains methods to comparability, averages out particularities and risks ending up resembling fortune-telling, where the predictions are sufficiently vague and general to fit just about anything that actually happens. Perhaps any attempt to enunciate universal laws in education is doomed to produce nothing more than fortune-telling: the universal (or null) prediction. An attempt to determine the average colour or pattern may result in failing to see the regularities and repetitions that do exist, and we should be wary of social scientists who continue to talk as if simple universal laws - or perhaps universally valid rules - are waiting to be discovered by careful educational researchers. As a better method, not placing too much emphasis on the band-aid, cook-book model of research, Campbell (1988a) has forcefully argued for the increased use of replication of studies. Meta-analysis has encouraged replication where the studies are as comparable as possible, but, as Campbell has pointed out, a more powerful replication would be one which did not do that, but one where the researcher considered the problem, and independently developed their own way of investigating it. The added power of triangulation would then be available. Of course this is not, as Campbell himself points out, a new idea. Before the development of formal meta-analysis, Lykken (1968) clearly distinguished between operational replication, where the methods used are the same as far as possible, and constructive replication, where the problem is the same but researchers independently develop ways of studying it, and strongly advocated constructive replication in psychological research. This view of the accumulation of small, often situation-specific, pieces of knowledge, if correct, is not a negative view of the possibilities of effective research - even effective quantitative research - in education. Weather forecasters use the science of meteorology, but fluid dynamics is not sufficient to predict tomorrow's weather. Macro-scale meteorological variables interact with features of topography and with the results of the weather which has occurred in the past. These interacting feedback systems generate chaotic behaviour. Predictions cannot be generalised far in space or time. However, that in no way devalues the importance of good weather forecasting, nor has it led to an abandonment of efforts to make it more and more reliable. I believe that there is at least a useful analogy, and more probably a similarity, between weather forecasting and the models we should be using in educational research. Common-sense reflection can reinforce this view. Suppose that my daughter is about to go to high school, and is interested in mathematics. How can educational research findings help her to choose a school? We may find that a study, comparing outcomes in two specific schools, suggests that, on the average, girls do better at mathematics in single-sex than comprehensive high schools. We may even find that a properly conducted study of many schools shows that, on the average, girls in the schools included at the time when the study was done in the country where it was done did better at mathematics in single-sex than comprehensive high schools. But my daughter has a friend who she has worked with since Kinder who is going to a comprehensive high school. Furthermore, she wants to take drama, which the girls' school doesn't offer. And the local high schools aren't average schools anyway, but individual schools with their own strengths and weaknesses. All that classical educational research can tell us is that, if everything else being equal, there is a slight preference one way. In forecasting the weather we are not attempting to manipulate variables. The focus is on understanding and predicting (not seen as quite so separable as they are in Scriven's model), but even here success has been very limited. Macro-scale meteorological phenomena, such as air masses, interact in very complex ways with physiographic features to produce weather, and thus far our predictions are at their best when fairly broad. More specific predictions for particular local areas can often be made by people with a knowledge of the relationship between the meso-scale climate variables there and the more macro-scale meteorological phenomena typical of the region. We know enough to know that generalisations can be obtained and predictions improved; progress is slow not because we are using the wrong research paradigm but because of the complex interdependency of the variables. Because weather is a chaos phenomenon its prediction is at best probabilistic and at worst impossible beyond the short term. However, characteristic patterns of association recur frequently, and regularities appear often enough for us to be able to identify some occurrences as unusual. In education, too, large-scale simplified generalisations from psychology and sociology interact with a wide variety of local conditions to produce outcomes in particular circumstances. Being impatient for accurate forecasts, we sometimes think we would do as well by flipping a coin, but in fact we couldn't. If we approach educational research in the belief that causes or treatments produce invariable and completely generalisable effects, we are like a forecaster who believes that frontal systems invariably produce a given amount of rain distributed evenly over the whole area they cover. The nature of the phenomena being studied limits the generality and generalisability of results. But it does not invalidate research, nor prove it to be a waste of effort. Further, educational research may, like research into climate, require investigations using different techniques at the macro-, meso- and micro-scales. Given the existence of certain macroÐscale effects, we can learn how these can be modified by local circumstances - or in educational applications, we may be able to modify these inputs locally so as to modify the outcomes for learners. In this way we may have a bit more power than the forecasters, who may only be able to say that we'd better take our umbrellas. Of course there are ways in which research into education is even more complicated than research into weather. Not only do treatments work differently in different schools; methods that worked thirty years ago may be quite ineffective, or even unacceptable, today. Nevertheless, in a particular school context, individual teachers have often accumulated standing which allows them to "get away with" things that others couldn't. This is the weakest point in the "study the experts" argument presented by Scriven; educators don't work in the same sort of context (willing or inert materials) that chefs, plant hybridisers, and even doctors enjoy. The second additional complicating factor is that humans are social beings. Every teacher knows that interaction with a single student alone is very different from the interaction with that same student in a class situation, and it is a commonplace of psychology that people in groups behave and react quite differently from the same people as individuals. As yet we are only starting to learn about the ways in which the characteristics of the individual members of a group affect that group's actions in particular circumstances: our findings so far are no more than tentative gropings toward understanding. Taking all this into account, it is no wonder that completely generalisable laws are hard to establish; it is frustrating, too, that most findings have an expiry date after which they can no longer be safely consumed. Nevertheless, I believe the research effort is still worth while; if our purpose is to inform and improve education, there are many ways in which this can be done which don't require the provision of infallible rules. References Braggett, K. C. E. (1992). "Chaos Theory for Administrators", Studies in Educational Administration 56, 11-18. Campbell, D. T. (1988a). "Can we be scientific in applied social science?", ch. 12 in Methodology and epistemology for social science : selected papers. Chicago: University of Chicago Press. Campbell, D. T. (1988b). "Science's social system of validity-enhancing collective belief change and the problems of the social sciences", ch. 19 in Methodology and epistemology for social science : selected papers. Chicago: University of Chicago Press. Cronbach, L. J. (1975). "Beyond the two disciplines of educational psychology", American Psychologist 30, 116-127. Cziko, G. A. (1989). "Unpredictability and indeterminism in human behavior: Arguments and implications for educational research". Educational Researcher 18, 3, 17-25. Glass, G. V. (1972). "The Wisdom of Scientific Enquiry on Education", Journal of Research in Science Teaching 9, 1, 3-18. Lykken, D. T. (1968). "Statistical significance in psychological research", Psychological Bulletin 70, 3, 151-159. National Institute of Education (1977). Fundamental Research and the Process of Education in the fourth annual report of the National Council on Educational Research. Washington: National Academy of Sciences. Orteza Y Miranda, E. (1988). "Broadening the focus of research in education", Journal of Research and Development in Education 22, 1, 23-38. Scriven, M. (1986). "Evaluation as a paradigm for educational research" in House, E. R. (ed.) New Directions in Educational Evaluation. London: Falmer Press. Sungaila, H. (1990). "The new science of chaos: Making a new science of leadership?", Journal of Educational Administration 28, 2, 4-23. Suppes, P. (ed.) (1978). Impact of Research on Education : some case studies. Washington: National Academy of Education. Wrigley, J. (1989). "Curriculum development, teacher training and educational research: a view from the inside", Research Papers in Education, 4, 3, 3-15. 1 As my intention is not to hold the authors or editors up to ridicule, but merely to illustrate our norms of accepted practice, I have refrained from referencing these articles here, but will do so privately on request. 2 See the currently fashionable national Course Evaluation Questionnaire study for a good example of this prevalent error