Assessing Student Performance via the Internet

 

Rosemary A Callingham Department of Education, Community and Cultural

Development

Patrick Griffin University of Melbourne

 

 

Introduction

 

The Year 9 Numeracy Assessment and Monitoring Program was undertaken by

all year 9 students in Tasmanian government schools in 1997 as part of

the ongoing state-wide assessment of student performance. The view of

numeracy that underpinned this program was that developed by the

Tasmanian Education Department (Department of Education and the Arts

1995). This has a strong cross-curriculum focus, in which learning

areas other than mathematics both contribute to and make demands on

students' numeracy development. The program had several parts,

including a test of mental computation, two multiple choice tests and a

constructed response test. Of interest here, however, is the extended

investigative task that was included in the bank of assessment tests.

Before considering the task itself and its delivery, it is worth

examining why a task of this nature was included in a large-scale

testing program.

 

Why include an investigative task?

 

Essentially there were three reasons why such a task was included in a

state-wide testing program - the nature of numeracy itself; the notion of assessing something other than mathematical skills alone; and the desire to involve teachers in assessment. Each of these will be considered separately.

 

Numeracy

 

The concept of numeracy is relatively recent, and has undergone some

shifts in meaning since the word itself was first coined in 1959

(Crowther 1959). There is a growing opinion today that numeracy

requires some mathematical action within a social context. "Numeracy is

more than being able to manipulate numbers" (Department of Education

and the Arts 1995, p6). Clearly it has elements of mathematical

knowledge, but it also requires application of this mathematical

knowledge and an understanding and appreciation of when mathematics is

being used or needs to be used. A recent Australian description

supporting this view emerged from the Numeracy Education Strategy

 

Development Conference:

 

To be numerate is to use mathematics effectively to meet the general

demands of life at home, in paid work, and for participation in

community and civic life.

 

In school education, numeracy is a fundamental component of learning,

performance, discourse and critique across all areas of the curriculum.

It involves the disposition to use, in context, a combination of:

underpinning mathematical concepts and skills from across the

discipline (numerical, spatial, graphical, statistical and algebraic);

mathematical thinking and strategies; general thinking skills; and

grounded appreciation of context.

Australian Association of Mathematics Teachers 1997 p. 15)

While not informed by this description at the time, the extended task

developed for this assessment program is very much in keeping with the

spirit of this definition.

 

Assessment of ?

 

There is a growing awareness among educators that we need to refine and

improve assessment techniques in order to measure a wider range of

important educational objectives (Shepard 1992). In particular

assessment should allow a student to make links with prior knowledge

and to demonstrate ability to transfer or generalise learning to

different contexts (Griffin 1997). Monitoring programs in general do

this only to a limited extent.

The importance of assessing more than basic skills has been

acknowledged. The Third International Mathematics and Science Study,

for example, included some performance tasks in which students

undertook practical activities (Harmon et al 1997). In Israel,

assessment of "scientific literacy" undertaken using a multiple choice

test and a more open-ended constructed response task indicated that

results on the open-ended tasks elicited information that was different

from that shown by the multiple choice formats (Zuzovsky 1997).

The inclusion of an investigative task was thus in keeping with a

growing trend in recent testing programs to include open-ended

questions, but also went further. Performance tasks of this nature are

being increasingly used in classroom assessment programs, and are well

accepted by teachers in this context (Nuttall 1992). They are currently

found in high stakes situations such as the Victorian Certificate of

Education and the UK GCSE. Inclusion of a complex task in which the

student has to formulate the problem and then devise methods of

solution is much rarer, however, in large scale testing programs run

for monitoring and accountability purposes.

 

Teacher Judgement

 

The task was conceived as a learning and teaching task that could be

integrated into classroom practice. The inclusion of a task of this

nature into the classroom context was based on more than just a belief

in the importance of context in numeracy, or the need to move beyond

measuring basic skills. It was also a recognition that teachers should

be involved in the testing process, especially if the results are to

have any relevance for practice. Increasing teachers' authority in the

area of evaluation of educational achievement, together with a

questioning of some psychometric practices, has been termed a

'post-modern' approach to assessment (Lewy 1996). Unlike some

practitioners who criticise psychometric approaches, however, we wanted

to develop a rigorous procedure for assessing student achievement

through a classroom based and contextualised task. This raised issues

of validity and reliability, and not a little controversy.

 

The Task

 

The task itself involved an analysis of a daily newspaper. It was

developed during two item writing sessions with practising Tasmanian

teachers late in 1996 and trialed in Victoria. A very similar activity

is included in the Tasmanian K-8 Mathematics Guidelines in the Chance

and Data strand (Department of Education and the Arts, 1993). Students

were asked to undertake a short project in which they considered the

arrangement of adverts and articles in any ten pages of a daily

newspaper. The project was divided into four sections: Definition,

Measurement, Classification and Application (Appendix 1).

In the first section, Definition, students defined the terms they would

use throughout the project. The four categories used were adverts,

news, features and information, although students could create their

own categories if they wished. Emphasis was laid on the fact that each

student's definition could be different.

 

The second section, Measurement, asked students to measure the amount

of space given to adverts and articles on the ten pages they had

chosen. Some suggested units of measurement were provided as a starting

point, including number of columns or word count, and students were

instructed to prepare graphs, charts or tables to present their

findings.

 

An element of choice was included in parts three, Classification, and

four, Application, in which students could choose whether to focus on

adverts or articles. In practice, nearly every student considered

adverts exclusively, probably reflecting year 9 students' interest in

daily newspapers. In Classification, students were instructed to

classify either adverts or articles and relate these classifications to

the kinds of people who might read them. Finally, students had to apply

this knowledge to decide where to place an advert in the newspaper and

estimate its cost. The alternative was to investigate the use of

statistical information in a newspaper, but no student who responded

chose this option.

 

Each section included basic instructions and some suggestions for

getting started, in much the same way that a classroom investigation

would be organised. The task was thus provided within a structured

framework, but allowed each student to respond in a unique way.

Each section was scored using the same scoring rubric. Scoring rubrics

were developed for two aspects. 'Process' referred to the manner in

which the task was attempted and included elements of choosing and

using appropriate skills and strategies, communication of ideas and

justification of conclusions. 'Content' was based on the range and

complexity of the numeracy ideas used by students in answering the

different parts of the task. The scoring rubrics were based on trial

students' responses, and the SOLO taxonomy (Biggs and Collis 1982).

Internet delivery

 

The task was presented to students in hard copy form, or could be

accessed via the Internet. The Internet pages were produced using

standard software. They were a replica of the printed form, with the

addition of hot links to two useful addresses that had supporting

material available - News Corporation's "News on disk" page and the

Hobart Mercury's "Chance and Data" home page. The scoring rubric was

also provided via a link so that students could see how they would be

marked.

 

Students entered their responses to each part of the task in

interactive text boxes. These were not limited in size, but were unable

to support graphics such as charts or diagrams. In order to accommodate

this limitation an email link was created that would allow students to

mail supporting material as an attachment. Before they could send any

response, students had to provide their name, school and project

identification number and were asked to print their response and give

this to their teacher. This safeguarded students against technological

failures. Responses were automatically sent via a "Send" button. A

specially set up email address collected the responses and sent an

acknowledgment to the sender.

 

Administration

 

For a variety of practical and political reasons the task became

optional and was eventually attempted by over 1200 students. While

originally intended to be teacher marked, the option was later given to

schools for central marking. In some schools teachers chose to mark

their own students' work, using the scoring rubric. Others returned the

responses for central marking. Approximately 790 were marked by an

experienced team of markers in Melbourne, using the scoring rubric

provided. The rest were marked in their schools by the teachers

concerned. The Internet responses were printed out and marked together

with the others sent in by schools.

Thus we had an interesting mix of teacher and centrally marked

responses, with some responses provided in hard copy form and some via

a specially set up email address. In addition classroom observation of

students undertaking the task in three schools, and discussions with

teachers involved provided some useful insights into the advantages and

difficulties of undertaking a task of this nature in a large scale

assessment project.

As part of the overall monitoring program, teachers and students were

asked to complete a questionnaire. For teachers this included a section

about their teaching and assessment methodology as well as their

opinion of each part of the program measured on a four point Likert

scale. Students were asked about their computer use and homework, as

well as their opinion of the program. Data from these sources also

provided useful information about teacher and student attitudes to the

extended task.

 

Findings.

 

Measurement aspects

The analysis of this task using the Rasch method was included as part

of the overall analysis of the total monitoring program using the

computer program QUEST (Adams and Khoo, 1995). Calibration of the items

was carried out using the compulsory parts of the monitoring program.

Using this set of items, the student abilities were calculated, and

these abilities were used as a person anchor to calibrate the data from

the extended task. In this way, data from the optional part of the

program were at no stage used in the estimation of student abilities.

The variable map for the extended task indicated that there was a good

match between the student cohort ability and the item difficulties

(Figure 1). The task allowed students at all levels to demonstrate

their ability. In fact the top ability levels displayed in this task

exceeded the ability levels of any other part of the program. The

extended task allowed the very top students to demonstrate their

ability.

The fit map for task 4 provided further information. It showed an

overfit to the model suggesting that a secondary variable had been

measured. The task thus provided additional information about students'

performances that could not be measured by the other parts of the

program.

Figure 1. Variable Map of Extended Task

Distracter analyses of every item in the test were produced, including

those of the extended task. One of these is shown in Figure 2.

Categories

0

1

2

3

4

5

 

Percent %

36.8

17.8

21.0

15.3

7.8

1.3

 

Mean Ability

0.02

0.05

0.28

0.68

1.07

1.50

 

Thresholds

 

-0.06

0.41

0.97

1.71

3.11

 

Error

 

0.13

0.12

0.14

0.18

0.30

 

 

Infit MNSQ = 1.16

 

 

 

 

 

Figure 2: Student performance on Extended Task Part 2 Measurement -

Process

 

 

 

This demonstrates increasing ability associated with increasing score,

as expected. This was shown in analyses of each part of this task and

suggests that the marking scheme had been consistently applied by all

raters, whether teachers in schools or part of the central marking

team.

The relationship between student ability and score assigned for each

component of the extended task is shown in figure 3.

 

 

Figure 3: Mean ability by score on extended task.

 

A strong relationship is shown - score is almost perfectly predicted by

student ability level. It should be remembered that the student ability

was calculated only on the other parts of the program. This makes the

strong fit (R2=0.98) even more interesting.

The results from this task were combined with a small number of other

items from other parts of the task and reported as a strand called

"Competencies". Given that the task appears to be measuring a different

variable students who completed this task were effectively provided

with additional, richer information.

Student and teacher attitudes

It was clear from a number of responses that a task of this nature was

unfamiliar to many teachers. Some criticised it on the grounds that

this was literacy or social science and not mathematics. Others

complained about the amount of time required. In some schools, however,

the task was approached positively as a learning experience for all.

One teacher, in a school in a low socio-economic area commented

afterwards that the task was "... horrendous. But it made me realise

that we weren't doing enough of this kind of task with the students."

Unfamiliarity with open-ended project tasks was borne out by the

teacher questionnaire results. Of 157 responses, only 5.1% used

investigations often, and 59.2% rarely or never used open-ended tasks

in their teaching. The dominant model of assessment was school-based

tests, with about half reporting that they sometimes used teacher

initiated projects. Student initiated projects, or negotiated studies,

were much rarer, with 63% of teachers never using this form of

assessment. In this climate it is not surprising that the task created

problems for some, and there are implications here both for classroom

practice and future testing programs.

Classroom context

The introduction of the task was observed in a class of middle level

students in one school. The teacher concerned had photocopied

dictionary definitions of the four categories of definitions and

students were asked to find examples of each in their newspaper.

Initially the students were uncomfortable with the idea that they could

use a definition or identify something in a different way from that of

other students. As they became more confident, however, they began to

defend their choices to teachers and other students. The ability in a

real life context to formulate defensible definitions as a basis for

measurement is an important one, which is often not addressed

explicitly in schools where much of what students work with is defined

for them.

The task created cognitive conflict for many students. While being done

during "maths" time in most cases this was not a usual maths task. Some

students treated it as an English assignment, and wrote pages of text

completely missing the numeracy aspects. In schools where students

appeared to have been given little guidance by teachers this was

particularly evident. A teacher reported that one student came to her

in tears. The student recognised that she had not completed the task in

the way that it required but she was completely unable to reorganise

her thinking to meet the demands. While not wanting to put students

under undue duress, it is necessary to remember that this sort of

conflict is at the heart of real learning. The teacher's responsibility

is to support and question to help the student resolve the conflict.

Computer use

Responses via the Internet were disappointingly low. The major issue

appeared to be access to computers during mathematics lessons, even

though schools had well equipped computer facilities. Computer use was

not common during mathematics time and one teacher commented that it

was "...quite frankly too much trouble to use the Internet."

Observations in one trial school were revealing. During the Monday

morning lesson students were free to go to the computer room to

continue work on the project. A group of about six students, all boys,

took advantage of this. The others preferred to work in traditional

ways and had produced some excellent work. When they were asked why

they hadn't chosen to use the computer facilities students were

non-committal. They were all familiar with computers - indeed some of

these students were participating in a special extension program

centred around technology use - but were not disposed to use computers

for a "maths project". In another school a class observed in a very

well equipped computer laboratory were producing some outstanding

mathematical models of the structure of a newspaper as shown in figure

4. The teacher's comment was "I could only arrange to be here because

you were coming." In this classroom, a number of students had not used

a spreadsheet and the basics of spreadsheet use had to be taught before

they could get into the task itself. This extended the time required

and further exacerbated the access problem.

 

Figure 4: Ribbon graph produced to show the analysis of a newspaper

 

 

 

 

The students who answered via the Internet text interface only in

general did not produce outstanding work. This seemed to be partly

related to the interface itself. This mirrored the hard copy but had

some drawbacks. Students could not work on the net page and save their

work ready for another occasion. Some students got around this by

sending the work in two separate lots, others worked in familiar

programs and then used the email link provided to send their completed

project to the centre.

Discussion

Technical issues

Delivering a task of this nature via the web provided some

technological challenges, some of which have already been described. To

place a task of this nature on the web was relatively easy but the

quality and quantity of the responses was far less than had been

expected. This seems to reinforce recent newspaper reports that

computer ownership and Internet usage was far less than had been

expected (Crawford, 1997).

Schools in general have extensive computer facilities today and we had

not anticipated the difficulty of access and the relatively low skills

of the students. Responses to the survey question are shown in figure

5.

 

Figure 5: Computer use

 

 

The lack of use in mathematics is very clear and explain the lack of

familiarity with spreadsheets that was reported by some teachers. If

students are to progress from pressing buttons to using computers as a

tool for analysis and synthesis of information there are major

organisational issues for schools in Tasmania to resolve.

Students who responded via the Internet produced responses that

indicated that the interface itself did not stimulate students'

thinking. Responses via the text boxes were usually short and

simplistic, as shown in figure 6, and did not provide much evidence of

the development of general thinking skills or use of mathematical

thinking. The competencies strand performance of the same student whose

response is shown in figure 6 was depressed relative to the other

strands (figure 7), although still relatively high compared to the rest

of his school. The text interface did not seem to encourage exploration

of ideas. This may have been compounded by the need to work partly with

concrete materials, using the newspaper, and partly through the ikonic

medium of the computer. In contrast, students who emailed responses as

an attachment provided a much richer picture. They were using familiar

computer programs but were also generally better students. The thinking

processes of students using computers may need to be explored further

in order to provide an interface more conducive to developing complex

responses. As well the computer interface needs to provide

opportunities for less able students to achieve to their capability.

This is an area where test constructors may need to learn from computer

games designers.

The need to save work and return to it at another time is crucial and

this could have been overcome with more development time. The

compromise email link, however, was effective and this may provide some

direction for the future.

 

From: TOL guest account[SMTP:tolguest@shelob.ecc.tased.edu.au]

Sent: Monday, 28 July 1997 5:10PM

To: OER Numeracy

Subject: Y9NAMP -

 

------------------------------------------------------------

Student Name : XXX

ID Number : XXX

School : XXX High School, Hobart

 

The following questionnaire response has been received :

 

 

Question 1a : An ad is any device or public announcement, as a

printed notice in a newspaper, a commercial film on television, a neon

sign, etc. Designed to attract public attention, bring in custom, etc.

The article titles on page one are literally advertising the article,

so I would regard the titles to be ads.

Question 1b : A report of any recent event, situation, etc. Yes

i would regard the tables of share movements as news, becuase they

advise people of any recent changes in share values

Question 1c : A prominate or conspicous part or characteristic.

It depends if the article has been talked about before, I surpose that

it could be a bit of both.

Question 1d : Knowledge communicated or received concerning some

fact or circumstances; news. Public notices are most likly to be

information, but it could be thought to be and ad, as it is advising

people to an event.

Question 2a : Using number of columns of writing per page.

Question 2b : The pages towards the middle of the paper had less

writing and more advertisments, the pages towards the beginning had

less ads and more writing, and the pages towards the back of the paper

had medium amounts of writing, little ads, and alot of pictures.

Question 3a : There were less ads towards the front of the paper,

in the middle of the paper the were a medium amount of ads that were

aimed at an older audience, and towards the end of the paper, there was

alot of ads mainly aimed at a young adult age group ( 15 - 30 ).

Question 4a : I would place an ad for sporting goods in the

sports section of the paper, because people interested in sport would

read this section more closly than the rest of the paper. The ad would

cost a medium rate probably about $80 - $100.

I would place a ad aimed at year nine students in the sports

section as well, but in one of the back sports pages.

Because a younger person might read the paper back to front, ( because

the sports section is towards the back). It would cost about $80 -

$100.

------------------------------------------------------------

Request came from :

- Remote host: dslip3.its.utas.edu.au

- Remote IP address: 131.217.8.3

 

Figure 6: Response via Internet text box

 

 

Figure 7: Part of student XXX report

 

 

Use of performance tasks

The extended task attracted criticism on several grounds. Although an

investigative project was a required part of the year 9 mathematics

syllabus, teachers objected to the amount of time this task took. This

is a valid criticism while teachers see monitoring programs primarily

as "add ons" for accountability purposes rather than an integral part

of their classroom teaching and assessment plans. This has implications

for all testing programs.

Teachers raised issues of validity and reliability, particularly

relating to the authenticity of the response on two grounds: firstly,

how would we know that any response received via the web came from the

actual student that it claimed to be from; secondly, wouldn't students

who got help be advantaged?

The first concern is an issue for any monitoring program - we do not

know for sure that the student identified has actually answered the

questions, regardless of the medium used to collect these answers.

Instead we rely on the professionalism of teachers to ensure that

mistakes are not made. The second concern could well be answered in the

same way. The teaching process requires a teacher to interact with an

individual. The nature of that interaction depends on the individuals

involved but teachers would generally agree that they ask different

questions of students with different abilities. The professionalism of

 

 

teachers using a performance assessment task as part of their teaching

and learning program will ensure that classroom support is appropriate

to the student and aimed at eliciting the student's best possible

response. In this way we can begin to get an assessment of optimal

rather than functional performance. (Lamborn and Fischer 1988) The

issue of some students being advantaged by getting help from parents or

other people also did not show up in the analysis of results, as shown

by figure 3.

Both concerns can also be addressed in a program of this nature by a

consideration of the performances of students across all tasks. If

major discrepancies are detected then we could infer that some

performances were not authentic. In this program the fit of the

extended tasks onto the common scale of item difficulty through common

person linking indicated that this was not a problem. Students with

high ability, as measured in the other parts of the program, produced

high performance on the extended task.

Criticism of the task because the results were based on teacher

judgement also appear to be unfounded. The item analyses for the

extended task did not indicate any randomness in response as might have

been expected if the marking had been inconsistent. The use of literacy

profiles and the work by Hill, Rowe and others for almost a decade have

shown that judgement using a holistic frame of reference - such as that

provided by the scoring rubric for this task - can lead to high levels

of reliability (Rowe and Hill 1995). Maybe it is time that we started

to trust teachers as highly trained and competent professionals.

Summary

This task aimed to assess numeracy in a wider context than had

previously been attempted in a statewide monitoring program. From a

measurement perspective this was successful - the task provided a

reliable assessment of competence beyond basic mathematical skills. It

also demonstrated the technical feasibility of delivering a performance

task via the Internet.

There are, however, issues still to be resolved. The nature of numeracy

itself and its relationship to mathematics and other curriculum areas

is currently being widely debated. Our belief that this was a numeracy

task was not shared by all teachers.

The use of computers carries with it the need to consider school

organisation to improve access during mathematics and science lessons.

Only in this way will students improve their data handling skills and

begin to use the computer as an effective analysis tool, beyond just

word processing. Access to the technology will be an issue for all

systems as computer use and testing becomes more widespread.

There is also the need to consider the computer interface. Merely

reproducing a paper test on the screen does not appear to draw out

optimal responses. Further research into students' cognitive processes

when faced with a computer interactive task is needed.

Finally, there are political issues to consider in any test that

involves teachers. There is evidence that teachers are skilled at

making judgements about students' performances. It is necessary,

however, for society in general and, in some instances teachers

themselves, to accept this before teachers can hope to be actively

involved in assessment as part of a large scale monitoring program.

 

References

Adams, R. And Khoo, S.T. (1995) Quest: an interactive item analysis

program. Melbourne: Australian Council for Educational Research.

Australian Association of Mathematics Teachers (1997) Numeracy =

everyone's business. Report of the Numeracy Education Strategy

Development Conference, May 1997. Adelaide: Author.

Biggs, J.B. and Collis, K.F. (1982). Evaluating the quality of

learning: The SOLO Taxonomy. New York: Academic Press.

Chance and data home page:

http://www.ni.com.au/mercury/mathguys.mercury.htm

Crawford, W. (1997) Blurb artists entangled in net of hype. Mercury,

Hobart:22 November p. 40.

Crowther, G. (1959) Report to the Central Advisory Council for

 

 

Education. London: Her Majesty's Stationery Office.

Department of Education and the Arts (1994) Mathematics Guidelines K-8.

Chance and Data. Hobart: Curriculum Services Branch.

Department of Education and the Arts (1995) Numerate students -

Numerate adults. Hobart: Author.

Griffin, P. (1997) Developing assessment in schools and workplaces.

Inaugural Professorial Lecture, University of Melbourne, September 18.

Harmon, M., Smith, T.A., Martin, M.O., Kelly, D.L., Beaton, A.E.,

Mullis, I.V.S., Gonzalez, E.J. & Orpwood, G. (1997) Performance

Assessment in IEA's Third International Mathematics and Science Study.

Chestnut Hill, Ma: International Association for the Evaluation of

Educational Achievement (IEA).

Lamborn, S.D. and Fischer, K.W. (1988). Optimal and functional levels

in cognitive development: The individual's developmental range.

Newsletter of the International Society for the Study of Behavioural

Development, 1988 Number 2, Serial No.14 1-4.

Lewy, A. (1996) Postmodernism in the field of achievement testing.

Studies in Educational Evaluation, Vol. 22, No. 3. pp. 223-244.

News on disk: http://www.marketing.newsltd.com.au

Nuttall, D.L. (1992) The message from England. In K. Burke (Ed.)

Authentic Assessment. A Collection. Cheltenham, Vic.: Hawker Brownlow

Education.

Rowe, K. And Hill, P. (1995, November) Assessing, recording and

reporting students' educational progress: The case for subject

profiles. Paper presented at the Annual Conference of the Australian

Association for Research in Education, Newcastle, Australia.

Shepard, L.A. (1992) Why we need better assessments. In K. Burke (Ed.)

Authentic Assessment. A Collection. Cheltenham, Vic.: Hawker Brownlow

Education.

Zuzovsky, R. (1997) Assessing scientific and technological literacy

among sixth graders in Israel. Studies in Educational Evaluation, Vol.

23, No. 3. pp. 231-256.

 

Copies of the task and the scoring rubric can be accessed at:

http://www.tased.edu.au/tasonline/edreview