Assessing Student Performance via the Internet
Rosemary A Callingham Department of Education, Community and Cultural
Development
Patrick Griffin University of Melbourne
Introduction
The Year 9 Numeracy Assessment and Monitoring Program was undertaken by
all year 9 students in Tasmanian government schools in 1997 as part of
the ongoing state-wide assessment of student performance. The view of
numeracy that underpinned this program was that developed by the
Tasmanian Education Department (Department of Education and the Arts
1995). This has a strong cross-curriculum focus, in which learning
areas other than mathematics both contribute to and make demands on
students' numeracy development. The program had several parts,
including a test of mental computation, two multiple choice tests and a
constructed response test. Of interest here, however, is the extended
investigative task that was included in the bank of assessment tests.
Before considering the task itself and its delivery, it is worth
examining why a task of this nature was included in a large-scale
testing program.
Why include an investigative task?
Essentially there were three reasons why such a task was included in a
state-wide testing program - the nature of numeracy itself; the notion of assessing something other than mathematical skills alone; and the desire to involve teachers in assessment. Each of these will be considered separately.
Numeracy
The concept of numeracy is relatively recent, and has undergone some
shifts in meaning since the word itself was first coined in 1959
(Crowther 1959). There is a growing opinion today that numeracy
requires some mathematical action within a social context. "Numeracy is
more than being able to manipulate numbers" (Department of Education
and the Arts 1995, p6). Clearly it has elements of mathematical
knowledge, but it also requires application of this mathematical
knowledge and an understanding and appreciation of when mathematics is
being used or needs to be used. A recent Australian description
supporting this view emerged from the Numeracy Education Strategy
Development Conference:
To be numerate is to use mathematics effectively to meet the general
demands of life at home, in paid work, and for participation in
community and civic life.
In school education, numeracy is a fundamental component of learning,
performance, discourse and critique across all areas of the curriculum.
It involves the disposition to use, in context, a combination of:
underpinning mathematical concepts and skills from across the
discipline (numerical, spatial, graphical, statistical and algebraic);
mathematical thinking and strategies; general thinking skills; and
grounded appreciation of context.
Australian Association of Mathematics Teachers 1997 p. 15)
While not informed by this description at the time, the extended task
developed for this assessment program is very much in keeping with the
spirit of this definition.
Assessment of ?
There is a growing awareness among educators that we need to refine and
improve assessment techniques in order to measure a wider range of
important educational objectives (Shepard 1992). In particular
assessment should allow a student to make links with prior knowledge
and to demonstrate ability to transfer or generalise learning to
different contexts (Griffin 1997). Monitoring programs in general do
this only to a limited extent.
The importance of assessing more than basic skills has been
acknowledged. The Third International Mathematics and Science Study,
for example, included some performance tasks in which students
undertook practical activities (Harmon et al 1997). In Israel,
assessment of "scientific literacy" undertaken using a multiple choice
test and a more open-ended constructed response task indicated that
results on the open-ended tasks elicited information that was different
from that shown by the multiple choice formats (Zuzovsky 1997).
The inclusion of an investigative task was thus in keeping with a
growing trend in recent testing programs to include open-ended
questions, but also went further. Performance tasks of this nature are
being increasingly used in classroom assessment programs, and are well
accepted by teachers in this context (Nuttall 1992). They are currently
found in high stakes situations such as the Victorian Certificate of
Education and the UK GCSE. Inclusion of a complex task in which the
student has to formulate the problem and then devise methods of
solution is much rarer, however, in large scale testing programs run
for monitoring and accountability purposes.
Teacher Judgement
The task was conceived as a learning and teaching task that could be
integrated into classroom practice. The inclusion of a task of this
nature into the classroom context was based on more than just a belief
in the importance of context in numeracy, or the need to move beyond
measuring basic skills. It was also a recognition that teachers should
be involved in the testing process, especially if the results are to
have any relevance for practice. Increasing teachers' authority in the
area of evaluation of educational achievement, together with a
questioning of some psychometric practices, has been termed a
'post-modern' approach to assessment (Lewy 1996). Unlike some
practitioners who criticise psychometric approaches, however, we wanted
to develop a rigorous procedure for assessing student achievement
through a classroom based and contextualised task. This raised issues
of validity and reliability, and not a little controversy.
The Task
The task itself involved an analysis of a daily newspaper. It was
developed during two item writing sessions with practising Tasmanian
teachers late in 1996 and trialed in Victoria. A very similar activity
is included in the Tasmanian K-8 Mathematics Guidelines in the Chance
and Data strand (Department of Education and the Arts, 1993). Students
were asked to undertake a short project in which they considered the
arrangement of adverts and articles in any ten pages of a daily
newspaper. The project was divided into four sections: Definition,
Measurement, Classification and Application (Appendix 1).
In the first section, Definition, students defined the terms they would
use throughout the project. The four categories used were adverts,
news, features and information, although students could create their
own categories if they wished. Emphasis was laid on the fact that each
student's definition could be different.
The second section, Measurement, asked students to measure the amount
of space given to adverts and articles on the ten pages they had
chosen. Some suggested units of measurement were provided as a starting
point, including number of columns or word count, and students were
instructed to prepare graphs, charts or tables to present their
findings.
An element of choice was included in parts three, Classification, and
four, Application, in which students could choose whether to focus on
adverts or articles. In practice, nearly every student considered
adverts exclusively, probably reflecting year 9 students' interest in
daily newspapers. In Classification, students were instructed to
classify either adverts or articles and relate these classifications to
the kinds of people who might read them. Finally, students had to apply
this knowledge to decide where to place an advert in the newspaper and
estimate its cost. The alternative was to investigate the use of
statistical information in a newspaper, but no student who responded
chose this option.
Each section included basic instructions and some suggestions for
getting started, in much the same way that a classroom investigation
would be organised. The task was thus provided within a structured
framework, but allowed each student to respond in a unique way.
Each section was scored using the same scoring rubric. Scoring rubrics
were developed for two aspects. 'Process' referred to the manner in
which the task was attempted and included elements of choosing and
using appropriate skills and strategies, communication of ideas and
justification of conclusions. 'Content' was based on the range and
complexity of the numeracy ideas used by students in answering the
different parts of the task. The scoring rubrics were based on trial
students' responses, and the SOLO taxonomy (Biggs and Collis 1982).
Internet delivery
The task was presented to students in hard copy form, or could be
accessed via the Internet. The Internet pages were produced using
standard software. They were a replica of the printed form, with the
addition of hot links to two useful addresses that had supporting
material available - News Corporation's "News on disk" page and the
Hobart Mercury's "Chance and Data" home page. The scoring rubric was
also provided via a link so that students could see how they would be
marked.
Students entered their responses to each part of the task in
interactive text boxes. These were not limited in size, but were unable
to support graphics such as charts or diagrams. In order to accommodate
this limitation an email link was created that would allow students to
mail supporting material as an attachment. Before they could send any
response, students had to provide their name, school and project
identification number and were asked to print their response and give
this to their teacher. This safeguarded students against technological
failures. Responses were automatically sent via a "Send" button. A
specially set up email address collected the responses and sent an
acknowledgment to the sender.
Administration
For a variety of practical and political reasons the task became
optional and was eventually attempted by over 1200 students. While
originally intended to be teacher marked, the option was later given to
schools for central marking. In some schools teachers chose to mark
their own students' work, using the scoring rubric. Others returned the
responses for central marking. Approximately 790 were marked by an
experienced team of markers in Melbourne, using the scoring rubric
provided. The rest were marked in their schools by the teachers
concerned. The Internet responses were printed out and marked together
with the others sent in by schools.
Thus we had an interesting mix of teacher and centrally marked
responses, with some responses provided in hard copy form and some via
a specially set up email address. In addition classroom observation of
students undertaking the task in three schools, and discussions with
teachers involved provided some useful insights into the advantages and
difficulties of undertaking a task of this nature in a large scale
assessment project.
As part of the overall monitoring program, teachers and students were
asked to complete a questionnaire. For teachers this included a section
about their teaching and assessment methodology as well as their
opinion of each part of the program measured on a four point Likert
scale. Students were asked about their computer use and homework, as
well as their opinion of the program. Data from these sources also
provided useful information about teacher and student attitudes to the
extended task.
Findings.
Measurement aspects
The analysis of this task using the Rasch method was included as part
of the overall analysis of the total monitoring program using the
computer program QUEST (Adams and Khoo, 1995). Calibration of the items
was carried out using the compulsory parts of the monitoring program.
Using this set of items, the student abilities were calculated, and
these abilities were used as a person anchor to calibrate the data from
the extended task. In this way, data from the optional part of the
program were at no stage used in the estimation of student abilities.
The variable map for the extended task indicated that there was a good
match between the student cohort ability and the item difficulties
(Figure 1). The task allowed students at all levels to demonstrate
their ability. In fact the top ability levels displayed in this task
exceeded the ability levels of any other part of the program. The
extended task allowed the very top students to demonstrate their
ability.
The fit map for task 4 provided further information. It showed an
overfit to the model suggesting that a secondary variable had been
measured. The task thus provided additional information about students'
performances that could not be measured by the other parts of the
program.
Figure 1. Variable Map of Extended Task
Distracter analyses of every item in the test were produced, including
those of the extended task. One of these is shown in Figure 2.
Categories
0
1
2
3
4
5
Percent %
36.8
17.8
21.0
15.3
7.8
1.3
Mean Ability
0.02
0.05
0.28
0.68
1.07
1.50
Thresholds
-0.06
0.41
0.97
1.71
3.11
Error
0.13
0.12
0.14
0.18
0.30
Infit MNSQ = 1.16
Figure 2: Student performance on Extended Task Part 2 Measurement -
Process
This demonstrates increasing ability associated with increasing score,
as expected. This was shown in analyses of each part of this task and
suggests that the marking scheme had been consistently applied by all
raters, whether teachers in schools or part of the central marking
team.
The relationship between student ability and score assigned for each
component of the extended task is shown in figure 3.
Figure 3: Mean ability by score on extended task.
A strong relationship is shown - score is almost perfectly predicted by
student ability level. It should be remembered that the student ability
was calculated only on the other parts of the program. This makes the
strong fit (R2=0.98) even more interesting.
The results from this task were combined with a small number of other
items from other parts of the task and reported as a strand called
"Competencies". Given that the task appears to be measuring a different
variable students who completed this task were effectively provided
with additional, richer information.
Student and teacher attitudes
It was clear from a number of responses that a task of this nature was
unfamiliar to many teachers. Some criticised it on the grounds that
this was literacy or social science and not mathematics. Others
complained about the amount of time required. In some schools, however,
the task was approached positively as a learning experience for all.
One teacher, in a school in a low socio-economic area commented
afterwards that the task was "... horrendous. But it made me realise
that we weren't doing enough of this kind of task with the students."
Unfamiliarity with open-ended project tasks was borne out by the
teacher questionnaire results. Of 157 responses, only 5.1% used
investigations often, and 59.2% rarely or never used open-ended tasks
in their teaching. The dominant model of assessment was school-based
tests, with about half reporting that they sometimes used teacher
initiated projects. Student initiated projects, or negotiated studies,
were much rarer, with 63% of teachers never using this form of
assessment. In this climate it is not surprising that the task created
problems for some, and there are implications here both for classroom
practice and future testing programs.
Classroom context
The introduction of the task was observed in a class of middle level
students in one school. The teacher concerned had photocopied
dictionary definitions of the four categories of definitions and
students were asked to find examples of each in their newspaper.
Initially the students were uncomfortable with the idea that they could
use a definition or identify something in a different way from that of
other students. As they became more confident, however, they began to
defend their choices to teachers and other students. The ability in a
real life context to formulate defensible definitions as a basis for
measurement is an important one, which is often not addressed
explicitly in schools where much of what students work with is defined
for them.
The task created cognitive conflict for many students. While being done
during "maths" time in most cases this was not a usual maths task. Some
students treated it as an English assignment, and wrote pages of text
completely missing the numeracy aspects. In schools where students
appeared to have been given little guidance by teachers this was
particularly evident. A teacher reported that one student came to her
in tears. The student recognised that she had not completed the task in
the way that it required but she was completely unable to reorganise
her thinking to meet the demands. While not wanting to put students
under undue duress, it is necessary to remember that this sort of
conflict is at the heart of real learning. The teacher's responsibility
is to support and question to help the student resolve the conflict.
Computer use
Responses via the Internet were disappointingly low. The major issue
appeared to be access to computers during mathematics lessons, even
though schools had well equipped computer facilities. Computer use was
not common during mathematics time and one teacher commented that it
was "...quite frankly too much trouble to use the Internet."
Observations in one trial school were revealing. During the Monday
morning lesson students were free to go to the computer room to
continue work on the project. A group of about six students, all boys,
took advantage of this. The others preferred to work in traditional
ways and had produced some excellent work. When they were asked why
they hadn't chosen to use the computer facilities students were
non-committal. They were all familiar with computers - indeed some of
these students were participating in a special extension program
centred around technology use - but were not disposed to use computers
for a "maths project". In another school a class observed in a very
well equipped computer laboratory were producing some outstanding
mathematical models of the structure of a newspaper as shown in figure
4. The teacher's comment was "I could only arrange to be here because
you were coming." In this classroom, a number of students had not used
a spreadsheet and the basics of spreadsheet use had to be taught before
they could get into the task itself. This extended the time required
and further exacerbated the access problem.
Figure 4: Ribbon graph produced to show the analysis of a newspaper
The students who answered via the Internet text interface only in
general did not produce outstanding work. This seemed to be partly
related to the interface itself. This mirrored the hard copy but had
some drawbacks. Students could not work on the net page and save their
work ready for another occasion. Some students got around this by
sending the work in two separate lots, others worked in familiar
programs and then used the email link provided to send their completed
project to the centre.
Discussion
Technical issues
Delivering a task of this nature via the web provided some
technological challenges, some of which have already been described. To
place a task of this nature on the web was relatively easy but the
quality and quantity of the responses was far less than had been
expected. This seems to reinforce recent newspaper reports that
computer ownership and Internet usage was far less than had been
expected (Crawford, 1997).
Schools in general have extensive computer facilities today and we had
not anticipated the difficulty of access and the relatively low skills
of the students. Responses to the survey question are shown in figure
5.
Figure 5: Computer use
The lack of use in mathematics is very clear and explain the lack of
familiarity with spreadsheets that was reported by some teachers. If
students are to progress from pressing buttons to using computers as a
tool for analysis and synthesis of information there are major
organisational issues for schools in Tasmania to resolve.
Students who responded via the Internet produced responses that
indicated that the interface itself did not stimulate students'
thinking. Responses via the text boxes were usually short and
simplistic, as shown in figure 6, and did not provide much evidence of
the development of general thinking skills or use of mathematical
thinking. The competencies strand performance of the same student whose
response is shown in figure 6 was depressed relative to the other
strands (figure 7), although still relatively high compared to the rest
of his school. The text interface did not seem to encourage exploration
of ideas. This may have been compounded by the need to work partly with
concrete materials, using the newspaper, and partly through the ikonic
medium of the computer. In contrast, students who emailed responses as
an attachment provided a much richer picture. They were using familiar
computer programs but were also generally better students. The thinking
processes of students using computers may need to be explored further
in order to provide an interface more conducive to developing complex
responses. As well the computer interface needs to provide
opportunities for less able students to achieve to their capability.
This is an area where test constructors may need to learn from computer
games designers.
The need to save work and return to it at another time is crucial and
this could have been overcome with more development time. The
compromise email link, however, was effective and this may provide some
direction for the future.
From: TOL guest account[SMTP:tolguest@shelob.ecc.tased.edu.au]
Sent: Monday, 28 July 1997 5:10PM
To: OER Numeracy
Subject: Y9NAMP -
------------------------------------------------------------
Student Name : XXX
ID Number : XXX
School : XXX High School, Hobart
The following questionnaire response has been received :
Question 1a : An ad is any device or public announcement, as a
printed notice in a newspaper, a commercial film on television, a neon
sign, etc. Designed to attract public attention, bring in custom, etc.
The article titles on page one are literally advertising the article,
so I would regard the titles to be ads.
Question 1b : A report of any recent event, situation, etc. Yes
i would regard the tables of share movements as news, becuase they
advise people of any recent changes in share values
Question 1c : A prominate or conspicous part or characteristic.
It depends if the article has been talked about before, I surpose that
it could be a bit of both.
Question 1d : Knowledge communicated or received concerning some
fact or circumstances; news. Public notices are most likly to be
information, but it could be thought to be and ad, as it is advising
people to an event.
Question 2a : Using number of columns of writing per page.
Question 2b : The pages towards the middle of the paper had less
writing and more advertisments, the pages towards the beginning had
less ads and more writing, and the pages towards the back of the paper
had medium amounts of writing, little ads, and alot of pictures.
Question 3a : There were less ads towards the front of the paper,
in the middle of the paper the were a medium amount of ads that were
aimed at an older audience, and towards the end of the paper, there was
alot of ads mainly aimed at a young adult age group ( 15 - 30 ).
Question 4a : I would place an ad for sporting goods in the
sports section of the paper, because people interested in sport would
read this section more closly than the rest of the paper. The ad would
cost a medium rate probably about $80 - $100.
I would place a ad aimed at year nine students in the sports
section as well, but in one of the back sports pages.
Because a younger person might read the paper back to front, ( because
the sports section is towards the back). It would cost about $80 -
$100.
------------------------------------------------------------
Request came from :
- Remote host: dslip3.its.utas.edu.au
- Remote IP address: 131.217.8.3
Figure 6: Response via Internet text box
Figure 7: Part of student XXX report
Use of performance tasks
The extended task attracted criticism on several grounds. Although an
investigative project was a required part of the year 9 mathematics
syllabus, teachers objected to the amount of time this task took. This
is a valid criticism while teachers see monitoring programs primarily
as "add ons" for accountability purposes rather than an integral part
of their classroom teaching and assessment plans. This has implications
for all testing programs.
Teachers raised issues of validity and reliability, particularly
relating to the authenticity of the response on two grounds: firstly,
how would we know that any response received via the web came from the
actual student that it claimed to be from; secondly, wouldn't students
who got help be advantaged?
The first concern is an issue for any monitoring program - we do not
know for sure that the student identified has actually answered the
questions, regardless of the medium used to collect these answers.
Instead we rely on the professionalism of teachers to ensure that
mistakes are not made. The second concern could well be answered in the
same way. The teaching process requires a teacher to interact with an
individual. The nature of that interaction depends on the individuals
involved but teachers would generally agree that they ask different
questions of students with different abilities. The professionalism of
teachers using a performance assessment task as part of their teaching
and learning program will ensure that classroom support is appropriate
to the student and aimed at eliciting the student's best possible
response. In this way we can begin to get an assessment of optimal
rather than functional performance. (Lamborn and Fischer 1988) The
issue of some students being advantaged by getting help from parents or
other people also did not show up in the analysis of results, as shown
by figure 3.
Both concerns can also be addressed in a program of this nature by a
consideration of the performances of students across all tasks. If
major discrepancies are detected then we could infer that some
performances were not authentic. In this program the fit of the
extended tasks onto the common scale of item difficulty through common
person linking indicated that this was not a problem. Students with
high ability, as measured in the other parts of the program, produced
high performance on the extended task.
Criticism of the task because the results were based on teacher
judgement also appear to be unfounded. The item analyses for the
extended task did not indicate any randomness in response as might have
been expected if the marking had been inconsistent. The use of literacy
profiles and the work by Hill, Rowe and others for almost a decade have
shown that judgement using a holistic frame of reference - such as that
provided by the scoring rubric for this task - can lead to high levels
of reliability (Rowe and Hill 1995). Maybe it is time that we started
to trust teachers as highly trained and competent professionals.
Summary
This task aimed to assess numeracy in a wider context than had
previously been attempted in a statewide monitoring program. From a
measurement perspective this was successful - the task provided a
reliable assessment of competence beyond basic mathematical skills. It
also demonstrated the technical feasibility of delivering a performance
task via the Internet.
There are, however, issues still to be resolved. The nature of numeracy
itself and its relationship to mathematics and other curriculum areas
is currently being widely debated. Our belief that this was a numeracy
task was not shared by all teachers.
The use of computers carries with it the need to consider school
organisation to improve access during mathematics and science lessons.
Only in this way will students improve their data handling skills and
begin to use the computer as an effective analysis tool, beyond just
word processing. Access to the technology will be an issue for all
systems as computer use and testing becomes more widespread.
There is also the need to consider the computer interface. Merely
reproducing a paper test on the screen does not appear to draw out
optimal responses. Further research into students' cognitive processes
when faced with a computer interactive task is needed.
Finally, there are political issues to consider in any test that
involves teachers. There is evidence that teachers are skilled at
making judgements about students' performances. It is necessary,
however, for society in general and, in some instances teachers
themselves, to accept this before teachers can hope to be actively
involved in assessment as part of a large scale monitoring program.
References
Adams, R. And Khoo, S.T. (1995) Quest: an interactive item analysis
program. Melbourne: Australian Council for Educational Research.
Australian Association of Mathematics Teachers (1997) Numeracy =
everyone's business. Report of the Numeracy Education Strategy
Development Conference, May 1997. Adelaide: Author.
Biggs, J.B. and Collis, K.F. (1982). Evaluating the quality of
learning: The SOLO Taxonomy. New York: Academic Press.
Chance and data home page:
http://www.ni.com.au/mercury/mathguys.mercury.htm
Crawford, W. (1997) Blurb artists entangled in net of hype. Mercury,
Hobart:22 November p. 40.
Crowther, G. (1959) Report to the Central Advisory Council for
Education. London: Her Majesty's Stationery Office.
Department of Education and the Arts (1994) Mathematics Guidelines K-8.
Chance and Data. Hobart: Curriculum Services Branch.
Department of Education and the Arts (1995) Numerate students -
Numerate adults. Hobart: Author.
Griffin, P. (1997) Developing assessment in schools and workplaces.
Inaugural Professorial Lecture, University of Melbourne, September 18.
Harmon, M., Smith, T.A., Martin, M.O., Kelly, D.L., Beaton, A.E.,
Mullis, I.V.S., Gonzalez, E.J. & Orpwood, G. (1997) Performance
Assessment in IEA's Third International Mathematics and Science Study.
Chestnut Hill, Ma: International Association for the Evaluation of
Educational Achievement (IEA).
Lamborn, S.D. and Fischer, K.W. (1988). Optimal and functional levels
in cognitive development: The individual's developmental range.
Newsletter of the International Society for the Study of Behavioural
Development, 1988 Number 2, Serial No.14 1-4.
Lewy, A. (1996) Postmodernism in the field of achievement testing.
Studies in Educational Evaluation, Vol. 22, No. 3. pp. 223-244.
News on disk: http://www.marketing.newsltd.com.au
Nuttall, D.L. (1992) The message from England. In K. Burke (Ed.)
Authentic Assessment. A Collection. Cheltenham, Vic.: Hawker Brownlow
Education.
Rowe, K. And Hill, P. (1995, November) Assessing, recording and
reporting students' educational progress: The case for subject
profiles. Paper presented at the Annual Conference of the Australian
Association for Research in Education, Newcastle, Australia.
Shepard, L.A. (1992) Why we need better assessments. In K. Burke (Ed.)
Authentic Assessment. A Collection. Cheltenham, Vic.: Hawker Brownlow
Education.
Zuzovsky, R. (1997) Assessing scientific and technological literacy
among sixth graders in Israel. Studies in Educational Evaluation, Vol.
23, No. 3. pp. 231-256.
Copies of the task and the scoring rubric can be accessed at:
http://www.tased.edu.au/tasonline/edreview