Let’s save the Trees – Online Testing to the Rescue
Sivakumar Alagumalai
Jonathan Anderson
The Flinders University of South Australia
Traditionally, the process of much research commences with a survey /
test done on paper and culminates into transforming the data from hard
copies to a digital form on the computer before any statistical
analysis can be attempted. This can be a huge undertaking, especially
if numerous test instruments are used so as to understand or answer a
comprehensive research question. At the end of the study, the ‘used’
hard copies are destroyed or recycled. This can be both an expensive
and time-consuming activity. This paper examines an approach and trial
on internet-based testing (net-testing), and also highlights advantages
online testing has over traditional methods of data collection. An IT
course currently being taught at the School of Education is used to
highlight net-testing. Psychometric differences between traditional and
net-testing methods are discussed. Findings on student’s views about
net-testing are also reported. Technical details of setting-up
net-tests and procedures for manipulation of data and statistical
analyses are discussed.
Please send suggestions and queries to:
sivakumar.alagumalai@flinders.edu.au
__Introduction
Computer administered and managed tests (CAMT) are here.
Computerised-Linear Tests (CLT) and Computer-Adaptive Tests (CAT),
which are forms of CAMT, have been shown "to reduce testing time, to
obtain more information about the test takers, to increase test
security, to provide instant scoring and to be scheduled more easily
than paper-and-pencil-administered tests (PAPAT) (Bugbee, 1996, p.282).
Most, if not all, CAMT have been conducted on standalone workstations
and on Local-Area-Network (LAN) systems. This paper seeks to provide
the leap from testing in LAN environment to the Wide-Area-Network (WAN)
systems, with special reference to the internet. The paper also
attempts to provide an extension of CAMT to survey scales and
opinionnaires. CAMT raises problems related to the psychometric
properties of the scales used and is examined. It also discusses
popular views of students using CAMT. Further to these, the paper also
seeks to address the issues related to technical-setup and problems of
internet-based CAMT (net-testing). The paper concludes with
illustrations of data management and analyses.
Advantages of net-testing
Rapidly emerging technologies are now making possible for all schools
and institutions to have the infrastructure for CAMT (Sandals, 1992).
He indicates that the technological enhancements and their related
psychometric capabilities have brought psychological and testing in
related fields into the teaching and learning arena.
There are advantages, disadvantages and differences to be found in
net-tests (and CAMT) as compared with traditional PAPAT. Numerous
researchers (Sandals, 1992; Glowacki, et al., 1995; Kumar, & Helgeson,
1995; Bugbee, 1996; Lloyd et al., 1996) indicate the advantages of
net-tests. Tables 1 and 2 (adopted from Sandals, 1992) summarise the
major advantages.
Tables 1 and 2. Major advantages of net-tests (Sandals, 1992)
Figure 1 is an example of a survey form used to obtain
student’s feedback about teaching.
Figure 1. Sample student feedback form
The advantage of putting such survey forms and any CAMT on the internet
is that is allows for student’s access at any time. However, in the
case of CAMT, safeguards against student collusion must be in place and
needs to be looked into. One possible option for this is to have a
large database of items. The large pool of items enables setting of
parallel tests with items having comparable difficulty level. Thus no
two student will be sitting for the same test, i.e. with identical
items. Optimally, it is recommended that net-tests be administered at
an assigned period to reduce variations in test conditions.
Although there are technical problems associated with net-tests and
net-surveys, there exist great potential for large participation rates.
In line with Anderson and Alagumalai (1996) and Alagumalai, Anderson
and Mala (1997) arguments of accessibility and availability of
cyber-based HyperText Markup Language (HTML) forms across platforms,
research and testing programs can be effectively managed globally.
Furthermore, net-tests allow for a wider modality of testing, which is
impossible through PAPAT. Use of sound, animations and video-clips can
be easily incorporated into net-tests and allow for authentic testing.
The use of these modalities in real-time in net-tests answers the
queries raised by Keeves and Alagumalai (in print), in providing a
solution to improving measurement in science and mathematics
eduucation. Thus, the implications for net-tests and net-surveys are
much wider and far reaching.
Psychometric properties of net-tests and net-surveys
Associated to net-tests and net-surveys is the mode-effect of the
examinee (Parshall & Kromrey, 1993). This raises further questions of
the psychometric equivalence of both CAMT and PAPAT. Validity,
reliability, parallax issues and student’s preference are inevitable
problems that need to be addressed.
Olson (1986) indicated that there were no significant differences in
measurement precision between PAPAT and CAMT. Furthermore, CAT was the
most precise in estimating ability and required 75% less time to
ascertain this. In line with this, Perkins (1993) purports that there
were no significant difference in scores between CAMT and PAPAT. This
has important implications for the ‘transferability’ or parallax
between tests. Mazzeo et al. (1991) argues that in CLT, the constructs
being measured were not affected by ‘mode-of-administration’ effect.
However, the mode-effect needs to be further explored with CAT.
Associated with the mode-effect are the reliabilities and validities of
net-tests. Watson et al., (1990) and Watson (1992) indicate the
reliabilities of CAMT and PAPAT are about equal. However, Sukigara
(1996) argues that test-retest reliabilities of CAMT are slightly
higher than PAPAT. Although Russell et al., (1986) purport that CAMT
yields relatively more reliable scores, it may be necessary to re-look
into reliabilities based on Classical test Theories. Reliability checks
through Item Response Theory (IRT), especially with Person-Separation
Index and Item-Separation Index may shed new light on this issue of
reliability of net-tests as compared to PAPAT. Furthermore, it may be
necessary to look into time and rate of response of students, which can
be easily captured through net-tests, and then compared to that of
PAPAT.
In discussing validities of CAMT, Bugbee (1996, p.286) stresses that,
"validity of computer-based versions of a test must be proved by the
developers." Thus, it is the onus of the test developer to ensure
parallel internal- and external validities of test forms, especially
when random item selections are being done for creating parallel test
forms. If these parallel tests fulfil the equality of ranks but not the
score distribution equivalence, than the scores need to be re-scaled.
Currently available software allow for automatic rescaling (Gibbs &
Lario, 1995).
Hence to reduce parallax errors between PAPAT and CAMT and to make
meaningful comparison between them, all test form versions need to have
approximately equal validities, reliabilities and correlation with
criterion variables and equal correlation among multiple versions. The
current state of software control through HTML forms,
Common-Gateway-Interfaces (CGIs), Integrated-Developmental-Environment
(IDE) and Java-applets removes the complexities of adjusting the
psychometric properties and indices of net-tests and net-surveys.
Computer familiarity and scores on CAMT still require further research,
as current findings are inconclusive (Mazzeo. 1991; Perkins, 1993). Lee
(1986) showed that past computer experience significantly affected
performance on CAMT. As CAMT may discriminate against those who have
not had experience in CAMT, administrators need to take into cognisance
student’s previous experience (Heywood, 1989).
Perkins (1993) found that computer ownership lowered anxiety of
computer use. Further to this, he also found that computer use and
exposure to computers during Computer-based Instruction were not good
enough to lower anxiety. His latter findings contradict to what was
recorded by our students.
In the Multimedia Literacy course taught in the School of Education at
Flinders University, students indicated lowered anxiety levels during
CAMT and they attributed it to the frequent exposure to computer use.
All lecture materials and slides for the Multimedia Literacy course
were made available to the students through the department’s intranet
facility. Furthermore, all assignments and tutorials when done through
the computers.
Students were given parallel versions of a PAPAT and CAMT and scores
for both these tests were identical. Most students indicated
‘acclimatisation’ to computers and had spent on the average, four to
nine hours a week with their assignments and projects. Practice on
model net-tests prior to actual net-test appeared to have lowered
student’s anxiety. It can be argued that constant exposure to computers
and teaching and learning through computers, especially through
flexible delivery, enhanced their performance at net-test. The whole
structure and approach to the course, which is different from the
traditional definition of Computer-based Instruction, has facilitated
in lowering high anxiety levels and thus more positive attitude towards
net-tests and better performance in it.
Open-book, where students are allowed to browse the internet for
information and notes and also using textbooks and personal notes may
further help remove net-test anxiety. With emphasis on the processes
and skills of information use, it may be necessary to move in the
direction of open-book net-tests and needs further exploration.
Setting up the Net-test
CGIs and IDE)-links are the basis for all forms of net-tests and
net-surveys. The HTTP protocol used by the internet is generally a
one-way street, going from servers to clients. However, browsers can
ask the server to display specific requests. Thus there is also the
return requester path. CGIs function on this path (McComb, 1996). CGIs
pass data in two ways: one is URL-based and can be displayed readily
while the other is hidden. Commonly used CGI functions are GET and
POST.
McComb (1996, p.550) indicates that "CGI programs that use the GET
method are generally easier to write, but the URL is limited to 256
characters. The POST method is ideal when lots of data has to be
provided by the client, and there are no restriction to the number of
character used."
Tied to these CGI functions are the necessary IDE-links that passes
commands from the CGIs to the databases that stores the information.
The Microsoft’s Access *.mdb database format was used as it provided
several advantages over other database engines (Garcia, 1997).
The CLT version of net-tests and net-surveys have similar form
structure. As illustrated in Figure 1, the form has a title, an
introduction, and followed by the test-questions or survey items. At
the end of these items are the "SEND FORM / SUBMIT" and "RESET ENTRIES"
button. Behind these buttons are the associated CGI functions. On
completing the net-survey or net-test, the user clicks on the first
button, which then initiates a sequence of events and is summarised in
Figure 2.
C
L
I
E
N
T
S
E
R
V
E
R
C
L
I
E
N
T
Figure 2. Technical structure for net-tests and net-surveys
The front-end panel was a form designed using JavaScript and HyperText
Markup Language (HTML). This HTML form provide the user or client with
the necessary entry fields and instructions. On ‘submitting’ their
response, the related CGI routines through the software server responds
by producing another form, this time acknowledging receipt of users
information.
All the client’s information is captured on an Access database that has
fields corresponding to the headings of the test database and sit in a
directory where the CGI functions were located. Website Professional
Version (Beta) II server software was used to pass data between
database engine and the compiled CGI functions, and the form on the
internet. The IDE provided the transfer of all data received on the
database, which can be translated to a spreadsheet for data management
and analysis.
Net-tests could also have any level of password protection. In the
Multimedia Literacy course, a two-level password protection was used.
Students had to first enter their name and student identification
numbers, which was then parsed to the server. The compiled routines
checked the received information against a database of student’s
personal particulars. If and only if the information received and held
were identical, were the student allowed to enter their respective
tutorial/test group, which was only given when the whole class was
present in the computer laboratory. This was to prevent ‘outsiders’
hacking into the test system and to prevent cheating. Only when the
second level password tallies with that of the examiner’s, then student
get to sit for their net-test. Figure 3 is an example of a one-level
password protection system, where all information are parsed and
checked by the server in one sweep.
Figure 3. One-level password protection
In the CLT form of the net-test, a pre-determined number of items are
presented. The use of buttons and check-boxes for alternatives in a
multiple choice tests allows for easy checking and correction. Free
response answers can also be collected through text-fields (Figure 4).
Unlike numeric responses which can be easily analysed on a spreadsheet,
free-response items need to be looked at individually by the examiner.
Figure 4. CLT-version of net-test (HTML Form)
Unlike the CLT-version of net-test presented illustrated by Figure 4,
greater technical challenges are provided by the CAT-version. They are
both memory / resource intensive. Administering a CAT-version of
net-test is parallel to going through a series of password levels.
Information is send to and fro the client to the server. Each response
of the client is checked against the compiled functions for the
selection of the next item to suit the user’s ability level. Trials
have been done with a maximum of three students sitting for a
CAT-version of net-test, the webserver running off a Pentium 100 MHz
machine with 16M of RAM. It was very slow. With current developments
for faster machines and larger memory, it is hoped that CAT-version of
net-test would be commercially viable.
Data manipulation, analysis and Conclusion
Traditionally, all test responses have to be entered through the
keyboard. The may be a challenging task, especially if it involves
large number of test-items and respondents. Further to the huge job of
entering the data, one has to also understand the software in which the
data is being entered. Data manipulation and analyses bring a further
challenge to the examiner or researcher. The process of data entry,
manipulation and analysis is simplified by net-tests.
As the response of the examinee / client goes directly into a
spreadsheet, many man-hours can be saved. The IDE-concept together with
the availability of developers version of spreadsheets (Microsoft’s
EXECL), enables the programmer to write routines for scoring and also
incorporate interactive functions for the examiner to print information
collected about the examinee. Similarly, immediate feedback can be
given through HTML forms to the client. The possibilities are there to
be explored.
Student assessment is an area of work which can be expensive,
especially in terms of examiner’s time and the amount of resources
used. Net-tests and net-surveys offer great potential saving in both
management time and also the huge loads of paper used. Net-tests and
net-surveys are the way to go if we want to save all the trees and
through it our own existence on Earth. As educators and researchers
shift gear into the era of net-tests and net-surveys, we can all but
pray for technology to bring us through.
THE NET PRAYER
In you we trust, not only for information retrieval,
but also for assessing information digestion and assimilation.
Please do not fail us, for you are governed by Murphy’s Laws;
… your hiccups are our "DNS Failure",
… your sighs are our "URL not Found",
… your frustrations are our "Server not Found", and
… your indifferences are our "Still Connecting".
For if do give-up on us,
you may create regurgitation and future inhibitions,
both for you and your future descendants.
References
Alagumalai, S., Anderson, J., & Mala, V. (1997). Software evaluation –
A Pedagogic solution. Paper presented at the 11th Conference of the
Australian Association for Research in Education Brisbane, Australia
(1-5 December, 1997).
Anderson, J., & Alagumalai, S. (1996). From text-based pedagogy to
cyber-based methodology. Paper presented at the 10th Joint Conference
of the Australian Association for Research in Education and the
Singapore Educational Research Association, Singapore (25-29 November,
1996).
Bugbee, A. C. (1996). The Equivalence of paper-and-pencil and
computer-based testing. Journal of Research on Computing in Education,
28(3), pp. 282-299.
Bunderson, C., Inouye, D., & Olsen, J. (1989). The four generations of
computerised education measurement. In Robert Linn (Ed.), Educational
Measurement, (3rd Ed.), pp. 367-407. NY: American Council of Education
/ Macmillan.
Eaves, C.R. (1984-1985). Educational Assessment in the United States.
Diagnostique, 10, pp. 5-39
Garcia, J. (1997). Make your database sing. VisualBasic Programmer’s
Journal, 7(9), p.95-97.
Gibbs, W.J., & Lario, G.A. (1995). TestMaker: A computer-based test
development tool. In Proceedings of Association of Small Computer Users
in Education (ASCUE) Summer Conference. North Myrtle Beach, South
Carolina. (18-22 June, 1995)
Glowacki, M.L., et al., (1995). Developing computerised tests for
classroom teachers: A pilot study. Paper presented at the Annual
Meeting of the Mid-South Educational Research Association. Biloxi, MS.
(8-10 November, 1995).
Heywood, J. (1989). Assessment in Higher Education. Chichester: John
Wiley & Sons.
Hicken, S. (1993). Administering comprehensive examinations using
computers. Collegiate Microcomputer. 11(3), pp.194-198.
Keeves, J.P. & Alagumalai, S. (in print). Advances in Measurement in
Science Education. In Fraser, B.J., & Tobin, K.G. (Eds.) International
Handbook of Science Education. Dordrecht, The Netherland: Kluwer
Academic Publishers. pp. 1229-1244.
Kumar, D.D., & Helgeson, S.L.(1995). Trends in computer applications
in science assessment. Journal of Science Education and Technology.
4(1), pp. 29-36.
Lee, J.A. (1986). The effects of past computer experience on
computerised aptitude test performance. Educational and Psychological
Measurements, 46(3), pp. 723-733.
Lloyd, D., Martin, J.G., & McCaffery, K. (1996). The introduction of
computer-based testing on an engineering technology course. Assessment
and Evaluation in Higher Education, 21(1), pp. 83-90.
Mazzeo, J. E. et al., (1991). Comparability of Computer and
paper-and-pencil scores for two CLEP General Examinations. College
Entrance Examination Board Report. NY: Educational Testing Service.
McComb, G. (1996). JavaScript Sourcebook. New York: John Wiley & Sons,
Inc
Olson, J.B. (1986). Comparison and equating of paper-administered,
computer-administered and computerised-adaptive tests of achievement.
Paper presented at the Annual Meeting of the American Educational
Research Association, San Francisco, CA. April 1986
Parshall, C. & Kromery, J.D. (1993). Computer testing versus
paper-and-pencil testing: An analysis of examinee characteristics
associated with mode effect. Paper presented at the Annual Meeting of
the American Educational Research Association. Atlanta, GA. (12-16
April, 1993).
Perkins, B. (1993). Differences between computer-administered and
paper-administered computer anxiety and performance measure.
Russell, G.K., Peace, K.A., & Mellsop, G.W. (1986). The reliability of
a micro-computer administration of the MMPI. Journal of Clinical
Psychology, 42, pp. 120-122.
Sandals, L.H. (1992). An overview of the uses of computer-based
assessment and diagnosis. Canadian Journal of Educational
Communication, 21(1), pp. 67-78.
Sukigara, M., (1996). Equivalence between computer and booklet
administrations of the new Japanese version of the MMPI. Educational
and Psychological Measurement, 56(4), pp. 570-584.
Watson, C.G., Manifold, V., Klett, W.G., Brown, J., & Thomas, D.
(1990). Comparability of computer- and booklet-administered Minnesota
Multiphasic Personality Inventories among primarily chemical dependent
patients. Psychological Assessment: Journal of Consulting and Clinical
Psychology, 2, pp. 276-280.
Watson, C.G., Thomas, D., & Anderson, P.E. (1992). Do
computer-administered Minnesota Multiphasic Personality Inventories
underestimate booklet-based scores? Journal of Clinical Psychology, 48,
pp. 744-748.
_PAGE _