The introduction into schools of the assessment procedures, devised by SEAC, for recording the performance of primary pupils on the various attainment targets in key stages one (7+) and two (11+) of the National Curriculum has been the subject of widespread criticism by teachers (Muschamp et al 1992). As a result of these complaints SEAC (Schools Examination & Assessment Council) is now proposing simpler versions with an emphasis on pencil and paper group tests wherever possible. So far, however, little empirical evidence has been forthcoming which would help test constructors decide which domains can more readily be assessed through the written mode and which cannot. Teachers find themselves in a "catch 22" situation, since having complained about the length of time it took to carry out the testing programme at key stage one they now find it difficult to oppose the governments intention to give them some relief, even if the solution offered looks to be similar to that used two decades ago for selection at 11. The original test specifications, drawn up by SEAC when commissioning the pilot projects for key stage one were based upon the recommendations of the TEGAT report, (DES 1988). The report argued that the three main modes of assessment, currently in use in the primary school, all had certain defects which gave rise to different forms of unreliability. Written tests, by their nature, were constructed in such a way that they could not replicate the conditions in which the curriculum knowledge and curriculum processes had been acquired by the pupils. Practical tasks, while overcoming this problem of curriculum invalidity, were the subject of observer error, while the third mode suggested by TEGAT, teacher assessment required summative judgements to be recorded and these were subject to teacher bias and expectancy effects. By combining a pupil's scores on all three modes TEGAT argued that these non systematic errors would tend to cancel out. Behind these arguments is an assumption that the three different modes are all measuring very similar traits since if this were not so there would be no justification for combining the three scores. However differences in a pupils' performance, across the three modes, could vary according to the cognitive demands generated by the task either because the skill assessed was context dependent or because there existed an interaction between the mode and the pupil's ability. The present paper is designed to explore these possibilities.