Graph Theory And The Analysis of Ordered Tree Responses Paper presented by Michael Bailey, University of Sydney, and Jude Butcher, Australian Catholic University - Sydney, at the joint annual conference of the Educational Research Association Singapore and Australian Association for Research in Education Singapore, November 1996 Graph Theory and The Analysis of Ordered Tree Responses Michael Bailey, University of Sydney, and Jude Butcher, Australian Catholic University - Sydney Abstract The analysis of changes in people's cognitions has been found to be an important means of studying the influence of education programs. Ordered trees are a valid tool for identifying differences and changes in participants' holistic forms and schema structures. In an earlier paper, the authors applied graph-theoretic concepts to the analysis of ordered tree task responses and suggested some quantitative and qualitative measures applicable across hierarchical and associative types of holistic forms. The present paper develops further the possible application of and discusses issues involved in the use of some graph-theoretic concepts and procedures to the analysis of concept maps. One of the chief difficulties in applying quantitative methods to the analysis of concepts is the development of meaningful measures which can be applied to both associative and hierarchical holistic forms of organisations. The application of the suggested measures is illustrated with data from some examples. Introduction and statement of problem Researchers have analysed the structure of people's thinking as means of studying their development in knowledge (Biggs & Collis, ) or pedagogical domains (Morine-Dershimer, XX). It has also been found that restructuring of knowledge bases is an integral part of teachers' professional development in the management domain (Butcher, 1995). Researchers using ordered tree as a research tool have found it to be a tool capable of identifying differences between groups of novices and experts (Roehler, Herrmann, & Reinken, 1989) and between participants across courses (Beyerbach, 1988; Morine-Derhsimer, 1989, 1991). Differences been teachers have been found in the structure of their responses (Elbaz, Hoz, Tomer, Chayot, Mahler, & Yeheskel, 1986; Naveh-Benjamin, McKeachie, Lin, & Tucker, 1986) with experienced teachers having been found to present ordered tree responses which were more complex and integrated than those of novice teachers (Roehler et al., 1989; Strahan, 1989). A set of quantitative and qualitative measures were developed as a means of analysing ordered tree task responses (Butcher, 1991). These measures were used across hierarchical tree and associative network responses. Graph-theoretic concepts were adopted as the basis for the construction of the set of quantitative measures for analysing the organisational characteristics of the responses (Bailey & Butcher, 1994) . The present paper develops further the application of graph-theoretic concepts and procedures to the analysis of ordered tree responses and discusses some of the issues involved. The emphasis is on applying quantitative methods to the analysis of both associative and hierarchical type tree response forms. The application of these measures is illustrated in this paper with six prototype holistic form responses and data from a study of teacher development . The paper discusses some of the structural measures which have been used in analysing ordered tree responses. Principles for constructing quantitative measures of organisational dimensions of responses based on graph theory and information system approaches are presented. A set of measures for analysing data from hierarchical or network ordered tree responses is then proposed with data from some examples. Issues in ordered tree organisational measures Much research attention has been given to measures of declarative knowledge structure. Cognitive interconnecting of elements in a domain can be discussed in terms of its holistic form and/or in terms of specific characteristics such as integration, differentiation, complexity or interrelatedness. The use of the ordered tree has provided opportunities for quantitative measures of these components of cognitive organisation (Roehler, et al., 1989; Strahan, 1989). The analysis of ordered tree interview data (Strahan, 1989) has used qualitative measures. Both quantitative and qualitative measures of cognitive organisation have been used to infer qualitative differences in participants' responses. Studies using quantitative indices have at times combined several measures of the extent of schema organisation to form a total complexity score. The reliability of the quantitative measures has often not been comprehensively documented. While this is understandable for the more objective features of the measurement, for example number of links and nodes, the reliability of coding chunk measures which involved decisions of superordinate and subordinate relationships among elements warranted more documentation. Several issues have been identified with the use of the ordered tree technique. Differences in the nature of the ordered tree task instructions have raised issues about the comparability of measures across tasks and their suitability for grounded theory studies of teacher thinking (Bailey & Butcher, 1994; Winitzky, Kauchak, & Kelly, 1994). These differences have included listing or not listing the concepts participants are to incorporate in their responses and/or constraining or not constraining the graphical form of the response. Such differences in the nature of the tasks are related to the degree of constraint (low or high) which the researchers' tasks placed upon respondents. Moreover, differences in data analysis procedures can impose further constraints on the interpretation of the phenomenon been studied. For ease of analysis different forms of response are often converted to a hierarchical form (Roehler et al., 1989; Winitzky, 1992) . Without this distortion analysis can rightly be seen as a time consuming technique (Kagan, 1990). Winitzky, Kauchak and Kelly, comparing the structured and unstructured approach to concept mapping exercises, commented that: There are advantages and disadvantages to each approach. The unstructured approach provides a picture of subjects' individual and idiosyncratic views of a content domain, but lost is the capacity to focus specifically on course or target concepts of particular interest to the researcher. In addition, because each subject's map is unique, cross-subject comparisons are rendered more difficult and cumbersome. By contrast, the structured approach allows focused analysis and cross-subject comparisons, but is less useful for investigating individuals' developing conceptualizations of a domain. (1994) The measures presented in the following section are designed to address the tension between a less constrained task for exploring the nature and development of people's thinking and the need for measures which facilitate cross-sectional and longitudinal comparisons of differences in participants' responses. This tension is addressed through measures which are respectful of the complexity and idiosyncratic nature of people's thinking and cognitive development. Principles and framework for constructing organisational measures The construction of a set of organisational measures applicable to different graphic forms of ordered tree responses needs to be based upon a set of principles which respect the nature of the data and incorporates salient organisational characteristics of the phenomenon being studied. They will also need to be applicable within different research and professional development contexts. The measures also need to acknowledge that responses feature the direct more than the indirect links between content elements (see Figure 1 where AC in the first network is an indirect link and where AC in the second network is a direct link). Figure 1 Two contrasting networks for nodes A, B, and C The measures are to „provide quantitative measures applicable to qualitatively different responses; „be applicable to both hierarchical and non-hierarchical ordered tree responses; „address graphical and information dimensions of responses; „focus upon a set of salient, distinctive, though at times complementary, organisational dimensions of respondents' schemata; „be reliable, with researchers agreeing on the constructs being examined and the matching of data to constructs; „account for indirect, as well as direct, links between nodes in ordered tree responses. The constructs adopted for investigating specific characteristics are integration and differentiation of the schema, the accessibility and availability of the information, and the centrality of content. Integration and differentiation are measures of the graphical organisational dimensions of the schema responses. Integration is seen as an expression of the mental process by which the content is grouped together through principles or content seemed to be held in common or related to each other. Differentiation is seen as an expression of the mental process which discriminates between the content elements in the schema. The accessibility of a content element or node is a measure of the least number of links involved in moving from a particular node to each other node in the ordered tree response. Mean accessibility is an overall measure of the ease of accessing the different nodes within a particular ordered tree response. Content centrality is a measure of the salience of particular content or content category in the response and is based upon the identification of the node which has the lowest or lower mean accessibility score(s). Another key issue in describing ordered tree networks is the information they contain about the respondent's organisation of the domain. Information theory considerations would suggest that too much integration is a bad thing. Both links and their absence convey information. The two representations in Figures 2 and 3 are identical from the point of view of information theory. Figure 2 No links between nodesFigure 3 Direct links between all nodes What kind of measure is appropriate for our purposes? The term "information" appears unsuitable because the measure we need is not directly one of information as defined in information theory, so we propose to call it a measure of "availability" in a concept map. Information availability is an index of the overall ease of retrieval of the information in the schema response. It is a measure of the degree to which the information is ready for use and is related to the number of direct and indirect links used to relate the total set of content elements in the schema. In a conceptual structure, analogously to sending a message along a network, it is possible to connect concepts by means of intermediate concepts. Conceptualising a link as being implied along a branch in a ordered tree response would facilitate comparison between hierarchical and other types of networks, and reflect the mediated associations which exist between concepts. A simple approach is to regard an indirect path as a path with weighting (0.5)n, where n is the number of intermediate nodes, and to use only the shortest path between the concepts. We may reasonably guess that, since complete linkage and no linkage equally convey no information beyond the list of nodes, a possible suitable measure might be the one proposed below. It is, of course, possible to argue that the sum of all possible paths should be used in measuring the strength of the link between two nodes, but however valid this is for currents in an electrical network, an ordered tree or concept map may be more analogous to an electrical network with very high resistance at the junctions than to one where most resistance occurs along the links. Butcher (1995) had previously used the following definitions: Integration is: ((total links) + (initial nodes) - (total nodes))/(total nodes) differentiation is: ((total links) - (parallel links))/((total nodes) - (terminal nodes)), centrality of a category (node subset) is: (1+ (total depth of tree) - (level of first occurrence of category)/total depth of tree. These operational definitions, which worked well when applied to hierarchical tree-type networks, became somewhat problematic when considering some of the complex associative networks generated by investigations with relatively unconstraining instructions to respondents. The proposed simpler definitions of integration and differentiation have the advantages of simplicity and applicability across different types of networks, and the measures appear to correspond well to intuitive understanding of the meanings of the terms (see illustrative examples below). Measures of specific characteristics of cognitive organisation Operational definitions for the total set of proposed measures are presented below. They are: Accessibility of node N is the mean number of links from the node to any other node. The lower this value, the greater the centrality of N. Mean accessibility of an ordered tree network = mean number of links from node to node - that is, the mean accessibility of nodes in the network. The lower this value, the more closely interconnected the ordered tree network is. Integration is defined as the percentage of nodes with three or more links to them (3-nodes) as a percentage of the total number of nodes in the network Differentiation is defined as the percentage of nodes with only one link to them (1-nodes or terminal nodes) as a percentage of the total number of nodes in the network. Information availability is defined as 100*K/0.5 percent where K is the smaller of the two quantities (weighted total links)/(maximum possible links) and 1 - (weighted total links)/(maximum possible links), with weighted total links calculated including implied links. The formal definition of this measure is discussed in Appendix B A similarity measure can be used where two ordered tree networks have a common node-set it seems reasonable to use a definition of similarity of the two networks. Similarity is defined as the proportion of the total links which they have in common. It is the number of links in the intersection of the two networks divided by the number of links in the union of the two networks. This measure analyses the ordered tree responses with respect to the degree of connectedness which is in common across each set of two ordered tree responses. A consideration of pairs of simple networks such as the two in Figure 1 suggests that the similarity measure should not use direct links only. It should include implied links also, as defined in the measure of availability proposed above, because if only direct links are used the two networks in Figure 3 have zero similarity, which seems intuitively unreasonable. Using implied links also, their resemblance is 0.17. The use of implied links requires intersection and union to be generalised so that the union of two links is formed by logical addition (e.g. 0.25 + 0.5 = 0.5) and the intersection of the links by logical multiplication (0.25 x 0.5 = 0.25). As with the definition of availability, a spreadsheet can easily be used to perform the necessary calculations. It may also be possible to define a measure of similarity over different (non-disjoint) node-sets both in terms of the proportion of nodes in common (content) and the proportion of links in common (connectedness). However, the application of this needs further investigation (see Naveh-Benjamin et al., 1986). Where an ordered tree network contains unconnected or disjoint parts, it is proposed that the number of links from a node in one part to a node in another may be taken as its minimax value of (the smaller depth of the two parts) + 2 [the two being added because it is assumed that at least one unspecified intermediate node exists, or else the link would have been specified directly]. Illustration of organisational measures To illustrate the proposed measures, six simple prototypical 7-node networks showing different holistic forms of cognitive organisation are used initially as example data. This analysis is followed by the application of the measures to the analysis of three ordered tree responses from a novice and a beginning teacher and an experienced classroom manager. The criteria used for validating the measures were their ability to describe and quantify structural dimensions of and illuminate differences between ordered tree task responses. The quantitative descriptions and comparisons needed to reflect the apparent similarities and differences in the graphic nature of the responses. The illustrative networks, for the first step in validating the measures, are simple tree, radial tree, associative network, tree with cross-links, disjoint and ring structures (see Figure 4). I Simple tree II Radial tree III Associative network IV Tree with cross-links V Disjoint structure VI Ring Figure 4 Illustrative ordered tree networks The six networks were analysed with respect to their accessibility, integration, differentiation and availability. The networks with the best, that is numerically lowest, accessibility scores (see Table 1) were the radial (1.71), tree with cross-links (1.76) and associative network (1.81). The next most accessible network was the ring (2.00) with the simple tree (2.29) and the disjoint structure (2.80) being the least accessible information networks. The analysis of integration and differentiation of the six example networks (see Table 2) showed the associative network (43%) to be the most integrated with the tree with cross-links and simple tree (each 29%) to be the next most integrated and clearly more integrated than the radial tree and disjoint structure (each 14%) and the ring with no integration. The order of the networks with respect to degree of differentiation, from most differentiated to least differentiated, is radial tree (86%), disjoint structure (71%), simple tree (57%), tree with cross-links (29%), associative network (14%) and finally the ring with no differentiation. An analysis of the availability of the information contained in each of the six forms (see Table 2) showed that the simple tree (95%) had the highest availability score and the ring had the lowest (33%). In between these two the tree with cross-links was second (76%), radial tree and associative network were equal third (71%) and the disjoint structure was fifth (67%). Table 1 Accessibility measures for the six example networks Node I II III IV V VI A 1.67 1 1.5 1.67 2.83 2.00 B 1.83 1.83 1.83 1.33 2.50 2.00 C 1.83 1.83 1.67 1.33 3.00 2.00 D 2.67 1.83 1.83 2.17 2.83 2.00 E 2.67 1.83 1.67 1.83 2.83 2.00 F 2.67 1.83 1.83 1.83 3.17 2.00 G 2.67 1.83 2.33 2.17 3.17 2.00 Mean 2.29 1.71 1.81 1.76 2.87 2.00 Table 2 Availability (information) measures for the six example networks I II III IV V VI Integration 29% 14% 43% 29% 14% 0% Differentiation 57% 86% 14% 29% 71% 0% Availability 95% 71% 71% 76% 67% 33% A comparison of the six example networks with respect to the mean accessibility, integration, differentiation and availability measures (see Table 3) showed that the measures discriminated between the different networks on each of the measures except where two networks were similar with respect to a particular dimension or criterion. The results for each dimension or measure were seen by the authors as being intuitively correct and accounted for both direct and indirect links between nodes. The characteristics of the five networks which showed the nature of the relationship between particular content elements were featured more than those of the ring which had minimal discrimination of content and organisation. Table 3 Comparison of networks with respect to accessibility, integration, differentiation and availability The similarity measure was employed for each set of two networks as the basis for comparing responses across the common node set, nodes A to G, (see Table 4). This measure took the implied or indirect links into account and showed the greatest similarity to be between the simple tree and the tree with cross-links (networks I and IV). The lowest degrees of similarity were those which involved a comparison with the ring or the disjoint structure. Table 4 Matrix of similarity measures for the six example networks Application of measures to three ordered tree responses To extend the use of the measures to data from a study of teacher thinking, ordered tree responses from a novice, beginning and experienced teacher were analysed. The ordered tree task required these respondents to develop a tree representing the concepts they associated with classroom management. They initially wrote down the concepts which came to their minds when they thought of classroom management. Then they drew a tree type diagram linking those concepts which they saw as being related. In this way participants were illustrating how they organised their thoughts about classroom management. Participants also wrote an explanation of how they organised their ordered tree. The graphic parts of the responses, that is without their written explanations, are presented in Appendix A. The data from the quantitative analysis of their responses are presented in Tables 5 and 6. As the respondents were to develop their own list of concepts a similarity matrix was not able to be constructed. An examination of the accessibility, integration, differentiation and availability measures showed that respondent 5000 presented a highly interconnected associative network which had an extensive content base but a low availability of information and little differentiation. Respondent 42/1 also used an associative network with little differentiation, but the interconnection was less and there was a fairly high availability of information. Respondent 6002, by contrast, produced a hierarchical holistic form which was highly differentiated. However, the radial character of the network limited the availability of information except from the key organising concepts of organisation and discipline. Table 5 Comparison of node accessibility across three ordered tree responses Table 6 Organisational characteristics of responses Respondent 42/1 6002 5000 Integration 60% 16% 82% Differentiation 10% 74% 12% Availability (information) 58% 47% 37% Conclusions The proposed measures appear to be effective in describing salient characteristics of different types of ordered tree responses, and can be applied across different node-sets and to quite different types of holistic forms of organisation. The acknowledgement of indirect links in the measures used appeared to improve the validity of those measures in this application, although further refinement of the way in which indirect links are allowed for may be possible. Bibliography Bailey, M, & Butcher, J. (1994). Dimensionality and conceptual space in measures of cognitive structure: An analysis of ordered tree task responses. Paper presented at the Australian Association for Research in Education Annual Conference, Newcastle, New South Wales. Beyerbach, B. A. (1988). Developing a technical vocabulary on teacher planning: Preservice teachers' concept maps. Teaching and Teacher Education, 4, 337-347. Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome). New York: Academic Press. Butcher, J. (1989). Methodological questions in analysing the content structure and meaning of student teachers' management schemata. Paper presented at the Australian Association for Research in Education Annual Conference, Adelaide, South Australia. Butcher, P. J. (1991). A study of instrumentation in measuring the impact of the practicum on student teacher cognition. Paper presented at the Australian Teacher Education Association Annual Conference, Melbourne, Australia. Butcher, P. J. (1995). A theory of development of in the management domain. Paper presented at the Australian Teacher Education Association Annual Conference, Sydney, Australia. Elbaz, F., Hoz, R., Tomer, Y., Chayot, R., Mahler, S., & Yeheskel, N. (1986). The use of concept mapping in the study of teachers' knowledge structures. In M. Ben-Peretz, R. Bromme, & R. Halkes (Eds.), Advances on research on teacher thinking (pp. 45-54). Lisse: The International Study Association on Teacher Thinking. Kagan, D. M. (1990). Ways of evaluating teacher cognition: Inferences concerning the Goldilocks principle. Review of Educational Research, 60(3), 419-469. Morine-Dershimer, G. (1989). Preservice teachers' conceptions of content and pedagogy: Measuring growth in reflective, pedagogical decision-making. Journal of Teacher Education, 40(5), 46-52. Morine-Dershimer, G. (1991, April). Tracing conceptual change in preservice teachers. Paper presented at the annual conference of the American Educational Research Association, Chicago. Naveh-Benjamin, M., McKeachie, W. T., Lin, Y-G, & Tucker, D. G. (1986). Inferring students' cognitive structures and their development using the "Ordered Tree Technique". Journal of Educational Psychology, 78, 130-140. Roehler, L. R., Herrmann, B. A., & Reinken, B. (1989). Exploring knowledge structures through the ordered tree technique: A manual for use. Unpublished manuscript, Michigan State University at East Lansing. Strahan, D. B. (1989). How experienced and novice teachers frame their views of instruction: An analysis of semantic ordered trees. Teaching and Teacher Education, 5(1), 53-67. Winitzky, N. (1992). Structure and process in thinking about classroom management: An exploratory study of prospective teachers. Teaching and Teacher Education, 8(1), 1-14. Winitzky, N., Kauchak, D., & Kelly, M. (1994). Measuring teachers' structural knowledge. Teaching and Teacher Education, 10(2), 125-139. Appendix A - Three graphic ordered tree responses in study of development in management domain Respondent 42/1 - Record 6002 Record 5000 Appendix B: Mathematical note on weighted total links: Formally we may define weighted total links as follows: Let M [mij] be the node-node matrix of the network, with mij = 1 if there is a direct link between nodes i and j, and 0 otherwise. Define M(2) as the matrix whose elements [m(2)ij] are given by: m(2)ij = mi1m1j Ć mi2m2j Ć ... Ć minmnj , where aĆb, analogous to logical addition, is the greater of a and b, and similarly for M(3) etc, if ičj, and m(2)ij =0 if i=j. Then the weighted total links are calculated as half the sum of the elements of the matrix P given by P = M Ć  EMBED "Equation" \* mergeformat M(2) Ć ... Ć  EMBED "Equation" \* mergeformat M(n), where n is the number of links between the two nodes which have the greatest number of links on the shortest path between them, and the operation Ć on matrices is applied to corresponding elements in the matrices being added. For the purposes of formal definition, the mean accessibility of node j in a network comprising k nodes is given by 1/(k-1) times the sum of the elements in row j of the matrix A [aij], where aij is given as log2(1/pij)+1, and P[pij] is the matrix defined above, if ičj, and aij = 0 if i=j. In practice it is, of course, easier to calculate the matrix A directly from the network, since aij is the number of links on the shortest path between nodes i and j, and then obtain pij as (0.5)(aij -1).