Data mining methodologies in educational organizations

Date of Completion

January 2005


Education, Tests and Measurements|Education, Administration|Education, Technology of|Information Science




Never before in the history of public education in the United States have schools been held to the level of accountability required by the No Child Left Behind (NCLB) Act. Unquestionably, NCLB is the most data-driven measure for monitoring student achievement ever imposed on states and local municipalities by federal legislators. Educators must report their data, but more importantly, they must understand their data and use it to make informed decisions that will lead to the Adequate Yearly Progress (AYP) mandated by this legislation. ^ Technological advances in predictive analytics have made it possible to analyze tremendously large data sets to reveal relationships among variables that even the most informed experts could not have predicted. This process, known as data mining, differs significantly from traditional statistical analyses in that it involves Exploratory Data Analysis (EDA) and is not driven by any a priori hypotheses. ^ This study proved that the CRoss-Industry Standard Process for Data Mining (CRISP-DM), a non-proprietary data mining process that was developed for and is currently used in the business world, can be transferred to educational settings and provide a start-to-end structure that is capable of producing operationally actionable information to address the student achievement questions of educational leaders. Data mining programs that contain the Classification and Regression Tree (CART) algorithm, like the one used in this study (SPSS's Clementine), can strongly predict student performance within a cohort of students. Furthermore, this study showed that the CART algorithm is able to predict the performance of one cohort of students from the performance model of a different cohort of students with moderate correlational values. By using these analyses, educators can identify the important variables influencing their students' achievement. ^ These findings elicit both elation and caution regarding the use of data mining methodologies as a tool for improving student achievement. Further study is required to determine if the techniques used in this study can be universally applied and the extent to which the information gleaned from data mining analyses can be used by educational organizations to make decisions that lead to improved student achievement. ^