Participation-based Student Final Performance Prediction Model through Interpretable Genetic Programming: Integrating Learning Analytics, Educational Data Mining and Theory
Publication Type:Journal Article
Source:Computers in Human Behavior, Issue Accepted (2014)
Keywords:cxcl, genetic programming, learning analytics
Building a student performance prediction model that is both practical and understandable for users is a challenging task fraught with confounding factors to collect and measure. Traditionally, most prediction models are difficult for teachers without a significant background in probability to interpret. This poses significant problems for model use (e.g. personalizing education and intervention) as well as model evaluation. In this paper, we synthesize learning analytics approaches, educational data mining (EDM) and HCI theory to explore the development of more usable prediction models and prediction model representations using data from a collaborative geometry problem solving environment: Virtual Math Teams with Geogebra (VMTwG). First, based on theory proposed by Hrastinski (2009) establishing online learning as online participation, we operationalized activity theory to holistically quantify students’ participation in the CSCL (Computer-supported Collaborative Learning) course. As a result, 6 variables, Subject, Rules, Tools, Division of Labor, Community, and Object, are constructed. Unlike some traditional blunt instruments (feature selection, Ad-hoc guesswork etc.), this step diminishes data dimensionality and systematically contextualizes data in a semantic background. Secondly, an advanced modeling technique, Genetic Programming (GP), is coded to develop the prediction model. We demonstrate how connecting the structure of VMTwG trace data to a theoretical framework and processing that data using the GP algorithmic approach outperforms traditional models in prediction rate and interpretability. Theoretical and practical implications are then discussed.