Predicting student performance by multiple regression


In the context of Educational Data Science, Student Academic Performance Prediction can follow Educational Data Mining, which seeks to make student performance quantitative, guiding teachers and educational institutions. Multiple Linear Regression is a forecasting methodology that can be applied to educational data, as is the case of data from the National High School Exam (ENEM). Based on data from the ENEM 2019 edition, this research proposed, tested and analyzed seven multiple regression models based on a sample of 18.908 candidates. Such models considered the scores of the tests of (i) Languages, Codes and their Technologies, (ii) Mathematics and their Technologies, (iii) Natural Sciences and their Technologies and (iv) Human Sciences and their Technologies and (v) Writing; and personal data (iv) age, (v) sex and (vi) completed high school in a public or private school. Six models showed independence, constant variance, absence of influential and significant outliers, allowing for an excellent predictive capacity of student performance.

Statistics Education