Predicting student performance by multiple regression

Authors

Keywords:

Predição, desempenho, Regressão Linear Múltipla, Ciência de Dados, Mineração de Dados Educacionais, Exame Nacional do Ensino Médio

Abstract

In the context of Educational Data Science, Student Academic Performance Prediction can follow Educational Data Mining, which seeks to make student performance quantitative, guiding teachers and educational institutions. Multiple Linear Regression is a forecasting methodology that can be applied to educational data, as is the case of data from the National High School Exam (ENEM). Based on data from the ENEM 2019 edition, this research proposed, tested and analyzed seven multiple regression models based on a sample of 18.908 candidates. Such models considered the scores of the tests of (i) Languages, Codes and their Technologies, (ii) Mathematics and their Technologies, (iii) Natural Sciences and their Technologies and (iv) Human Sciences and their Technologies and (v) Writing; and personal data (iv) age, (v) sex and (vi) completed high school in a public or private school. Six models showed independence, constant variance, absence of influential and significant outliers, allowing for an excellent predictive capacity of student performance.

Published

16-09-2021

How to Cite

Almeida de Novaes, A., & Piton-Gonçalves, J. (2021). Predicting student performance by multiple regression. Sigmae, 10(1), 82–98. Retrieved from https://publicacoes.unifal-mg.edu.br/revistas/index.php/sigmae/article/view/1552

Issue

Section

Statistics Education