Predicting Students Drop Out

City:

Eindhoven, North Brabant, Netherlands

Organization:

Eindhoven University of Technology

Project Start Date:

September 2000

Project End Date:

January 2009

Reference:

Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting Students Drop Out: A Case Study. Education Data Mining, 41-50.

Problem:

There was a drop out rate of 40% for freshmen in the Electrical Engineering (EE) Department of Eindhoven University of Technology. In an attempt to help lower this number the EE department wants to identify successful and unsuccessful students at an early stage. There are a wide range of factors that directly relate to a student’s academic success. Helping teachers, education personnel and management understand these factors will help support the students and decrease drop. A solution that was used was to have the councillor give students advice on continuing their academic degree depending on their grades. Although the solution yielded decent results, it was deemed unsatisfactory due to its subjective nature. It was then that data mining was proposed to be used in an attempt to find a more robust and objective process.

Technical Solution:

·         OneR classifier

·         Compared two decision tree algorithms CART (SimpleCart) and C4.5 (J48)

·         Bayesian Classifier (BayesNet)

·         Logistic Model (SimpleLogistic)

·         Rule-Based learner (JRip)

·         Random Forest (RandomForest)

·         Cost Matrix (Increase accuracy of results)

Datasets Used:

  • Dataset 1: Pre-University dataset, Eindhoven University of Technology Electrical Engineering Department, 2000-2009
  • Dataset 2: University Grades only dataset, Eindhoven University of Technology Electrical Engineering Department, 2000-2009
  • Dataset 3: Combined dataset of Pre-University and University grades, Eindhoven University of Technology Electrical Engineering Department, 2000-2009

Outcome:

Before the data modelling the solution was to give every enrolled student study advice in December, based upon grades and other results of the student. The academic data is examined by the department’s student counselor and they advise the student on whether they should continue their program. Results are generally accurate according to the department, but no clear accuracy number was released.

 

Model provided useful results of successful or unsuccessful students with accuracies between 75-80%. Use of cost matrix helped deal with misclassification but did not see significant improvement. However using this was able to find main classification issue of LinAlgAB having entries of zero when there was no entered value (Because they didn’t necessarily fail).

Issues that arose:

-During experiment found that there was not much room for enhancement

-Almost all students being misclassified did not have a database entry for LinAlgAB (No-entry is automatically mapped to zero)

-Negative classification can only be given after 3 years, and no guarantee that the student who does not get his/her diploma after 3 years will be unsuccessful

-Mistakes due to classification measure

Status:

Terminated

Entered by:

October 30, 2020: Danny Zhao, 1001533655, shibo.zhao@mail.utoronto.ca



CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca