How are machine learning questions evaluated?

This article explains what is needed for machine learning questions to be graded automatically on HackerEarth, and it also describes how this whole process happens.

Prerequisites for automatic evaluation

  • Candidate's results.csv file
  • Test case (ground truth values of the test data)
  • Checker file

Generating the results.csv file on HackerEarth

When candidates start the assessment, they do the following to generate the results.csv file in their local system:

  1. Download the dataset (both the train.csv and test.csv files)
  2. Build the required model in their local system
  3. Train their model based on the data in the train.csv file.
  4. Once the model is trained, they feed the data from the test.csv file into the model to generate the result.csv file. This file contains the predictions.
    Important: The candidate does not have access to the ground truth values (i.e. correct or “true” answer to a specific problem or question) of the test.csv file.
  5. They upload this result.csv file on the HackerEarth ML platform.

How is the candidate's submission evaluated?

Every ML question in the HackerEarth library has a checker file. This file allows the HackerEarth platform to automatically evaluate a candidate’s submission (result.csv file) and generate a score. Specifics about the checker file are as follows:

  • The checker file is a piece of code that contains the following:
    • Checkpoints (whether the file format (.csv) is met, all data points from the dataset are covered)
    • Evaluation metric (for example, r2_score)

When a candidate uploads their results file (in .csv format) on the platform, the checker file does the following:

  1. The checker file for the question compares this CSV file with the ground truth values which is added to the platform as a test case.
  2. Generates a score based on the evaluation metric that is defined in the checker file.

When you want to change the score for a machine learning question, you need to update the score on the UI and in the checker files for correct evaluation.