Data Science question creation guide in a test

Creating a custom Data Science question

  1. Log in to HackerEarth Assessment using admin credentials.
  2. Click Add question in the Overview section.
  3. Click Create a new question.
  4. Click Data Science in the Coding section.

DS20.gif

Template

The template contains the following sections:

DS21.png

 

S. no.

Section name

Description

1

Description

This section describes the problem that you want to create. You are required to add details about the problem statement, difficulty level, etc

2

Data & test cases

This section contains the datasets, test cases, etc.

3

Languages

This section contains the languages that you can enable for a candidate to solve the question

4

Editorial

This section can contain the approach to solve the problem. This is optional.

The Description section

Creating a problem statement

The question-creation template is displayed on your screen with the following fields:

  • Problem Name
  • Problem Statement
  • Difficulty level
  • Maximum Score
  • Tags

1. Add the name or title of your question in the Problem Name field.

2. In the Problem Statement field, add the problem statement that you want the candidate to solve. A good problem statement has the following:

  • Problem statement: It contains a task or problem statement that a  candidate must solve. 
  • Data description: It contains the dataset that is provided to solve a problem.
  • Submission criteria: It contains the formats and different criteria that a candidate must follow while making a submission.
  • Evaluation criteria: It contains the scoring metric that is used for evaluating a submission.

3. Set the complexity of your question from the Difficulty level list.

4. Add the required tags in the Tag section.

The Data & test cases section

Adding dataset and test cases

DS22.png

S. no.

Label name

Description

1

Sample Data Set

A sample data set is a subset of the full data set.

2

Expected Output 

The sample expected output file represents the true values of the sample test data that is present in the sample data set.

3

Full Data Set

A full data set can be defined as the complete data set which can be used to train and test the model that solves the problem given in the problem statement.

4

Checker

A checker is used to automatically evaluate and generate a score for the submission. This submission contains the predictions of the test data submitted by the candidate.

If you click Add Checker, you will observe the following fields:

  1. Expected Output: Represents the true values of the full test data set that is present in the Full data set.
  2. Checker file: Represents the checker file that must be added for auto-evaluation. For more information regarding the checker file, refer to this article. 
  3. Maximum score: Represents the maximum score assigned to the expected output.
  4. Checker language: Represents the language of the checker file.

To understand about partial scoring and multiple test cases, you can refer to this link.

Once you have uploaded the required files, click Upload.

5

Time Limit

The evaluation of your submission file during COMPILE & TEST and SUBMIT is limited to a specific amount of time. If your code exceeds the specified limit, then you will see the time limit exceeded (TLE) error on the screen. 

Note: The maximum time limit is 300 seconds.

The time limit is multiplied by a time factor. For example, in Python, this factor is 5. If you have set the time limit as 300 seconds, then the total time is 300*5=1500 seconds, that is, 25 minutes.

Similarly, in R, the factor is 1.5.

6

Memory Limit

The evaluation of your submission file during COMPILE & TEST and SUBMIT is limited to a specific amount of memory. If your code exceeds the specified limit, then you will see the memory limit exceeded (MLE) error on the screen. 

Note: The maximum memory limit is 3072 MB.

7

Code Snippet

Code snippets are boilerplate codes that can be used by the candidate for reference to solve the question. You can add instructions (in comments format) or provide comments to import required libraries.

Notes

  • The data set must be uploaded in a .zip format. And, the maximum file size limit is 30 MB.
  • The dataset folder should consist of the following information:
    • train.csv: Data set that candidates use to train their models
    • test.csv: Data set that candidates use to predict an outcome
    • sample_submission.csv: Format that candidates should follow to create their submission file
  • The test data, train data, and sample submission must be .csv files.
The Languages section

Adding code snippets

Code snippets are boilerplate codes that can be used by the candidate for reference to solve the question. You can add instructions (in comments format) or provide comments to import required libraries. You can select the languages allowed for the code snippets which are the following:

  1. Python
  2. Python 3
  3. R

Example

DS23.png

The Editorial section

Adding the approach to solve a problem 

You can add interesting content regarding the question that allows a candidate to solve the problem. This can include the approach, directions, or steps to solve the problem.

Click Publish to successfully create a Data Science question in a test.

To try the question and understand how the candidate can solve it on HackerEarth’s platform and make submissions, you can refer to this article.