ML problems on HackerEarth ML platform

HackerEarth’s ML platform supports the Machine Learning flow as shown in the diagram in the Typical learning section. 

How do we create an ML problem?

We divide the entire data into two parts:

  • Training data set
  • Test data set

What is a training data set?

A training data set is the data that candidates will use to train their models.

What is a test data set?

A test data set is the unseen data that the candidates will use to predict an outcome.

Note: The test data that we give to the candidates does not specify the outcome.

What does the candidate do after the models are trained?

After the models have been trained, the candidates are expected to do the following:

  • To predict an outcome on the test data set
  • Submit the prediction file

Example

The following data set of 10 rows can be divided into:

  • Training data set (50% of the rows)
  • Test data set (remaining 50% of the rows)

Entire data set

Outlook

Temperature

Humidity

Wind

Play

Sunny

Hot

High

False

No

Rainy

Mild

High

False

Yes

Sunny

Cool

Normal

False

Yes

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Overcast

Hot

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Sunny

Mild

High

False

No

Overcast

Cool

Normal

True

Yes

Rainy

Mild

High

True

Yes

 

Training data set

ID

Outlook

Temperature

Humidity

Wind

Play

1

Sunny

Hot

High

False

No

2

Rainy

Mild

High

False

Yes

3

Sunny

Cool

Normal

False

Yes

4

Overcast

Hot

High

False

Yes

5

Rainy

Mild

High

False

Yes

 

Test data set (test.csv)

ID

Outlook

Temperature

Humidity

Wind

1

Sunny

Hot

High

False

2

Rainy

Mild

High

False

3

Sunny

Cool

Normal

False

4

Overcast

Hot

High

False

5

Rainy

Mild

High

False

Note: We have not provided the target variable in the test data set.