Competition by Analytics India Magazine and Tredence on MachineHack.

Contents:

  1. Introduction
  2. Problem statement
  3. About the dataset
  4. Approach
    1. Data cleaning
    2. Models
    3. CV and hyperparameters
  5. Results

👋 Introduction:

Competitions with tabular datasets are always fun. I’ve been focusing on Computer Vision with RespAI and the Petfinder competition and this competition came as a pleasant surprise. I entered it as an escape/distraction from computer vision but that soon changed as I found myself working on it for hours together. The data presented here isn’t real world and clearly human-made, with clean rows and a seemingly careful input of NaNs into specific columns. I spent most of my time figuring out methods to fill these NaNs with model predictions being the solution I went with finally. As beginner-friendly as it was, it did help me learn a great deal. This was the first time I explored hyperparameter tuning libraries (apart from sklearn’s random search and grid search) and stuck with Optuna. Classical Machine Learning models aside, I took a Neural Network approach as well. Of course, my final submission was an ensemble of pretty much all the models I successfully evaluated. Below, I note down the info and details about the competition and my approach towards it.


🎯 Problem statement:

The year is 2050 and a team of astronauts from all over the world went on a mission to an Exoplanet and discovered a vast amount of life and awesome weather. The scientists began collecting data samples of fruits found in their landing site and were curious by their shape and size. They collected data for more than a solar year of the planet to understand the fruit growing conditions in different weathers.

To analyze data and grow fruits similar to earth, they began transmitting data back to the Earth, however, due to solar radiation, some data got corrupted and got lost in transmission. Back on Earth, the scientists figured they need to identify the type of climate the exoplanet has based on the properties of the fruit with the existing challenge of missing data. Help the scientists identify the earth-like season in which the fruit must have grown using the data collected.


📚 About the dataset:

Find out more about the columns in the dataset here.


✅ Approach:


🧹 Data cleaning:


🧠 Models:

Ensemble of the following models:

flowchart LR subgraph layer1 direction TB B[Dense 128] --> C[Relu] C --> D[Dropout 0.3] end subgraph layer2 direction TB E[Dense 64] --> F[Relu] F --> G[Dropout 0.2] end subgraph layer3 direction TB H[Dense 16] --> I[Relu] end subgraph layer4 direction TB J[Dense 4] --> K[Softmax] end A[Input] --> layer1 layer1 --> layer2 layer2 --> layer3 layer3 --> layer4 layer4 --> L[Output]

🚀 CV and hyperparameter tuning:



🎯 Results: