The sinking of the Titanic is one of the most severe shipwrecks in the world. Today I will use a Decision Tree model to predict who may become survivors. The data set comes from https://www.kaggle.com/c/titanic, which contains two CSV format files; one is the train set the other is the test set provided by Kaggle.

Scikit-learn

Scikit-learn is one of the python libraries that specializes in machine learning. It allows you to build machine learning models and provide utility functions for data preparation, post-model analysis, and evaluation. If I had to sum up the essence of applying learning algorithms in Sklearn, I would  say it goes 5 steps  like this:

Case Study walk through the Sklearn machine learning steps

Here I will use the Decision Tree algorithm to create a model that predicts which passengers will survive the Titanic shipwreck.

Step 1 : Importing  the required libraries

Step 2: Loading data from CSV files via Pandas

Step 3: Following that, data processing can be done on the DataFrame using various Pandas methods, such as handling missing data, selecting a specific column or range of columns, conducting feature transformations, conditional filtering, and so on.

 

Step 4 : After processing the data, the data set can be divided into train and test data sets. This operation can be accomplished with the train_test_split()  function, which splits the output variables into training and test subsets (X train, y train, X test, and y test).

Step 5: Building Decision Tree Classifier, here, I created the object Clf of DecisionTreeClassifier()

Step 6:  Initializing GridSearchCV() object and fitting it with hyperparameters

Step 7: Obtaining the Best Hyperparameters and Best Score. This will return the hyperparameters and values that provide the best performance for the estimate we specified.

Gary Li
Author: Gary Li