The sinking of the Titanic is one of the most severe shipwrecks in the world. Today I will use a Decision Tree model to predict who may become survivors. The data set comes from https://www.kaggle.com/c/titanic, which contains two CSV format files; one is the train set the other is the test set provided by Kaggle.
Scikit-learn
Scikit-learn is one of the python libraries that specializes in machine learning. It allows you to build machine learning models and provide utility functions for data preparation, post-model analysis, and evaluation. If I had to sum up the essence of applying learning algorithms in Sklearn, I would say it goes 5 steps like this:
Case Study walk through the Sklearn machine learning steps
Here I will use the Decision Tree algorithm to create a model that predicts which passengers will survive the Titanic shipwreck.
Step 1 : Importing the required libraries
Step 2: Loading data from CSV files via Pandas
Step 3: Following that, data processing can be done on the DataFrame using various Pandas methods, such as handling missing data, selecting a specific column or range of columns, conducting feature transformations, conditional filtering, and so on.
Step 4 : After processing the data, the data set can be divided into train and test data sets. This operation can be accomplished with the train_test_split() function, which splits the output variables into training and test subsets (X train, y train, X test, and y test).
Step 5: Building Decision Tree Classifier, here, I created the object Clf of DecisionTreeClassifier()
Step 6: Initializing GridSearchCV() object and fitting it with hyperparameters
Step 7: Obtaining the Best Hyperparameters and Best Score. This will return the hyperparameters and values that provide the best performance for the estimate we specified.