Welcome to this four-part blog series where we introduce a powerful analytical tool called Survival Analysis. In this series, I will provide a beginner-friendly guide to help you understand this popular statistical method.

In the first part, I will introduce the key concepts of survival analysis and show you some use cases where it can be applied. In the second part, we will dive deeper into the key models in survival analysis. In the third part of the series, we will walk you through how you can perform survival analysis using Alteryx. Finally, we will see how we can perform survival analysis using Python.

Whether you are a marketing analyst, medical researcher, engineer, or social scientist, this series will help you understand how to analyse time-to-event data and predict survivability. So, let’s dive in!

 

Content

  1. Part I: Introduction to Survival Analysis
    • What is survival analysis?
    • Why do we use it?
    • Key concept: Censorship
    • Use cases
  2. Part II: Key Models in Survival Analysis
    • The Kaplan-Meier Model
    • The Cox Proportional Hazard Model
  3. Part III: Survival Analysis using Alteryx
    • Data Preparation
    • The KM Model in Alteryx
    • The CPH Model in Alteryx
  4. Part IV: Survival Analysis using Python
    • Installing the Library
    • Demo

 

Let’s continue using the same dataset from the previous blog.

If you want to conduct survival analysis using Python, you will require either a Python Integrated Development Environment (IDE) or a Jupyter Notebook environment. Fortunately, Alteryx provides the Python Tool option to code in Python. If you are not familiar with the Python Tool, you are welcome to refer to my prior blog where I demonstrated how to perform a machine learning analysis using the Python Tool.

 

Installing the Library

The Python library that we will use to perform survival analysis is called lifelines.

We can easily install this package within Alteryx like below. And we will also import the other libraries needed (install them first if you haven’t already).

You can find its documentations here: https://lifelines.readthedocs.io/en/latest/

 

 

Demo

Step 1: Connect to our data source

Remember, you need to first run the workflow once after connecting the Python Tool to your input data stream, this helps the tool get all the necessary metadata.

Then, you can read in the input data like this:

Step 2: Kaplan-Meier Model

We need to first instantiate a an instance of the KaplanMeierFitter(), and then call the fit() method to calculate the survival curve values. We can then visualize the survival curve using the Matplotlib library.

Alternatively, you could also constructed a KM model for each group of customers:

 

Step 3: The Cox Proportional Hazards Model

We first need to convert the type of our covariates from “String” (or Object in Python) to numerical. And then we instantiate an instance of the CoxPHFitter().

 

Based on the survival regression output, we can see that the Python results are consistent with the Survival Analysis Tool results (which is based in R). Gender is the only statistically significant covariate and having a gender = “Female” is expected to reduce the risk of churning (as we have a negative coefficient for Gender and remember Female was encoded as 1).

 

Let’s plot the variable coefficients! While in our case, the values of the three variables we currently have can be read from the table, but as the number of variables increase, a coefficient plot will help us easily identify the bigger contributors!

 

 

 

Martin Ding
Author: Martin Ding

Martin earned his Honours degree in Economics at the University of Melbourne in 2011. He has more than 7 years of experience in product development, both as an entrepreneur and as a project manager in robotics at an AI unicorn. Martin is expecting to receive his Master’s degree in Data Science from CU Boulder at the end of 2022. Martin is excited about data and it’s power to transform organizations. He witnessed at first hand of how instrumental data driven decision making (DDDM) was in leading to more team buy-in and insightful decisions. Martin joined the Data School to systematically enhance his knowledge of the tools, methodologies and know-how of Data Analytics and DDDM. When not working, Martin enjoys readings, cooking, traveling and golf. He also thoroughly interested in the practice of mindfulness and meditation.