Welcome to this four-part blog series where we introduce a powerful analytical tool called Survival Analysis. In this series, I will provide a beginner-friendly guide to help you understand this popular statistical method.

In the first part, I will introduce the key concepts of survival analysis and show you some use cases where it can be applied. In the second part, we will dive deeper into the key models in survival analysis. In the third part of the series, we will walk you through how you can perform survival analysis using Alteryx. Finally, we will see how we can perform survival analysis using Python.

Whether you are a marketing analyst, medical researcher, engineer, or social scientist, this series will help you understand how to analyse time-to-event data and predict survivability. So, let’s dive in!

**Content**

**Part I: Introduction to Survival Analysis**- What is survival analysis?
- Why do we use it?
- Key concept: Censorship
- Use cases

**Part II: Key Models in Survival Analysis**- The Kaplan-Meier Model
- The Cox Proportional Hazard Model

**Part III: Survival Analysis using Alteryx**- Data Preparation
- The KM Model in Alteryx
- The CPH Model in Alteryx

**Part IV: Survival Analysis using Python**- Installing the Library
- Demo

Let’s continue using the same dataset from the previous blog.

If you want to conduct survival analysis using Python, you will require either a Python Integrated Development Environment (IDE) or a Jupyter Notebook environment. Fortunately, Alteryx provides the Python Tool option to code in Python. If you are not familiar with the Python Tool, you are welcome to refer to my prior blog where I demonstrated how to perform a machine learning analysis using the Python Tool.

**Installing the Library**

The Python library that we will use to perform survival analysis is called lifelines.

We can easily install this package within Alteryx like below. And we will also import the other libraries needed (install them first if you haven’t already).

You can find its documentations here: https://lifelines.readthedocs.io/en/latest/

**Demo**

**Step 1: Connect to our data source**

Remember, you need to first run the workflow once after connecting the Python Tool to your input data stream, this helps the tool get all the necessary metadata.

Then, you can read in the input data like this:

**Step 2: Kaplan-Meier Model**

We need to first instantiate a an instance of the KaplanMeierFitter(), and then call the fit() method to calculate the survival curve values. We can then visualize the survival curve using the Matplotlib library.

Alternatively, you could also constructed a KM model for each group of customers:

**Step 3: The Cox Proportional Hazards Model**

We first need to convert the type of our covariates from “String” (or Object in Python) to numerical. And then we instantiate an instance of the CoxPHFitter().

Based on the survival regression output, we can see that the Python results are consistent with the Survival Analysis Tool results (which is based in R). Gender is the only statistically significant covariate and having a gender = “Female” is expected to reduce the risk of churning (as we have a negative coefficient for Gender and remember Female was encoded as 1).

Let’s plot the variable coefficients! While in our case, the values of the three variables we currently have can be read from the table, but as the number of variables increase, a coefficient plot will help us easily identify the bigger contributors!