Select Page

Welcome to this four-part blog series where we introduce a powerful analytical tool called Survival Analysis. In this series, I will provide a beginner-friendly guide to help you understand this popular statistical method.

In the first part, I will introduce the key concepts of survival analysis and show you some use cases where it can be applied. In the second part, we will dive deeper into the key models in survival analysis. In the third part of the series, we will walk you through how you can perform survival analysis using Alteryx. Finally, we will see how we can perform survival analysis using Python.

Whether you are a marketing analyst, medical researcher, engineer, or social scientist, this series will help you understand how to analyse time-to-event data and predict survivability. So, let’s dive in!

#### Content

1. Part I: Introduction to Survival Analysis
• What is survival analysis?
• Why do we use it?
• Key concept: Censorship
• Use cases
2. Part II: Key Models in Survival Analysis
3. Part III: Survival Analysis using Alteryx
4. Part IV: Survival Analysis using Python

#### What is Survival Analysis?

Survival analysis – the statistical method that answers the all-important question: “how long until it happens?” Originally developed in the medical industry to predict the time until patient death (hence the name survival analysis). On the lighter side, nowadays survival analysis is widely used in engineering, social sciences and marketing analytics.

I first encountered survival analysis when analysing customer churn data, so why don’t we use churn to help us understand the topic?

#### Why do we use Survival Analysis?

“Why do we even need survival analysis when we have machine learning?”

“Can’t we use classification models to predict churn?”

Yes, I hear you and I agree, when it comes to predicting customer churn, it’s easy to get caught up in the hype of modern machine learning tools. And, let’s face it, who can resist their temptation of high accuracy rates? However, there is at least one area where machine learning-based classification models fall short, and that’s predicting when churn will occur. This is where survival analysis truly shines, and knowing the “when” is really valuable to businesses:

• Understanding when churn is likely to occur can significantly improve businesses’ ability to better prioritize and target customers. For instance, by identifying customers who are likely to churn after only one week of use versus those who are likely to churn after five years of tenure, the marketing team can develop tailored strategies to retain these customers.
• A customer’s value is often related to how long they stay with a business. For subscription-based businesses such as Netflix, a customer who churns in 1 month is not the same as a customer who churns in 1 year in terms of Customer Lifetime Value (CLV).
• Survival analysis allow us to deal with censorship (more on this later). If we do not predict a customer to churn right now, it does not imply that the customer never will churn. However, this aspect is often neglected in classification analysis and this ability to deal with ‘censorship’ in data makes survival analysis a superior technique to traditional classification techniques.

#### Key Concept: Censorship

Censorship, in the context of survival analysis, refers to losing track of an instance (in our case, that would be a customer) during an observation period, or where the event (churn) has not been observed for a customer during this period. This is an important concept, because if we don’t consider censorship, we will potentially introduce bias into our prediction — just because we haven’t observed a customer cancelling a subscription, doesn’t mean they never will. More specifically, there are three types of censorship:

1. Right-censored data: When you do know when a customer started the subscription, but don’t know when churn occurred (event end time):
• either due to the customer record being withdraw for reasons other than churn (e.g. data entry issue) or,
• the customer simply haven’t churned when we conducted the analysis.
2. Left-censored data: When the customer churn time (end time) is known, but we don’t know when they started the subscription:
• This may happen if a customer started the subscription before our observation period (e.g. when some customer data is in an older database and haven’t been migrated to the current database  used for analysis).
3. Interval-censored data: When the relevant data is collected at a specific time interval, but the exact start and end times are not known.
• For example, when we need daily data for churn analysis, but some customers’ info has been truncated to monthly granularity.

#### Use Cases

Survival analysis endows us the ability to analyse time-to-event data on a wide range of topics. Literally, we can apply survival analysis to predict any event of interest that happens over time, where we can define a clear start and an end. So of the common use cases include:

1. Medical Research: Survival analysis is frequently used in medical research to study the time to onset of disease or death. For example, a researcher might use survival analysis to study the survival time of patients with cancer after treatment or to study the time to progression of a disease.
2. Engineering: Survival analysis is used in engineering to help predict maintenance and time to failure. For example, a researcher might use survival analysis to study the time to failure of a mechanical component or the time to failure of a bridge.
3. Finance: Using survival analysis, we can predict the time to default of a borrower. For example, a lender might use survival analysis to study the probability of default of a loan portfolio.
4. Social Sciences: Survival analysis is used in social sciences to study the time to event for a range of outcomes. For example, a researcher might use survival analysis to study the time to first marriage or the time to unemployment for a group of people.
5. Marketing: Finally, survival analysis is used in marketing to analyse customer retention rates and churn. For example, a company might use survival analysis to study the time to churn of its customer base, to determine what factors influence churn, and to develop strategies to reduce churn rates. We will see this in action in my later blog.

Stay tuned!

##### Author: Martin Ding

Martin earned his Honours degree in Economics at the University of Melbourne in 2011. He has more than 7 years of experience in product development, both as an entrepreneur and as a project manager in robotics at an AI unicorn. Martin is expecting to receive his Master’s degree in Data Science from CU Boulder at the end of 2022. Martin is excited about data and it’s power to transform organizations. He witnessed at first hand of how instrumental data driven decision making (DDDM) was in leading to more team buy-in and insightful decisions. Martin joined the Data School to systematically enhance his knowledge of the tools, methodologies and know-how of Data Analytics and DDDM. When not working, Martin enjoys readings, cooking, traveling and golf. He also thoroughly interested in the practice of mindfulness and meditation.