Lifelines for Survival Analysis — ELI5
Imagine you buy a pack of lightbulbs and want to know how long each one will last. You screw them all in at the same time and wait. Some burn out after a month. Some last a year. Some are still going when you get bored and stop watching.
That “how long until something happens” question is called survival analysis, and it shows up everywhere:
- How long do patients live after starting a new treatment?
- How many months before a customer cancels their subscription?
- How many miles before a car part breaks?
The tricky part is the lightbulbs that were still working when you stopped watching. You do not know when they will eventually burn out — you only know they lasted at least this long. Throwing away that incomplete data wastes valuable information. Survival analysis has special math to include it.
Lifelines is a Python library built for exactly this kind of question. You give it a list of times (how long each lightbulb lasted) and a list of flags (did it burn out, or was it still working?). It draws a curve that shows the probability of surviving past any given time.
For a hospital, that curve might show: “80% of patients are still alive after one year, 60% after two years, 35% after five years.” For a streaming service, it might show: “Half of new subscribers cancel within the first three months.”
Lifelines also compares groups. Did patients who took Drug A survive longer than those who took Drug B? Did premium customers stick around longer than free-tier users? It answers these questions with statistical rigor, not guesswork.
The one thing to remember: Lifelines helps Python answer “how long until something happens?” — even when you have incomplete data about things that have not happened yet.
See Also
- Python Statsmodels Regression How Python draws the best-fit line through messy data and tells you whether to trust it.