Building an On-Call Culture That Doesn't Burn People Out
For engineering managers and CTOs who are tired of watching their best engineers leave because of a broken on-call process. It's time to treat your team's mental health like a first-class citizen of your architecture.
of engineers cite "unreasonable on-call expectations" as a primary reason for leaving a job.
Average loss in productivity per engineer per year due to sleep disruption.
Faster resolution times reported by teams with documented runbooks.
The Human Cost of Reactive Engineering
It starts with a notification at 2:14 AM. The pager goes off, waking the whole house. Your engineer is groggy, frustrated, and terrified of making a mistake. They spend the next hour debugging a timeout that "shouldn't happen."
By 6:00 AM, the incident is resolved, but the emotional debt is paid in full. When this happens three times a month for six months straight, you stop hiring great engineers and start losing them to startups with flexible hours.
On-call isn't just a technical burden; it's a psychological one. To fix it, you have to treat it like a shift work environment, not a badge of honor.
Six Principles for a Humane On-Call Program
Sleep Parity
Compensation for off-hours work should match or exceed day-shift rates. If a shift runs from 10 PM to 4 AM, that is a 6-hour shift that requires full alertness. Pay for the hours, not just the time.
The 15-Minute Rule
Every engineer should be able to respond to a page within 15 minutes. If they can't, they are on-call for that incident. If they can, they are "observing" and should be woken up only if the situation escalates.
Rotation Equity
Avoid the "senior engineer" trap where the CTO or Principal Engineer always takes the hardest shifts. Rotate through a fixed schedule. Rotate people out of on-call every 3 months to prevent burnout.
Incident Boundaries
Set a hard stop time. If the incident isn't resolved by 6 AM, it's not your fault. The goal is to contain the damage, not stay up all night fixing it. Escalate to a senior lead if you're past your stop time.
Cognitive Load Budget
A developer has a limited capacity for "thinking hard" in a day. Don't schedule two major feature launches on a day where someone is on-call. Protect the person on-call from high-stakes meetings.
The "No" Culture
If you are on-call and you are sick, tired, or just want to go to dinner, you should feel safe saying no to a page. The system should be designed so that one person missing doesn't bring down the world.
Compensation and Time-Off Policies That Actually Work
Standard comp time policies often fail because they are vague. "Bank your hours" sounds nice until you realize you can't use them for anything but vacation.
The Fix: Implement "On-Call Bank Time."
- 1:1 Time: For every hour you spend on-call, you earn 1.5 hours of time-off.
- Flexible Use: You can use this time to leave work 2 hours early on a Friday, or take a half-day on a Tuesday.
- No Expiration: These hours roll over until you use them.
Runbook Investment: How Good Documentation Reduces Stress
Nothing causes more anxiety than staring at a black screen of logs you've never seen before. The root cause of on-call panic is almost always "I don't know what to do."
A great runbook doesn't just list commands. It tells a story:
- What symptom are we seeing?
- Why does it happen?
- What is the step-by-step recovery procedure?
- Who do we escalate to if we are stuck?
Invest in your documentation infrastructure. If it takes you 20 minutes to find the right logs, you've already lost the battle against burnout.
Blameless Post-Mortems and Psychological Safety
The moment you blame an individual for an incident, you kill your culture. The goal of a post-mortem is to improve the system, not punish the person.
Start every meeting by acknowledging that the system failed, not the human. Use phrases like "The process broke" instead of "You didn't check the logs." When your team feels safe admitting mistakes, they will fix them before they cause outages.
On-Call Health Survey
Use this quarterly to gauge how your team is actually feeling.
Your team deserves better than chaos.
If you need help building a sustainable on-call program, we offer hands-on Team Training Workshops. We audit your current practices, train your engineers on best practices, and set up the runbooks to keep everyone sane.
Written by Sarah Jenkins
Sarah is the Co-Founder of LogFlow and has been managing on-call rotations for high-traffic e-commerce platforms for over a decade. She is a passionate advocate for developer wellbeing and believes that a good night's sleep is a critical infrastructure component.
More from Sarah