Welcome to The Long View—where we peruse the news of the week and strip it to the essentials. Let’s work out what really matters.
20 Years of Neglect Led to ‘Meltdown’
Last month’s débâcle of canceled flights was caused by decades of technical debt. That’s the analysis of Columbia University professor Zeynep Tufekci.
Analysis: SWA needs a cloud burst
Although there were several contributing factors, a lack of scalability in a critical crew scheduling system led to days of near-total paralysis: In many cases, the staff were in the right place to fly and crew the planes, but the SkySolver system had no way of knowing that. Making things worse, manual fallbacks collapsed under the weight of the workload.
What’s the story? Prof. Zeynep Tufekci explains technical debt to a mainstream audience—“The Shameful Open Secret Behind Southwest’s Failure”:
“Technical debt is real debt”
It’s been an open secret within Southwest for some time … that the company desperately needed to modernize its scheduling systems. … This problem — relying on older or deficient software that needs updating — is known as incurring technical debt [and it] appears to be a key factor in why Southwest Airlines couldn’t return to business as usual the way other airlines did after last [month’s] major winter storm.
When hiccups or weather events happen, the employees have to go through a burdensome, arduous process … because Southwest hadn’t sufficiently modernized its crew-scheduling systems. For example, if … their flight was canceled … employees have had to manually call in to let the company know where they are [sometimes] being left on hold on the phone for … hours just to let the company know their whereabouts. … Online forums are full of employee accounts of such misery.
This can easily cascade to a systemwide halt. … Such breakdowns resulting from technical debt are often triggered by external events, like weather. … So why didn’t Southwest simply update its software and systems? … Updating software is costly and difficult. [But] we can’t just keep leaving the operation of more and more of our infrastructure and our lives to antiquated software. … Technical debt is real debt. It will eventually be paid by someone.
And Andrew Paul knows what to blame—“outdated software”:
“Multiple logistical collapses”
While many weary travelers are reportedly still waiting to reunite with their lost luggage, others continue to voice bewilderment at how such a logistical nightmare occurred within one of the nation’s most popular aviation providers. The answer: … employee scheduling software that debuted around the same time as the Xbox 360 and PlayStation 3. … Southwest pilots have reportedly begged company executives to update the “antiquated” systems since at least 2015.
Southwest’s long-running reliance on a crew scheduler program called SkySolver is largely to blame. … The nearly two-decade-old program couldn’t [scale] to tackle the multiple waves of cancellations and delays. This left actual Southwest employees to manually attempt matching flight crews with available planes.
Despite Southwest’s previous CEO Gary Kelly praising the airline’s “wonderful technology” in 2021, it has since experienced multiple logistical collapses due to its long-delayed adoption of … cloud-based data systems that can handle crises as large as winter storms. … Until companies like Southwest actually start investing in bringing their software into the modern age, travelers can continue to expect future headaches.
Hello, IT. Tobias Mann asks, “Have they tried turning it off and back on again?”:
Over the past year several airlines have announced sweeping upgrades to their IT infrastructure, in part to avoid pitfalls like the one Southwest is now digging its way out of. … Delta announced it would modernize and migrate its workloads to Amazon Web Services.
American Airlines went even farther announcing a similar partnership with Microsoft to migrate its data warehousing and legacy apps to a single operations hub on Azure [aiming to] speed up bag tracking, enable preemptive rerouting based on weather conditions, and simulate larger changes using digital twins. The collaboration also involved the development of a mobile application that allows airline staff to access applications and communicate from wherever they are.
But are we really talking about debt? jgwil2 tries to unpick and translate:
Technical debt could be better explained here. I think the issue is that very old software systems tend to have a lot of technical debt, which could lead to resistance to adding new features. In this instance, the tech debt seems to have led to a fragile system that could not be updated to handle more than baseline levels of traffic, forcing the company to resort to manual solutions when a crisis occurred.
Do we need more regulation? skogs thinks not:
I don’t think we need to press some stupid government oversight into this. … More useful than a thousand angry phone calls and a government writeup or fine is the fact that they literally made zero dollars during this busy time of the year. They lost money:
Aircraft that aren’t moving don’t make money.
Lost luggage deliveries.
This is industry correcting itself. You screw up big enough and it hurts your wallet. Ignore it and it hurts more later. Then Southwest simply goes out of business and firesales all their assets to a reasonable company that can … manage properly.
However, Glenn Ribotsky fundamentally disagrees:
No self-respecting economy can rely on just unregulated laissez-faire capitalism. … There need to be legal checks and balances on all aspects of economic behavior. Even if they might occasionally dampen those entrepreneurial animal spirits.
So it’s a management failing? SWA pilot Larry Lonero is not a fan:
“Two decades of neglect”
I’ve been a pilot for Southwest Airlines for over 35 years. … Unfortunately, the frontline employees have been watching this meltdown coming like a slow motion train wreck for sometime. And we’ve been begging our leadership to make much needed changes.
We were watching in frustration and disbelief as our once amazing airline was becoming a house of cards. A half dozen small scale meltdowns occurred during the mid to late 2010’s. … We could see that the wheels were about ready to fall off the bus. But no one in leadership would heed our pleas.
[But] in early 2022, Bob Jordan was named CEO. He was a more operationally oriented leader. He replaced our Chief Operating Officer with a very smart man and they announced their priority would be to upgrade our airline’s technology and provide the frontline employees the operational tools we needed to care for our customers and employees. … But two decades of neglect takes several years to overcome. … This meltdown was not his failure but the failure of those before him.
Does that ring true? Yes, says u/Untgradd:
My father retired from SWA as a captain after ~30 years. … When I asked him, this was his take as well.
I’ve seen their bidding / scheduling software. If that was what the system looked like with lipstick on I am sure the backend is … a sight to see.
From the other side of the cockpit door, this Anonymous Coward agrees:
“Greedy, nasty, worthless bean counter”
Herb Kelleher built the best airline in the world bar none in the decades he ran SouthWest. … Then Herb retired and soon after this bean counter named Gary Kelly took over. … Once Herb died Kelly was soon treating the staff as badly as almost all the other big carriers. … Then he bailed last year and handed the big stinking mess on the point of collapse over to … the designated fall guy … Bob Jordan.
The people I feel sorriest for are the employees. … To see the company they dedicated their lives to, and for which they worked so hard for, utterly destroyed by some greedy, nasty, worthless bean counter.
Meanwhile, Local ID10T offers this neat summary:
Technical debt builds up because it costs money to dedicate developer time to fix the issues. So they get ignored. … Every business has a problem with technical debt. Most just get lucky and don’t get publicly smacked in the face with theirs.