Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What's Happening at Southwest?
76 points by JimmyL on Dec 29, 2022 | hide | past | favorite | 102 comments
The media narrative is that there's a massive software failure going on in its scheduling system, and they need to stop flying for a day or two to "reset" it. Anyone have first-hand experience on their systems about what that really means?



From what I know... Not affiliated at all with the airline industry, but I fly a lot and tend to nerd out on this stuff...

Their crew (and flight) scheduling software functions in a manner where it more or less simulates a "perfect" day of operations. Airplanes take off on time, land, and continue on. If anything disrupts this simulation, crew members had to call in and talk to someone to update the computer system to tell it that both the airplane and crew members are not where the system thought they were.

Once the call center got overwhelmed it was a cascading failure with Southwest quickly not understanding where most of their flight crew happened to be at any given moment. It appears they feel the only way to solve this is get everything (planes and crews) back into "starting position" to restart the simulation.


In this age of more or less fully network connected devices, this kind of a setup seems archaic in the extreme, probably likely to be some hangover from the 70s or 80s.


Airline systems are some of the oldest computing architectures in use.

The Sabre reservation system was created by IBM for American Airlines in the 1950s — when a computer filled an entire building floor and was built out of vacuum tubes — and remains actively used today by multiple airlines. The programming language has been switched several times over the past 60+ years, but essential compatibility remains.


I can't remember where I read this and I can't find a cite right away, but I bet someone can confirm-or-correct me: the 6-letter reservation code for your flight? That used to be an explicit pointer of memory in oldold Sabre.


The IBM 1301 disk manual (A22-6785), for the 7090 computer that SABRE was first used with seems to have a record address scheme that matches.(see page 10, Record Address) http://bitsavers.org/pdf/ibm/7090/A22-6785_1301_1302_on_709x...


I really hope that’s true, it’s hilarious either way!


Indeed. Probably the only equally complex and old dynamic computing systems out there are railroad and federal economic ones.


Apparently Southwest is one of the biggest offenders in the airline industry when it comes to not investing in their IT infrastructure. You can only get away with that sort of debt for so long.


I knew someone who was working on this, my understanding was technology infrastructure was a huge blocker to their plans for international flights. The scheduling system required many hours of downtime every night to process. It effectively ran from the moment the last airport on the west coast closed to start of operations next day east coast and it was just getting longer.


If that's true, and I have no reason to doubt it, the blame will fall squarely on the team who were asking for IT investment, and not the leadership who said no to them.


If IT is smart, there will be a paper trail to prove these asks were made/denied.



>You can only get away with that sort of debt for so long.

Kevlin Henney has recently suggested we should stop calling it technical debt, and start calling it technical neglect instead.

Managed debt is a useful thing, if you ignore debt, you end up out of business.


I have a relative that worked in the IT department at SW. Can't remember the details but he said some main parts of their scheduling software were written in the 60's (or maybe even 50's). And it is still running.

Edit: Ok reading comments jogged my memory. It was something to do with the 'Sabre' system which was from the 1950's.


This is true but also true for AA and several other airlines, for reservations and more but not crew scheduling. Sabre is old, huge, and complicated, but is not at fault here.

They run on a specialized version of Sabre that doesn't work the same as the other airlines on Sabre (AA, etc.) which is one reason you don't get SWA flights in Orbitz, etc.


Maybe it evens runs on some mainframe systems since the 80s.


The airline industry is one of the most competitive in the country. You succeed by ruthlessly cutting costs and offering tickets for $5 less than your competitors. And that’s how we end up with software from the 80s


I’m not sure that’s the case although we heard that third hand from Reddit.

https://blog.geaerospace.com/technology/big-wins-in-flight-e...

Skysolver is a GE Flight Services trademark - there’s a video here showing how it works and SW planes in the video. Contrary to the reddit claim, it does appear to use a predictive algorithm.

Highlight quote from the video: “It is humanly impossible when there’s a major disruption for somebody to figure out what the optimal approach is to get them back on schedule”

Edit: I could see tracking off-duty crew being difficult and done by phone - employees don’t carry beacons and would you want that surveillance from your company? In this situation many crew could be home for the holidays and far from their last known location, and stranded due to problems with other airlines, trains, or roads.


Via reddit/r/bestof https://old.reddit.com/r/flying/comments/zw5lsl/southwest_pi...

"So the storm came and it impacted ground ops so bad that many many crews were now “unaccounted” for and the system in place couldn’t keep up. Then it happened for several more days. By Xmas evening the CS department had essentially reached the inability to do anything but simple, one off assignments. And to make matters worse, the phone system was updated not too long ago and it was not working well."

"I used to work for a large company trying to fill the void [huge gap in the market for good aviation scheduling software], and our software was damn good too. SW was one of the airlines interested, we would demo it exactly like the scenario today, but it was "too expensive" and they stayed on their homebuilt stuff."


> "I used to work for a large company trying to fill the void [huge gap in the market for good aviation scheduling software], and our software was damn good too. SW was one of the airlines interested, we would demo it exactly like the scenario today, but it was "too expensive" and they stayed on their homebuilt stuff."

This is a common situation in any industry. A company has a solution it self-developed in its infancy to support itself, and as it grows, it comes to rely very heavily on this software. At a certain point, everyone involved with the software -- from the developers to the users to the customers -- knows that it is not on par with third-party software developed by a team dedicated to all of its potential use-cases and pitfalls, but it is very difficult for the business to replace it because the ongoing cost is a relatively low maintenance cost, and the replacement cost is a one-time, relatively high purchase fee (with, usually, a similar if not higher ongoing maintenance cost). So the immediate reaction is to stay with the low ongoing cost, even though everyone "knows" that the long-term benefits of the third-party solution far outweigh the financial savings of keeping the in-house solution, whether those benefits might be avoiding a national week-long service outage or something simpler, like the ability for staff to get more done in other areas of the company that need attention when getting larger.

I have been there. It can be very difficult to make the financial case when the future benefits are somewhat speculative or intangible. Accountants tend to devalue those benefits, relative to the hard numbers they already have. It does not help that system replacement projects often go over budget and introduce other problems that cost more money. Replacing home-grown systems is hard because they are not just software replacements; they typically also require reworking business processes.


It can be cost-driven, but it's just as likely to be complexity-gated (w/ cost frequently being implementation driven and a proxy for complexity). The cost comes from actually unwinding a gnarly legacy system (that may be less than well understood) and making the successor system work. Also, Gall's law has entered the chat:

A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.


For folks interested in legacy software systems (and managing the teams managing these systems), I highly recommend Marianne Bellotti's "Kill It with Fire: Manage Aging Computer Systems (and Future Proof Modern Ones)". The author worked at the United States Digital Service.

https://nostarch.com/kill-it-fire


You just described the last 30 years of the US phone network.


> It does not help that system replacement projects often go over budget and introduce other problems that cost more money.

If justifying speculative future benefits to the CFO is the first strike against these sorts of projects, this is the second strike — the operations team has likely been burned by failed IT modernization projects before. How hard are you going to fight for something you aren’t 100% sure is even going to work?


> How hard are you going to fight for something you aren’t 100% sure is even going to work?

Tangent: maybe I'm just tired of this industry, but I have trouble imagining myself "fighting" to convince a CxO to do anything that benefits the company but not me.

My attitude is like, here's my advice and reasoning. Listen to me or don't. I'm not going to participate in some budgetary cage match just because the CxO likes drama.

I wonder if this shows my age :)


> and the replacement cost is a one-time, relatively high purchase fee (with, usually, a similar if not higher ongoing maintenance cost)

i'd wager complex scheudling software is probably licensed, not sold outright. many many millions a year to license.

but yeah, lot of good factors you cite.


> even though everyone "knows" that the long-term benefits of the third-party solution far outweigh the financial savings of keeping the in-house solution, whether those benefits might be avoiding a national week-long service outage

Although it's also a regular occurrence that the replacement of an old 'just works' system brings about a week of downtime, mass disruption and still basic features missing.


The existing solution needn't even be home built. I worked for a company that replaced one ERP system with another. It was a medical manufacturer, so in many cases the new system had to warp to the established government mandated business practices. Over time and over budget, you bet - they weren't finished with the transition when I left.


Onr big problem of this kind of upgrades is few in the trench would be able to provide feedback or recommendation. It would be lucky if some managers in trench gets consulted.


Basically what I read was the former CEO was accountant educated and focused on stock price and financials and ignored system upgrades for like a decade. That coupled with their decentralized nature of operating (no hubs or star pattern) the recent storms caused their systems to not “know” where people or planes were even though the people knew where they were. That knowledge is the critical feature because hours of fly-time for pilots and airplanes is crucial for legally mandated rest periods, maintenance and logistics.

Having personally experienced the delays and issues with SW earlier this year I can suggest not flying them other than direct flights. Them not having any hubs means if one plane is late or a crew times out because of delays you are stranded. We got stranded at one airport overnight and got routed thru 2 other airports, changing planes at one, before reaching our destination. They have zero capacity of crews/planes on standby at any airport, probably because of the former CEO leadership and pandemic complications.


Ah so bean counting treated IT as a cost center and cut CapEx to the bone. Running things with no reserved capacity, crew wise and system wise, is just asking for a catastrophic fall off the cliff style shutdown.

Edit: a simple queue theory can explain why no reserved capacity leads to system breakdown.


Yup. For me they had brand prominence because of various customer friendly things like bags fly free, sit where you want, etc. It will take a mountain of good reports from other people over a significant period of time for me to want to fly with them.


This is the first time I’ve ever seen the plane-boarding hunger games of sit-where-you-like referred to as a customer-friendly thing.


People either love it or hate it.

Personally, having flown with them for business for 5 or so years, I generally preferred it to assigned seating.


Yes, I like it when flying alone. On other airlines you get to choose where your seat is at. On Southwest you get to choose who you sit beside.


I'll add another data point of strongly preferring it. Check in early so you can board early so you can get a good seat. I'm fairly punctual and this system rewards me for that without needing to open my wallet, so I'm a fan.

That said I agree with the grandparent comment about avoiding them for the foreseeable future based on this fiasco. Hopefully they get it together at some point.


Absolutely love it, choice of window or aisle seat is guaranteed if you’re not obsessed with being the very first person off the plane. Also in practice Southwest’s system is the fastest way to board.


They were also among the cheapest flights available now they are among the most expensive and their little extra's don't cover the difference.


The question is, will they still come out ahead even accounting for the catastrophic shutdown? Maybe once a year is fine


Another question is should the executive bonus be crawled back for the rising stock by the cost cutting which leads to catastrophic shutdown due to deferred maintenance.


Not sure it's as cut and dry as "CFO becomes CEO and overly focuses on financials" in Southwest's case.

In 2021, Gary Kelly (former CFO/CEO) announced his retirement and picked Bob Jordan (CS trained, having previously worked on the AirTran integration) to take over as CEO, which seems a realization that Southwest has some serious technical debt to address.

All businesses have highest priorities at a given point in time.

In Southwest's early years, they were legal and operational, hence Herb Kelleher.

2000-2020, I can't say financials weren't top of the list, as Southwest migrated off its older plane models.

2020+, maybe they're technical.

A good company picks the CEO it needs for the moment, not always one particular type.


But the Southwest pilot's union is accusing the former CEO, Gary Kelly, of running the company at full speed into operational suicide. I don't see that as picking the right CEO for the moment, even if the stock price did well.


The pilot's union doesn't have a monopoly on truth. They would always like to be paid more and work less, as would I!

Are they accurate or inaccurate? I can't say.

Were they unionized, I'm sure the Twitter employee's union would be crying bloody murder right now. But it's fair to say that organizationally Twitter had a lot of headcount for their product. And when you tighten a financial belt, employees are going to be unhappy.


Classic Jack Welch move. Get out before the shit you created hits the fan and leave the next guy to deal with it.


> Having personally experienced the delays and issues with SW earlier this year I can suggest not flying them other than direct flights

And it seems to me they've cut direct flights -- still plentiful on many of the hops (say, LAS <-> LAX) but scarcer on longer routes (say SLC <-> LAX).

And on the shorter routes, driving is roughly competitive time-wise...


What's being skipped over in a lot of reporting is that Southwest has serious issues maintaining ground staff staffing, and has been struggling with that the entire pandemic because they refuse to pay market rate. This means that planes can't be unloaded and turned around in any reasonable amount of time. I gave up completely on southwest after I was stuck on a plane for 4 hours this summer, because of lack of ground crew to unload the planes, and thus planes were stuck on the tarmac.

Add on to that the latest technology woes, the storm was just the thing that tipped over a company that had cut operational capacity to the bone.


A few days ago they declared a "state of emergency" in a few airports, which they're pretty transparently using to squeeze staff.

I don't see any non-bogus reason to, for example, stop accepting doctor's letters from telehealth visits only in cities where they're particularly short staffed.


From what I've gathered, Southwest effectively performs Just-In-Time resource management.

They arrange flights and crews so that the right number of planes and people are in the right places at the right times.

There's some tolerance in the system. So if the plane from New York to Cincinnati is late, it's ok-ish. The flight from Cincinnati to Dallas should be able to make it in time if things aren't too bad. Then the flight from Dallas to Phoenix should take off when it should. The Phoenix to Las Vegas flight will never know there was a problem.

It also matters for crew. Pilots can only fly for so many hours. So if you have someone stuck in a holding pattern, that cuts into times.

However, if that plane from New York to Cincinnati shits the bed, it'll fuck over Cincinnati to Dallas, Dallas to Phoenix, and Phoenix to Las Vegas. The failure just cascades. You lose planes, you lose crews, nothing is matching up and everything is fucked.

Now imagine this happens a few hundred times. Thousands of flights are affected.

Other airlines don't have this problem because they can just not do a flight. They fly people into a hub, then out of a hub. Delta will go from New York to Atlanta, and back again. Cincinnati to Atlanta, and back again. They work more like a busses. Miss a bus, catch the next one. So if you crap out a day's of flights, you can still put those people on planes and get them out. You know they're either at the hub or on their spoke. So if they're not in Atlanta, they're in their city.


In HN terms: imagine your assets are encoded in an array or list. Movement is represented by shifting elements right or left.

Most airlines are hub and spoke. Elements shift right (planes leave the hub) then shift left (return). Sometimes they shift two places right. If something disrupts this, like, a plane is stuck in [1] when it needs to be in [0], its generally recoverable when the event clears, or, you can use one of those 2 step moves in another flight to pick up stranded passengers on your way back to the starting position [0].

Southwest operate a point to point model. Assets start at [0] and have to traverse every point on the list through to [N] to succeed. N can be 3, 4 or higher. You can imagine what happens if a disruption happens in the middle of this list. Everybody upstream of the break is left planeless, everybody downstream who wants to go up cant make it further than the airport before the break, and everybody at the breakpoint is miserable.

So Southwest then fallback to a new scheduling model for which their system was not designed. Its like having a graph traversal system and trying to get it to solve matrix equations. Yeah, there are some similarities, but its not really the same.


Fun fact I found in an NYT article and LinkedIn: The current CEO Bob Jordan started at Southwest in 1988 as a computer programmer. From 2006-2008 he was EVP of Strategy and Technology


FWIW, a SouthWest pilot with 35 YOE at the company placed most of the blame on the previous CEO's lack of investment in operations in chase of shareholder profits.

The pilot gave the current CEO Bob Jordan (who started in early 2022) a vote of confidence and stated that he has strongly signaled his intention to improve the state of SW's systems, but obviously was unable to do so before this meltdown.

Source: https://www.reddit.com/r/SouthwestAirlines/comments/zxg6op/t...


Someone in charge of an org that large, less than 10 months in, getting the blame for over a decade of skipped maintenance.... pretty unfair.


A related question: can Southwest ever fully recover from this? Their reputation had taken such a hit that many loyal customers will hesitate to fly with them again, and even more fairweather customers will permanently cross Southwest off their list of carriers to consider flying with.

Their stock price is only down ~10% since the meltdown, which seems extremely optimistic to me.


Yes. I'm a very frequent flyer, and used to be far more frequent. No one, and I mean no one, flys southwest because it's their choice. "Southwest is the greyhound of the sky" is a well known phrase. You fly southwest because it's slightly better than being cramped on 3 legs of a 50-60 seat commuter jet. Your home airport dictates which airline you take. Unless another airline steps up and takes some of the routes & provides better planes, you're stuck with SWA.


> No one, and I mean no one, flys southwest because it's their choice.

I wholeheartedly disagree. I don't fly SW except to save money, but everyone else I've spoken to with an opinion on airlines loves to say how much they love the carrier.

Coworkers and I went to look at the sea of unclaimed bags yesterday, and I mentioned that SW has drastically underpaid ground crews and have flight crews sleeping in crew rooms in airports, and they said that they'd repeatedly heard how much flight crews enjoy working for the company.

I really don't understand the appeal, but it's definitely out there.


I would rank SW roughly in the middle:

Alaska Hawaiian [Delta American United Southwest] (rough tie here) Frontier Spirit

It depends on what you value. If you always fly first class, of course SW is worse. SW does offer a lot of direct flights that no other airlines offer. For me saving several hours in not changing planes is worth a slightly worse flight experience.

Also SW doesn't gouge you when cancelling or changing flights, unlike all the others.


Are you on one of the coasts?

In the Midwest I know a ton of business travelers (>1 trip/month) who default to Southwest. The weekly travelers tend to prefer the legacy carriers if they're available (so I'm sure there's something to your point even here) but IME Southwest rules the monthly/short notice flyers around me.


Weekly travelers prefer legacy carriers because they get upgraded to fist class, have special phone numbers for customer service and rebooking, board first, etc.


I can anecdotally back this statement up via experience with frequent flyers based in Nashville.


Home airport is STL, I normally fly to the coasts.


> No one, and I mean no one, flys southwest because it's their choice.

Everyone in my family preferred to fly southwest (at least before this round of incidents). Among other things, this was based on a history of better customer service especially in extenuating circumstances. They've been at the top of e.g. JD power customer satisfaction rankings for years.

If you're trying to say no one flies economy because it's their choice... I could certainly afford not to, but I _choose_ otherwise.


My boss likes seeing Southwest on my expense report because it means I’m saving money for the company.

I also enjoy the extra legroom on Southwest.


> "Southwest is the greyhound of the sky" is a well known phrase.

This had to be before spirit and frontier came onto the scene.


In various parts of the US there are sketchy bus companies with unpredictable and unreliable service that are cheaper than Greyhound. One example is the busses between New York and nearby large cities. They tend to accumulate accident records, close, and re-open under a new name. Boarding involves standing in a scrum they vaguely measure.


> Their reputation had taken such a hit that many loyal customers will hesitate to fly with them again

zero damage. This will all be forgotten in a matter of weeks. ppl will continue flying with whoever has the best rates.


Fully? No. Mostly, yes. They've definitely taken a hit, but most fliers have pretty short memories. Remember when this happened to Delta a few years ago? Most of the alternatives to SW are nearly as bad. In a year nearly everyone will be back to just choosing the cheapest flight.


People fly spirit and frontier all the time. The only thing that matters to many, many flyers is price. The airlines realize that so they all provide shit service, then people come to expect shit service and the race to lowest price continues.


There are plenty of PR firms that will help them get out of this.


Everyone here is at least a little misinformed regarding their software and it's simulation abilities. They have control systems with the data they need, and this data is fed into the flight scheduling & monitoring UI. Their tech works, it just gets bashed on because it's different than a lot of other airlines. Used properly it can account for these things.

What causes things like this is overbooking. Overbooking happens in more than just butts in seats. If you're not going to have enough ground crew to handle your SLA on flights, flights should be preemptively cancelled to keep them under that SLA. Airlines keep that SLA as low as safety permits. The max is set by the FAA at 3 hours for domestic flights and 4 hours for international flights (both ends).

Southwest's software is doing it's job, it has adjustable tolerances. It will even take into account weather conditions reducing ground crew therefore raising SLA. But, the effect of the weather conditions on the ground crew is also adjustable by humans. As a matter of fact, while one would think it would just reference historical data, that's not exactly true. It references predictive models that are adjusted by experienced individuals. Those individuals can be ordered to adjust the parameters outside their honest assessment to allow for steps at the beginning of the process to operate smoothly.

Weather prediction has a horizon. Southwest allowed excess bookings beyond that horizon, or they allowed excess last minute bookings. This weather event was massive and one-sided, yes, but it was also completely predictable. A human made the decision to widen the guardrails. There aren't a ton of people allowed to do that, I can think of maybe 3.

Disclosure: I wrote software for SWA many years ago. Nothing I've said here is privileged, in fact most airlines operate this exact way.


> Nothing I've said here is privileged, in fact most airlines operate this exact way.

I consulted for a few large carriers and this was my conclusion also. In fact most very large American enterprises have all the same issues. With airlines the problems simply become really visible all at once really quickly when things go wrong.

For the folks here on HN whose experience is mainly in Silicon Valley: it's hard to appreciate how little the execs at these companies care about software. They don't care, not even a little bit, and they definitely don't care what your opinions are. Their priority is growth, the stock price, and answering to the board (not necessarily in that order). The only carrier I heard about that cared for their IT was Continental and they were bought by United.

Compounding the issue is how staffing works at these companies. Being a full-time employee at an airline is considered attractive because you get benefits including cheap airline tickets. My understanding is that the airline ticket perk used to be much more awesome than it is today.

In any case, airlines bend over backwards not to have full-time employees. Depending on the company they have a vast army of contractors (on shore and off shore) who are on a revolving door policy lasting from 6 to 18 months. These folks come in, get trained up for a few months, turn around tickets in a grueling and dehumanizing environment, then they get to take a hike for a while until they can come back on another short-term stint. I had a colleague who had been full-time at SWA, he said his job literally only consisted of training people up on the systems, he rarely wrote much code himself, he was there to 'keep it all together.'

But honestly, the current crisis is not a surprise. The IT systems at big American enterprises are truly horrific. It is decades of homegrown software "integrated" with decades of acquisitions where systems are smashed together on short timelines in service of quarterly goals.

If you want to find the true culprit here, look up at the broader structure of the economic system. This mess is created by how we run the economy.


> A human made the decision to widen the guardrails. There aren't a ton of people allowed to do that…

Assuming this is the case, where folks can override during an incident like this, it sounds like the simulation tooling doesn’t have a human loop interface to demonstrate potential effects post override.

Does that type of simulation interface exist in the industry? And would it have helped?


I don't think the reset is around resetting the software logic exactly, but rather that things have gotten so far outside of what they should be they no longer know where planes are, where crews that can find those planes are, and how many people even are still planning on using Southwest to fly to their intended destination.

They need a "reset" to spend a few days inputting as much of that data as possible into their computer system, at which point they can start operating with some level of efficiency and start functioning again.


Truth is even they won't really know until they have time to do a post mortem.

Some observations, though.

- Via the departure control, manifests, etc...they DID know where their crews went. Any employee flying non-rev, deadhead, or on duty is recorded by employee number. There was likely a problem with keeping that synced with the crew management platform.

- Large parts of this have little to do with technology. Point to point versus hub and spoke means less slack. And recently, SWA has removed even more slack trying to drive revenue with aggressively optimistic schedules, overbooking, tight crew slack, etc. If they didn't pull pack the schedule in small, manageable pieces like the other airlines did, there's no system that would save them. Seems likely they didn't pull back enough before the storm. There's a brink you can't go past, and a rate of cancels you shouldn't exceed. You have to guess right earlier.

- Crew unions have historically negotiated out anything that looks like big brother, location tracking for example...even on-demand "I'm here". So lots of things that could have made this better didn't exist , on purpose.

- A fair amount of the reset is just practicalities of using planes empty of pax to get rested crews staged in the right cities and bags to the right places.


I basically only used them for the PIT -> LAS direct flight for Defcon, and while I never got enough miles or whatever for any kind of formal loyalty program, I got the impression that a decade and a half of respecting sky law[1] got me some conversations most passengers don't get to have :-)

One of their pilots once confided the reason he had to have them plant a drink cart in the aisle while he popped out to use the bathroom was because they're given such little time between connections they can't even stop at a urinal. (<15 minutes)

They flew too close to the sun, and they had a cascading failure. Womp womp!

[1] https://www.youtube.com/watch?v=vK1HINtjdHY


A key point is how they do routing - https://www.ajot.com/news/southwest-air-faces-gridlock-with-...

> Unlike competitors that use a so-called hub-and-spoke system to funnel passengers to large airports, Southwest is focused on point-to-point service, flying the same aircraft — Boeing Co. 737s — on trips that may hopscotch around the US.

With a hub and spoke system, all the planes go from A to HUB and then from the HUB to somewhere. If the route A-HUB gets saturated, they can put more planes on that, and those planes can always be found at the HUB. This applies to crews too.

You'll have something that looks like this: https://www.airlineroutemaps.com/maps/Delta_Air_Lines/North_...

This comes at the cost of having oversupply at some spots and its harder to offer the "ideal" routes as everyone needs to transfer to another plane with a layover somewhere... and your baggage is more likely to get lost. There's a bit not to like as a passenger on such an airline unless it's a nice one leg route - but then who wants to fly to Detroit?

Southwest is different - they go from anywhere they want to anywhere they want with non-stop flights and picking the most lucrative routes they can. This lowers the effective cost per flight and are likely non-stop flights. Everything that a customer wants.

Southwest routes from 2001 https://www.flickr.com/photos/erussell1984/15863298679 - you can see the lack of hubs there.

This is done through some crafty constraint programming to try to make sure that all the capacity is where it needs to be when it needs to be there.

However, when the capacity hits "holidays - everything is at max", along with "big storm prevents flights from going to where they need to be for the next leg" this system breaks down and planes and crews are out of position or need to sleep. Their software was able to handle this constraint system when it was a smaller company with fewer routes - there were fewer constraints.

The "reset" is not "shut down the computers and start them back up" but rather "let all the crews get their required sleep and then go to the spot where they need to be in order to handle the load - not where they currently are (out of position)".

Hypothetically, this is solvable if you have enough compute... but that's a lot of compute that needs to be recomputed each time something changes (weather, crew gets sick, passenger load changes) and that ends up being impractical and expensive.

---

Related reading:

NYT - What Caused the Chaos at Southwest : https://www.nytimes.com/2022/12/28/travel/southwest-airlines...

WSJ - How Southwest Airlines Melted Down : https://www.wsj.com/articles/southwest-airlines-melting-down... --- https://news.ycombinator.com/item?id=34165791


I find it hard to believe that compute power is the limitation here… something like simulated annealing would probably provide a reasonable, even if far from optimal solution on a regular PC. I imagine the feeding all of the data and constraints in is a bigger issue, when there is a lot of confusion and disorganization.


It would, and that's likely what they're using - but it isn't perfect.

When it becomes non-perfect and you get a number of events that throw it off (like storms causing certain legs from not getting completed to move the plane to the proper spot and holidays causing disproportionate load in certain parts of the network), then everything gets messy.


FTA: "In the event of a disruption you call scheduling and they manually adjust you"

Maybe the airline industry needs AlphaScheduler deep learning with monte carlo tree search. Or stimulated annealing. These cheap short sighted stuffed suits with MBAs who think that a fat tail refers to some part of the airplane.


It doesn’t seem like that, they have canceled virtually all flights to and from airports unaffected by storms. They seem to have huge numbers of planes and flight crews ready to go that are just sitting idle.


I’ve worked on airline scheduling system and solving to optionally is indeed NP-hard and scales badly.

However you can get to a very good spot with heuristics. This particular issue with SW looks like bad data collection(crew has to phone their location!) and a combination of lack of reserves, bad weather and holidays surge.


I found the hub-and-spoke explanation misleading. Even if you don't have a star arrangement you could still have a network of point-to-point flights that just constantly bounce back and forth between the same two cities. This is like a max flow network then and you can lose a flight or two and still be fine.

What I think is happening with SW is their planes are like pilots in Elite Dangerous, flying around the country to random destinations as the opportunity arises. This is not what point-to-point means to me, this is more like... sky Uber?


Not quite that - they've only got one plane body (the 737). Additionally, they've got the plan for what the routes should be. If you buy a flight from Albuquerque to Chicago ( https://www.southwest.com/routes/flights-from-albuquerque-to... - https://www.southwest.com/route-map/ - https://www.southwest.com/air/low-fare-calendar/select-dates... ) there is going to be a plane flying that leg.

What they don't have is a crew that does MDW - ABQ - MDW - ABQ - MDW - ... but rather a crew that hypothetically does MDW - ABQ - HOU - BWI - MDW . The exact route is based on what capacity people have purchased on different legs and may change from loop to loop based on demand.

However, this means that if there's a something that disrupts part of that (a storm cancels all the flights in MDW and HOU for a day) and there's an odd demand pattern, the plane that needs to do the BWI to MDW route is currently in ABQ. And while there is a plane in BWI that could pick up the leg of BWI to MDW, that crew is currently on a mandatory rest period... but you could use the crew that is scheduled to be deadheaded in from AUS to DCA (and then send them by ground to BWI)... but that plane is delayed.

At some point they say/said "stop, cancel everything - demand is now 0, for all planes, reposition the planes so that we can start flights on Dec 29 according to the purchased demand."


I wonder how long the “direct chains” are - if a plane is used on a flight on Monday how long until it runs that leg again, if ever?


From what I've read here and other places there was a unique situation of:

1. Large holiday volume of flights AND a lot of cancelations due to weather happened at the same time. 2. Their software system was unable to recommend optimal next steps for flight crews because it had not been designed to take this specific situation into account.

Other airlines also had problem #1. But since they mostly operate on a hub model it is easy to tell their flight crews where they should go once the weather gets better.


Disclaimer: I've read comments before write my own.

It remembers me long history of failures of post-soviet passenger railroad booking system.

They constantly have issues, it is just impossible reliable buy tickets when hot season.

Situation becomes much better latest years (to be honest, we just don't see them), just because free market - railroads give up large share of load to other transport - to air, buses, private autos, and also large part of tourists go abroad.


Reasons for so long constant decline are simple.

First, old Soviet style of management, unfortunately saved in govt backed monopoly.

Second, constantly underpaid computers/software depts.



Unlike other airlines that operate directly to and from hub airports, Southwest tends to operate a point-to-point service, meaning that when one flight is disrupted, there may not be spare aircraft and crews to pick up the route, leading to disruptions through the scheduling chain


So how do other airlines track where their planes are? Radio? ADS-B?

How about their crews? In-house mobile-phone app?

The Southwest telephone-based system sounds super low-tech. AirTags might work better.


Everyone knows where the planes are. It's the pilots and crews that are dead heading around the country to get into position that got lost.


This is going to add fuel to the fire of demand for autonomous/remotely piloted aircraft.

Airlines would like nothing more than to be able to fly passengers in drones and fire all their flight crew.


How could a CEO let this persist for more than an hour? Is there nobody with the leaderships skills to just give everyone free flights and promise everything will be sorted while having people record who worked and who flew and what services were used as best they can?

The software failed, just Jerry-rig something up to get people where they need to be, let the accountants and programmers clear up the mess after the fact.

[Edit]Software helps optimize things, but as long as the crews and suppliers have sufficient faith that they will be treated fairly, you could run this all with paper and pen for a few days while the software gets straightened out.

Decide which hubs are most important, and have someone work out what goes where by hand. Once those routes are up and running, you can work outward to the less trafficked airports. Just work on getting the most people to the places that help the most.

Crews know how many hours they've worked, and can track that themselves for the moment, or perhaps the Captain could do that. Everyone has cell phones, and could route around this damage.

The consensus here on HN seems to be to give in without trying, which I find disturbing.


You can't give people free flights if you don't know where your planes and crews are and don't know where to tell your angry customers to go. There are also laws to adhere to, there is a maximum amount of time a flight crew can work so if you tell them to "just fly if you can and we'll figure it out later" you're probably going to break a lot of laws and endanger a lot of passengers.

I think they probably are trying to jerry-rig a system, but the airline industry is heavily regulated for safety reasons (a good idea that has been extremely successful), so it's very difficult to get a plane in the air if you don't know what the fuck you're doing.


It's so simple, why didn't this entire company full of experienced specialists just do the random idea I just came up with without thinking about it at all?


Somebody once said "the H in HN stands for hubris" and I often find it painfully accurate.


It's almost as simple as crapping on an idea without saying why the idea is a bad one, which I personally find to be a much worse failure and an indicator of someone who likes to tear down without making suggestions.


I don't have enough information to make suggestions. Sometimes it's okay to not have an opinion on something, especially when you have no relationship with the company and have no idea what the reality of the situation is.


If your FAA-approved operations specifications (ops specs) say that "all revenue flights will be approved for dispatch via process X", you better make sure that all flights are approved by that process, not by Enterprising Ernie on a whiteboard somewhere.


Would that really work? How do you make sure too many people don’t show up to get on the same plane? How do you make sure every flight has a crew? How do you decide which flights to still run when only half the incoming planes actually showed up?


> have someone work out ... by hand

How many minutes for that someone to work out where a plane is, how to crew it legally, and where it should go?

How many planes crews and flights are backlogged?


According to Wikipedia, Southwest has 779 planes in service.[1]

Also, there are 11 hubs if I count correctly[2], so an average of 71 planes/hub.

Surely a team of a people could manage that amount of information in each hub on a temporary basis.

[1] https://en.wikipedia.org/wiki/Southwest_Airlines_fleet

[2] https://en.wikipedia.org/wiki/Southwest_Airlines


> how to crew it legally

the rules for this are insanely complicated and practically require automation to keep track of if you have more than a small handful of crew members.

https://www.ecfr.gov/current/title-14/chapter-I/subchapter-G...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: