More

meshko · on July 25, 2024

I so hate it when people fill these postmortems with marketing speak. Don't they know it is counterproductive?

meshko · on Feb 5, 2020

to be fair, they needed to scale to roughly a thousand concurrent users (where concurrent is used very loosely here). so I wouldn't say that scaleability was really a requirement. 1000 users is nothing.

meshko · on April 10, 2019

All the talk about how crappy Boeing engineering here was is bullshit and speculation and I am surprised PG participates in it. What we can discuss objectively here is incident response in which Boeing allowed the situation to continue after the first crash. How did they not run hundreds of hours of simulations, code reviews etc, etc on the system assumed to be at fault? How did they not immediately change the safety features associated with MCAS to be free and mandatory for everyone? Engineering mistakes happen and are hard to prevent. Business mistakes like this are a sign of terrible culture, lack of priorities and are an existential thread to the company.

salawat · on April 11, 2019

It's not really speculation. The proof, as they say, is in the impact crater.

The only mystery left is, what is the nature of the paper trail that led to this catastrophe?

Was there malicious malfeasance? Overt and irresistible pressure to certify at all costs?

Was it all just a tragic mistake? We don't know. We only know the physical systems that contributed to the crashes, and some of the motivations that would have contributed. The technical implementation can be roughly inferred by any programmer, and it doesn't take a rocket scientist to figure out a ball was dropped somewhere for a plane development program to fall afoul of such a foreseeable failure state.

meshko · on April 11, 2019

"The technical implementation can be roughly inferred by any programmer", "such a foreseeable failure state"... how many years of experience do you have?

salawat · on April 11, 2019

How rude. Here I was thinking we were having a civil discourse over the Internet. More than born yesterday, less than since the Moon landing.

Regardless, my assessment is based on most juniors I've worked with. By their third year most seem to have already grasped the need to test for boundary conditions, and to ensure proper error handling for GIGO failure. Any 1 year+ with at least FizzBuzz levels of understanding can be handheld toward it with the right nudge, and in fact, the less experience they have the more eager and likely they tend to be to pick up on error handling since they haven't yet developed sufficient skills to be able to get their head around the "test you don't need to write" because the result can be inferred from a test at another level of the system (a frequent coping strategy that starts creeping it's way in with increased levels of familiarity with a complex system).

Any problem grokking the above points is usually solved with an impromptu exercise and lecture where I have the junior play the part of a computer until they realize just how much the computer "doesn't know", and has no capacity to derive from reasoning, unless it's actually coded/implemented to. I've not yet had a junior who failed to grasp this to some degree (though a recent one is giving me a run for my money), and become capable within a couple months of inferring two to three-function away error states to test for. Within the year, I can typically point them at an arbitrary code block and get back a reasonable testing surface.

Which brings me to my next observation, where I think you may be attempting to make a point:

If I run into juniors of 1-3 years experience who need coaching to fully understand what I explained above, then perhaps the average programmer is not capable of inferring what I claim.

To which all I can say is, my observation may be skewed, because I'm a bloody paranoid polyglot of a tester when it comes to safety critical systems. Even when I was pre-collegiate programming calculators, the more someone else actually depended on something, the greater the lengths I'd go through to test things before cutting them loose with anything I was producing for them. The THERAC-25 postmortem is bedtime reading for me, and I've pushed myself to understand computer science and software engineering as more than mere 'coding'.

If the argument then, is that I'm an atypical representative of my software composing brethren, then I'd like to know why in the $deity's name we're not triple-checking safety critical code at system integration time, seeing as we can assume this level of inattention to detail by the average programmer. Especially given as the languages these types of systems are implemented in are typically not the most 'friendly' languages.

This suggests cultural issues, undue pressure to fast-track approval, disincentive to raise red flags that could impede delivery, or an "over-the-wall" hyper siloing of expertise/responsibility that lead to the least experienced in complex system implementation being blindly trusted by those who had the experience to realize something was horribly wrong.

If the above doesn't assuage any concerns relating to my experience, I'm afraid not much else will.

meshko · on April 11, 2019

You think I was rude asking you about your experience. Now think how rude this unsubstantiated allegation of obvious simplicity of the code in question is to the person who wrote it -- with the weight of hundreds of lost lives on their shoulders. These control systems can get arbitrary complex. We don't know anything about the hardware this runs on and what it has to interface with. We don't know the constraints and age of the codebase. Nothing. To assume that this boils down to a simple if statement is something I would expect from a recent college graduate, or someone who has only worked at a web startup, not a person with 5+ years of real world experience building complex systems. I agree about all the points about testing and business processes. We have enough evidence to conclude that unforgivable mistakes were made there (and I point to that in my original comment).

salawat · on April 12, 2019

>You think I was rude asking you about your experience.

You asked for a number. Who I am, doesn't factor in. The experiences and insights I can bring to the conversation do. Of which I provided more than enough to get you in the right ballpark experiencewise.

>We don't know anything about the hardware this runs on and what it has to interface with.

These are essentially networked embedded microcomputers, likely utilizing various potential protocol stacks such as

CAN, CAN FD AFDX® / ARINC 664 ARINC 429 ARINC 825 Ethernet CANopen for networking.

They are likely highly constrained, and must be compliant with DO-178B/C, which includes a need to verify the software down to the op codes spit out by the compiler.

The most popular languages for this purpose are known to be C, C++, FORTRAN, and Ada.

There's this wonderful place called the Internet where Engineers and other really dedicated people share information about what they use to do things.

>Arbitrary complexity Is a possibility, but tends to be bounded by the fact humans still need to be able to implement and verify the systems they make in a reasonable amount of time. Which coincidentally, seems to have missed a few layers or so given we're here talking about this.

The world has very little that can't be found with a little digging, and in the interests of saving time, we tend to reuse technologies when appropriate from things like, cars, in other things, like airplanes.

If you can gain a mastery of how to network and program computers in general, you gain insights into how other physical systems, even though they aren't Turing machines, interconnect and propagate information and forces.

If you can then understand engineering principles well enough to decompose complex things into a network of simpler basic parts, and understand how to employ mathematics to analyze and predict the behavior of those systems, you can quickly formulate broad guesses about contributory factors to a failure state given the even small amounts of information.

And if you say all that's impossible to appear in one person, I don't know what to tell you. I'm not asking you to have faith, I'm asking you to think, question, imagine, and connect the dots between what information is available out there.

But hey, what do I know? I'm just a guy who objects to having credibility pidgeonholed based on some number instead of the content of what is being communicated.

I apologize if I sound aggravated or hostile, but I do not appreciate it when something as tightly regulated as aircraft out of the blue starts killing people, and the reason looks to be a lack of scrutiny/verification, rushed implementation, intentionally sparse communication, and unethical sales practices for whatever reason.

There are ways to do things, and there are ways not to do things. I expect a leader of an industry to at least show a level of effort such that I can entertain the benefit of a doubt that gross incompetence Or greed was not a factor. I have no such illusions left to me based upon what I've been able to work out. The cause is somewhere in their culture or business practices, and I want it ripped out into the light as an example to everyone, everywhere.

I don't care half as much about what happens to the people involved as long as it is enough to dissuade anyone thinking of doing the same thing from going down that path.

meshko · on Aug 30, 2018

how about they just use ML for once and detect the typist and mute her automatically?

meshko · on Aug 29, 2018

Meh, it is not as hard as it seems. Just get yourself 2 kids in quick succession. Adding second one will be both easier than getting first and it will highlight how silly it was to feel that it is difficult when it was just one. Also they will play with each other while you watch TV and drink beer.

EADGBE · on Aug 30, 2018

Then, once you feel comfortable with two, add a third.

Because, what the hell!

meshko · on Aug 29, 2018

Dude I lost 10 pounds and keeping that is constant work. You go!

meshko · on July 16, 2018

I agree. I know some doctors and while they do tell stories about the patients (obviously completely anonymously) and sometimes have a laugh about them, you can always feel certain base level of respect, love and care, not unlike someone telling about their kids doing something silly. Good doctors don't dehumanize their patients, ever.

meshko · on Nov 20, 2016

I wonder if it would be possible to design a system which would power down some of the memory when not plugged to a power source. I do all my dev on an MBP and would be more or less fine if the memory-intensive things were only available when plugged in. That said, I don't really feel like 16gb limits me in any way except for doing data processing, but that should be happening on the cluster anyways.

qb45 · on Nov 20, 2016

Possible but hard because:

1. OSs are generally unaware of the details of memory controller configuration, everything is preconfigured by firmware. Apple's vertical integration could help here, but they would probably need help from Intel and it would work only in OSX.

2. It's not the case that if you install four 8GB sticks they will be mapped at 0-8G, 8-16G, etc. and the OS can simply move data from one area to another. For performance, data are striped RAID0-style across all modules. They would have to disable this.

3. I don't know if OSX supports non-identity VM-mapping of kernel memory pages. Linux for example doesn't afaik. Without this it's impossible to move arbitrary kernel objects because you can't find and update all pointers pointing to them.

meshko · on Oct 30, 2016

Oh man, I don't care (as much) about typewriters but yeah, what a happy story. It's like visiting a town for a day and learning by accident that the band you've loved for years but never seen live is playing there this very night.

meshko · on Oct 30, 2016

I've interviewed probably dozens of people who couldn't code, but never worked with one. I've had plenty of bad coworkers, but all of them could write code to a degree where they would pass basic coding interview. I assume there is just a large population of people who can't do anything at all, and they just migrate from one company to another, coasting there w/o doing anything for a couple of years, because firing is hard. I keep remembering this story i heard from a co-worker. He worked at a giant laptop repair shop. There was a guy there who didn't know how to fix anything. He stayed for a year or so, and then finally was let go. They found dozens of spare parts in his drawers -- he would just order random parts from the warehouse to imitate work, and leave them in his desk. This is part of the unspoken dues that corporations pay to the society, creating this hidden safety net for people who are just not good at what they do, or perhaps just can't do anything. It is both good and terrible. I am glad that this safety net exists, but it is really disparaging to these people. I wish society had a better way of helping them.