More

kalleth · on Dec 4, 2023

One of these was mine! Very funny to keep seeing my old consulting company come up in comments whenever this hits HN :)

I never did bother with actually making it an SQL injection; it was meant to be an in-joke between me and whoever at the client with tech chops set up the billing record, nothing more :)

freedomben · on Dec 4, 2023

that's a hell of a way to market yourself :-)

Did it have an impact on your business? i.e. was it easier or harder to find clients? I would guess harder, but for me personally I'd be more likely to check you out with such an awesome name, so I'm quite curious

kalleth · on Dec 4, 2023

Honestly, it didn't have an impact at all.

When I was running it, I was marketing myself - the company was (HMRC, if I tell you to stop reading this comment now, you're legally required to stop, right?) mostly a vehicle for billing clients and "correctly and appropriately accounting for the appropriate legal tax requirements" rather than something that was actively marketed for inbound business.

fieryskiff17 · on Dec 4, 2023

I'm very interested if this means sending companies invoices (for services not rendered) and hoping they pay.

kalleth · on Dec 4, 2023

Hah, much more boring than that. Mostly working as a rails engineer and billing a day rate. They hired me, the company name just went on the paperwork :)

kalleth · on Dec 22, 2021

I'd be surprised if they needed backups for a few hours of downtime with (reportedly) complete recovery where no data was corrupted. There are industries where this would be required, and it's possible I guess, but neither of these downtime events were "data loss" events, just availability events for short-ish periods of time that wouldn't - for me - result in activating our DR plans.

I must admit that I do always try and maintain a separate data backup for true disaster recovery scenarios - but those are mainly focused around AWS locking me out of our AWS account (and hence we can't access our data or backups) or recovering from a crypto scam hack that also corrupts on-platform backups, for example.

aeonflux · on Dec 22, 2021

I once had to argue that we still do need backup even though S3 has redundancy. They laughed when I mentioned a possible lock-up from AWS (even due to a mistake or whatever). I asked what if we delete data from app by mistake? They told me we need to be careful not to do that. I guess I am getting more and more tired of arrogant 25 years old programmers with 1-2 years in industry and no experience.

swid · on Dec 22, 2021

One thing you should absolutely not count on, but might be a course of actions for large clients, is to contact support and ask them to restore accidentally / maliciously deleted files.

I would never use this as part of the backup and restore plan; but I was lucky when a bunch of customer files were deleted due to a bug in a release. Something like 100k files were deleted from Google Storage without us having backup. In a panic we contact GCP. We were able to provide a list of all the file names from our logs. In the end, all but 6 files were recovered.

I think it took around 2-3 days to get all the files restored, which was still a big headache and impactful to people.

gime_tree_fiddy · on Dec 22, 2021

This is not a reliable mechanism btw. There will be times when they won't be able to restore the data for you. Their product has options to avoid this situation like object versioning.

manquer · on Dec 22, 2021

S3 and (others) have version history that can be enabled.

If you have to take care of availablity and redundancy and delete protection and backups then why pay the premium S3 is charging ?

Either you don't trust the cloud and you can run NAS or equivalent (with s3 APIs easily today) much cheaper or trust them to keep your data safe and available.

No point in investing in S3 and then doing it again yourself.

ncallaway · on Dec 22, 2021

> No point in investing in S3 and then doing it again yourself.

I mean that's just obviously wrong, though.

There is a point.

> Either you don't trust the cloud and you can run NAS or equivalent (with s3 APIs easily today) much cheaper or trust them to keep your data safe and available.

What if you trust the cloud 90%, and you trust yourself 90%, and you think it's likely that the failure cases between the two are likely to be independent? Then it seems like the smart decision would be to do both.

Your position is basically arguing that redundant systems are never necessary, because "either you trust A or you trust B, why do both?" If it's absolutely critical that you don't suffer a particular failure, then having redundant systems is very wise.

manquer · on Dec 22, 2021

My point is if your redundancy is better than AWS then why pay for them ? If it not they why invest in your own?.

You can argue that you protect against different threats than AWS does . So far I have not seen a meaningful argument of threats a on Prem protects differently than the cloud that you need both.

Say for example your solution is to put all your data backups on the moon then it makes sense to do both, AWS does not protect against threat to planet wide issues.

However if you are both protecting against exact same risks having just provider redundancy only protects against events like AWS goes down for days /months or goes bankrupt.

All business decisions have some risk , provider redundancy does not seem a risk to mitigate for the cost it would mean for most businesses I have seen.

Even Amazon.com or Google apps host on their own cloud and not use multi cloud after all, their regular businesses are much bigger than their cloud biz , they would still risk those to stick to their cloud/services only.

xboxnolifes · on Dec 22, 2021

> My point is if your redundancy is better than AWS then why pay for them ? If it not they why invest in your own?

This is a really confusing question. Redundancy requires more than 1 option. It's not about it being better than AWS, it's that in order to have it you need something besides just AWS. AWS may provide redundant drives, but they don't provide a redundant AWS. AWS can protect against many things, but it cannot protect against AWS being unavailable.

miked85 · on Dec 22, 2021

> Even Amazon.com or Google apps host on their own cloud and not use multi cloud after all, their regular businesses are much bigger than their cloud biz

This is probably true with Google, but AWS contributes > 50% of Amazon's operating income. [1]

[1] https://www.techradar.com/news/aws-is-now-a-bigger-part-of-a...

manquer · on Dec 23, 2021

Interesting, no wonder AWS head became Amazon CEO.

Their retail/e-commerce side is less profitable than AWS but the absolute revenue is still massive and the risk of losing that a chunk of that revenue(and income) due to tech issues is still enormous risk for Amazon .

bee_rider · on Dec 22, 2021

If you trust your airbag, why bother with the sealtbelt?

RA_Fisher · on Dec 22, 2021

True with two independent servers at 90% each, that’s 0.1^2 = 1% chance both fail— so redundancy can add a lot of reliability.

manquer · on Dec 22, 2021

Only if they are truly independent of each other.

You and AWS are using similar chips similar hard disks even with similar failure rates.

If you both use same hardware from say batch both can defects and fail at similar times.or you use the same file systems, that say corrupts both your backups.

90% is not a magic number , you need to know AWS supply chains and practices thoroughly and keep yours different enough not to have same risks as AWS does for your system to have independent probability of failures.

RA_Fisher · on Dec 27, 2021

True. One would want to continually decorrelate services or model the dependencies. Redundancy will help even with some dependency, but you raise an important point.

JackFr · on Dec 22, 2021

You assume failures are uncorrelated. Which, depending on what you think you are protecting yourself from, might or might not be true.

(Consider a buggy software release which incorrectly deletes a backup. Depending on the bug it’s very possible it will delete in both places.)

bee_rider · on Dec 22, 2021

If one buggy software release can delete both copies, then you don't have actual redundancy from the point of view of that issue.

kalleth · on Dec 22, 2021

In most startups? You're mostly correct.

But you still have some risks here, yes, with a super low probability, but a company-killing impact.

In some industries - banking, finance, anything regulated, or really (I'd argue) anywhere where losing all of your data is company killing - you will need a disaster recovery strategy in place.

The risks requiring non-AWS backups are things like:

- A failed payment goes unnoticed and AWS locks us out of your AWS account, which also goes unnoticed and the account and data are deleted

- A bad actor gains access to the root account through faxing Amazon a fake notarized letter, finding a leaked AWS key, social engineering one of your DevOps team, and encrypts all of your data while removing your AWS-based backups

- An internal bad actor deletes all of your AWS data because they know they're about to be fired

...and so on.

There's so many scenarios that aren't technical which can result in a single vendor dependency for your entire business being unwise.

A storage array in a separate DC somewhere where your platform can send (and only send! not access or modify) backups of your business critical data ticks off those super low probability but company-killing impact risks.

This is why risk matrices have separate probability and impact sections. Miniscule probability but "the company directors go to jail" impact? Better believe I'm spending some time on that.

cameronh90 · on Dec 22, 2021

Just to add that S3 supports a compliance object lock that can't even be overridden by the root user. Also AWS doesn't delete your account or data until 90 days after the account is closed.

Between these two protections, it's pretty hard to lose data from S3 if you really want to keep it. I would guess they are better protections than you could achieve in your own self managed DC.

I'm guessing AWS has some clause in their contract that means they can refuse to deal with you or even return any of your data if they feel like it. Not sure if that's ever happened, but still worth considering it.

manquer · on Dec 22, 2021

Yes threat models is obvious qualifier, if you have a business that requires backup on the moon if there asteroid collision then by all means got for it.[1]

For most companies what AWS.or Azure offers is more than adequate.

An internal bad actor with that level of privileged access can delete your local backups or external one can all things you he can do to AWS he can likely do easier to your company storage DC too.

Bottom-line it doesn't matter if customers can pay for all this low probability stuff that can only happen on the cloud and not on Prem sure go ahead. Half the things customers pay for they don't need or use anyway.

[1] assuming your business model allows you to spend the expense outlay you need for the threat model

chrisandchris · on Dec 22, 2021

Nope. 3-2-1 strategy. 3 Backups, 2 Medias, 1 Offsite. Now try to delete files from the media in my safe. Only I have a key.

Sure, your threat model may vary. But relying on cloud only for your backup is simply not enough. If you split access for your AWS backup and your DC backup to two different people, you mitigated your thread model. If you only have 1 backup location, that's going to be very hard.

manquer · on Dec 22, 2021

All of these are questions asked and solved 10 years ago by bean counters who only job is risk mitigation.

Every cloud provider has compliance locks which even root user cannot disable, version history and you can setup your own copy workflow storage container to second container without delete/update access to second one to two different people or whatever.

You don't need to do any of it offsite.

JackFr · on Dec 22, 2021

Not sure I agree about the usefulness of different media.

Having had to restore databases from tapes and removable drives for a compliance/legal incident, we had a failure rate of >50% on the tapes and about 33% for the removable drives.

I came away not trusting any backup that wasn’t on line.

lambic · on Dec 22, 2021

We have AWS backups, "offsite" backups on another cloud provider, and air-gapped backups in a disconnected hard drive in a safe.

The extra expense outlay for the 2 additional backups is approximately $50/month, so it's not going to break the bank.

manquer · on Dec 22, 2021

Egress from aws is not cheap.

At $50/month scale a lot of things are possible. Most companies cannot store their data in a hard disk in a safe. If you can, then cloud is a convenience not a necessity for you. I.e. you are perfectly fine running your storage stack for the most part.

My company is not very big(100ish employees) and we pay $200k+ for AWS in just storage and AWS is not even out primary cloud. If we have to do what you have, it is probably in bandwidth costs alone another $500k. Add running costs in another cloud and recurring bandwidth for transfers , retrieval from Glacier for older data on top of that.[1]

Over 3 years that would be easily $1-$1.5 million in net new expenses for us scale.

No sane business is going to sign off on +3x storage costs on a risk that cannot be easily modeled[2] and costs that cannot be priced into the product, just so one sysadmin can sleep better at night.

[1]your hard disk in a safe third component is not sensible discussion point at reasonable scale.

[2] this would be probability of data loss with AWS * business cost of losing that data > cost of secondary system.

Or probability of data availablity event(like now) * business cost of that > cost of an active secondary system .

For almost no business in the world the either equation would be valid.

For example even the cost is 100B dollars in revenue with 6 nines of durability the expected loss would be only $10,000 (100B * 0.000001) a secondary system is much costlier than that.

lambic · on Dec 23, 2021

> My company is not very big(100ish employees)

I don't get how this is relevant at all, it's more about how much data your company stores than how many employees it has.

I've worked for a company with 5000 employees that stored less data (fewer data?) than my current employer that has less than 100.

> No sane business is going to sign off on +3x storage costs on a risk that cannot be easily modeled

Probably not, but for us the cost is about 0.1x our aws storage costs, so it's a no-brainer.

jeremyjh · on Dec 22, 2021

There are completely independent risks that you are dealing with here. If you are a small company there is a non-insignificant risk that your cloud account will be closed and it will be impossible to find out why or to fix it in a timely matter. There have been several that were only fixed after being escalated to the front page of Hacker News, and we haven't heard about the ones that didn't get enough upvotes to get our attention and were never fixed.

Also, what we saw on Dec 7th was that the complexity of Amazon's infrastructure introduces risks of downtime that simply cannot be fully mitigated by Amazon, or by any other single provider. More redundancy introduces more complexity at both the micro level and macro level.

It doesn't really cost that much to at least store replicated data in an independent cloud, particularly a low-cost one like Digital Ocean.

scurvy · on Dec 22, 2021

Backup on site and store tertiary copies in a cloud. Storing all backups in AWS wouldn't meet a lot of compliance requirements. Even multiple AZs in AWS would not pass muster as there are single points of failure (API, auth, etc).

hinkley · on Dec 22, 2021

Whether you realize it or not, you believe in the Scapegoat Effect, and it's going to get you into a shitload of trouble some day.

Customers don't care if it's you're fault or not, they only care that your stuff is broken. That safety blanket of having a vendor to blame for the problem might feel like it'll protect your job but the fact is that there are many points in your career where there is one customer we can't afford to lose for financial or political reasons, and if your lack of pessimistic thinking loses us that customer, then you're boned. You might not be fired, but you'll be at the top of the list for a layoff round (and if the loss was financial, that'll happen).

In IT, we pay someone else to clean our offices and restock supplies because it's not part of our core business. It's fine to let that go. If I work at a hotel or a restaurant, though, 'we' have our own people that clean the buildings and equipment. Because a hotel is a clean, dry building that people rent in increments of 24 hours. Similarly, a restaurant has to build up a core competency in cleanliness or the health department will shut them down. If we violate that social contract, we take it in the teeth, and then people legislate away our opportunities to cut those corners.

For the life of me I can't figure out why IT companies are running to AWS. This is the exact same sort of facilities management problem that physical businesses deal with internally.

I have saved myself and my teams from a few architectural blunders by asking the head of IT or Operations what they think of my solution. Sometimes the answer starts with, "nobody would ever deploy a solution that looked like that". Better to get that feedback in private rather than in a post-mortem or via veto in a launch meeting. But I have had less and less access to that sort of domain knowledge over the last decade, between Cloud Services and centralized, faceless IT at some bigger companies. It's a huge loss of wisdom, and I don't know that the consequences are entirely outweighed by the advantages.

b112 · on Dec 22, 2021

Erm.

In some orgs, recreating lost data, code, deployment and more is literally hundreds of thousands of hours of work.

In a smaller org, the devastation can be just as stark. Loosing hundreds of hours of work can be a death knell.

Anyone advocating placing an entire orgs's future on one provider is literally, completely incompetent.

It's the equiv of a home user thinking all their baby pics will be safe on google or facebook. It is just plain dumb.

hayd · on Dec 22, 2021

Having an additional AWS account which some S3 backs up to, with write only permissions (no delete) and in an account that is not used by anyone, seems like a good idea for this type of situation/concern.

tonto · on Dec 22, 2021

I had this experience when I asked about s3 backup also (after a junior programmer deleted a directory in our s3 bucket...). The response from r/aws was "just don't let that happen" or ("use IAM roles")

AceyMan · on Dec 22, 2021

411, in the latest reInvent AWS announced preview of AWS Backup for S3 (right now in USW2 only).

Relevant blog post, https://aws.amazon.com/blogs/aws/preview-aws-backup-adds-sup...

sidpatil · on Dec 22, 2021

> I asked what if we delete data from app by mistake? They told me we need to be careful not to do that.

Ah, the Vinnie Boombatz treatment.

smiths1999 · on Dec 22, 2021

Maybe they are getting tired of arrogant older programmers assuming they cannot possibly be wrong. God forbid a 25 year old might actually have a good idea (and I am far removed from my 20s).

Maybe having S3 redundancy wasn't the most important thing to be tackled? Does your company really need that complexity? Are you so big and such an important service that you cannot possibly risk going down or losing data?

ramraj07 · on Dec 22, 2021

You really chose to die on “backups are for old people” as a hill?

smiths1999 · on Dec 22, 2021

I'm not sure how you got "backups are for old people" from my post. My point is that there are two sides to this. Perhaps the data being stored on S3 data _was_ backup data and this engineer was proposing replicating the backup data to GCP. That's probably not the highest priority for most companies. Maybe the OP was right and the other engineers were wrong. Who knows.

In my experience, the kind of person that argues about "arrogant 25 year olds that know everything" is the kind of person that only sees their side of a discussion and refuses to understand the whole context. Maybe OP was in the right, maybe they weren't. But the fact that they are focusing on age and making ad hominem attacks is a red flag in my book.

ramraj07 · on Dec 23, 2021

I’ve most definitely been in numerous places where arrogant 25 year olds with CS degrees but not smart enough to make it to FAAnG think they know what they are talking about when they don’t. Not every 25yo is an idiot, but many especially in tech think they are smarter than they are because they’re paid these obscene amounts of money.

exdsq · on Dec 22, 2021

I’d love to know what someone works on when the risk of losing data is not worth one or two days engineering work.

lostcolony · on Dec 22, 2021

But that's just it; you can't even have that discussion if the response to "hey, should we be backing up beyond S3 redundancy?" is "No. Why would we? S3 is infallible"

smiths1999 · on Dec 22, 2021

Sure you can. As the experienced engineer in that setting it is a great opportunity to teach the less experienced engineers. For example, "I have seen data loss on S3 at my last job. If X, Y, or Z happen then we will lose data. Is this data we can lose? And actually, it is pretty easy to replicate - I think we could get this done in a day or two."

It's also possible the response was "That's an excellent point! I think we should put that on the backlog. Since this data is already a backup of our DB data, I think we should focus on getting the feature out rather than replicating to GCP."

Those are two plausible conversations. Instead, what we have is "these arrogant 25 year olds that have 1-2 years of experience and know it all." That's a red flag to me.

FpUser · on Dec 22, 2021

>"Maybe they are getting tired of arrogant older programmers..."

And this is of course valid reason to ignore basic data preservation approaches.

Myself I am an old fart and I realize that I am too independent / cautious. But I see way too many young programmers who just read sales pitch and honestly believe that once data is on Amazon/Azure/Google it is automatically safe, their apps are automatically scalable, etc. etc.

smiths1999 · on Dec 22, 2021

Yes - the point of that line was to be ridiculous. Age has nothing to do with it. Anyone at any age can have good ideas and bad ideas. There are some really incredibly _older_ and highly experienced engineers out there. But there are others that think that experience means they are never wrong. Age has nothing to do with this - what is important is your past experience, your understanding of the problem and then context of the problem, and how you work with your team.

And again, my point isn't that you never need backups. My point is that it is entirely plausible that at that point in time backups from S3 weren't a priority.

javagram · on Dec 22, 2021

Sounds like the kind of short-term thinking that leads to companies being completely wiped out by ransomware. Who needs backups anyway?

oblio · on Dec 22, 2021

It's not a lot of complexity.

Add object versioning for your bucket (1 click) and mirror/sync your bucket to another bucket (a few more clicks).

Yes, your S3 costs will double, but usually they're peanuts compared to all the other costs, anyway.

Debating it takes longer than configuring it.

nightpool · on Dec 22, 2021

As I understand it, Aeonflux was talking about redundant backups outside of S3, which are much more complex.

exdsq · on Dec 22, 2021

Cron-ran SFTP from s3:// to digitalOcean:// ?

wly_cdgr · on Dec 22, 2021

Would you put the one and only copy of your family photo album up on AWS, where AWS going down would mean losing it? Because your customers' data is more important than that

smiths1999 · on Dec 22, 2021

AWS going down means I've lost it or temporarily lost access to it? Those are two very different scenarios. Of course S3 could lose data - a quick Google search shows it has happened to at least one account. My guess is it is rare enough that it seems like a reasonable decision to not prioritize backing up your S3 data. I'm not syaing "never ever backup S3 data" only that it seems reasonable to argue it's not the most important thing our team should be working on at this moment.

I have my family photos on a RAIDed NAS. It took me years to get that setup simply because there were higher priority things in my life. I never once thought "ahh I don't need backups of our data" I just had more important things to do.

Beltiras · on Dec 22, 2021

Losing data usually means losing customers. Usually more customers than just whos data you lost.

smiths1999 · on Dec 22, 2021

I suppose the caveat is you have to have customers to lose them :) We don't know what the data is or the size of the company.

silon42 · on Dec 22, 2021

Next time also mention that it might be a problem to get a consistent back of microservices...

hinkley · on Dec 22, 2021

AWS has had at least one documented incident where a region had an S3 failure that was not recoverable. They lost about 2% of all data. That might not sound like much but if you have a lot of data, partial restoration of that data doesn't necessarily leave your system in a functional state. If it loses my compiled CSS files I might be able to redeploy my app to fix it. Then again if I'm a SaaS company and that file was generated in part from user input, it might be more difficult to reconstruct that data.

Johnny555 · on Dec 22, 2021

Which incident is this? I can’t find it online. The closest I can recall is when they lost some number of EBS volumes. We were affected by that, but ran snapshots (to s3) to recover the affected servers.

newobj · on Dec 24, 2021

Sorry, when was this? Please provide a citation.

kalleth · on July 21, 2021

If you don't have the spec, then you likely don't have a deadline, or a "project", really -- and something like this would be the wrong choice for an approach to follow.

I'd say leaving the overall roadmap (which is all this produces, at the end of the day, if you ignore the estimation piece) fuzzy and allowing the team to work that out with users/subject matter experts is the right approach, imo.

kalleth · on July 21, 2021

Well phrased, thank you! I didn't think of it in this way, but yes, that kind of phasing makes sense.

kalleth · on July 21, 2021

That's really tough, and I'm sorry you had to deal with it.

I was lucky in that I was dealing (in both cases where I've run similar flows to this) with above-board exec teams who wanted the best quality information I could give them - even assuming that estimates are just assumptions - even if it meant having some tough conversations about scope or headcount.

marcus_holmes · on July 21, 2021

Like I said, it was early in my career. I now know how to handle this - ask more questions and work out what the real objective is. Also refuse to let people shout at me ;)

kalleth · on July 21, 2021

Insightful, thank you! The entire project management industry doesn't have that great a "hit rate" -- consider the budget overruns for the last few Olympics, or for Crossrail.

I'm just not sure why software projects are "special" -- if you can avoid it being a project and instead make it ongoing OpEx like, for example, GDS managed for the UK in 2016, then great, you've sidestepped that, but until the entire PM industry discovers how to improve overall project management techniques, I don't see why we'd consider our industry "above" them.

marcus_holmes · on July 21, 2021

> instead make it ongoing OpEx

Every non-tech startup founder who's approached me with "how long will it take to build an MVP for my startup idea?", I've answered with this. Development is an ongoing cost, a process, not a once-off capex cost.

I recommend to them going the other way. Start with "how much can you afford to pay a dev team sustainably?" then work out how many devs that works out to, then work out how long your MVP will take to build based on their estimates (and estimates are not deadlines).

Not quite the same as #no_estimates (which I also try to argue for whenever possible), but close.

RogerL · on July 21, 2021

It's not special, except that for whatever reason people insist on treating random data as gospel.

How long until we cure retinoblastoma? Everyone understands there is no way to produce a meaningful timeline. There are too many inter-related unknowns - various causes, various treatment modalities, varying funding on various fundamental and applied research, no real idea if the final answer is gene therapy, nano-something, chemo, etc.

I used to develop software for a defense contractor, and it was pretty waterfall-y. But we built risks in, to an extent. Not by multiplying Sally by 1.3x, Joe by 2.7x or whatever, but you'd chart it all out, showing interconnections (x depends on Y, which depends on Z and Q, which...). And then roughly figure out risks of each of those sub-tasks going long.

The idea NOT being you then just multiply by a weight, and ta-da, you have an accurate schedule. The idea is that you have now identified particularly risky chain of events, and now you at least have a chance of managing risk. Every day/week you'd have a risk assessment meeting. Where are we on X, Y, Z. What can we do to get X back on track? Can't, okay, can we we-jigger the dependencies, or is this a hard slip? And so on. I've never seen this done on the commercial side, and it just seems like people are flying blind as a result.

"Waterfall is terrible" you reply. Sure. But when you are building an airplane, ya kinda need the 1553B/ARINC bus installed before you install and test the avionics. You can't attach the engines if the wings haven't shown up. You can't redesign the fuselage after the wing builder started building the wings (in general). These are hard, unavoidable dependencies, and changes are often extremely to destructively expensive (hence the endless mind numbing meetings arguing about change control).

It is just (IMO) not an unsolved problem, but unsolvable. Too many unknowns results in unpredictability. Your only bet is to manage the risks, adjust as necessary, and accept some things are just unknowable. Agile does that in one way, sophisticated waterfall in another.

some_chap · on July 21, 2021

I'd say that software is "special" because it's ephemeral, meaning there's incredibly few limitations on the possibilities (or changes to requirements mid-project) when compared to projects involving physical items. It /can/ be managed like a physical engineering project, but the cost and time ramp up so severely that it's not practical for most situations.

gadders · on July 22, 2021

A project plan is only a point in time prediction of a project end date. It should be continuously updated as new facts are known (scope changes, additional complexity, unknown risks happen).

Obviously as you progress through your project, you predicted end date should become more and more accurate.

A project plan, though, is a prediction of the future. I've not seen anyone that can predict the future perfectly. That's not to say that you shouldn't do it, but defective project management creates a project plan on Day 0 and then tries to bend reality to meet the plan.

*Obviously all the above is caveated with reasonableness - you do try and bend reality a certain amount to meet your plan, and you try and keep to your plan as much as possible.

tonyedgecombe · on July 21, 2021

You can't really do that with the Olympics though as there is a hard deadline. Once you have a fixed date then either quality or cost are going to have to give.

kalleth · on July 21, 2021

I think I would have popped a monocle too, had that been the real example and estimate from a team!

I tried to think up an accessible example that didn't require too much context on the part of the reader, so obviously, as you correctly point out, all the numbers are made up, and I'm trying to use it solely to demonstrate the workflow :)

commandlinefan · on July 21, 2021

> all the numbers are made up

Well... have you actually applied this process successfully? If so, wouldn't you have some actual numbers to point to from a past project? Names and details changed a bit to protect the innocent, of course.

kalleth · on July 21, 2021

I have, yes - or I'd feel a bit like a charlatan writing about it!

The problem with the real world examples is the business domain, which was hyper complex and the specific "pieces" of work I described wouldn't have been easy to grasp for most not familiar with the esoteric side of fintech that the project took place in.

So I went with a simpler, albeit contrived and more accessible example.

commandlinefan · on July 21, 2021

Were they monocle-popping results?

kalleth · on July 21, 2021

Author here. I've been lucky enough in my career to hold some senior positions, and I thought I'd give a step-by-step on an approach I took with an "enterprise-scale" software project, and how I stole some techniques from university project management courses to meet with some success. Happy to answer any questions :)

mzarate06 · on July 21, 2021

I'm curious that you mention project management courses from university. How much benefit have you found them to provide in practice?

Asking b/c I've taken two courses in dev process or project management in my academic career, and neither provided substantial value or benefit to how I've lead projects professionally.

kalleth · on July 21, 2021

In the early stages of my career, or in startup life? Absolutely not. Very little relevance; XP/SCRUM were both covered together in a single 50 minute lecture, the rest of the PM aspects were tackling paperwork-generation methodologies like the "Rational Unified Process" and "Dynamic Systems Development Model", both of which I feel like would be _hell_ if I actually had to work within.

However, there were techniques (like critical path analysis) that as I've got more senior, started working at larger companies, and started stepping down the senior leadership path, I've started to see some applicability to. Not direct applications - they still need taking with a massive pinch of salt, and modifying for modern learnings in industry, but they do start to provide some value, even if it's just learning what the grey-hairs in the exec are used to seeing :)

mtippett · on July 22, 2021

Fully agree. There are huge nuggets of wisdom in old-school engineering. For example PERT is something that most people don't know about. But it's simple to apply in the real world. Get three estimates - optimistic, pessimistic and realistic. double the realistic, add the optimistic and pessimistic and divide by four. It's basically a centerweighted estimate - but it forces you to think about the the corner-cases and what could go wrong.

Lots of other old school stuff that we "just turning grey" people need to translate for the new kids.

justin_oaks · on July 21, 2021

How are deadline dates assigned? Is the deadline exactly the same as the estimated completion date?

Realistic estimates aren't padded, but they still have significant probability of being inaccurate. After all, they're estimates, not information from the future transmitted to the past.

kalleth · on July 21, 2021

This is where the ideal meets the annoying reality of The Enterprise (tm).

I can't talk in too much detail, but in general, the deadline date was fixed through commercial contracts signed at a high enough level that engineering didn't have sight of them. The concept and commercial case was sound, but the implementation hadn't been worked out yet, when a date was set.

My strong preference would be for estimation to come first, of course, before a deadline is picked (and even then, only picked if it is really a necessary deadline), which is then based on reality, and also include some slack for unintended discoveries.

hu3 · on July 22, 2021

I had this "Enterprise sets the deadlines" experience just yesterday.

I delivered a spreadsheet with detailed information of how long it would take to develop each feature of the new module they wanted. It totalled around 3 months of development.

Fancy suit folks told me 3 months wouldn't do because sales promised it would be ready in 2 months. Then I was asked where could shortcuts be taken.

In the end we had to cut features that will certainly upset our client given their initial expectation.

sarks_nz · on July 22, 2021

Thanks for the article. Curious how you ensure dev outputs match up with other (external) deadlines, for example, the sales/marketing teams firing up a promotion of your new functionality.

How do you incentivise devs to hit those timeframes?

PUSH_AX · on July 21, 2021

Our paths crossed on the engineering team of a certain letterbox flower company, for a short time at least. Good to see you doing well Tom, nice article.

kalleth · on July 21, 2021

Thanks! And I couldn't possibly comment on which company that could be...

Hope you're doing well too!

ksec · on July 21, 2021

Do you think there is a different in how US and UK tech companies in project planning?

kalleth · on July 21, 2021

I've never worked for a US tech company, so I wouldn't know, sorry!

pphysch · on July 21, 2021

What tools and methods do you use for creating, sharing, and modifying project charts?

kalleth · on July 21, 2021

Depends how much of a perfectionist I'm feeling. For the initial development, a sharpie, index cards, and a whiteboard wall - or Lucidchart, because it's basically an in-browser whiteboard/drawing tool.

Once the project is ongoing and you'll need to account for changes, I've either done it manually (which takes an age) or handed it off to PM's to oversee using either MS Project, airtable with a custom-authored set of actions/etc, or PrimaVera.

kalleth · on May 2, 2020

I did read the article.

I understand the desire to "do something", but the certification and testing process for something like this is how you find out if this actually improves patient outcomes or not. Sometimes "doing nothing" in a certain area gives a better outcome.

I'm glad they're advising doctors to use caution.

I have some architectural concerns about how it's built (a browser's javascript engine is not reliable enough to be used as a safety-critical alerting engine!), but I can see how it's attractive.

_t0du · on May 2, 2020

It feels like you've lost the forest for the trees.

kalleth · on May 2, 2020

I apologise for this comment, you've done some great coding, but this scares the shit out of me.

There's a reason medical certifications are so hard to get, and medical software is so expensive.

You're storing patient information in postgres. What certifications do you have to assert that the patient data is stored securely, in line with your government guidelines on patient/medical data? There's a damn good reason this is the "holy grail" of information security certifications.

You've got critical alerting built into the browser window using JavaScript.

This "alerting" is the kind of critical thing that sometimes needs *immediate" intervention, or someone could die. What happens if your browser experiences a JavaScript error blocking processing? And your alerts don't fire?

What happens if they fire too often and you get "alert fatigue" because they're not tuned correctly or in line with the other alerts available at the bedside/nursing station?

How much testing have you done to correctly assert that you're interpreting the HL7 or other specs correctly? And aren't misinterpreting data for some conditions or types of individual?

The "throw things together quickly" startup mentality might (I stress might!) Be okay where it's the difference between nothing at all and something that can save lives, in a country like Sri Lanka, during a global pandemic, fine.

But afterwards, this is so much junk without serious thought and time put into certifying it.

Medical, Aerospace -- really, any safety critical industry where your code working or not could mean someone is seriously injured or dies as a result -- is an industry that needs disruption, but that disruption should happen slowly, carefully, and safely.

harikb · on May 2, 2020

> We created this software on a request from healthcare staff

If this is some small town hospital in Srilanka, the choice is between an unaffordable certified solution and not having any monitoring. If Medical software didn’t bleed them dry, they wouldn’t go this route.

> disruption should happen slowly, carefully, and safely

Disruption always happens this way - same way Uber broke existing laws. Yes, few people will die. But this isn’t surprising when the alternative is even worse.

kalleth · on May 2, 2020

> the choice is between an unaffordable certified solution and not having any monitoring.

No, this isn't _necessarily_ the choice. Without a "false sense of security" that an imperfect monitoring system might instil, you have nurses and doctors actually doing rounds and checking their patients.

> Disruption always happens this way - same way Uber broke existing laws. Yes, few people will die. But this isn’t new when the alternative is even worse.

This is an absolutely horrible viewpoint to have. People dying because of "disruption" so a few companies can make a few more dollars is _never_ acceptable.

hinkley · on May 2, 2020

It's funny-sad watching my fellow tech people debate civics and public policy and talk about how often "Something must be done, this is 'something', so we will do it" exhibits itself. Everyone nods or cheers as if we have some leg to stand on.

When it comes to solving technical problems? We are ever so happy to do exactly the same thing.

Any solution is better than no solution. Except when no solution causes people to stop trying to delegate an important responsibility. Which is quite frequently.

A crap solution crowds the problem space. If a better solution is possible, it now has to defend itself against the incumbent. Explain why it is more expensive, why people should be bothered to switch.

If you can't do something well, then for pity's sake let someone else try. Log away every cost of not doing it at all and then when you can justify doing it well, build your pitch.

derangedHorse · on May 2, 2020

I think we can only ascertain whether this is a good or bad thing if there was data on the amount of valid abnormalities caught by this system vs having nurses and doctors having to do rounds. We also have to take into consideration the fact that they may run out of money for disposable protective gear, or even have the amount of protective gear available for purchase drastically reduced. From his disclaimer in the post it also seems like they're using this on top of their typical monitoring so that the staff can have insight in between visits

formercoder · on May 2, 2020

Indeed. It’s like rubber gloves during this pandemic. People think once they’re on they’re protected - you only gain increased protection if you know what you’re doing.

hinkley · on May 2, 2020

I die a little every time a store employee wearing gloves gives me change from the register. This is not better.

harikb · on May 2, 2020

You are the one who brought up “disruption”. In this particular case, someone created a free/affordable solution for the hospital. I am not sure how you can read “make a few more dollars”

matz1 · on May 2, 2020

Yes in the worst case people could die, that is distruption, that's the reality. The sooner you accept that the better, or you going to have a hard time.

olieidel · on May 2, 2020

I had the same thought. But I think it's more complex than that.

Always consider the alternative. This could be a hospital in a remote part of a third-world country. Maybe they're understaffed. How are they currently handling the task of gathering information from monitoring devices and reacting to alarms?

Maybe, their nursing staff has to run from bed to bed to check patient's vital signs and device alarms. Emergencies would frequently be missed because they are understaffed and checking is irregular. Now, you could introduce software which provides centralized monitoring. If it's introduced on top of the existing activities (i.e., running from bed to bed), it leads to a net benefit - you catch emergencies earlier and consequences of malfunctions are less severe. But if it's introduced to replace the existing activities, it may lead to patient harm.

Sure, it's self-coded, browser-based and buggy - but you always need to weigh risks with benefits, and those depend on usage context.

Of course, in most western countries, this would be completely illegal. But these are also the countries in which medical software looks like it's from the 90s, with catastrophic usability and missing features.

We need to ask ourselves: Right now, we heavily prioritize patient safety over innovation - but have we got that balance right? What are patients missing out on if we could just bring a few more of the latest advances in technology to their bedside?

You know, not machine learning, the blockchain or the internet of things. Rather things like browser-based applications which "just work" and have great usability.

Note: I'm a physician, software developer and consultant for medical software certification :)

DanBC · on May 2, 2020

> Maybe, their nursing staff has to run from bed to bed to check patient's vital signs and device alarms.

It feels to me like the management has misunderstood the cost of the software vs not having the software. It feels like they're saying "this software is expensive, and doing nothing is free" when they should be saying "having all these healthcare professionals spending time putting on and taking off PPE the check patients is costing us this much per year".

As you probably know, an ICU will go through 30 sets of PPE per patient per day. That's a lot of time putting stuff on and taking it off.

im_down_w_otp · on May 2, 2020

Sure, but there are plenty of technologies that are applicable to safety-critical systems or are safety-critical adjacent which are freely available. There are MCUs, application boards, RTOSs, programming languages, compiler toolchains, network stacks, parsers, etc. available which are the same-a or close-to those which would be commonly sourced and deployed in a safety-critical context.

So, why not use those to build the "something is better than nothing" solution?

p_l · on May 3, 2020

Availability.

Just availability.

This was a quick and dirty hack to improve access to patient data done with what was on-hand, for a constrained deployment using specific known devices. They didn't have anyone with knowledge on using any of the tech you mentioned, some of which requires spending months setting up unless you have practical experience in delivering on the platforms. Just getting a more safety-minded setup for a MCU using free software can be a harrowing experience.

And they don't have the money to just contract it out or pay for the commercial grade stuff.

They did what they could with what they had, with explicit mention that it's not good on safety and security - but it brings some benefit now.

Here in Poland, a few weeks into lockdown, nobody asked for certifications on volunteer made PPE parts anymore. Because a shoddy PPE with no certification was still better than none.

PaulRobinson · on May 2, 2020

You’re right. There is no way this would be deployed in a UK hospital as it stands. It might be some of the most dangerous ideas encapsulated in code I’ve ever seen. I disagree with standards like DCB0129, but they’re there for a reason. This would not pass.

p_l · on May 3, 2020

UK also has a lot more resources to work, even with conservative government trying to break NHS financing for the last decade.

mfashby · on May 2, 2020

This is the comment that was in my thoughts and I failed to write it.

I really hope they 1. Open source it 2. continue to work on this throughout the crisis and get it to a state where its actually suitable for critical care, and then 3. Work on achieving the relevant certification.

It sounds (just guessing) like the device vendor sells their own software separately, and is unwilling to budge on price during this time, forcing an already stretched hospital to look for new solutions.

pstuart · on May 2, 2020

The perfect is the enemy of the good.

This could likely be "good enough" for those that have no other options if open sourced.

prostheticvamp · on May 2, 2020

That’s one of the dumbest platitudes ever deployed to deflect criticism, and I wish people would use it correctly.

“This thing has absolutely no evidence of reliability or safety in a critical environment” is not criticizing it for being less-than-perfect. It’s criticizing it for being possibly inferior to the status quo.

Here’s one simple example:

Staff gowning up for routine rounds are much more careful, and safe, than staff rushing into an emergency code. If this thing throws up even the occasional false alarm, its cost to staff (in exposure) could easily outweigh, massively, and reduced rounding requirements.

That’s not “oh, well that’s not perfect.” That’s “oh, that might be worse, masquerading as better.”

“Perfect is the enemy of the good” is a wildly irrelevant comment.

pstuart · on May 2, 2020

FTA:

> The deadly virus can infect you with a very small mistake. As healthcare workers, our frontline has to wander around the isolation wards to check vital signs of a patient from time to time. This task involves disposing of the protective gear after a visit. All just to check some reading on a device.

> A request from health authorities reached us to develop a remote monitoring system for isolation wards. There are expensive softwares to remotely monitor them. But Sri Lanka might not be that rich to spend such amount of money.

I think you're wrong in this case.

edit: formatting

whilenot-dev · on May 2, 2020

I think you're misunderstanding the critique of the parent... In the software world we often tend to interpret "The perfect is the enemy of the good." as "If it's the only software solution it most certainly must be a good one.". But sometimes there are non-software solutions that are even better suited to solve the problem - engineering wise that MUST(!) also be taken into account.

What makes you think the team covered enough edge-cases to be "good enough" software? Do you think the presentation in a single blog post is enough information about a system to determine its quality and reliability?

pstuart · on May 3, 2020

> If it's the only software solution it most certainly must be a good one.

We have different interpretations. For me, TPITEOTG means:

Choose one: a solution that works well but is clearly not perfect, or no solution at all.

> Do you think the presentation in a single blog post is enough information about a system to determine its quality and reliability?

Epilogue FTA:

> We created this software on a request from healthcare staffs. It is not a commercial application. Even with this system, we strongly suggest doctors to visit their patients, take real measurements.

> As this software was developed fast due to prevailing pandemic situation, we released it with the most urgent feature monitoring. We tested this for long run, with multiple devices as well. So far it worked out well.

> It does not indicate this is perfect, we are working on improvements and fixing bugs until its very stable.

> Thus we have adviced doctors to use this with CAUTION

Many of the complaints in the OP were specious for the situation in play:

> You're storing patient information in postgres. What certifications do you have to assert that the patient data is stored securely, in line with your government guidelines on patient/medical data? There's a damn good reason this is the "holy grail" of information security certifications.

This is monitoring data from dying patients in a third world country. Do you really think that they should have spent a couple months making sure hackers couldn’t access patients’ vitals before putting into use?

> You've got critical alerting built into the browser window using JavaScript.

Yes, because that is the language of the UI toolkit they are using.

> This "alerting" is the kind of critical thing that sometimes needs immediate" intervention, or someone could die. What happens if your browser experiences a JavaScript error blocking processing? And your alerts don't fire?

The alternative appeared to be that those alerts might not be noticed anyway because they might not have the staff to gown up and go into each room frequently enough.

> What happens if they fire too often and you get "alert fatigue" because they're not tuned correctly or in line with the other alerts available at the bedside/nursing station?

What happens if the device in the room fires too often?

> How much testing have you done to correctly assert that you're interpreting the HL7 or other specs correctly? And aren't misinterpreting data for some conditions or types of individual?

They seemed to find that it was accurate enough for the crisis* at hand.

> The "throw things together quickly" startup mentality might (I stress might!) Be okay where it's the difference between nothing at all and something that can save lives, in a country like Sri Lanka, during a global pandemic, fine.

Whelp, here comes a “not perfect but good enough to use part

> <further hand wringing on future concerns irrelevant to the situation under discussion>

matz1 · on May 2, 2020

You gotta start from something. That is progress. You made improvement overtime. Sure, in the worse case people can die, that something you have to accept.