Being on-call/carrying the pager for a complex and unstable distributed system is a great way to understand how to build a good one. Building a distributed system isn't hard per-se. It's almost always the non-determinism and the associated difficulty in debugging errors that's a problem.
This has been my experience. The pieces are simple, but being to able to understand what is going on in the whole system is not. Detailed, quality structured logging is very important, sent to a central location that can be queried (e.g Elastic Search / Kibana). I have come across several experienced developers who really don’t like he idea of verbose logging but in my experience it has been critically important for dealing with a mesh of micro services.
"How does this even get invoked" and "Where would I go to find these logs" are the two questions that, if the answers are not obvious/obviously documented, there is a big problem with that particular system.
My current company for whatever reason does not value logging at all, to the point where for lots of our systems there just aren't logs/nobody has ever bothered to look for them. It's pretty astonishing to me, a person who has valued logging as one of the highest-value mechanisms to increase tracibility.
Prior to moving to software development, I worked 5 years in Support and consequently I have a murderous hatred of anyone writing software that is hard to debug and logging is a huge part of that.
I was actually greatly inspired by a product I used to support which had such detailed logs that you could troubleshoot almost any issue from the (rather huge) log dump. I try to do the same now.
I really wish all developers had to do support at the beginning of their careers and so they understand how to build things that the ops team can work with. Devops/SREs are one approach to this, but I'd really like to see it be more widespread.
For any new work that I do, logging will obviously be at the top of my priority list. Unfortunately, the systems that need the logging the most are the systems that I can only touch when they go down.
It's basically what happens when technically incompetent and inexperienced people wear management hats. If I was running a firm, the engineers would dictate to management, not the other way around.
Yep, you never know what logs you need until you need it. If there’s an economical factor coming into play (because proper telemetry will cost a noticeable amount of money), you can always choose to sample the more verbose logs.
Honest Q: Isn't that's where tracers like AWS XRay and Google Dapper come in? And somewhere down the line, solutions like Envoy/istio try to make the whole ordeal manageable?
You can implement distributed tracing with logs if every player in the system is propagating the headers appropriately. Tools like NewRelic APM and XRay will take care of that bit mostly for you.
Service mesh data planes have a wider scope, but they do make it much easier to implement distributed tracing (I think Istio includes this out of the box).
In the end though these tools can’t replace app logs because they only let you reconstruct what happens in between services, not internal to them.
Nothing cloud is relevant when your production network has no outbound access to the internet due to PCI. ;)
Not sure about the other two (will check it out) but we have have apps in 4 programming languages across several teams. Structured (JSON) logging is the easiest way of sending data to one place from literally everywhere.
> Building a distributed system isn't hard per-se.
I think this may be a bit overstated.
It's certainly true that most of the algorithms, etc. are -- if not necessarily simple -- at least understandable/understood generally. IME the problem is really all of the engineering around the algorithmic stuff.
One thing which I don't think is really well understood yet is different levels of Consistency. There are a lot of trade-offs to be made here, but generally I find that it's really hard to help people understand what those trade-offs are and how they could impact the UX and business.
(After that there's things like how to handle configuration changes safely, how to properly dispose of nodes, etc. etc.)
What you're saying is true, but I think a big problem in general is that while bespoke problems often require bespoke solutions, there's a lot of common problems out there and using a common solution goes a long way. They make the pitfalls and limitations obvious, help greatly in staff turnover and growing your team, and make things more intuitive and consistent.
I see a lot of systems that work pretty well when the author maintains it, but someone else jumps in and has no idea what server some part is running on, where the credentials might be, or where the log file is.
This is why I roll my eyes when I see resumes, especially from senior engineers who job hop every year or two. They may have never deployed their system or even they did, have enough support time to learn how supportable their system is, how their system performs, how modular it is when requirements change, etc. Resume looks great, though.
I have to dispute this. A few months is sufficient to appreciate the stated concerns. A lot can happen in a few months. I am proof of the same. I think under two months is too short, however.
> Building a distributed system isn't hard per-se.
Well, we just have some ugly proofs that building a proper distributed system is impossible, leading to many trade-offs that trigger one or the other customer at times... There aren't that many areas in computer science which are as difficult as building reliable scalable distributed systems.
I think successfully doing this could be a good way, but I don’t think just carrying the pager would teach you much. I think unless you are able to understand and debug the system and handle alerts effectively, this experience would just leave you frustrated.
As you can see, over the last 3 years, we've rewritten large parts of our infrastructure in golang. While we still use python for a lot of things, we felt that the type safety and concurrency primitives in go were a much better fit for writing some of our core services.
We try to avoid microservices wherever possible. If we're adding something new, it typically starts off as part of the service being deployed - either as a container within the pod (we use kubernetes) or as a library that the code can use. If something grows big enough in a way that it can't scale with the service it's running with, we split it into a separate service. The opposite is also true - if a service that we run no longer warrants a separate deployment, we make it a container or a library. We use GRPC for most communication and interfaces for anything that travels package boundaries. Both of these help with making the split/aggregation a lot easier to manage.
Interesting stats. Am I interpreting it correctly that the average python dev is doing 6 commits a month at 25 lines per commit? 150 lines in 160 hours?
Averages are a little misleading because pretty much everyone touches python only about half the engineering team uses it as their primary programming language. I don't have the median number of commits or lines per commit handy - but that would be a better approximation of developer productivity.
I've just updated the post to make it clearer that I'm only talking about services that I used. GKE definitely looks promising, but I haven't used it so I can't give an opinion on it.
I will add from my personal experience that GKE has come a long way, and it is probably the best managed K8s experience now. Still has some warts - a lot of the stuff you'd expect out of the box is still beta, or preview. And until this year, I wouldn't have used GKE for any significant installs, at least not without a really good support contract.
His use of b.N is correct. The code you have is simply multiplying N by 100 with the inner for loop - so your times are 100x of what each "concat" operation (+,WriteString) takes.
You are also allocating a new string/buffer/builder for every run - which is not useful if you want to just benchmark concat.
I really hope they use some of this money to better screen dashers and improve customer support.
We used to order lunch on Doordash once a week till things got so bad that we decided never to use Doordash again. Orders would regularly have missing or incorrect items. Some dashers would leave the food in the lobby, text me, and leave. Delivery was hardly ever on time and there was no way to get in touch with a human on customer care except through some unlisted numbers. We've switched to Caviar(https://www.trycaviar.com) since and have no complaints.
That's like saying if everyone maxes out their health insurance, insurance companies will go out of business. In any given year, not everyone is going to use the ~$8000 allocated to them. Also, healthcare in India is cheap. $8000 goes a long way.
> healthcare in India is cheap. $8000 goes a long way.
Yeah I think ₹5 lakh is excessive. I have a ₹2 lakh cover for me and my wife and unless we both are diagnosed with bone cancer simultaneously we are in no danger of maxing it out.
[IANAL] "Computer programmers may no longer be eligible for H-1B visas" -
This is absurd and incorrect. The USCIS memorandum is rescinding a document dated December 2000 because it doesn't want the Nebraska Service Center (NSC) to use the old guidelines for issuing H-1B visas for computer related positions. The NSC started adjudicating H-1B applications last year after a hiatus of around 10 years.
I don't quite understand the memorandum [1] yet either, but it does say [2]:
the fact that a person may be employed as a computer programmer and may use
information technology skills and knowledge to help an enterprise achieve its
goals in the course of his or her job is not sufficient to establish the position
as a specialty occupation.
So, calling the thesis of the article "absurd and incorrect" is probably too early.
[IANAL] New guidelines as to what constitutes as a speciality occupation are already in place at the other centers handling H-1B applications. The only thing this memo is saying is "Hey, someone issued you folks a memorandum in December 2000 that shouldn't be used anymore because things have changed. We're rescinding it so that no one uses it accidentally"
Moreover, if USCIS thought this would affect already-filed applications, I think they would've made a clear blog post stating it as opposed to dumping it in a hard-to-read memorandum. (I'm subscribed to their newsletter which I've observed sends out all pertinent news with no delays).
This is not the same as the new suspension. The one in 2015 only affected extension of stay applications (if you want to extend your visa beyond 6 years). Typically people start the extension of stay process well in advance (USCIS lets you start at most 6 months before expiry).
The current one affects anyone trying to file a new petition (including those that are cap-exempt). If someone wants to switch employers they cannot use premium processing, which affects their ability to travel internationally and/or move jobs.
The link I pavanky's post spells it out in plain English:
> Starting May 26, 2015, USCIS will temporarily suspend premium processing for all H-1B extension of stay petitions until July 27, 2015. During this time frame, petitioners will not be able to file Form I-907, Request for Premium Processing Service, for a Form I-129, Petition for a Nonimmigrant Worker, requesting an extension of the stay for an H-1B nonimmigrant.
In summary, yes the 2015 stay was just for extensions. No it wasn't for all applications.
This is not against the law. If your visa is current, you can start working as soon as you get the receipt notice for the new H-1B application https://www.murthy.com/2011/06/27/h1b-faqs/