Hacker News new | past | comments | ask | show | jobs | submit login
The architecture of Uber’s API gateway (uber.com)
148 points by zerop on May 22, 2021 | hide | past | favorite | 43 comments



I think you will find that Uber engineering is just like any other place - a lot of silly mistakes while people learn on the company dime.

This entry, for example, mentions how they avoided Go routines as a "performance concern" even without any data to prove it. It's bush league level to think you can do it better yourself.

https://eng.uber.com/go-geofence-highest-query-per-second-se...


That post you linked is the one that got heavily criticised by a Bing Maps engineer for being under-engineered[1].

1. https://medium.com/@buckhx/unwinding-uber-s-most-efficient-s...


While the linked post maybbe right. I completely agree with the Uber engineers that it sounds too complicated. I can understand the double polygon search without thinking about it.

I also understand the double polygon search with bounding box optimization, so I’m not sure why that wasn’t used.


If that’s the case, it’s weird that they would take the time to write a blog post about how great their algorithm is while effectively saying that they didn’t understand the other approaches and that they were too hard.


This is probably harsh but the description of their api gateway sounds like the description of corba from the late 90s… probably if you are just getting started and need something simple but powerful go with openresty… IMO it gives you all the benefits and it’s super fast/lightweight … you can get advanced with cookie sharing / signing or just do simple logging to statsd… it’s really good if you are starting out to get operational experience running nginx / statsd etc… IMO


For anybody looking at OpenResty, it’s also worthwhile to a look at Kong, which is the largest openresty-based application and already provides the right abstractions in place for API management: https://github.com/Kong/kong


The people defining CORBA faced many of e same issues we face with micro services today.


After working with Distributed COM, I'd go as far as to say that people building DCOM software were doing microservices, way before that term was known or popular - microservices at much finer granularity (per object), and all the bells&whistles people make startups off today - like load balancing, service discovery, strong auth, etc. - were already built into the platform and properly integrated.

Alas, it was ahead of its time; the legacy of C makes it hard to work with.


Eh, but stateful and synchronous and chatty. I don't think DCOM has too many good ideas, beyond what it shares with CORBA - mostly, an IDL. It's good to have an IDL if you're going to be serious about heterogeneous implementations.


CORBA never scaled to the level that Uber needs to. Pre-pandemic, you’re talking hundreds of thousands to millions of requests per second globally. Also you never saw a CORBA project with thousands of engineers checking in code multiple times a day. All these are factors too.

People on HN always think “all it’s doing is matching a rider with a driver, why is it so complicated? I can write it in a weekend!” Sure, but it won’t scale.


I doubt they have millions of driver to rider matching requests per second. A lot of people use Uber, but they don’t need a ride every second.


At peak I can see it being low single digit millions per second. A single client will do more than 1 rps, there’s a lot going on per client.


This GUI must have a pretty advanced versioning and rollback support, otherwise I can see one user borking the whole API with a bad change and nowwhere to check what happened.


It did not. The deployment system at Uber was a f*ing nightmare.

Things would fail, rollback, and then the logs would have their errors truncated or something. I wasted so many days deploying botched releases from coworkers.

And the use of Phabricator at Uber was a nightmare. LLVM uses it correctly, IDK what Uber did but it was a PITA to do really anything.

Pair that with the siloed off teams where "every team is its own startup" mentality and you have constant fighting, power grabs, being blocked all the time, etc.


Look at it positively. At least the rollback is working. Our version of rollback is to redeploy the old version and hope that the database migrations will work with the old one.


That was absolutely a problem at Uber, too.


Former Uber employee who worked on this system. It maps changes onto git changes (technically Phabricator diffs) and this system open the diff for you.


Just to add, the changes are just config changes at this point and not code changes.

One one user talking own all APIs, the system has ability to skip unmountable APIs, but we catch it during user interactions with tons of test and validations.


You only change one endpoint at a time. Fixes are rollforward since you would be reverting changes to other endpoints if you rolled back.


Enjoying the article so far, but some things make it a bit hard to read. 1) code snippets are screen shots, and 2) some links point to an internal Google Docs page.


oops.. thanks for sharing this. we will get that corrected.


It's a decent overview but you could easily do a whole blog series on each category of the gateway that they touch on. Is there a deeper dive for each part? Like, just the AuthNZ could get very complicated depending on requirements (as they have multiple implementations, that causes a mini crisis of what to rely on for what and how)


I worked at Uber for a few years.

This was a terrible layer. Everyone hated it when I was there. People ended up building logic directly into the API gateway because it was so difficult to use.

I am so glad to never have to look at RTAPI again.


This article is about the newer Edge Gateway and doesn't mention anything about RTAPI. When did you leave Uber?


RTAPI is the internal name. Why would it be mentioned here?

I left too late. The engineering in that company was abysmal.


RTAPI is mentioned in the previous blog post [1], it sounds like the new edge gateway supercedes it

[1] https://eng.uber.com/gatewayuberapi/


Nice find. I stand corrected.


One of the authors here, happy to answer questions.


Did you consider using existing API Gateway solutions, either OSS or commercial? Why did you decide to build your own?

Given that Uber is an engineering heavy and tech-centric organization, why did you choose to do configuration through the UI? Why not configuration and infrastructure as code?


The issues that you had with Go 'Language naming conventions like ID, HTTP, and reserved keywords in Go (but not in Thrift) created failures that exposed the internal implementation details to the end users.'

How did you go about solving those ?

Did yall work with the Go team to resolve the other issues that were stumbled upon?


Since the final artifact was generated, we were able to work around by annotating in thrift with alternate field names.


Just make sure to not accidentally add an extra space in the Thrift annotation, or else it will globally bring down upfront pricing


If you guys could easily start over, would you have still picked Go given the problems you encountered? Or would Java have been a more likely candidate?


Why not just use OOP Scala? It's so much cleaner and more readable.


I tried starting an “OOP Scala” project. The entire Scala ecosystem, at least the parts that seems good and active, all seem to be FP oriented.

If you simply want a better Java, I’d try Kotlin. Or Java 15.


Was existing API management solutions like Mulesoft considered?

And while on the reuse topic, will Uber open source this?


Side question : Anyone knows a gateway solution able to do blue/green deployment ?


Envoy based gateways such as Istio ingress gateway or Ambassador use traffic shadowing to achieve this. Traefik also has shadowing


[flagged]


This has nothing at all to do with the article.


I disagree. The article is about Uber’s API. My comment is about how Uber’s API clearly doesn’t have robust support for requests that did not fully execute. Based on my experience it appeared as though one system had processed my request to schedule the trip, but the rest of the system did not. This caused my account to essentially be soft locked by a trip that was not visible through the app (and by extension their API). I then added my antidotal experience with their lackluster support.


Which technologies have been used to Implement it? Framework? Language?


> At the time of development of the gateway, our language choices were Go and Java. Our previous generation was in Node.js. While that was a very suitable language for building an IO-heavy gateway layer, we decided to align with the languages supported by the language platform teams at Uber. Go provided significant performance improvements. The lack of generics resulted in a significant amount of generated code during build time to a point where we were hitting the limits of the Go linker. We had to turn off the symbol table and debug information during the binary compilation. Language naming conventions like ID, HTTP, and reserved keywords in Go (but not in Thrift) created failures that exposed the internal implementation details to the end users.


One of the author this blog post. We had to do lot of other scaling optimizations like generating code for only IDL elements used rather than a big fat IDL, replacing ser/deser generated code with dynamic generation of code, etc. We will share details in further blog post on that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: