Hacker News new | past | comments | ask | show | jobs | submit login
Meta developer tools: Working at scale (fb.com)
245 points by ot on June 28, 2023 | hide | past | favorite | 219 comments



I think Meta's tooling is inferior to industry standard. I actually took a survey while I was there, and that wasn't the majority opinion, but frankly I think most outsiders would absolutely agree.

Things like Eden were a great idea, but tools had all sorts of issues they gloss over (a virtual file system can be really slow if you have a ton of small files), dev environments would randomly fail a lot, really the only tool they had that nobody disliked was their log-searching thing (can't remember what it's called) but it was still lightyears behind something likes splunk.

It was my conclusion that "Wow you have 15 people working on a dev-tool compared to a public company with 100 building the industry-standard version over 10 years with actual product managers and UI experts, no wonder ours looks like crap... wouldn't it be cheaper just to take .01% of your salary and buy standard dev tools"

I guess "Clunky" is the word I'm looking for. "Blow it away and make a new one" was a phrase that happened with some regularity for dev-envs, repo-checkouts, etc. And iirc restarting your dev box took like >30min.

---

Side note - the other strangest thing was some of these tools people agreed were terrible (restart takes 30min). So you'd expect thousands of engineers to be swarming any system with any UI bug, edge-case, or whatever. But it just didn't work that way.


For what it's worth, as a Meta employee, I generally disagree with this comment. I mean, calling it 'inferior' is just laughable. It's way better than what open source projects have access to and it's way better than Amazon internal tooling (is that 'industry standard' enough?).

The sapling workflow is better than git. I was skeptical when I joined, but between no branches and excellent stack and merge support, it's just a better, more intuitive workflow.

Eden is fast. Crazy fast, in fact, if you look at the size of the monorepo. The comment about small files is odd to me. Source code are small files. The entire monorepo is just small files.

Now if we are going to complain about something, it would be the custom Android environment we've got. Android Studio integration is actually clunky, bordering on non-existent. Compared to writing vanilla Android code, which I have done for years before joining Meta, Meta folks wrapped almost every API, so all your pre-existing Android knowledge is useless. I've never been so unproductive when writing Android code.


Fastmail had great developer tooling. Give yourself about two minutes from a issued command via slack or CLI and you'd have a brand new tagged developer box the live system that was a self contained infrastructure in a box of everything from the imap server to the user interface. Another command you could get as many copies for whatever development branch, etc.

The entire system replicated the real system (not user data but the software) and you could spin up test addresses and accounts and all sorts trivially. Beautiful!


> get as many copies for whatever development branch

Why would someone need a more than one copy of a dev/feature branch?


Because its free? You could have a beta server spun up on the fly to do some QA while your dev server is still kicking. It's just spinning servers off of git tags


> Eden is fast. Crazy fast, in fact, if you look at the size of the monorepo.

You’re measuring the wrong thing. The user actions are slow and take several seconds to complete usually. No one gives a shit how big the repo is. They care how long their operation takes.


There isn't a tool that can do this job faster. git takes a few seconds once you're at half a million commits. Facebook has many repos checked in in their entirety that are bigger than that.


That’s like saying cargo ships are fast because they carry lots of stuff. It’s still slow in absolute terms to get my stuff from China.


Compared to what? Not having a monorepo? Sure, but you run into different issues with that approach. Eden is the best solution there is for a monorepo of this size.


Yeah compared to not having a monorepo and buying more SSDs. The separate repos are much nicer to work in, but harder for the infra folks to manage. I miss having dataswarm as a separate repo. I will miss IG’s.

I understand the tradeoff. That doesn’t mean it’s “fast”. It’s slow but the other alternatives are slower.


I agree. The tooling is not as good as Google's(expected, Google's internal ides are incredible). But it's definitely better than the rest


Having also written Android code at Meta, I 100% agree.


What's the short version of why Meta did this, wrapping so much up in this internal nonstandard format?


I'm guessing same to why everyone else does that: they want to abstract away the platform. To some people platform intricacies are uninteresting comparing to the problem they are solving, so their abstractions aim to express their problem domain, and hide everything else.

For example for apps you may like to deal with abstractions that can express navigation and pages. They are the same across web, iOS, Android. Unlike Activity, Fragment, View (is that still a thing, I stopped doing android 10 years ago lol) and some slightly different set of abstrations over iOS.


Often, performance.

When I was at Meta, if you could measure it, you could ship it and mark that as a win in your performance review. So a lot of projects got shipped off of A/B tested metric wins.


>and it's way better than Amazon internal tooling (is that 'industry standard' enough?).

It's slowly changing but I wouldn't consider Amazon's tooling to be industry standard by any definition.


>> Amazon internal tooling (is that 'industry standard' enough?).

I'm not sure, maybe?

When I say industry-standard tools I mean "Best you can purchase." So I mean a UI as good as github, a platform as good as AWS (meta's doesn't hold a candle to AWS), full-text log searching as good as splunk, chat as fast/searchable as slack (workplace doesn't hold a candle), video chat as fast/clear/clean as Zoom (workplace doesn't hold a candle)

Some problems are harder at scale (feature toggles interactions just get harder with size), but some of them are just a mess of their own making (restarting taking 30m, butterfly rules having >20m+ delays, eden not being optimized to work with buck, which needs to read tons of files out of tons of directories)


GitHub's developer workflow is a joke compared with the ticketing, CI, and code review tooling at Meta. I found the tools at Meta made it a lot smoother and faster to write and review code. Also, their automated flaky test detection, suppression, & ticket issuance/resolution is no joke. One of my favorite features. Their feature flagging system was also on point. Also, Workplace is a fantastic replacement of Wiki that surfaces relevant & interesting content to you. Their chat tools are way better than Google Chat / Microsoft Teams (not sure how it stacks up against Slack since I'm not a huge user of them).

I agree the observability tools I interfaced within Meta was subpar a few years ago, but I think you're being ungenerous to take that and extend it to the coding tooling specifically.


>Their feature flagging system was also on point.

Which one... there are like 3 (or more)? The ones I worked with were the single biggest cause of SEVs. I wrote some tooling around trying to make it better, but it honestly needed to be completely replaced.

>Workplace is a fantastic replacement of Wiki that surfaces relevant & interesting content to you.

Yea? I found it horrible to have to read through post-after-post-after-post to try and keep up-to-date with what's going on. Not my way of ingesting information.

>Their chat tools are way better than Google Chat / Microsoft Teams (not sure how it stacks up against Slack since I'm not a huge user of them).

100% disagree... telling my dog to bark at the neighbor because I want something from them is better than Workplace chat. Constantly switching between internal beta and production to try and get the features I wanted, with something stable. Go ahead and send me a link to that message in a chat... oh yea, you can't :-|


Phabricator w/ "arc & jf" were my bae. Wish they would fully open source all of it or expose a public version of it to compete against GitHub.


Suppressing flaky tests is a little terrifying. Google would just retry them a few times, which is slow but only delays requesting a code review.


It determines flakiness in aggregate across all builds. It also does automatic retries but it does so for the purposes of marking a run as flaky which it then remembers. And it's your team's responsibility for making your tests stable. If they're not stable, they're not going to fail builds which is the correct response (a ticket is filed & it's your team's job to keep those tickets under control). The neat thing is also that if you fix the test and it's no longer flaky, it gets automatically added back in & the ticket is closed (in case you forgot).

https://engineering.fb.com/2020/12/10/developer-tools/probab...


If a test goes from “flaky” to “never succeeds,” I absolutely want to fail builds. To me it seems pretty common for large integration tests to be flaky, and I don’t know that it’s always feasible to avoid without mocking so much that the test becomes meaningless.


What makes you think that’s not the case?


Maybe I misinterpreted “flaky test detection, suppression” and “not going to fail builds.” You retry until a confirmed success before merging?


I think flaky tests would be always retried and if they start failing 100% then it’s marked as a pure regression. But they don’t block merges if they fail (ie it might be done only after the merge). Not 100%. But generally yes - the culture is to incentivize people to fix their flaky tests and it’s one of the things EMs are judged on.

> Ultimately, our goal with PFS is not to assert that any test is 100 percent reliable, because that’s not realistic. Our goal is to simply assert that a test is sufficiently reliable and provide a scale to illustrate which tests are less reliable than they should be.


>> GitHub's developer workflow is a joke compared with the ticketing, CI, and code review tooling at Meta.

??


Completely consistent and unified UX and text editing (GitHub has this, atlassian doesn't). All text editing is rich multi-modal that is basically what you see in Facebook posts (i.e. you can add a bunch of images easily, it all previews immediately) w/ programmer-friendly extensions (e.g. code formatting, markdown w/ live preview etc).

The ticketing system had a rich tagging system (which GitHub has). The notifications system was completely unified and extremely useful - Workplace activity, relevant ticket updates, PRs to review, reviews, etc - all relevant and timely. From what I've seen, GitHub tends towards just being a useless uncoordinated firehose.

Finally code reviews. Having a code review for every commit is a bit to get used to but their tooling makes it super easy to post a stack (i.e. PR). Each commit in the PR can be independently reviewed & merged. This means you can land improvements progressively as code reviews complete (i.e. short cleanup commits go in faster while larger commits are reviewed longer) This also encourages people to size the commits appropriately & to be standalone.


I've worked at Google and Meta (née FB). They are inferior.


No company I've worked at after FB/Meta ships or works even at a non-eng level with the same velocity. I always attributed that to their internal tools, since most of the other companies seem to be using the same crap. Slack is straight up painful compared to their chat system. And don't get me started on the how good the task tool is compared to literally any other ticketing system out there. Everything is behind their intern tool, and usually built to all just work together without friction.


>Slack is straight up painful compared to their chat system.

As someone who vastly prefers Slack over all its competitors, I'm very interested in hearing how Meta's compares. What makes it better than Slack for you?


As a chat system workplace is inferior to slack. Discovery and organisation of chats are hard, especially group chats that would normally be a channel.

However the combination of the "newsgroup" and chat makes it kinda better than slack on it's own.

Would I use workplace at a new startup? probably not. I would use slack though.


Yea, it could be that FBs internal tooling connected to workchat is what made it so great. I think channels are a nasty way to handle topics of conversation compared to posts on a forum type place. Leaning into searchability and all that was great.

I’ve seen sub 500 employee companies and Slack kicks ass, but it seems once it goes over 1k it’s like glhf managing channels and bots.

Butterfly bot integration to workplace was really something.

Workplace def overkill sub ~ 600 users though.


I also think meta's workchat was better than slack. Maybe its because at my new company slack has to play the role of both workplace and workchat whereas at meta they have two separate things for that.


Moving to a company using a hodgepodge of constantly rotating SaaS tools has been hell. I’m always thinking to myself “why can’t they just do this?”. I think I’ve been both spoiled and broken permanently.

On the other hand, building all these connections between the disparate apps is basically my job, so it’s got its positives.


Slack tries to satisfy a number of requirements, all through a chat interface. 1:1 chatting, group chatting, team/project discussions, company updates, etc. I think Slack is as good or superior to Workplace just considering chat functionality, but as a tool for inter-company collaboration Workplace is much better.

Once your company reaches a sufficient size (maybe 300+ employees?), Slack channels just become so clunky and tedious for trying to keep up with projects and discussions. Using Slack keyword notifications helps cut through the noise, but I really think applying Facebook's ranking/Groups expertise to work-related content has been the best solution I've seen so far. You're no longer inundated with unread channels that may or may not have messages relevant to you, Workplace just surfaces the relevant content (and obviously there's chat/mentions for more pressing issues).


Don't believe the hype -- I think workchat probably beats slack for speed in the web client, but the desktop/android clients which I use regularly are truly sluggish and constantly refusing to load new messages if I'm not constantly using them, forcing me to restart them in order to do so.


Oh yea, web comparison for sure, but I also don’t see a difference between slack desktop and web. They’re both electron clients though iirc, the desktop client just wasn’t given much attention because web worked great.


slack is incredibly, unbearably slow


Meta uses workchat which is just a reskinned Facebook Messenger. It’s been a while since I’ve used it but I remember it being a lot less glitchy than slack on iOS (desktop is another story).


Code review on stacked diffs was awesome for bigger features.


The open-ness of code, visibility, diffs. It was perfection. Something broke in my env suddenly? OH, I just checked recently pushed diffs that affect my realm. Hey there it is, security pushed something weird. I'll just revert the part that affects me and tag them. No meeting, maybe a SEV for visibility and review, maybe not. Easy peasy.


Its not the tools, its the culture. Meta cut red tape in its early years especially. I know there is more "process" now, but they empowered teams and individuals to just make decisions and move on and the ecosystem around everything supports this.

They'd be just as fast with Slack or whatever, for the most part.

EDIT: as someone posted below:

> The open-ness of code, visibility, diffs. It was perfection. Something broke in my env suddenly? OH, I just checked recently pushed diffs that affect my realm. Hey there it is, security pushed something weird. I'll just revert the part that affects me and tag them. No meeting, maybe a SEV for visibility and review, maybe not. Easy peasy.

This is the kind of empowerment developers are given and expected to handle (both the explicitly ability to revert previously committed code and the implicit responsibility that the teams code that was overridden must deal with it once they're notified rather than push back)


And just to add that last part. Moving fast was so damn cool. That shared understanding of responsibilities and freedom of making changes. In most cases of course they were added as reviewers to the revert diff review, and in some cases they knew it could adversely affect us and they’d tag us for our perspective before pushing, but the open-ness of it all. The freedom to bring me in at a later stage or for me to do the same to them.

These days at different companies it’s all planning and hashing out all those details before implementing, and then guess what? Surprises happen anyway. And in the end we become master planners and never actually build anything. Those blueprints sure do look great though.


100% agree. The culture is a huge factor as well, but in some of the environments that I encountered before and after, that culture was there, but the tools didn’t allow the same cohesion.

To your point, other experiences have shown me the ego that rears it’s ugly head when trying to move that way in an env that didn’t have that culture. At FB there was a lot of candor, but in other environments I feel like I’m going to hurt someone’s feelings in code review, or even just watching what I say in slack messages. People take work too seriously in some companies. It’s so rewarding to have fun with it.

They used to say “Nothing at FB is someone else’s problem”

I hope they still do.


In part this is only possible at Meta because it incremental measured improvements would raise all boats. You didn't have someone being evaulated how well a project was planned (Product Managers), how well it was executed (Engineering Managers) and how well it was built (Engineers / IC). All 3 of these things can be extremely orthogonal to each other and human nature makes it so more often than good intentions of a person to even realizes.

Meta cut through that by focusing on measurable improvements.

The downside is it can create silos, a negatively competitive environment. IE, teams not sharing resources or credit etc because they are only looking out for themselves at the end of the day, because any measurement you don't capture is one you can't claim as yours.

I also argue it can breed short therm thinking. Meta even had special teams from what I understand that were "exempt" from the typical metrics driven review cycles because they'd create the wrong incentives.

I think the general ethos can be really powerful though, but I'd peg it to collaboration and value driven measurements (and value is loose here, I'm not strictly thinking monetary) rather than strict "user based metrics".


Shipping faster should not be the only metric, the amount of SEVs should be given much higher weight. The attitude of most engineers is to show impact and if there is a SEV, it is better as it will show even more impact. This slows other people down and causes a gradual decline is actually shipping things that matter - quality over quantity.


Oh yea, I elaborated more in another comment. But the ease of finding documentation related to code, SEVs, anything wrong. Someone pushing something and breaking something else in my env, then me using the amazing tools like diff to see what was pushed recently that affects my realm, I can quickly track down, find, and often times alter to fix my problem with no more than a message to the original author in a comment on the new diff.

Early career this is pretty huge for growth. I was there until 2019 though, so not sure how it is now.

My point was that the cohesion of all these internal tools makes information discovery frictionless. cross-pollination of functional space in a business is like butter because everything is built by Meta and behind their intern tool. The other companies I've worked at just don't have this. They lack the Eng capacity to build it, unfortunately.


> No company I've worked at after FB/Meta ships or works even at a non-eng level with the same velocity.

This is a curious thing to see.

As a mere user, and speaking purely about FB not any other Meta IP…

Other than unbreaking things that break when the OS and browsers change under you, has Facebook shipped even one thing in the last five years?

Don't get me wrong, I know it takes a lot of effort to stand still — forgetting the Red Queen effect was Musk's obvious mistake with Twitter even before stuff broke — but Facebook seems completely unchanged.


I don’t work for Meta, but you can see their product announcements here: https://about.fb.com/news/category/product-news/


Most of the open source Ai things are Meta. React, GraphQL, pytorch, rocksdb, docusaurus, prophet, a whole ton of internal tools that aren’t public knowledge. Full disclosure I haven’t worked there since 2019 though, so not sure since then.


None of that looks like it's about the Facebook website itself?

But I guess that implies you didn't mean the website itself when you wrote "FB" in the comment I replied to? :)

(One of the most well known consumer brands shipping tooling rather than stuff the consumers use directly is, ah, apt for a company called "meta").


When I worked there it was still called Facebook. I’m still not used to calling it Meta haha. My bad.


Anyone who's used both, how does it compare to Google's internal tooling?


I’ve got coworkers who went back and forth. Their thoughts were (in 2019) that Facebook had cooler things going on, but less mature. Nicer to use, but less stable in some cases.

I personally loved things breaking at times. The monorepo and all the tools were so open that enabled me to follow along and try my own fixes in some cases (sometimes being the one to fix it!).

At that point in my career, that kind of exposure was like a rocket ship for personal growth. Others shared similar sentiments.


For me as a customer/user, Facebook seems to be the same since 5 or more years. So that velocity doesn’t translate to user experience unfortunately.


Yea I was on the internal side, not outward facing production apps. They have and are building their own versions of ENTIRE companies for internal use. It’s a marvel to see. But the blue app? I can’t speak to that one on the internal side, but I’d agree with you there as a customer.


I can't really compare to Meta, but I will confess that "industry standard" tooling has been very disappointing for me. Trying to get a build setup that automatically pulls in dependency changes once a week is something I'm still not entirely sure how to do in a way that everyone agrees with. Seems most tools bake that into a code commit with a lock file nowadays. But even that is amusingly recent. Years ago, you had to mirror any repository system locally and learn how to set that up. Probably getting it wrong in the process.

Mentioning splunk, I'm assuming you are using more paid industry tools. I suspect that opens things up a bit more, but realize that most of the industry doesn't use those due to prohibitive pricing. And I don't even know what the appropriate paid tool for building would be.

We seem remarkably primed to just hate whatever tooling we have at our disposal in ways that baffles me.


I remember when Cisco mentioned potential acquisition of Splunk for $20B, the jokes about whether it was to buy the company or to pay the running Splunk bill wrote themselves.


I know splunk is expensive, especially if you don't have somebody who is actively monitoring your spend, but sumo is a pretty good alternative. It's actually something I look for in the interview.

Regardless, many companies underspend by an order of magnitude on dev tools. If a tool makes you 1% more efficient, it's worth them putting 1% of your annual salary into it (multiply this by engineers at the company who need the tool). Soon you'll see that a tool costing hundreds of thousands to license a year is often a steal.

So you may say "oh splunk is overkill and unfair expectation" but in their CI/CD system, if your job failed (frequent occurrence) and you wanted to search for something in there you'd manually DOWNLOAD a 300 meg file to your machine and grep it for errors.

Basically a fang engineer's time costs $3 a minute minimum, so if a query could save 1 minute then the break-even cost is paying up to $3 to run that query.


> If a tool makes you 1% more efficient, it's worth them putting 1% of your annual salary into it

Hear me, nameless internet stranger, that you might learn from my mistakes: This is not true. Efficiency is only worth money if it increases profits - concretely this means your efficiency gain must result in the following:

1. Delivering features faster

2. Delivering features with meaningfully higher quality

3. Delivering the same features with lower headcount costs

(2) is hard to measure, so you can generally only sell on (1) and (3). During the 80s the business world learned the hard way you can throw away a lot of money on useless efficiency. If that interests you I recommend "The Goal" by Goldratt for a fictionalized account of those learnings.

* There's also a subtler point that corporate finances may mean that even if efficiency is perfectly captured, a 1% efficiency increase may only be worth 0.1% of your salary.


100% this... profit, or it's not worth it


Even in a fang, most money spent on log storage is fully wasted. The article yesterday about monitoring being a pain was spot on.

More amusing is when you see someone build a giant Elastic Search pipeline to a Kibana dashboard so that they can get what would have been table stakes metrics if they had used the normal service templates. Without a ridiculously large bill.

Or folks that think they can warp around the high cardinality traps of making a metric out of everything. Assuming if you can make a system that works for the testing environment, of course it will work when you open the floodgates.

Seen people argue that sampling shouldn't be used, "because you could miss data?" Reservoir sampling is a thing, for very real reasons.

At any rate, I don't mean to just yell about splunk, from all I've heard it is nice. I am annoyed that folks seem to ignore the OLAP and OLTP divide, such that they think your metrics system should somehow be optimized for both. At the same time.

Pulling back to builds, though. What, exactly, is the gold standard in industry? I have yet to see it. Python builds, in particular, strike me as not good.


Splunk can save money, as long as you’re intelligent about processing data. I can think of dozens of security and operational incidents where splunk saved the day.

I’ve been through a few savings exercises where everything is gonna go open source dashboards etc. Once the guy who understands how it all works disappears, it migrates back to splunk.

One security incident response will pay for the entire splunk infrastructure.


This isn't a great argument... sure, Splunk "saved the day" because it caught that really bad security thing... but what other tools, at half the cost, could have also "saved the day"? If most of them could have, then it's not really worth the price... right?


Sure. I can save a ton of money by fixing my own car too - if I have the ability to execute.

If you can, that seriously awesome. The point is, many can’t.


FB's log shipping and parsing is much better than "industry" however we ship billions of logs to get a metric, because the internal graphite service isn't easy to use. and unidash really sucks compared to grafana for non-SQL based metric discovery and prototyping.


As a Meta employee as well, and working in the data analytics / engineering space, I'm finding tooling to be pretty high standards actually.

Though we arguably rely on a lot of Apache products, whatever we use that's internal only is great to work with. Daiq** recently started supporting notebooks, which has been a game changer for us as well as for the teams we work with. Phabricator is great as well, and makes shipping stuff super easy. Only Ben**, the internal notebook solution, I find meh. Especially compared to Google Colab. But the rest has simply been a joy to work with.

For those interested, a former DE made this nice repo that maps internal tools against "real world" products: https://github.com/thijsessens/xmeta2external


Is there a reason you censored parts of tool names? Your link lists the uncensored version.


I would say it’s quite uneven: most tools were better than the state of the art when they were introduced.

For many, the world has moved on a lot since, and those tools feel obsolete but so embedded in practices that it’s unthinkable to apply better approaches. One example of that is tracking data sources: legacy is slow, dysfunctional, and even fairly straightforward questions time out because you have to load so much meta-data. That’s because no one really uses it: most of that information is carried by the Data engineers who have built the system, and they are over-whelmed with questions that could be answer with a good system, but derive power over analysts. They benefit from it through comments they copy in their review, so fixing it isn’t anyone’s priority.

Others have evolved because the internal demand pushed things forward. PMs want Deltoid to work, so that system has moved to be state-of-the-art or interestingly unique in many ways: scaling, not implementing MCC, integration with the metric definitions.


My current role for the first time exposed me to amounts of data and variety of data (in type and how/where it's stored) that's difficult to learn organically. I wonder how a FAANG does that at "scale".

> most of that information is carried by the Data engineers who have built the system, and they are over-whelmed with questions that could be answer with a good system, but derive power over analysts.

Is that the kind of question you're talking about here?


Strongly not my experience. The tooling at meta was so good compared to my previous 5 jobs that I left with changed opinions about the tradeoffs of investing in tooling.


Yeah, that's fair. I think another thing is that a lot of these internal tooling projects date back 5-10 years when open source alternatives may not have been viable yet. For example, my company uses an in-house written time series database, which probably made sense at the time because when the project started, Prometheus wasn't 1.0 yet. Now it's reaching growing pains as we've scaled, and at this point it's a bit of a sunk cost fallacy to not migrate off of it.

Another thing is that sometimes it's truly impossible to use one-size fits all for some tools if your company scale is large enough. I've heard from friends at Amazon that the dev experience and tools are totally different in the hardware space than it is for those working in the retail or AWS orgs.


> For example, my company uses an in-house written time series database, which probably made sense at the time because when the project started, Prometheus wasn't 1.0 yet.

That's why big orgs are now opensourcing their tools. But it's not to clean up all "too internal" stuff from a tool.


It's interesting to hear the contrast between Meta and my experience talking with ex-Googlers who complain that open-source or industry standard infrastructure is far inferior to what they were used to at Google. Why such a big chasm between the internal tool quality at Meta and Google?


I think GP's opinion is fairly contrarian. I would say the in-house tools at Meta are substantially better than typical open source or commercial tools used at smaller companies.


FWIW I am a Meta-to-Google transplant and I feel the opposite way. There’s a lot of internal tooling and functionality I miss dearly. Most of all, Workplace.


I do really like Workplace, I think it provides a much better information architecture than docs/slack alone.

It’s possible to run a really disciplined Slack where updates go in a ready-only channel, and people broadcast their work appropriately - but it’s really hard. Most people default to non-discoverable DMs and then stream-of-thought in public channels, which are both bad for discovery in their own way.

Posts end up being a bit more considered than a slack message, but still lightweight, and more discoverable than docs.

Shame Google nixed their “Google plus for workplace”, I would love to see this workflow used more widely.


FWIW I did the opposite leap and felt the opposite way- I really disliked workplace’s stream of seemingly randomly sorted information compared to e-mail lists I could filter, control, and search better. I really dislike workplace as a store of institutional knowledge. The chat was definitely way better than hangouts chat though.

I also felt like a lot of the developer tools looked nicer than their google counterparts at the surface but had major reliability problems under the hood where you would need to do a lot of turn it off and on again style operations.


phabricator & scuba are far better than anything i've used anywhere else.


I REALLY miss Phabricator, Scuba and (gasp) Tasks. Would love to have those back in industry. Far better than Github, Superset (?) and of course the dreadful Jira I have to use these days.


I was very impressed by Scuba's ability to ingest and search data... and very unimpressed by the interface. I found it terribly unintuitive, and often didn't get you what you wanted without having to run multiple queries.

Phabricator was pretty sweet... and the internal version of mercurial was a dream!


I don't know anything about tooling at Meta, but I have appreciated custom-built tooling at all the jobs I have worked at so far -- much smaller companies ranging from about 100 to 5000 people.

Some of this is just having worked in the industry for a while. Guess what, 15 years ago "Put all your code in a web service running on AWS" was not nearly as slam-dunk a proposition as it is now; and tools optimized for on-prem were faster and more reliable.

The other thing is that you don't always have to reinvent things to the point that Meta does. You can just wrap around an Open Source project, or even contribute to it. This is what all of my previous jobs did -- make custom tooling from industry-standard OSS building blocks; and contribute back when it made sense for the larger community.

I actually think Meta was actually trying to do this with Mercurial a while ago [1], you probably have a much better idea than us as to why that didn't work out.

----------------------------------------

[1] https://engineering.fb.com/2014/01/07/core-data/scaling-merc...


https://sapling-scm.com/

It did work out.


I know that, I meant the specific reasons why Sapling diverged from standard Mercurial. https://sapling-scm.com/docs/introduction/differences-hg/ lays out some of the differences; but not why they exist.

One guess that I can make from Sapling docs is that perhaps FB/Meta needed an ability to pull in Git repos where needed; and I can see Mercurial devs not being super enthusiastic about that being a first-class workflow in Mercurial itself.


> I know that, I meant the specific reasons why Sapling diverged from standard Mercurial

The needs of a large internal code base are often quite different than what open source projects need.

Anther issue is that tools like Git and Mercurial need to be compatible with positively ancient repositories and can't really break backwards compatibility whereas within Meta, it is easier to move faster and deal with breaking format changes since developers's checked out repos on their laptops can be silently upgraded since the machines are well managed with Chef and can be upgraded behind the scenes. In the outside world, you can't assume people will be running the latest version and can't upgrade them.

Finally, Google is also moving away from Mercurial. Perhaps the JJ developers want to respond why (and check out Jujitsu -- it's a novel and very interesting system y'all should check out).


> Finally, Google is also moving away from Mercurial. Perhaps the JJ developers want to respond why (and check out Jujitsu -- it's a novel and very interesting system y'all should check out).

Sure. I presented about that at Git Merge 2022. https://github.com/martinvonz/jj#disclaimer has links to the slides and the recording from there. I'll summarize the problems we have with Mercurial here:

1. Performance/scalability. That's partly because Python is slow. Both the Mercurial project and Meta have rewritten many parts of it in C and Rust as a result. Maybe more importantly, there are many assumptions in Mercurial's design that don't scale well. We have extensions for downloading only a slice of the repo. We slice it in both file space and in version space. However, it can get very expensive to change afterwards. For example, checking out an old revision that they user hasn't previously downloaded is very slow (it requires rewriting all local revisions after that point).

2. Consistency. Mercurial was designed for local file systems, so when we store repos in our distributed file system, we run into write races that can corrupt repos.

3. Integrations. We integrate with Mercurial by running the `hg` binary and parsing the output. That's unnecessarily complicated and slow.

We also see several opportunities by switching to jj (in addition to hopefully fixing the problems above):

1. Simpler workflows. Things like: working-copy commit (no "dirty working copy" errors, for example), undo, first-class conflicts (no interrupted rebases, for example). See the GitHub project for details.

2. Cloud-based repos. The repos will be stored in a database instead of being stored in files on top of a distributed file system. That makes them much easier for our server to work with, and it opens up for many kinds of integrations that were not feasible before.

3. Simpler architecture. We designed jj from the beginning to be easy to integrate with our internal systems, so there should be much fewer workarounds.

4. Simpler code base. You can typically add a command without worrying about concurrent commands, a dirty working copy, or conflicts. An example I like to mention is how I spent about two weeks trying to implement a command for amending into an ancestor commit in Mercurial. Then I implemented a more powerful version of that (can move changes from any commit to any other commit) in an hour in jj.


(A bit late, but better than never) Hey thanks, this is exactly the sort of detail I was looking for -- appreciate the link and clear reply!


Yeah, I've worked at another large tech company who had accumulated a lot of in house tooling with a lot of invested SMEs who had been there a long time and missed that the state of the art out in the world had passed them by.


I left FB in the early-mid 2010s and can attest to this. It took a year or so to adjust to the "real world." Things like, no, you don't have a fancy scheduler; you get SSH, ansible, and systemd timers. Docker, Jenkins, and Kubernetes (RIP Mesos) were all new to me when other devs/devops folks had years of experience.


Like a few others already commented, I also disagree. Was at FB for 7 years, and now almost 2 years at a place with the “industry standard” things like slack / datadog / github / sourcegraph / (20+ other tools all disconnected from the rest and all behind SSO).

As a backend engineer, the dev experience was just incomparably better at FB. Some things I most most are probably phabricator / stacked diffs workflows, buck dep management with everything in a monorepo, deep integration across all the the tooling etc.


As an ex-meta mobile engineer, I 100% agree that I found Eden to be impossible to use productively. I always had to pre-warm the cache, pretty much defeating its stated goals. Maybe it has gotten better since I left a year ago, but I had 2 TB of local storage, and mercurial's sparse checkouts were more than enough for me and far more reliable.

However, all the other tooling around code were just superior in every way. I miss Phabricator, and mercurial and the way it seamlessly integrates into a team. I also miss all of the command line helpers that let me manage all of it.

It used to be the case that legitimate teams could form around and focus just on building internal tools for other teams if it improved engineering velocity. Not sure if that's still the case in this new era of layoff-happy Meta, but it was definitely true when I was there.


Eden definitely works better on a dev server and sucks on a laptop, especially arm based mac's (it's slowly getting better but still requires occasional reboots). +1 on the coding tools being great


Please reach out to the Eden team about your M1 issues; they'd love to hear from you - from what I've heard, it works better on M1s.


as a former employee who has worked both in many other companies over a decade + starting a few startups, Meta tooling is one of the best if not the best, it's so good to a point where Saas companies came out of facebook simply by replicating their internal tooling, like Asana, Scuba, etc


I thought Asana were Xooglers?


Nope started by Dustin Moskovitz who was even in the "social network" movie


The problem with metas approach to tooling from a very high level viewpoint is that they virtualize everything. They abstract all common tasks like building, testing or running by building an entirely new system that builds an abstraction layer.

Abstraction layers tend to be slow.

What they should focus on is plumbing. Take what exists and connect it in a smart way.

Take React: the idea of applying functional programming and spitting out HTML was a great idea. HTML existed before react. Then they also implemented a virtual-DOM, which was the unnecessary part.

Or react-native: rendering UIs in a functional way is great, but stuffing JavaScript into everything is not necessary.


Maybe it's not necessary anymore, but wasn't the virtual DOM made for performance reasons since updating the DOM used to be really really slow?


Virtual DOM in any scenario will always be slower than updating the DOM directly. The virtual DOM will always be overhead. The issue is if you are manually updating the DOM yourself you can easily shoot yourself in the foot by doing so in a way that is less performant, or breaks things like scroll position, focus management etc. Its also normally more tedious to do manually. This is where the virtual DOM came in. You just declaratively update the JSX and it handles the updating itself.

But nowadays there are plenty of solutions that do not use virtual DOM (or even use a virtual DOM solution that is just plain faster) but still allow you to use declarative rendering, basically the best of both worlds. They are just not as popular or for some people not as interesting to them as React is. Or they are content with Reacts performance already.


Every tool Meta has, Google also has. Google open source theirs so the industry gravitates towards that, but Meta keeps theirs internal. So is Meta's tooling better than Industry/Google open source tools? probably not. The only tool I was impressed by was scuba. But Meta basically build it out of gold (thousands of machines with TB's of ram)


>I think Meta's tooling is inferior to industry standard.

s/Meta's tooling/Meta/g

s/inferior/very inferior/g


Sapling looks quite cool! I've used git extensively in my career and consider myself as having a slightly-more-advanced-than-typical understanding of how to use it just based on conversations with colleagues. However, one thing that's always been very limiting with git has been stack-based PR reviews, and as they mentioned amending deep commits. It's not impossible, but it makes it awkward enough that I usually avoid it if possible.

Curious if anyone has used Sapling after lots of time using git. Is it the future?


Some of the recent tool started to come out to fix Git unfriendly UX.

Meta's Sapling (1) is definitely one of them. But there is also `jj` (2) and `git-branchless` (3). These tools target a smaller set of workflow where there is 1 main branch inside a big repo and everything else are short-lived branch/topic that could be treated as ephemeral stack of patches, constantly being uproot / rebase on top of the main branch to derive final result.

If that's the workflow you use daily, then you should give these tools a try.

(1): https://github.com/facebook/sapling (2): https://github.com/martinvonz/jj/ (3): https://github.com/arxanas/git-branchless


pijul for the purists https://pijul.org/


When we first switched from git to hg internally I was real cranky. Git was the clear industry winner, and hg was adding a whole bunch of churn for no good reason.

Now, hg is amazing and when I leave I will be extremely sad to go back to git. The UX is well thought out, with commands mapping to operations (how do you undo a commit? hg uncommit vs git playing around with the reflog).

Amending deep commits is pretty good -- it's still tricky and absorb works on some pretty limited heuristics. But the merge tooling is pretty good around moving around in a stack, and the general UX over interacting with the stack is way better: hg histedit edits history, while hg rebase moves the stack around.


You can always use Sapling. Many x-Meta people are still contributing to it on Github and use it for their git projects.


The main point against hg is that the tooling is dying out as git becomes even more dominant. But I do quite like it.


I'm one of the earliest GitHub users, have run FOSS projects, etc. Sapling is absolutely excellent. If anything, the 'sl web' UI alone, which can do rebases/commits, is worth giving it a shot. 'sl web' makes Git rebase look like the dark ages.

The UX just has lots and lots of polish in small ways and it has a nice amount of good features. It has fewer verbs than Git, but it's still a bit different from Mercurial. Having a built in 'undo' command that basically always works is nice.

I have replaced Git with Sapling (and a similar-but-not-the-same system, Jujutsu) in most of my own personal workflows at this time. If that's a good enough endorsement for you, then I suggest trying it out. You might be surprised.


Probably not, if only for reasons of inertia. Git plus third-party review tools (like GitHub) is more than "good enough" for most purposes.

I used to work at FB, and while sapling is quite nice to use in practice, without the internal version of Phabricator to do code review (and, in all likelihood, mononoke), I don't think I'd pick it up again.


reviewstack adds the missing stack and versioning phab UX on top of github apis, been enjoying using it outside meta for some oss stuff, and fixed a limitation of it recently in https://reviewstack.dev/facebook/sapling/pull/656


GitHub's gotten so much worse since Microsoft bought it. The drop in quality compared to when I used it in school is remarkable.


Can you provide some specific examples?


The token system used instead of passwords now was horribly explained. I had to look at several online resources to figure out how to actually use it.

I've noticed that the system seems to repeatedly throttle me where it never did before. Logging in often results in being given an error until I try it enough times.

Sometimes attempting to pull recent changes from a repo will fail, and tell me that I pulled the recent changes even though I am stuck on an old commit. I tried several things to fix this and the only thing that seems to work is doing a hard reset on the latest commit id. I never noticed github doing this before.


Thanks for the clarification.

I agree with you on the token system. It was a pain to figure out.

I've actually run into issues with throttling and stale fetches before, but it was due to a company proxy intercepting all requests. So effectively our entire org was hitting GitHub through a small number of proxy gateway IPs. Happened with AWS APIs too -- major throttling issues with terraform that seemed to be triggered based on the shared IPs.

Might be a longshot, but I'd check to make sure you aren't going through some proxy or VPN because I don't think your experience is typical.


This kind of stuff puts me off from wanting to work at FB. If I got really good at working with these tools (or fell in love with this tooling), I wouldn't really be able to go back to working "in the real world"


For me the barrier to working at Facebook is the fact their product is causing so much harm in our world, from elections to the environment to mental health to the breakdown of social capital.

I know I’ll be downvoted for taking an ethical rather than financial or technological position, but ethics matter. Especially in technology and finances.


It's also doing a lot of good. It's in the nature of any broadcasting and communications platform


Qualify "good".


"drunk driving may kill a lot of people, but it also helps a lot of people get to work on time, so, it;s impossible to say if its bad or not,"


Lmao that is ridiculously unfair since you're starting with a near-universally agreed bad thing.

Better analogy augmenting yours: just cars. Accidents between cars or cars and pedestrians kill lots of people every year.

This works with every tech. Phones have made it easier to coordinate crimes. Printing press and even televisions made mass propaganda possible. There are always tradeoffs.


"driving helps a ton of people get around, but sometimes people drive drunk and kill other people, so it's definitely unforgivingly bad"


Bad enough you won't catch me working for a car company. Excellent example really.


The problem with facebook isn't that individual users sometimes do bad things on it's platform.


As much as I dislike Meta, this is a very unfair comparison.

Facebook has done a lot of harm to the world, but it also has done plenty of good.

Aside from that, your analogy is incredibly weird. Do people drive drunk to... work? Who does that?


Never actually saw the skip parent post quantify harm vs good and yet.


Ethics matters, but not everyone think that Facebook causes "so much harm in our world". At least, not more than other corporations.


It's true. Exiting FB's incredible in-house information & developer tools culture is for many people one of the hardest things about leaving. Ex-fb groups maintain lists of potential "in the wild" replacements for each tool but very few are up to the task (pun intended).


It doesn't sound different than working anywhere else for an extended time. You get used to a certain way of doing things.


I still miss tasks, five plus years later.


> If I got really good at working with these tools (or fell in love with this tooling), I wouldn't really be able to go back to working "in the real world"

SWEs need to learn new tools all the time. This shouldn't be a concern. Besides, it's not like "the real world" only use a single tool. Each company has their own tweaks, even those using open source stuff.


Big tech has _way_ better tools than what most companies use. At Amazon so many concerns were taken care of for you like:

* How do I setup a new project? * How do I build code? * How do I pull internal dependencies? * How do I publish artifacts?

Every company makes a horrible copy of these systems with off-the-shelf tools. At Amazon there were certainly flaws in the internal tools, but overall it was much better than the "real world".


I'm not quite sure if it is true that those tools are better than "the real world". There are many options for each of the steps that you are mentioned, old systems mixed with new solutions, so developers have to learn all of them and onboard to each of the system that they decided to use. While they provide some templates and scripts here and there to solve these problems, most of them are not production ready, and doesn't provide a holistic experience. I work for smaller companies before and the platform teams did a much better job to provide a unified experience across those steps.


What you are assuming those tools are completely unique and there aren't any open source alternatives, which is not the case


Isn't this the Google curse? Companies hire an engineer who was at Google and suddenly find the Google engineer is building out all the Google internal infrastructure within the new company and because You Aren't Google, it sucks.


Most (all?) of these are open source and can be used outside of Meta infra. https://github.com/facebook/sapling


> Mononoke is the server-side component of Sapling SCM.

> While it is used in production within Meta, it currently does not build in an

> open source context and is not yet supported for external usage.


You can still use sapling with existing Git repos


Unpopular opinion.

May be this is a feature not a bug? They want to filter out some Resume-Driven developers.


… lock developers into the ecosystem and prevent them leaving, since their knowledge is not so valuable elsewhere.

A bit cynical, but maybe not so unpopular opinion.

Meta as business is all about locking users to the ecosystem.


Same thing applies to most government jobs. The tech is so out of date / niche that you become un-hire-able with those skills.


Unless you hit a job on some consult firm which offers special knowledge about all these!


No see, what happens is you leave then go to some other company and complain about how shitty their tools are then build a crappy half-baked version of whatever you had at $BigTechCorp and when shit hits the fan you boomerang back to $BigTechCorp for a sweet promo and raise.


I avoid hiring people coming from places like this for reasons similar. Extends past tools into libraries and frameworks, databases and other middleware, developer and business workflows. Only knowing proprietary stuff is a handicap. Assuming that proprietary stuff is superior because the big ad companies prefer it is even worse.


Archive link for people who block facebook - https://web.archive.org/web/20230628131034/https://engineeri...


I know git is complex, and the UX is sometimes messy, but after really, really learning it (shoutout to the Github training folks), I've never had any problems that couldn't be solved. I understand the desire to simplify some things, and their log looks way better than gits, but I wish they'd contribute back instead of rolling their own entire VCS.

The one thing that is super exciting to me is the stacked pull request support. Using Github for this kind of workflow is enormously painful. Conversations constantly get outdated, and its nearly impossible to track whether comments have been addressed.

I know they're working on an improved UX/experience there, but it seems like it'll be a good long while, especially for enterprise server customers.


> I've never had any problems that couldn't be solved.

Stockholm syndrome. I've used Git for 15 years, early GitHub user, etc. Yes, you can solve many of these things, but until recently even things like "I am changing patch 2 in a series of 5 and need to rebase the following 3" were ridiculously painful. This is a common workflow many people like (including the Linux kernel devs) and Git was bad at it.

Git submodules. I'm not even going to go into this, they're so bad. That's a problem I wish Git had never "Solved" to spare us the burden.

There are tons of minor nits in Git all over the place. "Solving" something is completely different from actually having something that can be easily used for your team. There's no amount of contributing Facebook could have done to fix Git, because they'd be turning Git into something else that it fundamentally is not. And it doesn't matter if you have a trillion dollars, it's often not practical to just overhaul someone else's whole project when these goals don't align.


> Stockholm syndrome.

Is an intellectually dishonest fantasy invented for the sole purpose of using it to discredit and distract from criticism of the actions of the inventor of the phrase, so should never be ascribed as the source of a position you want to argue against unless your intent is to signal that your own position lacks a reasonable argument and you are just choosing to character-assassinate the opposition to cover for that.


[flagged]


That comment is neither chat-generated nor baseless: https://www.idiva.com/health-wellness/mental-health/why-the-... https://www.themarysue.com/viral-tweet-exposes-sexist-origin...

"Assume good faith. Please don't post shallow dismissals." https://news.ycombinator.com/newsguidelines.html Please try abiding by that, thank you.


I know git makes you shove toothpicks under your fingernails in order to let you use the keyboard, but after really, really learning to do this, i've never found it a blocker for my daily work. I understand the desire to simplify things, but i wish they'd contribute back ways of making people more comfortable with the toothpicks rather than just removing them and starting from scratch.

I am not a fan of FB, but they tried - you can find them on the git mailing list where they got told they are doing it wrong for things like "scaling" or "productivity". Which is always ironic since basically nobody in open source generates or uses any real data about productivity, it's all just gut feelings about users.


I think the biggest issue is that back around 2011/2012 ish, when Facebook devs went to git core devs and asked how they could get git to scale to the size of their predicted monorepo, the response was roughly "no, shard it".

git falls over and dies really, really badly when the repo gets stupidly large.

There's an article alluding to the discussion here: https://engineering.fb.com/2014/01/07/core-data/scaling-merc..., but I can't find the original thread on the git mailing list.


Maybe this is the mailing list you were searching for?[0]

[0] https://web.archive.org/web/20210119051414/http://git.661346...


Ah thanks! Yes I think that was it.


"Contribute back to our piece of crap that is 99% antagonistic to your use case" is not realistic. No amount of third party contributions to git will relieve git of its opinions about how development workflow should be done, and those opinions are not shared with every organization.


Surprised to see them use Phabricator (I know it came out of there, but I basically already forgot it existed)

I used it briefly but couldn't get most people to adopt it widely enough.


Their internal Phabricator is quite different from the open source version. When I worked there (around 2017), it was definitely miles ahead of anything I'd used before. A lot of how it works is probably... controversial, I guess, but it suited my mental model quite well.


I used Phab at a job and it was a complete mess. Stacked Diffs only sort-of worked, CI integration was bad, notifications were so noisy everyone tuned them out, etc.

Talking with a Meta person, it sounds like Phab really needs Mercurial to work well, at least for Stacked Diffs because you need to be able to identify commits independent of their location in history to properly maintain the Stacked Diff associations.


I wonder whether they'll continue using their in-house Phrabricator or choose to support Phorge (https://we.phorge.it/) now that open-source Phabricator is no longer supported.


Internal Phabricator isn't even really Phabricator anymore. In fact, the name changed to just Diffs. I'm pretty sure its entire codebase has been rewritten, at the very least into Hack from PHP. They have a custom API; not Conduit. Etc.

So, no. They wont support Phorge. They really never supported open-source Phabricator after Evan left and made it his own with Phacility.


I used Phabricator at a previous (non-Meta) company. It was better than GitHub in some ways but worse in others. Overall I didn’t mind it


reviewstack is the thing now, adds some parts of phab UX ontop of github apis. Been enjoying using it for oss stuff oursite meta and fixed a limitation of it recently, https://reviewstack.dev/facebook/sapling/pull/656 shows what it looks like with the versioning available


I am genuinely curious about wasabi, the python LSP they announced on Meta open source but is not available anywhere. Would love to try that out, there is not enough competition in the LSP space in Python and it would foster new development https://developers.facebook.com/blog/post/2022/07/18/enablin...


In the “Offline + Online Processing” section, are they talking about an external service that needs to run alongside the processing that occurs on your machine? Or am I misunderstanding that?


The linked article has a section about IDE that this one does not touch. I was surprised they are locking them selves in to one tool with their workflow.

I'm a bit worried about the dominance of VS Code. I can't stand the editor and it's popularity just grows and grows.


They aren't locking themselves in. It just makes sense to support less IDEs than more IDEs. You can move faster if you don't have to duplicate your work between vscode, intellij, android studio, emacs, vim, etc. Nothing is stop developers from using ed, the standard editor, if they wanted to.


What are you using and why?


IntelliJ. To name a few features I've not been able to get in VS Code or have had a worse experience: - debugger

- refactoring (extracting functions and variables, renaming across entire project etc.)

- context aware selection (expand selection from cursor logically)

- stack trace / error parsing

- comparing anything (diff selection with clipboard for example)

- DB schema support even in SQL formatted as strings (e.g. when using psycopg) directly from DB by just connecting to a DB

- find in files (I've never seen a VS code user have an easy time finding stuff)

Etc.


If Meta acquires a new company codebase, do they just move it into their monorepo immediately?


Context: I was part of a company acquired by FB in 2019, worked there until Dec 2022. It really depends, based on how independent the acquired company needs to be, and how useful it would be to have overlap with code components & engineering resources with the main monorepo.

In our case, up until Dec 2022, parts our pre-acquisition monorepo were still separate, but gradually components and workflows, such as tests and reviews, were moved into the main repo. From my awareness, we didn't even start merging into the main monorepo until more than a year after we were acquired, though of course there were exploratory efforts before.

In general, I'm pro-monorepo, it makes sense to be able to update multiple interconnected components in lockstep. For the startup, we were still in research mode, so it was less urgent to spend eng/sci/TPM to incorporate with anything on the FB monorepo side...until it was.


I just love talking about build systems, editors and developer tooling. Anyone has resources on where I can read about how different companies do it?


Google describes theirs in their software engineering at Google book. It’s available for free online


Off topic, but why haven't they migrated these kind of sites to the meta.com domain? I mean, these are Meta tools, right? Not FB tools? Unless... the rename was more of an exercise in liability obfuscation than it was an indication of any kind of reorganization...


If anyone is interested in an open source VCS that's trying to solve similar problems to these internal tools check out https://jamhub.dev.

(I am the author)


Best tool: macros. You can easily tie in gifs and memes into just about anywhere


Sapling looks interesting. Git has a horrible user experience.


Might as well release the Tupperware/Twine part and complete the picture :)


Efficiency!

- Where are you going?

- I'm going on vacation.

- Have you finished your project?

- Not yet. Just submitted the diff.

[^_^]


Someone is going to read this and start to retooling their five person developer organisation because: "Facebook uses it".

It's funny that the Sapling command is "sl", that's going to conflict with installations of "stream locomotive".


lol I've suffered under that before... except it was an ex-Googler forcing bazel on us, all of a 10 person dev team working on a codebase that was probably less than 15,000 LOC across three or four packages that just _had_ to be packaged into a monorepo.


I worked with a guy who loved to use "Microsoft does it" as justification. Likewise, my suggestions such as "maybe a consistent naming convention for these components would be sensible" were met with (literally) "hmm I haven't seen MS suggest this". That was my shortest developer gig.


Once it was setup was it really that painful? I’ve worked with buck and for day to day usage it didn’t really affect much.


I've worked for both Facebook and Google so can make informed comments on this with two exceptions: Buck2 came after I left and I'm honestly not sure what sapling is. Is it some Mercurial-like re-implementation a bit like how Google's Piper is a re-implementation of Perforce?

The tl;dr is that Google's developer tooling and ifnrastructure is superior in almost every way. Examples:

- When I started at FB we used Nuclide, an internal fork of the Atom editor. While I was there it was replaced by VS Code. It's better but honestly they should've built their tooling off of Jetbrains products. Jetbrains make IDEs. VS Code is a text editor like vim or emacs. There's a massive difference;

- Buck should've been killed and replaced by Bazel. I can't speak to Buck2 but this seems like a pointless investment;

- Thrift should be killed and replaced with gRPC/Protobuf. Same deal;

- FB's code search is just grep. It's literally called BigGrep. Grep can get you pretty far but it's just not the same as something with semantic understanding. Google has codesearch, which does understand code, and it's miles ahead. This has all sorts of weird side effects too, like Hack code at FB can't use namespaces or type aliasing because then grep wouldn't be able to find it. When there were name conflicts you'd sometimes be forced to rename something to get something to compile;

- Tupperware (FB's container system) is a pale shadow of Borg;

- Pushing www code at FB is a very good experience overall. You commit something and it'll get pushed to production possibly within an hour or two or, at busier times, it might take to the next day. This requires no release process or manual build. It's basically automatic; Google's build and release process tends to be way more onerous;

- The big achillees heel in FB's www code is that it is one giant binary. There's no dependency declaration at all. This means there's an automatic system to detect if your change affects other things and that process often fails. This leads to trunk getting broken. A lot.

- Because of the above problem there is a system to determine what tests to run for a given commit. This is partially about what the affected components are but also longer-running tests aren't run-on-commit and often those tests would've found the problem. There is no way to say "if this file is modified, run this test". That's a huge problem;

- FB has a consistent system for running experiements and having features behind flags (ie gatekeeper). This wasn't the case when I was at Google. It may well have changed;

- Creating a UI for an internal tool or a new page is incredibly easy at FB. There are standard components with the correct styling for everything. If you want to write an internal tool, you can start at 9am and have it in production by noon if it's not terribly complicated;

- The build system for C++ at FB is, well, trash. For Buck (and Bazel), the build system creates a DAG of the build artifacts to decide what to build. FB C++ might take 2 minutes just to load the DAG before it builds anything. This is essentially instant at Google because a lot of infrastructure has been built to solve this problem. This is a combination of SrcFS and ObjFS. Incremental builds at FB to run tests doesn't really work as a workflow;

- All non-www builds at FB are local builds. Nothing at Google (on Google3 at least) is built locally, including mobile apps. This is way faster because of build artifact cachcing and you have beefier build machines.

- There tends to be less choices as to what to use for FB code (eg storage systems). I consider this largely a good thing. You will typically find 5 different way of doing anything at Google and then need to consider why. You will often find different teams solving the same problem in slightly different ways or even the exact same way.

- There are people at FB who work on system-wide refactors (eg Web security, storage). These people can often only commit their diffs that might touch thousnads of files on weekends.

- A lot of generated code is committed at FB that isn't at Google. This exacerbates the previous problem. FB has a ton of partially and completely generated files that mean a change to the generating code has a massive effect. At Google, for example, the protobuf generated code is genearted at build time and isn't in the repo.

There's probably more but that's what comes to mind.


Buck2 is IMO much better than even Bazel is from a design POV, because it actually cleanly separates all user rules from the build engine, and has a coherent modern design around a sound theoretical basis. Neil, one of the leads and author of this post, has written many build systems, so it's not like he's unaware of Bazel; his taxonomy of build systems and ones like Bazel in "Build Systems a la Carte" is worth reading even for Bazel users. It also has a snappy UX and is fast on the command line, which Bazel still lags at a bit. Bazel has all the mindshare, though, and a lot of good features and libraries. But Buck2 is in a different league from Buck1 completely, and in many ways even from Bazel, IMO.

Even then, it's not like rewriting a billion lines of BUCK files to use Bazel was even practical. Realistically any solution had to have a direct migration path from Buck1 without rewriting everything. I don't even work there, this is just pretty obvious from the design constraints and talking with the dev team, though. Frankly, I'm pretty impressed they were both able to meet the goals they had (migration from Buck1, better performance, more extensible), while still jam-packing the thing full of good design decisions and features like they have. It's good work.

(I am a pretty happy, non-Facebook user of Buck2 already, FWIW.)


I hope Buck2 is better. Like I said, I have no experience with it.

My issue is more organizational. Meta, as a company, in my experience, does not put in sufficient investment to build a mature, robust open source project with few exceptions. Even things like the www test infra, which are core to the company when I was there had ~1 FTE SWE. That's not open source but you get my point. Buck (and Thrift) seem to have been woefully underinvested for years. Thrift was originally an intern project at a time when Stubby (Google's protobuf-based RPC) was not open source.


That's fair, and I agree because honestly almost no companies IMO actually know how to build robust FOSS projects outside their own needs, I wouldn't put that totally on one place. It pretty much actually comes down to the engineers, in my experience, and how much they understand the whole thing. It's a big problem actually. But I know what you mean, Buck2 is definitely a Meta-first project right now, since they're still gearing up to replace everything. They've been pretty receptive to me, at least, but it probably helps that Neil has a bunch of FOSS experience (and we have a bit of rapport with each, so that helps), and there's many more people than just him on the job who all seem to want it to succeed! I think a lot of the technical decisions will help it grow better than Buck1 did, too.


But Meta has had a lot of cool open source projects over the years. React, cassandra, graphql, etc


What about pytorch?


Everyone reading this should assume it's talking about FB around 2014 or so, I don't know how it could otherwise be so wrong.

> FB's code search is just grep. It's literally called BigGrep.

This hasn't been true for a long time, codesearch at FB is more complex than just grep and has some semantic understanding. Here's some discussion here when some of this infrastructure was open sourced: https://news.ycombinator.com/item?id=28365880

> FB C++ might take 2 minutes just to load the DAG before it builds anything.

Yeah, this is pretty terrible, but is mostly a description that doesn't apply to buck2.

> This is essentially instant at Google because a lot of infrastructure has been built to solve this problem. This is a combination of SrcFS and ObjFS.

SrcFS and ObjFS aren't what solve this problem at Google. And to the extent that they do, FB's sapling and buck integration do the same.

> All non-www builds at FB are local builds. Nothing at Google (on Google3 at least) is built locally, including mobile apps. This is way faster because of build artifact cachcing and you have beefier build machines.

This is wrong since at least 2015 FB's build system had the build artifact caching, and you can see in buck's git history that they've had remote execution just like Google (in fact, using the same RE api as bazel) since like 5 years ago.


As for Glean, you concede "some semantic understanding". All I know is as of 2 years ago I couldn't do "Find Usages" of a particular C++ method where I could Google cs 6+ years ago.

I did spend way more time doing www than C++ though and Hack was specifically written to facilitate regex searches like being able to search for "SomeClassName::someFunctionName" to find usages as well as the other examples of prohibiting namespacing and type aliasing (in Hack). Google cs doesn't have that constraint.

> SrcFS and ObjFS aren't what solve this problem at Google

It's a mix. You can't just pull out one piece of the Google dev infra because it's all connected. In a P4 client, you'd list some paths that were "local" allowing local modifications. Forge, SrcFS, ObjFS, TAP and Sponge are all pieces of this puzzle.

> This is wrong since at least 2015 FB's build system had the build artifact caching

First, there's more to FB builds than infra C++, most notably iOS and Android, which are all built locally. It's why iOS/Android engineers have big, chunky machines like the iMac Pro or the trash can. If this has changed, it's a fairly recent change. Google builds mobile apps on Forge. There are literally racks of Mac Minis to build iOS. With this you can build artifact cachincg like you do with, say, Google3 Java or C++.

Second, "local" requires some further explanation. Typically, things are built on a devserver (although this was transitioning to on-demand VMs, for which the Google equivalent was CitC). But a devserver build required a full checkout and build with artifact caching. It could then be incremental until an hg pull forced a larger rebuild. There were sparse checkouts but I think the support was pretty limited and only worked on certain infra projects.

Either way, the whole FB C++ build experience was fairly primitive and even worse for iOS/Android.


> Typically, things are built on a devserver (although this was transitioning to on-demand VMs, for which the Google equivalent was CitC)

I skipped over that parenthetical in first read, but my God, it's not even wrong. I mean citc is excellent and all, but describing those as equivalent just makes me wonder if you held some non technical role and are just playing telephone from people who better understand things. Like I'm just imagining a conversation where you're like, how does Facebook handle this specific thing that's handled by a X at Google (or the reverse)? Getting an answer on one side that involves citc and the other that involves on demand and deciding that that specific problem represents the entirety of those two ecosystems.

I guess one of the difficulties I have here is that a lot of what you've described across a bunch of google-facebook comparison is just like so factually wrong that i can't ignore the problems with a direct reading of what you are saying. At a deeper level, I think I'd totally accept google's CITC and facebook's on-demand as like philosophically similar in that solutions to seemingly unrelated things fit into the structure enabled by those systems. Again though, I think crediting you with trying to discuss things in those terms would be too generous given that that equivalence is no more true of on-demand VMs than it is of devservers.


> First, there's more to FB builds than infra C++, most notably iOS and Android, which are all built locally

Again, ios and Android have had the remote artifact caching you mention as so important since many years ago. Android has had remote execution for years (I think even before the infra c++ you mention).

> But a devserver build required a full checkout

This is not true with sapling, which had been used extensively for years.


> the whole FB C++ build experience was fairly primitive and even worse for iOS/Android

Flatly untrue, on iOS much of the infrastructure was well ahead of what anyone else had aside from Google. It worked so well that a tooling team responsible for upgrading to a new version of Xcode soon after its released declared victory and took credit for the compiler upgrade, as this had been an issue in past years.

They hadn't realized that a compiler team had been quietly running their builds company-wide for nearly two years and fixing compiler bugs on the bleeding edge of clang/llvm the whole time, so by the time the compiler was branched for Xcode and released from Apple, the fixes were already complete, open-sourced, and merged fully upstream.


> A lot of generated code is committed at FB that isn't at Google. This exacerbates the previous problem. FB has a ton of partially and completely generated files that mean a change to the generating code has a massive effect. At Google, for example, the protobuf generated code is genearted at build time and isn't in the repo.

What's nice about this scheme at Google is all manner of generated code is indexed, so you can navigate up and down the caller-callee graph between artisanal code and generated code. Really works well.


A lot of this used to be true but is no longer so.

Buck2 is much faster than Buck and pretty great for the set of problems it's trying to solve. (I'm skeptical that the corner that Meta has painted itself into is good, but assuming that that can't change, buck2 is great.)

Remote builds are common now, from what I've heard.

Sapling is the name for Meta's fork of Mercurial. Piper is not a reimplementation of Perforce, by the way. The closest equivalent that Meta has to Piper is their source control server Mononoke.


> Piper is not a reimplementation of Perforce, by the way.

Yes, it is. You are probably confusing Piper, the internal name for the VCS, with Google Piper [1], which is a completely separate and unrelated project. That project seems a closer match to Monomake.

This was a common problem with Google, actually. There would be an internal name but once released it would take on a different name. Sometimes that name would conflict with a different internal name, which then made internal searches impossible. I forget the specifics, but Buzz became an external product but was an internal product name for something different.

Dart (the language) was internally known as Dash prior to release. IIRC Google Dash was an external thing for advertisers or something like that.

So when I say Piper I really do mean the internal rewrite of Perforce.

[1]: https://cloud.google.com/customers/piper


That link is not to a Google project named Piper. It's to a company named Piper that uses Google Cloud.


I'm not confusing anything. Saying that Piper is a reimplementation of Perforce is like saying that Git is a reimplementation of CVS.


I've been out for a couple of years, but...

> I'm honestly not sure what sapling is

My rough understanding: arcanist + git evolved into arcanist + hg, hg became a frontend to cHg and then the whole edenfs/sapling stuff started to replace that to optimize sparse checkout workflows.

> - Buck should've been killed and replaced by Bazel. I can't speak to Buck2 but this seems like a pointless investment;

Bazel's open-source version remained significantly less capable than Buck for a long time; Buck migrated towards Skylark just like Blaze, and cleaned things up greatly.

> - Thrift should be killed and replaced with gRPC/Protobuf. Same deal;

In general, agree with the former, but that's because protobufs and especially flatbuffers were already used in many places for many many years.

> FB's code search is just grep

There was also a less popular semantic search that people didn't use nearly as much.

> There's no dependency declaration at all

Strictly speaking this is not true, it was just not usually needed.

> There is no way to say "if this file is modified, run this test".

You needed to update the target determinator, but I do think this was possible.


> FB's code search is just grep. It's literally called BigGrep. Grep can get you pretty far but it's just not the same as something with semantic understanding. Google has codesearch, which does understand code, and it's miles ahead.

Pedantically, codesearch is also grep. But codesearch calls out to Kythe (nee Grok) which has a semantic graph of the code.

But.... internally everyone thinks the Kythe team is just codesearch anyways, so that's about right lol.

Kythe is partially open source, but critically a lot of the postprocessing to get it to work at massive monorepo scale is not, so FB would have a bunch of work to do to replicating it.

Also we don't have a PHP indexer, because nobody's written one.


How many engineers did Meta lay off in the last 12 months? Is there a developer tool for laying off developers?


Fewer than some other companies. More than some others. But yeah, there are internal tools for layoffs. How else would you turn off access to hundreds of laptops at once?


I stopped reading when I reached that they made their own CVS.

This is a solved problem, and even if it isn’t, there’s entire communities dedicated to it. There’s literally no reason why Facebook needs to be in the business of reinventing the wheel from scratch beyond some dude trying to show “impact”. It’s a distraction from core business problems.


> This is a solved problem.

Hosting a gigantic monorepo for 25K concurrent users is so far from a solved problem.


[flagged]


I disliked a lot of things about working at Meta, the monorepo was not one... it was truly amazing, and very well done.


For a moment, let's assume they do abandon the monorepo. What solution would you recommend for managing code dependencies and coordinating releases between thousands of teams (at a modest 5 repos per team) - git tags?


This coordinating releases across teams is not a unique a problem. In fact, every large software organization solves this problem. They don’t usually do it in a assbackwards way due to institutional blindness.


You're deflecting. What solution would you recommend, since you disapprove of monorepo as a solution to this problem we both agree exists.

If you're not simultaneously updating the code and all it's references (i.e. a monorepo), you will need a version dependency graph system (with integrated with your build system). I'm yet to encounter one such tool that isn't awful to use[1]: monorepos are an improvement when you grow beyond a couple dozen repos. Git submodules aren't a good solution either. If you familiar with a decent tool/workflow that is not "institutionally blind", I'd love to learn more about it.

1. Gradle, Android's "repo", home-grown git-submodule-based build systems.


What version control system do you consider to have solved it?


That's a very uninformed opinion. Nobody could say that unless they are capable of actually coming up with a solution that can handle version control at Meta's scale.


>they made their own CVS.

It is a fork of Mercurial and there wasn't a community dedicated to making it scale to the scale Meta was reaching hence why they invented resources in making Mercurial scale.


You might be surprised to learn that it was not and still is not a solved problem for companies like Meta. Here is an earlier write-up about this topic from Meta: https://engineering.fb.com/2014/01/07/core-data/scaling-merc...


Also keep in mind they've been in hypergrowth mode since 2010 or earlier. Regardless of whether Git is the clear solution today, it definitely wasn't at that time. So it makes sense they invested in building their own tooling.


And now it's 2023, and they're inventing the wheel again.


Not a very robust analogy. There are tons of different wheel designs out there. Some are good for racing and suck for rain. Some are great in the snow but are noisy and inefficient on the highway. Etc... Maybe their needs are so special they need a wheel design that doesn't commercially exist?


Someone really needs to reinvent that analogy.


You might be surprised to learn that they're solving the wrong problem.


I don't think you fully understand how big their codebase is and how many different teams are working on it at any given time.

There are very fundamental differences between a mono-repo and a bunch of repos for each "service" or whatever. Lots of tradeoffs. I've worked both and I can see the reasons for huge monorepos. They make a lot of things that were previously hard much simpler... The tradeoff is your tooling needs to be able to scale with growth of the company. And for a company the size of FB, dedicating an entire team to improving the tooling for their monorepo is well worth it.


Sit down. I worked at FB.


One year in 2014/2015?

Saying that Meta "reinvented CVS" means you either know very little about source control or you are just being purposely misleading. Neither is a very good look.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: