Hacker News new | past | comments | ask | show | jobs | submit login
Using Rust in Mercurial (mercurial-scm.org)
341 points by oblio on Dec 4, 2017 | hide | past | favorite | 147 comments



So my understanding is the hg developers are planning to rewrite a large part of hg in Rust. If so, this is an unfortunate blow to Python, because often I hear (and I do as well) Python developers cite hg as one of the largest Python-based application (regardless of the C-extension). I certainly feel sad if that is the case.

I studied hg quite in depth in undergraduate for a semester, when I was implementing "bitbucket" myself. To be really honest, the codebase was easy to navigate, and function names were pretty consistent with the actual hg commands/internal spec. While the code itself is probably hard to write any true unit tests (you'd have to monkeypatch like crazy) using mock -- which means the function has a lot of code, overall the codebase quality was pretty good for a complex software. I just had to know the variable name abbreviations, get used to them and referred back to the Hg paper.

In anyhow, whatever the decision is, I'd learn a bit Rust to help out :-)

P.S. I am still trying to find an answer to this: if FB uses Hg, then what about their git code?


> rewrite a large part of hg in Rust

That's not really the right story. The motivating factor is that we'd like to write /less C/, not /less Python/. It's likely in the long-term that parts of the code that are performance sensitive will move out of Python, but we've been doing that for years, and we just keep finding new things to optimize as the scale of Mercurial repositories keeps increasing.


Really a question to indygreg, but just curious to know if you considered using D with pyd? What are your thoughts?


> If so, this is an unfortunate blow to Python

As both a Rust and Python user, I don't necessarily agree. This isn't a rewrite to purge Python; it's clear from the post that Mercurial values Python's flexibility, and wants to bend over backwards to ensure as little of that flexibility is lost in whatever transition may occur. And Python's credentials would be secure with or without Mercurial being written in it. Frankly, I think that swapping out critical bits for a low-level language is a great way to scale a dynamic-language codebase without completely sacrificing the usual ergonomics of dynamism.


I think the article makes it clear that a 'purge' of Python is desired:

> In addition to performance concerns, Python is also hindering us because it is a dynamic programming language. Mercurial is a large project by Python standards. Large projects are harder to maintain. Using a statically typed programming language that finds bugs at compile time will enable us to make wide-sweeping changes more fearlessly. This will improve Mercurial's development velocity.


"Desired End State

hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today)"

Clearly not a complete purge.


The end state described is one where Mercurial is written in Rust so that large scale changes can be made without touching any Python. 3rd party extensions may still be permitted in Python.

If that's not what the authors intend I'd suggest they rewrite the document to make clear what the quote above means to describe.


Basically, Mercurial is migrating away from Python as its primary and core programming language. You don't say Git is written in Python even if Git were to have a few Python code (I actually don't know, never read Git code), would you?

The Python community agrees that in order to combat performance, Python developers would write C extension and/or compile code with Cython before considering a language migration. You can no longer say Hg is written in Python. You can't cite that in any of your conversation unless you say "was written in Python". This migration is to show that while Python's flexibility and dynamism are great, especially in the context of development velocity, Python is no longer Hg's first-class programming language of choice whenever possible. There is a limit as to how much you can get from Python. That's a blow, a temporary crying moment. I recognize this is cynical or entirely an ego thing. I am, again, to be clear, I am not criticizing the decision to move away to Rust - because I have an equal respect for Rust, but this announcement is nonetheless a sad moment to see another Python project moving away.

Sometimes improvements to a programming language stem from the limitations observed by a popular project (and equally by the large number of users).

P.S. On the other hand, moving CPython devl workflow from Hg to Git was a blow to Hg.


> Basically, Mercurial is migrating away from Python as its primary and core programming language. You don't say Git is written in Python even if Git were to have a few Python code (I actually don't know, never read Git code), would you?

Not necessarily. It would show that you can prototype an application in Python and once you have a stable product you would have a migration part to Rust for optimization. That might be even a bit more convincing than the prospect of starting a new project in a non-GC lang.


> The Python community agrees that in order to combat performance, Python developers would write C extension and/or compile code with Cython before considering a language migration.

Not sure if this is what they're doing here, but you can also write extensions in Rust instead of C. If that were what's happening here, I don't think that it would say anything more about Python than writing a C extension would (although it might say something about C vs. Rust)


You know I think hacker news and reddit can be exasperating because only people who disagree reply.. so nobody ever seems to agree with you or understand you. Just wanted to stop in to let you know that I 100% agree this is a sad moment for a python fan or dev. As a general rust fan (although in this instant I'm a rust hater), this is a nice moment, but I 100% see how this sucks. Its as deflating as if a large rust project ditched rust, or Mozilla maybe said "actually rust was a waste of time lets go do Go or C++".. truly a sad moment. I'm with you on that!

For me, if I received a comment like the one I just wrote, I may feel slightly vindicated and ok about myself, for those of you who wonder why I wrote this comment and are not satisfied with the first sentence as explanation. I'm sure there'll be others who still think this was a stupid comment.. but hopefully someone out there understands where I'm coming from and sees the value in what I'm saying.


> If so, this is an unfortunate blow to Python

I doubt this will have much impact - bazillions of other things use python


> this is an unfortunate blow to Python

Just as unfortunate as how Python turned its back to Mercurial


You are referring to https://www.python.org/dev/peps/pep-0512 aren't you?


Such a petty and childish thing to be upset by. If Mercurial is soo great, then surely nobody would switch to git. Mercurial's extensibility has served some people, but in large part people prefer to work with git today. If that's what constitutes a betrayal, I don't see how there was much of a relationship to betray.

Mercurial, Bazaar, Darcs, CVS, and even Subversion are all now barriers to entry for a project. Git has won the vast majority of mindshare and good will by being straightforward, ubiquitous, and well supported.


People are switching to git because of popularity and GitHub, regardless if it's technologically superior. Not always the better technology wins, often rather "just" the more popular one.


Sounds like a cop-out to me. GitHub and GitLab are strengths of Git, just as L3 networks, Verizon, CloudFlare, AWS, and Akamai are all strengths of TCP.


> Git has won the vast majority of mindshare and good will by being straightforward, ubiquitous, and well supported.

Thanks for a good laugh. The sheer volume of "No the REALLY right way to think about git's totally brain damaged use of source control concepts. This time we mean it." tutorials shows that "ubiquitous" is true but "straightforward" and "well supported" are a lie.

But you are correct. The vast majority of coders have decided on git. I will happily sit over here on Mercurial while you try to figure out the arcane git command for something that I was done with in Mercurial hours ago.

The Mercurial->git gateways are now quite good and I can do my development in Mercurial and transfer it to git at infrequent intervals. It's not like git people want to see all my intermediate commits anyway--that's the whole point of "rebase" after all.


> while you try to figure out the arcane git command for something that I was done with in Mercurial hours ago

I sense a lot of resentment in this statement, but to be honest I have no idea what people are talking about when it comes to their gripes against Git. Little goes wrong, and when anything does, fixing it is a reflog or a reset away. I could understand being frustrated trying to jump right into it, especially with the interface as of the mid-late oughts, but it's fairly straightforward these days. The defaults make sense, and the built-in tools are consistent and plentiful. I doubt I could compose a correct filter-branch from memory, but I don't even think I could find an example for Mercurial, and I doubt the manpage would be as helpful.


> fixing it is a reflog or a reset away

Only for people who have internalized what GP called "git's totally brain damaged use of source control concepts." When you don't understand the structure of the underlying database that Git uses, the error messages that Git returns might as well be in Greek for all the help they provide. Mercurial has mostly the same concepts as Git but the developers have taken the time to translate the error messages that are returned to users into words that actually explain what the problem is and how you might go about fixing it.

In my experience with Git, there's two ways to be productive with it. The first is to go read Git source code and basically understand exactly how each command you issue is changing the data in your repository. I eventually resorted to this strategy the umpteenth time that I managed to get my repo into some bizarre state that I'd never seen before. The other solution is to ensure that the entire team sticks to a very precise subset of what Git provides and be pretty militant about limiting everyone to those blessed incantations. Git flow is pretty useful for this since adherence to the steps generally keeps a Git repo in working order. I generally try to follow this second strategy since following the first means that you become the "Git guy" at work that everyone turns to when they've managed to get in over their heads.

Also, mostly unrelated, but when I work in Git, I miss hg serve. Being able to pull changes from/to colleagues using zeroconf without needing to set them up as a remote was really useful. I know Git has instaweb, but it's an incomplete solution and just another example of where using Git just has a lot more friction than the more pleasant hg experience.


> When you don't understand the structure of the underlying database that Git uses

For me this is totally exactly true for Mercurial. I find git's underlying model way easier to understand. Mercurial with its 3 different branching representations (branches, bookmarks, patch queues..) and "state tracking" is really quite a bit more complicated and I do not find it is easier to internalize; quite the opposite, as a git user I have found it extremely difficult to contribute to hg-based projects for this reason. For me, git is just a tree and pointers to nodes on that tree. I find it hard to see how it is difficult to understand.


Branches are, well branches, bookmarks are pointers, and patch queues are quilt integrated into Mercurial, and not something you need to worry about anyway. And if you need to, it's just a staging environment for commits before you publish them to the immutable branch.

I have now, for 3 teams, migrated them to Mercurial from Git, and all now have zero issues with their DVCS. Every now and then, they would write and ask for help with some issue with Git they need solved.

It seems as the average developer do not want to learn the intricate workings of Git to properly use it, and just want a tool that does what you think it will, and will not shoot them in the foot if they issue the wrong command.


My point is that you can make the exact same argument in the other direction, so that argument is pointless. Hg users seem to take it as granted that it's so much easier to use and understand but I simply don't find it so. It is not an objective point of view.

Case in point,

> Branches are, well branches, bookmarks are pointers,

you say this as if it is obvious and means something. As a git user, "pointers" are branches, and "branches are branches" means nothing at all. Contrary to claims of Hg users, to understand how branches work I need to spend some time understanding the internal representation of branches and experimenting with toy local and remote repositories to make sure I don't step on my own foot in a real project. Just like I did when I was learning git. The claim that it is somehow more natural and requires less learning is unfounded imho.

> just want a tool that does what you think it will, and will not shoot them in the foot if they issue the wrong command.

I'm sure I'm not the only person who has shot themselves in the foot using Hg. Adding files and changes I didn't intend, pushing them, having to go to the repository site and remove that tip, trying to delete the branch locally to redo things, giving up and re-cloning the repository. These are all learning steps that one goes through using Hg. Just like git.


A whiteboard, draw the commits as blobs and draw arrows between nodes and parents. That is all there is to it. You need to do this for Git as well. This is not an argument for or against. But calling Mercurial branching model confusing compared to Git is dishonest IMO.

The major deciding point between the two is that Mercurial see history as an immutable truth. Git does not, and actively encourage changing it. This is also the reason behind Mercurial "push/pull it all", and Git "push/pull what I say". I believe this is the major point of conflict between the two camps. Once you master Git, you get a tool. Mercurial actively discourage this history rewriting, and many from the Git camp get frustrated that they can no longer easily manipulate what is in the repository.

If all you ever do it branch, commit, and merge, then either are fine. If you want ability to modify history, use Git. If you explicitly do not want to modify history, Mercurial is the better choice. And this is the crux of my point of view. If you modify history in either system, you may end up shooting yourself in the foot. But it's significantly harder to do in Mercurial, exactly because the system encourage not doing it.


> The major deciding point between the two is that Mercurial see history as an immutable truth.

That view is quite outdated[1]. It's true for _published_ history though, but that's it. Couldn't help but note Mercurial did it right, _again_ :)

1: https://www.mercurial-scm.org/doc/evolution/


I was only talking about commits in a publishing repository. What happens on a developers machine, and between developers as draft changesets is quite a different story. We use that a lot and I really like it. Git has the concept of remote removing commits and garbage collection.


I guess you haven't read the link posted. Changeset evolution is not about draft commits that a developer changes locally. It's about actually changing the public, pushed history in a way that nothing breaks for other developers who already based commits on those that get altered.


Um, that's not quite true: https://www.mercurial-scm.org/doc/evolution/sharing.html#pub...

The limits imposed by phase are not going anywhere; you still cannot edit published history.


Publishing repository by definition contains public changesets. You cannot alter history for them without force. Again, what happens on and between developers can and should be trimmed before it's pushed to the publishing repository.


It's easy to understand, I just don't get why do I need it for anything but Git development. I mean I was perfectly able to be productive with Mercurial without the single shred of knowledge of it's internals. True, I've learned them anyway, but because I _wanted_ to, not because I've _had_ to in order to be able to do anything beyond these 5 commands mentioned in every git tutorial or understand what git output actually means. And, oh, these internals actually are leaking abstractions, multiple episodes. "You had one job" ©


> It's easy to understand, I just don't get why do I need it for anything but Git development.

I dunno man.. I don't particularly want to implement a macro language compiler these days, but it sure helps to understand how one works when working with the C preprocessor. (Just a dumb example..)

I mean, are you actually arguing that it should be unnecessary to understand the underlying models that your tools use to do what they do?


> When you don't understand the structure of the underlying database that Git uses,...

The directed acyclic graph of commit objects, their associated trees and blobs? After all these years and git tutorials, you would have to be trying really hard to avoid coming across an easy git intro. If someone is a developer, I'd say it's past time to learn it.


The directed acyclic graph is fine.

But that is not really the structure once we start having more than one git repository interacting with remotes. And then there is the staging area and the stash. Tracked untracked and ignored files. There's just a lot of state, and a lot of duplication, where the same result can be achieved in various ways.

Git is a disaster for version control. It excels for one particular use case, and user type. Because of this it has become ubiquitous, and now we need to teach it to everyone.

Part of my work is getting scientists to use version control. I kid you not, there are many here who follow the advice: Before you try to put it into git, make a backup.

And that's sane. Because who the hell knows what will happen once you push the commit and then some changes are merged in, or even worse, the merge fails, etc...

The unwillingness of a large number of HN commentators to see the weaknesses of git is astonishing to me.


> Part of my work is getting scientists to use version control. I kid you not, there are many here who follow the advice: Before you try to put it into git, make a backup.

I know some that won't delete code. If they need to change something, they leave the previous code commented out. They don't merge, either. All changes are manually applied to every branch. They don't trust SVN, despite having used it for years.

I don't know why you think it would be better with hg. You can lead a horse to water, but you can't make him drink.


> The directed acyclic graph is fine.

> But that is not really the structure once we start having....

Well, yes, the DAG of commits with trees and blobs attached to them is still the structure of the underlying database, which is what you were originally talking about. But OK, you want to talk about the complexity that arises from git being distributed and having a staging area. That's totally valid, and it's what simplified wrappers like gitless are here to solve. Maybe try those?

> Git is a disaster for version control. It excels for one particular use case, and user type.

So, because git is excellent at its intended use case, it is a disaster for version control in general? OK...

> Part of my work is getting scientists to use version control.

Well, maybe your scientists should be using some other version control tool if they are having so much difficulty with git?


I am a big fan of gitless. It shows how things could be better with the same underlying structure.

But the standard git interface can not realistically be avoided either. Whether IntelliJ or Atom git plug in, they are all closely modelled on the git commands.

We also have a considerable number of people working on Windows.

The disaster is that because the open source community has entirely embraced git, git is no longer optional. We could try to teach people two tools (pushing git off until later), and it's something we have considered, but that has obvious downsides, too.

It's still something we will investigate, but we are in an extremely resource strapped environment. Time, IT staff, etc...

On a technical note, a single DAG that you append commits to is vastly different from a number of DAGs (technically a directed acyclic forest) that interact in non-trivial ways. So no, the DAG is not the underlying database, it is just a part of it.

Failing to make this distinction is how we end up with millions of tutorials explaining the easy part, and plenty of smart users that still end up with messed up repos.


Yes, the distributed part is hard to wrap your head around. But IMHO, hiding it is even worse because you'll always be wondering what exactly is going on. Whereas with git you just check the status and the current DAG to figure out where you are.


But it's not so easy to just check the status of the repo I am pushing and pulling from. Is there a UI that will show me side by side the two DAGs that are being coordinated through push/pull?


Congratulations. You just hit the nail on the head.

"Developer"

Source control should be usable by people other than developers.

And, to be blunt, if the end user has to be a developer to understand a source control system, the source control system is the problem--not the end user.


> Source control should be usable by people other than developers.

Sure, but does that mean that git needs to be usable by everybody? I don't see how that follows. There are plenty of source control tools, they range in ease of use from beginner to advanced. Git was developed to be a source control system for software developers. If you try to shoehorn it into every use case, how are you going to make anyone happy?


> but to be honest I have no idea what people are talking about when it comes to their gripes against Git. Little goes wrong, and when anything does, fixing it is a reflog or a reset away.

I will be happy to call you the next time one of my devs does something that causes git to wedge itself into one of those states. It happens about every 2 weeks--and I suspect that it happens more frequently than that.

Normally I can unwedge git after reading approximately 10 git pages all claiming to fix my problem (they don't but they probably point me at the brain-damaged command I need). Most of the time ... about once every 3 months everything fails and we have to pull a backup.

Fortunately, the devs have now been brainwashed to rsync the repository on at least a daily basis and always before they do anything besides a basic checkout or commit. But, hey, they're using git, and it's what everybody else uses.

As someone who uses Mercurial and does some pretty hairy stuff with it, I have never put the system into a wedged state. NEVER. The only time I saw a Mercurial repo get wedged was when the underlying filesystem barfed on itself.

Yeah, I'm a touch sore about this because I regularly get to fix problems that WOULD NOT EXIST if the team was in Mercurial instead of git. However, as a manager who likes to think of himself as "good", I cannot force tools on a team unless they actually do measurable damage.

I also get sore with the "Well, if you only understood the git model..." I DO understand the git model--much to my horror. The problem is that while a newbie can learn the Mercurial model in about 60 seconds and be productive the git model takes about 60 months to learn and they will still regularly footgun themselves.


So you've repeated your assertion that "people break git" and that fixing it "is hard".

It seems surprising that so many people, familiar with daily use, manage to keep breaking things in your environment, but others here don't.

Have you written bug reports? Or documented the nature of these broken states?

90% of the git users probably just run a handful of commands. ("git status", "git add", "git diff", "git commit", and creating/merging branches). Those commands seem unlikely to cause widespread problems, but perhaps you're doing something else?


Just out of curiosity, which states do your developers put git in? I’ve worked with people that were learning git and the worst they’ve done is committing files with conflicts after a merge or putting in files that should be ignored. Oh, and one guy who force pushed to a repo and made the CI system refuse to pull the changes until I reset the repo there.



Decided or followed the crowd, absolute most of them probably would defend their choice with teeth and nails (seen that lots of times).


Yes, git won.

This does not mean that mercurial is not highly ergonomic, equally powerful, with a very ordered and dynamic codebase, and a sane community.

Edit: and I keep wondering what the panorama would have been like if we had hghub instead of github.


There is, Bitbucket was exclusively Hg before Bitbucket saw Git's popularity. This is my opinion: Bitbucket is like the new Sourceforge. I'd be interested to see a survey on Hg and Git usage today.


IIRC from when I looked ~2 years ago, > 80% of new repositories on bitbucket were using git. (and that's a conservative estimate, because I'm not entirely sure whether it clearly was over 90% or not, but it might (likely) have been).


> I keep wondering what the panorama would have been like if we had HgHub instead of GitHub.

Yeah, I think that the Linux process and jargon both came along for the ride with Git, so it would probably be even more different than you think.


git won because of GitHub and StackOverflow.


> what about their git code?

Could you elaborate?


FB has git repositories on GitHub, for example, React.


You don't necessarily go to github just to use git, but for the community, advertisement, ...

Also facebook is a big company, and like all bigger companies many people in it prefer different stuff.


I am not sure if I wasn't clear.

I have never worked for FB, so I don't know how internally FB handles this. But my understanding is, FB uses Hg for its main repository. However, FB has developed open source projects which most of them are published on Github, so for those development teams they work with Git. The question is, how do they reconcile some dev teams are context switching between hg and git? How common is that?

Personally, I would allow one DVCS only, so my internal development platform doesn't have to support both Git and Hg (who knows, maybe FB also have teams using SVN for good reasons).


People at Facebook who use both internal repos and external ones (like me) need to know both hg and git. It's not too hard; the two provide similar functionality and many of our hg extensions (https://bitbucket.org/facebook/hg-experimental) make hg more git-like in some ways.

Most of our open source projects also use an in-house tool "fbshipit" https://github.com/facebook/fbshipit to automatically export internal commits to open source so if you make changes that happen to touch an open source project but it's not your primary focus then you can work entirely in the internal (hg) repo and your changes get copied out automatically.


Do you go through a process to clean those commits for open source?

It sounds really interesting - I assume the tool is mostly just for marshalling commits (or parts of commits?) that touch external projects, but I would also assume that someone then needs to do the work to get those changes into a publishable state. Organising commits into a consistent set of changes, meaningful commit messages for the target project, removing any vestiges of internal content.

Is there a team who manage all of those aspects, or is it down to the individual developers liaising with the open source projects?


We don't. We err on the side of moving faster here and asking people not to put internal-only info in commit messages that might be exported. We don't reorganize commits since it's usually more valuable to export them as they were developed/committed internally. No one looks at every commit before they get published to our GitHub repos though, other than the regular code review that happens on every internal commit.


Makes a lot of sense.

What happens in the case of a series of commits that belong together? Is there a way of making sure they are applied to the project correctly (say in a branch or something)?

I guess the point is moot if internally you are more-or-less working on the full tree of the external project. In that case, the commits would already be organised however they needed to be.

The alternative would be that the external project is mixed in with an internal one, and you make commits against both in the same tree. Lots of potential edge cases around refactored commits etc, and I would guess the tool assumes you follow a certain set of workflow guidelines (like not force pushing a new set of commits) so those edge cases may not be that important.


External projects that use fbshipit each live in a subdirectory of an internal project. We don't use branches or merge commits, and commits are exported in order.


How does the other side work?

Do third party changes get brought into the external project, and how do they get brought back into the internal project?

For example, would a third party contributor need to rebase their work when pushing in order to make history linear? Do you ever get situations where different internal projects are trying to use fbshipit on the same external project, at the same time?


> Do third party changes get brought into the external project, and how do they get brought back into the internal project?

There's a feature of fbshipit called "importit" that allows for importing an external pull request from GitHub and reversing the path mapping in order to create an internal commit. That's how we land all changes on projects that use fbshipit. The internal repo is the source of truth and an employee always needs to land that commit internally, then it gets synced back out automatically and the PR closed.

> For example, would a third party contributor need to rebase their work when pushing in order to make history linear?

We typically end up squashing each PR into one commit.

> Do you ever get situations where different internal projects are trying to use fbshipit on the same external project, at the same time?

Each external project lives in only one internal repo and that is the source of truth. Occasionally we have scripts to automatically copy files between internal repos when pieces need to be shared but we try to avoid it. We prefer merging repos together and making larger single repos. Most of our developers work in one of two large repos – one for all our web products, and one for all our backend services and mobile apps.


That's really cool, thanks for taking the time to go through the details.

It's always interesting to see how other people manage projects and collaboration.


I'm at FB and switch between the two pretty regularly. Of course I'd prefer to just use one, but I haven't found this is a problem in practice.


Any large code hosting company will allow multiple client protocols for checkin checkout.

You can even use SVN to do github checkouts https://help.github.com/articles/support-for-subversion-clie...


I don't work at facebook. But there is stuff like this:

https://hg-git.github.io/

So in the end it doesn't really matter.


I'm very happy to hear about any and all blows to Python! Especially ones caused by Rust.


Initial reaction to headline was it sounded like (a) no more python, and (b) this is a decided future direction.

Instead, it sounds like this is a proof-of-concept for flipping the main 'hg' command from being python + C extensions, to instead being a rust binary with an embedded python interpreter. Part of the rationale appears to be performance, but also smoothing out cross platform experience, especially on Windows.

Pulling out some related snippets:

-----

While Python is still a critical component of Mercurial and will be for the indefinite future, I'd like Mercurial to pivot away from being pictured as a "Python application" and move towards being a "generic/system application." In other words, Python is just an implementation detail.

-----

Desired End State

hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today). Python code seemlessly calls out to functionality implemented in Rust. Fully self-contained Mercurial distributions are available (Python is an implementation detail / Mercurial sufficiently independent from other Python presence on system)

-----

"Standalone Mercurial" is a generic term given to a distribution of Mercurial that is standalone and has minimal dependencies on the host (typically just the C runtime library). Instead, most of Mercurial's dependencies are included in the distribution. This includes a Python interpreter.

-----

This patch should be considered early alpha and RFC quality.


What are these quotes from? Sounds like it contains some info the original post doesn't.


The submitted article was to the hg wiki.

I also included some quotes from the related code revision in Phabricator:

https://phab.mercurial-scm.org/D1581


... and now looks like the HN headline was improved to be more representative of the content compared to the earlier headline.


what does embed python interpreter mean? are they actually writing a python interpreter using rust so they can write python code and compile to rust?


python has a concept of "extending" and also "embedding". It looks like they are looking at embedding[0], which enables you use the normal CPython interpreter from within another program. (So no, not writing a new Python interpreter in Rust).

Sample snippet from python docs:

-----

So if you are embedding Python, you are providing your own main program. One of the things this main program has to do is initialize the Python interpreter. At the very least, you have to call the function Py_Initialize(). There are optional calls to pass command line arguments to Python. Then later you can call the interpreter from any part of the application.

There are several different ways to call the interpreter: you can pass a string containing Python statements to PyRun_SimpleString(), <...etc..>

-----

[0] https://docs.python.org/3/extending/embedding.html


If interested, you can see their work-in-progress main.rs in the related code revision[0], which includes their Rust code calling down to the C function Py_Initialize() to spin up the now-embedded CPython interpreter that is living "inside" a Rust program:

    unsafe {
        Py_Initialize();
        PySys_SetArgv(args.len() as c_int,
                      argv.as_ptr());
        PyEval_InitThreads();
        let _thread_state = PyEval_SaveThread();
    }
----

[0] https://phab.mercurial-scm.org/D1581#change-t24aVkGEJ5Xh


> what does embed python interpreter mean?

Embedding CPython to script/extend larger applications (much like Lua) is a first-class use case of CPython and well-supported: https://docs.python.org/3/extending/embedding.html.

For instance Civ IV used Python as its scripting language (though I believe Civ V switched to Lua).


Leaving aside the actual project at hand, this is a great example of a well thought out project plan. There is a clear rationale, clear end state, a bunch of known problems to tackle, a front loading of risk, and it delivers incremental value along the way.


I was wondering how serious this was. I don't know a lot about how mercurial is developed, but https://twitter.com/indygreg/status/937527180292014080

https://gregoryszorc.com/work.html

> I am a significant contributor to the Mecurial open source version control system.

> I serve on the Mercurial Steering Committee, which is the governance group for the Mercurial Project. I also have reviewing privileges, allowing me to accept incoming patches for incorporation in the project.

So not sure, but it seems like at least one person on the team is into it?


It looks like in their last sprint meeting (end of Sept[0][1]) there was a lot of planning and talk about moving parts of mercurial to rust. From the history of sprints[2] it sounds like facebook first started experimenting with some mercurial implementations in rust and may be one of the big contributors spearheading this (though I also see indygreg and durin42 around the phabricator projet giving mercurial advice). I'm a fan of both rust and mercurial so this is exciting news to hear.

[0] https://www.mercurial-scm.org/wiki/4.4sprint#Rust

[1] https://public.etherpad-mozilla.org/p/sprint-hg4.4-NOSPAMREM... (remove everything after-and-including the last hyphen, I left it in since it seems like they don't want a direct link that's easily scraped?)

[2] https://www.mercurial-scm.org/wiki/4.0sprint


Augie Fackler seems surprised, but not necessarily negatively, they're a pretty big contributor (top 5~10 by commits).


This is a proposal not a plan. Right now it’s 100% vaporware.


Yeah, totally; I was wondering if it's a proposal that came from the team themselves, or some random person.


There's pretty broad consensus that we'd like to do Rust and not C for future extension work, but the current plan is that a pure-Python hg will also be something we support. The "rust binary that embeds Python" approach looks like a straightforward win on Windows, where we have a native .exe that embeds Python anyway. I'm not sure if that'll make sense on non-Windows, but we'll see.

I've done some poking around with milksnake, which seems extremely promising for writing native speedups.


Neat, thanks!

As always, happy to answer questions and work through questions and problems. Don't hesitate to get in touch!


Do you know what is the current status for Hg on PyPy? (I guess that besides portability that was also a motivation for keeping the pure python bits around.


Last I knew it worked fine, but hg doesn't tend to run long enough for the JIT to warm up enough and be an unambiguous win. Fijal complemented us on how good our C was.

I think it does work if you do chg with a commandserver in pypy.


How about reaching to GPS? You both are employed by Mozilla.


I did not notice that! Maybe I will.


> The nice things we want to do in native code are complicated to implement in C because cross-platform C is hard. The standard library is inadequate compared to modern languages. While modern versions of C++ are nice, we still support Python 2.7 and thus need to build with MSVC 2008 on Windows. It doesn't have any of the nice features that modern versions of C++ have. Things like introducing a thread pool in our current C code would be hard. But with Rust, that support is in the standard library and "just works." Having Rust's standard library is a pretty compelling advantage over C/C++ for any project, not just Mercurial.

Sounds like the main reason for rust is that Python has a weird dep on a fixed MSVC version.


There was also this point that was phrased as a comparison to python but I think works equally as well for choosing rust over c:

> In addition to performance concerns, Python is also hindering us because it is a dynamic programming language. Mercurial is a large project by Python standards. Large projects are harder to maintain. Using a statically typed programming language that finds bugs at compile time will enable us to make wide-sweeping changes more fearlessly. This will improve Mercurial's development velocity.


Which they mention they'd still have to implement workarounds for if they adopt Rust, so I'm not sure I understand that selling point.


Switching to Rust removes a lot of papercuts from dealing with MSVC. These things could be solved in many other ways, so it's not exactly the reason, just something you also get from adopting Rust.

For C programs Windows happens to be the odd one out for lots of things. It's annoying with its unloved C compiler, missing unix-ish headers and tools, string encoding pains, very different packaging story, etc.

So I think the motivation here is they'll solve just that one CRT problem now, and will have fewer Windows problems to worry about later.

I've done that for pngquant. I can build things with Cargo rather than explain to users that `make` may be `mingw32_make`. I can parse options with Rust's getopts, rather than getting bugreports that Visual Studio can't find `getopt.h`. These aren't hard problems. I could have solved all of them, but I don't have to!


It sounds more like they don't want to be stuck using an old version of C++ for writing their fast-path, and would prefer to use Rust, because of the nice features it provides.


For those that don't want to deal with the Python startup time, the Mercurial team already has an attempt at fixing this with a tool called CHg[0]. It is a C binary that interacts with the Mercurial CommandServer[1], which is just a long running version of Mercurial CLI that you can interact with over a pipe.

Using Rust as the primary client would simplify this a lot, but is a lot more work than what CHg accomplished.

[0] https://www.mercurial-scm.org/wiki/CHg

[1] https://www.mercurial-scm.org/wiki/CommandServer


This is already explained in the linked article.


I noticed this after the fact. Thanks for pointing it out.

I had previously looked into understanding how mercurial worked and what options there were for different frontends or backends. (And what it would take to write my own backend just to understand a very basic flow of a clone or pull). It's a lot of work for sure and there mercurial team (at least mpm a few years back) preferred one true implementation.


I suddenly remember FbExperiment on building Mercurial server using Rust https://github.com/facebookexperimental/mononoke

And yes.. it's for extension modules (maybe just the beginning)


Perhaps childish and trite but I do like the use of the word “oxidation” for porting to Rust



"Rewrite everything in Rust, exactly the same way but in Rust it solve all problem." - some Rust programmer


how would they deal with hg extensions written in Python?


By embedding[0] the interpreter and loading the extensions in there.

[0] https://docs.python.org/3/extending/embedding.html


built in python interpreter.


It’s not being rewritten in Rust. They want to use rust instead of C for extension modules.


We've updated the submission title, which was “Mercurial (hg and C extensions) being rewritten in Rust” to that of the article. It's surprisingly easy to mislead, which is one of the reasons we ask submitters to use the original title and not to editorialize.


One of us misread the article. I understood it to say that they want to rewrite the core of the Python code to Rust so that they don't have to wait for the Python interpreter to start every time someone uses the hg command. The Rust binary will be able to call Python code, which is basically a complete reversal of what they have now.


> One of us misread the article. I understood it to say that they want to rewrite the core of the Python code to Rust so that they don't have to wait for the Python interpreter to start every time someone uses the hg command.

It would still use the Python interpreter internally but they want to skip it for some functionality like the basic command line interface. So they would embed the Python interpreter in their new distribution. However the Python code still just calls out to other Rust code.


> However the Python code still just calls out to other Rust code.

this will probably be hard. calling back and forth is always harder than into a single direction.


This is how I understood it as well.


I read it as they want it to be possible for some commands to be written in pure rust, but that they expect to maintain the majority of their current Python code.


If it's already possible to write C extensions, then it's already possible to write Rust extensions, so that would be a non-news.


I think the idea is that in order to avoid Python interpreter startup time on commands that are primarily written in Rust, they also want to have the main function written in Rust, with Python interpreter startup being an optional codepath (that a majority of the codebase still depends on).


> Desired End State: hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today)

I'd say it seems like they want to also flip the ratio of the core around, so it's Rust for the central parts and Python only as needed (whereas today it's Python for the core and C for certain performance critical things).


I'm not sure they want to flip the ratio as much as the architecture. The entry point would be Rust, but the majority of the code would still be Python.


I quoted this elsewhere but this part makes it sound like they are intending to move over large parts of the implementation into rust in the long run, migrating away from python due to the size of the project becoming more difficult to maintain

> In addition to performance concerns, Python is also hindering us because it is a dynamic programming language. Mercurial is a large project by Python standards. Large projects are harder to maintain. Using a statically typed programming language that finds bugs at compile time will enable us to make wide-sweeping changes more fearlessly. This will improve Mercurial's development velocity.


No, they want to flip the relationship of Python and native code; instead of hg being Python with C extensions, they want it to be Rust with an embedded Python interpreter. It's not a total rewrite in Rust, but it's also a more significant change than just replacing C extensions with Rust extensions while maintaining a Python core. Quoting the source material on the “Desired End State”: “hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today)”


From the page:

"Desired End State: hg is a Rust binary that embeds and uses a Python interpreter when appropriate (hg is a Python script today)"


Could we get a mod edit on the title here? The article itself is called "Using Rust in Mercurial".


really wonder why not Go.

original code was python with lots of C. the same can be done with Go, while keeping much of the same philosophy of python.

This change will probably alienate most of the contributors since rust and python or C are worlds apart.


I would wonder why Go if Rust is available.

Rust isn't hard. The only somewhat hard thing in Rust is writing with borrow checker, and honestly, if you want to seriously write in C, you really need to go through that experience and understand it enough to feel somewhat comfortable with it.

And even if that would really be a thing, giving up compiler checks in order to allow more low quality code to be contributed doesn't really seem like a good trade off to me.


Rust is certainly harder than Go. There's no need to dance around the issue, that doesn't help anyone.

Further, Mercurial is already 100% on the train of "giving up compiler checks," though "low quality code" is hardly a fair characterization of why.


In isolation Go is probably easier to write than Rust. However in terms of embedding/interop Rust is certainly much easier.

Last I looked there wasn't a way to pin a pointer in Go and they explicitly forbid passing around opaque handles so you're already constrained in how you can interop.

Secondly you now have a whole nother runtime to hoist up instead of a simple C FFI. It hit this just recently with Python+Rust. Needed to optimize an inner loop. I could just drop down to a single Rust fn, write it and be on my way. No need to spin up a whole runtime just for that inner function.

The parent also hit it on the head. If you're going to be writing C you already need to understand lifetimes deeply, Rust just codifies that in a way that lets you catch it at compile time.


Certainly, those are exactly the reasons I would use to argue for Rust. I just don't like seeing people minimize its learning curve just to make a point.


> If you're going to be writing C you already need to understand lifetimes deeply,

Deeply? Unlike complicated languages like for example GC-languages, C only has 2 lifetimes: block scope for stack variables, forever (until explicitly free'd) for heap malloc'ed memory.

You don't have to twist your head to know in which of N possible ways the data you get from a function was allocated, you don't have to twist your head to know if you need to free it manually, or semi-manually, or hope that it gets freed automatically at some point.

You don't have to care about stack variables, you just have to be organised about heap variables, and that's it. No complicated concept, not many different paradigms. You malloc() something to get some memory when you need it, you free() it when you don't need it any more.

Also you don't have to twist your head to know if you can access some function argument, if it was passed by copy or by reference or by a third mean, does it involve heavy data copying or not, what does it mean concerning access, etc. Everything is passed by value (copy) and you can only pass simple objects (plus structures), for anything else you pass the value of the address of the object, end of story.


> C only has 2 lifetimes: block scope for stack variables, forever (until explicitly free'd) for heap malloc'ed memory.

To the contrary, C has just as many lifetimes as Rust. They're just not explicit. "free() it when you don't need it any more" is a complex problem and C doesn't really help you solve it at all.

> Everything is passed by value (copy) and you can only pass simple objects (plus structures), for anything else you pass the value of the address of the object, end of story.

That's how Rust works as well.


I don't think go is interpreted in the way python is


>Rust is certainly harder than Go. There's no need to dance around the issue, that doesn't help anyone.

It's also faster, much more expressive, and much for flexible (e.g. for loading as a C-compatible lib) so there's that.


I have no idea as to the particulars here. I'm involved with a project that is working with Go code from another language.

Go doesn't allow you to share pointers to Go objects with other languages/runtimes. So whatever else the pros and cons of each, working with Go from another language can be painful.


Correct me if I'm wrong, but I think Go lets you share pointers with other languages/runtimes. I think Go doesn't allow you to pass ownership of the pointer to another language, and you need to make sure the pointer is no longer used by the other language by the time Go cleans it up, but I think you can (for example) pass a Go pointer into C. And assuming I'm right about that point, I think this approach is also roughly what was outlined in the Oxidation Plan (Python won't own resources allocated from Rust and vice versa).


I think Go doesn't allow you to pass ownership of the pointer to another language, and you need to make sure the pointer is no longer used by the other language by the time Go cleans it up, but I think you can (for example) pass a Go pointer into C.

Yes, you can, but there are more constraints. E.g. you are also not permitted to pass a pointer to Go memory that contains Go pointers. There is a detailed description here:

https://golang.org/cmd/cgo/#hdr-Passing_pointers

Also, it used to be the case that cgo calls were relatively expensive (definitely not something you want to do in a hot loop). I am not sure if this is still true, but it was when I was using Go daily for projects.


Rust is quite a lot harder than Go. While lots of practice makes it easier, it's still not trivial to reason about lifetimes/borrowing/moving, unlike a GC language (which isn't to say that GCs are strictly superior to other memory management schemes, mind you). There are lots of good reasons to choose Rust over Go (you have extreme performance or correctness requirements, for example), but "Rust is comparably easy to Go" is untrue (or as untrue as any user-friendliness assertions can be).

EDIT: I don't mean to convey that GCs are still trivial in a mixed-runtime context; only that Rust is generally harder than Go.


Except if you're embedding a runtime you invert the simplicity of a GC.

As soon as GC'd pointers start passing across an FFI boundary you either need to:

1. Pin them and track yourself, which then loses a lot of your advantage of having a GC since you've got non-deterministic seg-faults based on when GC runs and pulls objects out from underneath you. C# lets you do this, but it's really painful to get right.

2. Prevent any GC objects being passed(which is what Go does). This means that there's a lot of things you can't do(esp with respect to callbacks and future events) that require a ton of gymnastics to do something that's trivial in C.


I absolutely agree with this. I didn't mean to convey otherwise.


One reason might be that it seems easier to mix languages with Rust vs. Go. The C interop is pretty sane and Go has a few more gotchas in that department.

The rest of the Rust v Go debate may be similar to the 1000 times people have had it before.


Nim is an even closer match to Python and interop is trivial.


My understanding is that Go would be a fine candidate if you wanted to rewrite the entire codebase in it, but not for writing compiled extensions that Python can dynamically link, and they want one language to do both of those things?


One thing that comes to mind is having to bridge two GC/runtimes in the same binary, where Rust is a drop in replacement for C in that regard (systems programming language)


As the plan seems to be to embed Python in Rust, rather than just writing extension modules in Rust, this shouldn't be as much of a concern.

The tricky part with two runtimes is (1) the system initialization and (2) the GC (if any) being able to control the initial stack frame. If you're just writing extension modules, either problem can be a challenge.

But the first problem goes away if you embed Python, as Python initialization is trivial for the host language when embedding. So does the second problem, as the host language is now in control of the initial stack frame.

The fact that a language is garbage-collected should not matter much; Python uses reference counting as its primary memory management mechanism, with a generational trial deletion approach for cycle colletion (which does not require root scanning). This approach can generally coexist well with a tracing garbage collector (though some challenges remain, but those can be designed around).

That said, Go specifically may have problems (I'm speculating here) due its green threads not being happy if Python does any blocking operations, and it's probably not safe to operate on Python objects concurrently in multiple threads. But that wouldn't necessarily be a problem for other languages.


Go isn't really a huge win for hg, and it doesn't support being used as a shared library, so we couldn't easily use it as a replacement for C for native extensions. main() has to be in Go, and even then you've got two GCs in the mix which is not great.


Using rust for a shared library implementation seems like it would mean a big improvement for mercurial's tooling in the long run. Right now the options are command line invocation or using the python library (effectively puts a requirement on implementation). Though I'm not sure the integrations I'm aware of[0] would benefit from a shared library.

[0] https://secure.phabricator.com/T9548


There are places where C and other places where programming C++ makes more sense; the same is true with Rust and Go.

When it comes to low-level, high performance, no overhead stuff and less unexpected behaviour C and Rust fit better.

Where high implementation speed and fast changing, experimental and highly modular architecture systems are more important C++ and Go are better.

For Mercurial and Git Rust or C makes more sense. Especially if you put experimental and fast changing stuff into a scripting language. IMO.


I'm curious, why is Go+Python easier than Rust+Python?


It is not.

Go alone is easier than Rust. But since Rust has no GC, it is easier to embed, especially with a language with a Garbage collector like Python.


Given a situation where (pre-rust) the usual choice would be to use C or C++, it's totally appropriate to choose Rust. That's one of Rust's express design goals.

Go is intended to make it very easy for junior devs to build concurrent network applications. That's its design goal.

It's not a zero-sum game. Choosing rust doesn't weaken go or vice versa.


Forget Rust and Go, why not ATS? You get C compatibility with dependent types!


I wonder why not Dlang. It has better C/C++ interop than Rust, C parts coukd be used/wrapped before they're rewritten.


Or Swift, or Nim.

Really, There are so many other languages that are both performant and more readable than Rust (I consider Rust readability to be on par with C++).


Outsider guess: they want to move away not only from Python syntax and runtime, but also from Python philosophy.


Last I read, any C calls have to be spun off into their own OS thread since the runtime is unable to inject soft context-switches into foreign code.

With heavy FFI usage, that could become costly...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: