Hacker Newsnew | past | comments | ask | show | jobs | submit | zbraniecki's commentslogin

Here is a proposal for a message resource format on top of MF2.0 - https://github.com/eemeli/message-resource-wg


https://messageformat.unicode.org/

Lmk if you have further questions!


The site behind that link gives answers to only 2 out of 6 question. If your goal was to promote and teach, then you have failed. If your goal was to demoralise the HN readers and grind the conversation to a stop, then you have succeeded.


Definitely the former, apologies for making it confusing.

> What is the equivalent of xgettext.pl

There is no standard one, although people build their own. The general consensus is that source strings should not be inlined into code. The closest analogy is to "style" vs "class" in HTML/CSS - the clean separation of concerns comes from the "id" being the contract.

You can read more about it here: https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...

There are attempts to "merge" those two philosophies, by extracting and "generating" slugs as ids. Examples: - https://formatjs.github.io/docs/getting-started/message-extr... - https://lingui.dev/guides/message-extraction - https://app.studyraid.com/en/read/15768/550728/setting-up-th...

I'm fairly skeptical of this approach.

> the file extension for the main catalog file `.po`

In MF1.0 world, the file format is JSON or XML. You encode id=>Message pairs. In Fluent world there is a Fluent (FTL) file format. In MF2.0 the format itself is, again, message scoped. On top of it there's a proposal by Mozilla to create MessageResource - https://github.com/w3c/i18n-discuss/blob/gh-pages/explainers... and that may feed into DOM L10n - https://github.com/mozilla/explainers/blob/main/dom-localiza...

> the __ function?

see the (1) and links to "generated ids".

> How does gender work (small example)?

MF 1.0:

``` {GENDER, select, male {He answered} female {She answered} other {They answered} } ```

Fluent: ``` user-answered = { $gender -> [male] He answered. [female] She answered. *[other] They answered. } ```

> How does layering pt_BR on pt_PT work?

MF does not prescribe fallback behavior. It also more popular to treat each locale as "complete" and fill "gaps" at build time. So at runtime you have `pt-BR` which has pt-BR strings and missing ones "completed" from `pt` (parent locale).

Fluent has a "resource manager" (simple one like this: https://github.com/projectfluent/fluent-rs/tree/main/fluent-... or more complex like Mozilla L10nRegistry), which can fallback at runtime, allowing for what we call "partial locales" which can roll out to production with gaps and the resource manager will fetch the fallback strings from the parent locale.

> What is a compelling reason to switch?

If you and your users are happy with gettext, none!

If either of those groups complain, there may be many: - https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...

Hope that helps!


Thank you, this was a good answer and it provided the necessary insight. We will include MessageFormat resp. its ecosystem into reevaluating which l10n system we should use at the next upcoming opportunity in the hopes that the missing parts will have arrived by then.


ICU4C and ICU4J have implementations. We also have a JS polyfill and will be working on ICU4X impl this quarter.


Yep. Mozilla is planning an auto converter from Fluent to MF2.0 once we stabilize it.


It is great to hear a confirmation, though the core of the question was more about when is that roughly forecast to happen rather than if. :)


We are targeting MF2.0 for inclusion in JavaEcript stdlib (ECMA-402). And later maybe with its own format into DOM for DOM L10n.


Correct. MF2.0 addresses all the challenges we identified during design of Fluent.


No, gettext scales very badly, both vertically (larger systems) and horizontally (locales with rich grammatical forms like declensions etc.)

We (authors of Fluent and collaborators on MessageFormat 2.0) wrote this explainer which you may find informative - https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...


Thanks, I'm a decades-long user of gettext from both developer and translator point of view, and have encountered several of the drawbacks to some extent.

It's very good, and has certainly been good enough for most practical purposes, but innovation needs to happen, and things can certainly get better. Thanks for your work in this direction!


Hi! Thank you for your critique!

> 1) “Your knight has killed a dragon with a crossbow”

We have a proposal for dynamic references to address this problem - https://github.com/projectfluent/fluent/issues/80 - it's non-trivial but I hope we'll see it solved in Fluent and/or in MessageFormat 2.

> 2) The parser is extremely sensitive

True. It's on purpose. We wanted to start with strict and loosen, rather than the opposite.

> 3) The input files mandate a weird arrangement of new lines for even the simplest branching

Same as above.

> 4) The documentation is too Spartan to know what happens in edge cases.

We're a small team :)

> It heralds itself to be the saviour of all i18n, but it’s literally worse than the mess that came before it.

I'm sorry to hear it doesn't work for you. I'm relieved that your criticism is seems more subjective except of one missing feature that no other l10n system has as of yet. We'll keep pushing, but if you encounter a better l10n system, please let me know! We're working on Unicode MessageFormat 2.0 based on Fluent and incorporating lessons learned.


I wish you guys the best, but I think you’re being a little self-congratulatory here.

The first feature is not optional - it has been a feature of i18n systems since the 1990s, possibly earlier. I’ve seen cludged-together in-house solutions that can do it without breaking a sweat. It is currently not feasible to use Fluent to localise any substantive, dynamic content in languages with case or gender - which is the main challenge an i18n package exists to solve. (I note the issue you link is five years old, dismisses the problem as not significant, and flat out states it is not being worked on.)

Translation files are generally made by translators, not programmers, and the fact that Fluent falls over in a slight breeze makes it difficult to imagine a translator being able to produce working Fluent files. This is not a ”subjective” problem. Translators do not, and should not, work for free. Using Fluent adds considerable (and needless!) complexity and therefore expense.

As you point out, you’re working on a new data format, so it’s unclear why anyone should adopt (and pay for translations in) the current moribund format.

I genuinely do wish you guys the best, and I apologise if I spoke too bluntly above, but it is not merely a matter of personal opinion that Fluent is de facto still in alpha.


We're happy to announce ICU4X 1.2, containing a host of new features with a focus on text engines. The new ML-powered break iterators and HarfBuzz bindings enable developers to perform text layout on many platforms and resource-constrained environments.


Hi, just for context - this was a comparison of `unicode-normalizer` crate to ICU4C.

Since then @hsivonen from Mozilla wrote a new normalizer that recently got merged into ICU4X - https://github.com/unicode-org/icu4x/tree/main/components/no...

I don't have perf numbers yet but I suspect it to be perf comparable to ICU4C at least.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: