Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's still surprising to have any generated things there. E.g. you could make the same case for keeping built binaries in Git as well.

Is there a reason why that type of file couldn't be better place into an artifact repository, or just generated and consumed in CI as part of generating a final build output?




> It's still surprising to have any generated things there. E.g. you could make the same case for keeping built binaries in Git as well.

This is not surprising at all. In fact, it's quite standard to commit string translations. Just because you can run the code generation/string replacement step as part of the build that does not mean it's a good idea to generate everything from scratch at every single build.

String translations hardly change once they are introduced, running the build step takes significant amounts of time, and if anything fails then your product can break in critical and hard to notice ways.


I'm not saying don't have the translations at all. I'm saying: 1) caching things in git in general is a bad idea; why is it not in this case? 2) these are not - to my understanding - the raw resource files, but rather machine-generated intermediate files. This is why it's about caching, rather than minimal source files.

Additionally, to respond to your comment, if string translations don't change much then it may be possible to push them out as an internal 3rd-party library, and then they're even quicker to build.


> I'm not saying don't have the translations at all. I'm saying: 1) caching things in git in general is a bad idea (...)

You're missing the point. Storing translated files is caching things in git, and it is not a bad idea. It's a standard practice that saves your neck.

You either place faith on a build step working deterministically when it was not designed to work like that, or you track your generated files in your version control system.

If you decide to put faith on your ability to run deterministic builds with a potentially non-deterministic system, you waste minutes with each build regenerating files that you could very well have checked out and in the process risk sneaking in hard to track bugs. Then you need to have internationalization test steps for each localization running as part of your integration tests to verify if your build worked, which consume even more resources.

Or... you stash them in git?

You use git to track changes, regardless of where they came from. Just because you place faith in some build step to always work deterministically that does not mean you are following a good practice and everyone else around you is wrong.


> You either place faith on a build step working deterministically when it was not designed to work like that

I'm sorry, what? Why would a build not work deterministically?

> If you decide to put faith on your ability to run deterministic builds with a potentially non-deterministic system

If your build is non-deterministic, how can you have any faith in the binaries it produces? You would have much larger problems in that case.

> You use git to track changes, regardless of where they came from

You probably don't want to do that if it is 70% of your codebase and slows down all your developer's git.

> Then you need to have internationalization test steps for each localization running as part of your integration tests to verify if your build worked

I'm convinced you've never used a build system before. Your build should fail if required files are missing. Downloading translation files at build time from some artefact repository vs storing them in git is how a lot of companies do it.


> I'm sorry, what? Why would a build not work deterministically?

Because they don't and never did?

Do you understand build systems and individual tools were not designed to ensure deterministic behavior?

https://reproducible-builds.org/docs/deterministic-build-sys...

Anyone with any professional experience developing software can tell you countless war stories involving bugs that popped up when building the exact same project separate times. What leads you to believe that translations are any different? In fact, more often than not we see unexpected changes during translation update steps.

> If your build is non-deterministic, how can you have any faith in the binaries it produces?

First of all, all builds are not deterministic by default.

To start to come close to get a deterministic build, you need to do all your own legwork after doing all your homework.

Did you ever did any sort of this work? You didn't, didn't you? You're not looking and are instead just placing blind faith on stuff continuing to work by coincidence, aren't you?

> You probably don't want to do that (...)

Yes, I do. Anyone with their head on their shoulders wants to do that. It's either that or waste time tracking bugs that you allowed to go to production. Do you want to waste your time hunting down easily avoidable and hard to track bugs? Most of the professional world doesn't.


It is definitely possible to have determinism in a CI build step, and it's possible to have checks for it. If one needs determinism and a cache, they can store the files on S3 or some other place instead of git. Re-generating the files every time on the build isn't the only alternative. Instead of generate-and-commit, generate and upload. The difficulty is the same for developers.

If one has to be more granular than that, and have versioning and verification against the repository, they can still store the multiple versions on another service and store the hashes on git. Even though I'm not a fan of this for translation (especially if you have lots of languages/lots of strings), since there's an advantage of decoupling the translation process from the development process.

The problem with storing those files on git is that it can cause more problems, including developer experience issues.

It depends on how much you're storing on git. Some CSS files? Fine. 70% of files of the project, like in this case, slowing down everyone's workflow? Definitely not.


> Just because you place faith in some build step to always work deterministically that does not mean you are following a good practice and everyone else around you is wrong.

You're also doing that everywhere else. How do you think anything works? Why do you think Git is deterministic somehow? Why more so than including some files in a build?


Just an example, I had the non-deterministic case using JAXB to generate java classes from XSD Schema files. Running an ANT jaxb task to generate the classes from the same schema files would generate different class files each time. The class files were functionally the same, however it would reorder methods, the order of the variable definitions etc. Possibly due to some internal code using a Map vs List, so order was not guaranteed. In our case the schema files were in Source Control, the Java/Class files were not, the Java/Class files were generated by the build, packaged to a jar and published to our artifact repository.


Is there a reason why that type of file couldn't be better place into an artifact repository, or just generated and consumed in CI as part of generating a final build output?

No reason at all, but when you need the files during development, and testing, and CI, and in production, and you don't want those things to fail when your artefact repo or source of data is down, then putting the latest versions in git makes sense.

The cost of having them in the repo is a tiny bit more complexity in your git workflow and config. The benefit is being able to access those files everywhere you access the code. It seems like a no-brainer to me.


> place into an artifact repository

This adds yet another moving part to the system, and another place things can go wrong.

> generated and consumed in CI as part of generating a final build output

This can get quite slow, and on larger projects you have to expend a lot of effort to keep build times reasonable.

Also, if you're serving a library for public consumption, you generally don't want to add the burden of extra build steps for the user to follow before they can use it. If it can all be automated to the point of invisibility to the user that's fine, but often it can't.


author here, xlf files are translations that are coupled with the texts we set in the code so they're not really generated I admit that was misleading. What I wanted to get across is they're not touched directly by engineers but they're still created through our translation pipeline where real humans translate them




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: