Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why are these not compiler errors by default? Opting in to such important safety features seems like broken design.


One reason is locale-dependent format strings which are loaded from resource files.

Also, in personal projects, I almost always used custom wrapper functions for printf/fprintf/sprintf for various reasons, so that default wouldn’t be of much use, unless maybe I could enable it for the custom functions.


You can with __attribute__((format(printf, 1, 2))) See: https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attrib...


At least for GCC/clang, you can mark your functions with special __attribute__ format.

For loading translated strings, I'm missing some library function to verify whether two format strings are argument-compatible.


I've never seen locale-dependent format strings work well. The translators will change the formatting codes, and you can't change the order of the formatted arguments. You are much better off with some other mechanism for this.

(I have no recommendations. When I've seen this stuff done properly, on the occasions I've managed not to avoid doing it, it's always been using some in-house system.)


> you can't change the order of the formatted arguments.

You can with the $ syntax. Never seen it used though. Maybe it isn't very portable.


It is specified by POSIX, but not by ISO C (or C++). So most Unix(-like) systems support it. But the printf in Microsoft's C runtime doesn't. However, Microsoft does define an alternative printf function which does, printf_p, so `#define printf printf_p` will get past that.

I think the real reason you rarely see it, is it is only used with internationalisation–the idea being if you translate the format string, the translator may need to reorder the parameters for a natural translation, given differences in word order in different languages. However, a lot of software isn't internationalised, or if it is, the internationalisation is in end-user facing text, which nowadays usually ends up in a GUI or web UI, so printf has less to do with it. And the kind of lower-level tools/components for which people still often use C are less likely to be internationalised, since they are targeted at a technical audience who are expected to be able to read some level of English.


printf_p is pretty neat, thanks for the pointer. But I would bet that you will still find at least one %d gets turned into a %s.

I like printf format strings, but as a way of handling localizable strings I don't think they are the best.


You are 100% correct. printf, even with the argument numbering feature, is insufficient for high quality internationalisation.

A good example of this is pluralisation. We've all done things like:

    printf("%d file(s) copied\n", count);
which is acceptable but kind of ugly. Some people want to make it nicer:

    printf("%d file%s copied\n", count, count != 1 ? "s" : "");
Which is fine for English, but doesn't work at all for other languages. The problem is not just that the plural ending is something other than `s` – if it was just that, it wouldn't be too hard. The problem is that the `count != 1` bit only works for English. For example, while 0 is plural in English, in French it is singular. Many other languages are much more complex. The GNU gettext manual has a chapter which goes into this in great detail – https://www.gnu.org/software/gettext/manual/html_node/Plural...

printf() has zero hope of coping with this complexity. gettext provides a special function to handle this, ngettext(), which is passed the number as a separate argument, so it can select which plural form to use. And then the translated message files contain a header defining how many plural forms that language has, and the rules to choose which one to use. And for some languages it is crazy complex. Arabic is the most extreme, for which the manual gives this plural rule:

    Plural-Forms: nplurals=6; \
        plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
        : n%100>=11 ? 4 : 5;


> One reason is locale-dependent format strings which are loaded from resource files.

Aren't those usually resolved to string literals by preprocessor such that the compiler still could emit a warning?


That might be possible if the "resource file" was processed at compile time. I've never seen a C toolchain that did it that way, though - I've only seen them read in at runtime. And at that point, the preprocessor can't save you.


Since the locale is only determined at runtime, and might even change at runtime, the format strings are usually dynamically loaded from text files, and are not in the form of string literals seen by the compiler.


The former is ideally resolved with attribute((format_arg)), the latter with attribute((format)).


In principle, they are not enabled by default because a C compiler must be able to compile standard C by default.

One practical reason I can think of is because not everyone compiles their own code.

You must most definitely look for and enable such flags as they become available in your own projects. (eg I was rooting for -Wlifetime but it did not land for various reasons)

But when you compile other people's code, your breaking your local build doesn't help anyone. Best you can do is to submit a bug report, which may or may not be ignored.


How could that really work? printf is a library function, not an intrinsic. A function named printf can do anything your heart desires.


Depends on the compiler, but you'd mark the printf function with something like: __attribute__((format(printf, 1, 2)))


It's a standard library function meaning the compiler can assume that it follows the standard. Specifically for GCC [0]:

> The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, free, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, realloc, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).

Builtin here doesn't mean that GCC won't ever emit calls to library functions, only that it reserves not to and allows itselfs to make assumptions about how the functions work, including diagnosing misuse.

The library functions themselves might also be marked with __attribute__(format(...)) as the sibling comment notes but that is not necessarily required for GCC to check the format strings.

[0] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


There's more than one compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: