Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"format not a string literal" is one warning I always upgrade to an error. Dear reader: you should do this, too!


I don't like a lot of things in C++, but one thing worth praising in particular is std::format

std::format specifically only works for constant† format strings. Not because they can't make it work with a dynamic format, std::vformat is exactly that, but most of the time you don't want and shouldn't use a dynamic format and the choice to refuse dynamic formats in std::format means fewer people are going to end up shooting themselves in the foot.

Because it requires constant formats, std::format also gets to guarantee compile time errors. Too many or not enough arguments? Program won't build. Wrong types? Program won't build. This shifts some nasty errors hard left.

† Not necessarily a literal, any constant expression, so it just needs to have some concrete value when it's compiled.


Nice, I didn't know about https://wg21.link/P2216 .


Thanks! This prompted me to look up the flag to enable this. For GCC it’s:

  -Werror=format-security


The flag is -Wformat-nonliteral or -Wformat=2. -Wformat-security only includes a weaker variant that will warn if you pass a variable and no arguments to printf.


Why are these not compiler errors by default? Opting in to such important safety features seems like broken design.


One reason is locale-dependent format strings which are loaded from resource files.

Also, in personal projects, I almost always used custom wrapper functions for printf/fprintf/sprintf for various reasons, so that default wouldn’t be of much use, unless maybe I could enable it for the custom functions.


You can with __attribute__((format(printf, 1, 2))) See: https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attrib...


At least for GCC/clang, you can mark your functions with special __attribute__ format.

For loading translated strings, I'm missing some library function to verify whether two format strings are argument-compatible.


I've never seen locale-dependent format strings work well. The translators will change the formatting codes, and you can't change the order of the formatted arguments. You are much better off with some other mechanism for this.

(I have no recommendations. When I've seen this stuff done properly, on the occasions I've managed not to avoid doing it, it's always been using some in-house system.)


> you can't change the order of the formatted arguments.

You can with the $ syntax. Never seen it used though. Maybe it isn't very portable.


It is specified by POSIX, but not by ISO C (or C++). So most Unix(-like) systems support it. But the printf in Microsoft's C runtime doesn't. However, Microsoft does define an alternative printf function which does, printf_p, so `#define printf printf_p` will get past that.

I think the real reason you rarely see it, is it is only used with internationalisation–the idea being if you translate the format string, the translator may need to reorder the parameters for a natural translation, given differences in word order in different languages. However, a lot of software isn't internationalised, or if it is, the internationalisation is in end-user facing text, which nowadays usually ends up in a GUI or web UI, so printf has less to do with it. And the kind of lower-level tools/components for which people still often use C are less likely to be internationalised, since they are targeted at a technical audience who are expected to be able to read some level of English.


printf_p is pretty neat, thanks for the pointer. But I would bet that you will still find at least one %d gets turned into a %s.

I like printf format strings, but as a way of handling localizable strings I don't think they are the best.


You are 100% correct. printf, even with the argument numbering feature, is insufficient for high quality internationalisation.

A good example of this is pluralisation. We've all done things like:

    printf("%d file(s) copied\n", count);
which is acceptable but kind of ugly. Some people want to make it nicer:

    printf("%d file%s copied\n", count, count != 1 ? "s" : "");
Which is fine for English, but doesn't work at all for other languages. The problem is not just that the plural ending is something other than `s` – if it was just that, it wouldn't be too hard. The problem is that the `count != 1` bit only works for English. For example, while 0 is plural in English, in French it is singular. Many other languages are much more complex. The GNU gettext manual has a chapter which goes into this in great detail – https://www.gnu.org/software/gettext/manual/html_node/Plural...

printf() has zero hope of coping with this complexity. gettext provides a special function to handle this, ngettext(), which is passed the number as a separate argument, so it can select which plural form to use. And then the translated message files contain a header defining how many plural forms that language has, and the rules to choose which one to use. And for some languages it is crazy complex. Arabic is the most extreme, for which the manual gives this plural rule:

    Plural-Forms: nplurals=6; \
        plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
        : n%100>=11 ? 4 : 5;


> One reason is locale-dependent format strings which are loaded from resource files.

Aren't those usually resolved to string literals by preprocessor such that the compiler still could emit a warning?


That might be possible if the "resource file" was processed at compile time. I've never seen a C toolchain that did it that way, though - I've only seen them read in at runtime. And at that point, the preprocessor can't save you.


Since the locale is only determined at runtime, and might even change at runtime, the format strings are usually dynamically loaded from text files, and are not in the form of string literals seen by the compiler.


The former is ideally resolved with attribute((format_arg)), the latter with attribute((format)).


In principle, they are not enabled by default because a C compiler must be able to compile standard C by default.

One practical reason I can think of is because not everyone compiles their own code.

You must most definitely look for and enable such flags as they become available in your own projects. (eg I was rooting for -Wlifetime but it did not land for various reasons)

But when you compile other people's code, your breaking your local build doesn't help anyone. Best you can do is to submit a bug report, which may or may not be ignored.


How could that really work? printf is a library function, not an intrinsic. A function named printf can do anything your heart desires.


Depends on the compiler, but you'd mark the printf function with something like: __attribute__((format(printf, 1, 2)))


It's a standard library function meaning the compiler can assume that it follows the standard. Specifically for GCC [0]:

> The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, free, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, realloc, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).

Builtin here doesn't mean that GCC won't ever emit calls to library functions, only that it reserves not to and allows itselfs to make assumptions about how the functions work, including diagnosing misuse.

The library functions themselves might also be marked with __attribute__(format(...)) as the sibling comment notes but that is not necessarily required for GCC to check the format strings.

[0] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


There's more than one compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: