Hacker News new | past | comments | ask | show | jobs | submit login

Fun fact about %n:

Mazda cars used to have a bug where they used printf(str) instead of printf("%s", str) and their media system would crash if you tried to play the "99% Invisible" podcast in them. All because the "% In" was parsed as a "%n" with some extra modifiers. https://99percentinvisible.org/episode/the-roman-mars-mazda-...




"format not a string literal" is one warning I always upgrade to an error. Dear reader: you should do this, too!


I don't like a lot of things in C++, but one thing worth praising in particular is std::format

std::format specifically only works for constant† format strings. Not because they can't make it work with a dynamic format, std::vformat is exactly that, but most of the time you don't want and shouldn't use a dynamic format and the choice to refuse dynamic formats in std::format means fewer people are going to end up shooting themselves in the foot.

Because it requires constant formats, std::format also gets to guarantee compile time errors. Too many or not enough arguments? Program won't build. Wrong types? Program won't build. This shifts some nasty errors hard left.

† Not necessarily a literal, any constant expression, so it just needs to have some concrete value when it's compiled.


Nice, I didn't know about https://wg21.link/P2216 .


Thanks! This prompted me to look up the flag to enable this. For GCC it’s:

  -Werror=format-security


The flag is -Wformat-nonliteral or -Wformat=2. -Wformat-security only includes a weaker variant that will warn if you pass a variable and no arguments to printf.


Why are these not compiler errors by default? Opting in to such important safety features seems like broken design.


One reason is locale-dependent format strings which are loaded from resource files.

Also, in personal projects, I almost always used custom wrapper functions for printf/fprintf/sprintf for various reasons, so that default wouldn’t be of much use, unless maybe I could enable it for the custom functions.


You can with __attribute__((format(printf, 1, 2))) See: https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attrib...


At least for GCC/clang, you can mark your functions with special __attribute__ format.

For loading translated strings, I'm missing some library function to verify whether two format strings are argument-compatible.


I've never seen locale-dependent format strings work well. The translators will change the formatting codes, and you can't change the order of the formatted arguments. You are much better off with some other mechanism for this.

(I have no recommendations. When I've seen this stuff done properly, on the occasions I've managed not to avoid doing it, it's always been using some in-house system.)


> you can't change the order of the formatted arguments.

You can with the $ syntax. Never seen it used though. Maybe it isn't very portable.


It is specified by POSIX, but not by ISO C (or C++). So most Unix(-like) systems support it. But the printf in Microsoft's C runtime doesn't. However, Microsoft does define an alternative printf function which does, printf_p, so `#define printf printf_p` will get past that.

I think the real reason you rarely see it, is it is only used with internationalisation–the idea being if you translate the format string, the translator may need to reorder the parameters for a natural translation, given differences in word order in different languages. However, a lot of software isn't internationalised, or if it is, the internationalisation is in end-user facing text, which nowadays usually ends up in a GUI or web UI, so printf has less to do with it. And the kind of lower-level tools/components for which people still often use C are less likely to be internationalised, since they are targeted at a technical audience who are expected to be able to read some level of English.


printf_p is pretty neat, thanks for the pointer. But I would bet that you will still find at least one %d gets turned into a %s.

I like printf format strings, but as a way of handling localizable strings I don't think they are the best.


You are 100% correct. printf, even with the argument numbering feature, is insufficient for high quality internationalisation.

A good example of this is pluralisation. We've all done things like:

    printf("%d file(s) copied\n", count);
which is acceptable but kind of ugly. Some people want to make it nicer:

    printf("%d file%s copied\n", count, count != 1 ? "s" : "");
Which is fine for English, but doesn't work at all for other languages. The problem is not just that the plural ending is something other than `s` – if it was just that, it wouldn't be too hard. The problem is that the `count != 1` bit only works for English. For example, while 0 is plural in English, in French it is singular. Many other languages are much more complex. The GNU gettext manual has a chapter which goes into this in great detail – https://www.gnu.org/software/gettext/manual/html_node/Plural...

printf() has zero hope of coping with this complexity. gettext provides a special function to handle this, ngettext(), which is passed the number as a separate argument, so it can select which plural form to use. And then the translated message files contain a header defining how many plural forms that language has, and the rules to choose which one to use. And for some languages it is crazy complex. Arabic is the most extreme, for which the manual gives this plural rule:

    Plural-Forms: nplurals=6; \
        plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
        : n%100>=11 ? 4 : 5;


> One reason is locale-dependent format strings which are loaded from resource files.

Aren't those usually resolved to string literals by preprocessor such that the compiler still could emit a warning?


That might be possible if the "resource file" was processed at compile time. I've never seen a C toolchain that did it that way, though - I've only seen them read in at runtime. And at that point, the preprocessor can't save you.


Since the locale is only determined at runtime, and might even change at runtime, the format strings are usually dynamically loaded from text files, and are not in the form of string literals seen by the compiler.


The former is ideally resolved with attribute((format_arg)), the latter with attribute((format)).


In principle, they are not enabled by default because a C compiler must be able to compile standard C by default.

One practical reason I can think of is because not everyone compiles their own code.

You must most definitely look for and enable such flags as they become available in your own projects. (eg I was rooting for -Wlifetime but it did not land for various reasons)

But when you compile other people's code, your breaking your local build doesn't help anyone. Best you can do is to submit a bug report, which may or may not be ignored.


How could that really work? printf is a library function, not an intrinsic. A function named printf can do anything your heart desires.


Depends on the compiler, but you'd mark the printf function with something like: __attribute__((format(printf, 1, 2)))


It's a standard library function meaning the compiler can assume that it follows the standard. Specifically for GCC [0]:

> The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, free, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, realloc, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).

Builtin here doesn't mean that GCC won't ever emit calls to library functions, only that it reserves not to and allows itselfs to make assumptions about how the functions work, including diagnosing misuse.

The library functions themselves might also be marked with __attribute__(format(...)) as the sibling comment notes but that is not necessarily required for GCC to check the format strings.

[0] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


There's more than one compiler.


Fun fact about %n:

The %n functionality also makes printf accidentally Turing-complete even with a well-formed set of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th IOCCC.

- sez wiki.

A not so fun fact:

Because the %n format is inherently insecure, it's disabled by default.

- MSVC reference.


>The %n functionality also makes printf accidentally Turing-complete

No it doesn't. Printf has no way to loop so it's not Turing complete. Even if you did what the IOCCC entry did with putting it into a loop it still wouldn't be Turing complete as it would not have an infinite memory.


Nothing real is "Turing complete" if it requires infinite memory. That's a property only abstract machines can have. In common parlance, something is Turing complete if it can compute arbitrary programs that can be computed with M bits of memory, where M is arbitrary but finite.


So in common parlance finite state automata are Turing complete? That definition doesn't make any sense.



I'd say they're Turing complete (in common parlance) if they can reasonably viewed as a Turing-complete system that's been hobbled with an arbitrary memory limitation. FSAs generally can't be viewed this way as you can't just "add more memory" to an FSA. By way of contrast, consider a pushdown automaton with two stacks. While any physically real implementation of such a device will necessarily have some kind of limit on the size of the stacks, you can easily see how the device would behave if this limit were somehow removed.

It's definitely a bit fuzzy. I'm sure lots of philosophy papers have been written on when exactly it is or isn't appropriate to consider a finite computational system as a finite approximation to a Turing-complete system. In realistic everyday cases, however, it's usually clear enough what should and shouldn't count as such.


>as you can't just "add more memory" to an FSA.

Adding another state to a FSA adds more memory.

There is no difference between a hobbled Turing machine and a FSA. Turing machines aren't a useful concept in the real world and that is okay.


>Adding another state to a FSA adds more memory.

Yes, but it also changes the state transition logic. You can't just 'add 100 more states' to an FSA in the same way that you can 'add 100 more stack slots' to a bounded pushdown automaton.

As I said previously, these are somewhat fuzzy distinctions, and I'm not saying that they're easy to make mathematically precise. They do however seem clear enough in most cases of practical interest. There are many real-world computing systems that would be Turing-complete if they had unbounded memory. There are others that are not Turing-complete for more fundamental reasons than memory limitations. Again, I acknowledge that 'more fundamental' is not a mathematically precise concept.


Well, I'm pretty sure all existing digital computers are finite state automata, so they are not, strictly speaking, Turing complete. But that doesn't make any sense.


Strictly speaking there are essentially no programming languages that are even theoretically Turing-complete because they can only address a bounded amount of memory. For example, in C `sizeof(void*)` must be a well-defined, finite integer. But that definition is not useful in practical use.


There are plenty of Turing complete languages. For example JavaScript is Turing complete.


Even if that was true, any definition of Turing completeness that includes JavaScript and excludes C is worse than useless in practice. It's useless for communication, it's useless for education, it's useless for reasoning about capabilities. There's simply no place for such a definition in a civilized society.


Turing machines themselves are a useless concept in our society. Since C is lower level and tied more to the physical hardware it makes sense that it is not Turing complete because Turing machines are not applicable to the real world. Computers do not work anything like an infinite tape. I've never seen a practical program have to implement a Turing machine.


My laptop with its finite RAM isn't Turing Complete? Wow.


Also iPhones had a RCE uaing a WiFi Name that contained %s https://thehackernews.com/2021/07/turns-out-that-low-risk-io...


%@.


This is one of those annoying little problems that is easily picked up by the vet command (https://pkg.go.dev/cmd/vet) when writing Go code. There are, of course, many linters that do the same thing in C, but it's nice to have an authoritative one built in as part of the official Go toolchain, so everyone's code undergoes the same basic checks.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: