shrug It doesn't fall over. I've done it, the openBSD team has done it. DJB has done it. Maybe something is wrong with your implementation that I can help you with?
OpenBSD takes a fairly minimalist approach, which is vaguely described here: http://www.freebsdforums.org/forums/showthread.php?threadid=... They basically replace the unsafe functions with things that are easier to use. Their idea is that it isn't the format of the C-string that causes security issues (null-terminated string), it's the poorly defined functions (with weird corner cases that are hard to get right). It's worked well for their use cases.
DJB did something similar in qmail, I don't recall the details but you can look at the source code as easily as I can, and it eliminated security problems.
When I'm working in Java, I find that most of my string parsing uses the split() function. This is a pain in C, because even if you had a split() function you'd need to deal with memory allocations. Most of these are solved with a memory pool. In my own library, I also added runtime, grammar-based parsing functionality. So to parse a CSV line you might do something like this:
char *g = " S -> WORD | WORD , S;"
"WORD -> [^,]";
results = parsegram(g, inputString);
Grammar parsing + memory pools makes string parsing in C easier than in Java. The biggest difficulty with this kind of library is to do it right, you need to be something of a unicode expert, and that's tough.
Here's roughly what that would look like using Bernstein's C string library (which was not only used in qmail).
#include "stralloc.h"
...
static stralloc s, t;
...
if (!stralloc_ready(&s, 0)) die_nomem();
if (!stralloc_copys(&t, "hello")) die_nomem();
if (!stralloc_copy(&t, &s)) die_nomem();
if (!stralloc_cat(&t, &s)) die_nomem();
if (!stralloc_copy(&t, &s)) die_nomem();
if (!stralloc_cat(&t, &s)) die_nomem();
if (!stralloc_cat(&t, &s)) die_nomem();
if (!stralloc_copys(&t, "hello")) die_nomem();
if (!stralloc_cat(&t, &s)) die_nomem();
if (!stralloc_copy(&t, &s)) die_nomem();
if (!stralloc_cats(&t, "hello")) die_nomem();
if (!stralloc_copys(&t, "hello")) die_nomem();
if (!stralloc_cats(&t, "world")) die_nomem();
Yes, that does work. But it's not without problems, not the least of which it's just not attractive to look at. For example, concatenating "hello" and "world" allocates memory, when it should instead give you a "helloworld" string literal. In fact, simply initializing `s` with a string literal needlessly allocates memory, and that's anti-ethical to performance. Calling die_nomem() leaks memory if it does anything but terminate the program. All those tests for memory exhaustion are tedious.
> Even such a simple use case is fraught with major problems:
>
> 1. who allocates needed memory?
>
> 2. who free's it?
That's also a major feature. It allows people to write systems that are resilient in the face of tight memory limitations. It's not cool when a language forces string operations to allocate & duplicate memory willy-nilly.
> 3. can the compiler constant fold cat("hello","world") ? Does the result wind up allocating memory anyway?
I fail to see how that's a major problem. Why are you concatenating string literals? How common is that?
> 4. what about the lack of function overloading to handle the permutations?
I consider lack of overloading to be a feature. Overloading is one of the things that are way too easily abused, and it makes code auditing harder than it needs to be. Please just type out the different function names so I can see exactly what is going to be called when I read the code. Or use the sprintf family of variadic functions.
It's the opposite. I've seen lots of code written in C that pretends to be out of memory safe. I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.
With a GC and exceptions you can theoretically be quite resistant to OOM conditions, not that anyone really cares.
> I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.
sqlite takes care to correctly deal with out of memory conditions. It has explicit tests for that code too. See section 3.1, Out-Of-Memory Testing, of [1].
Now I found my first program that actually tests it properly :)
I knew you had to systematically drive the code through every OOM codepath to even have a shot at doing that in an unmanaged language. Sadly a lot of C code is written by people who think:
if ((ptr = malloc(sizeof(struct foo))) == null)
return -1;
One of the things with tight memory systems is that you don't use malloc to begin with, if you can avoid it. C gives you the option.
When you're concatenating strings, you already have storage for those strings. Maybe you can re-use that storage. Maybe you have a static buffer. Maybe you have a fixed size buffer on the stack and the stack use is bounded.
A language that forces you into making redundant duplicates onto the heap is terrible in these situations.
And yes there are programs that try to deal with failing mallocs. Again, C gives you the option.
Very, very few C programs can handle running out of disk space. This includes the operating system(s). Get close to filling up the disk, and try various things.
Just recently, I was having a lot of trouble with Windows Update hanging. I finally noticed that free disk space was low. Freed up more space, and WU started working again.
For fun, try:
#include <stdio.h>
int main() { printf("hello world\n"); return 0; }
and redirect stdout to a file on a device that is full. Amazingly, it succeeds!
I assume you're referring to OpenBSD here, they didn't use snprintf(). They used asnprintf(), which solves the problem of who should allocate (but not who should free).
"That means that we have been going through the tree cleaning out all calls to sprintf(), strcpy(), and strcat(). Instead, these things are being rewritten to use asprintf(), snprintf(), strlcpy(), and strlcat()."
These functions will take care of buffer-size checking, and reallocation if necessary. For cases where you need to interface with pre-existing libraries, you can return a cstring(). Make it a function/macro to enable you to change the struct definition in the future:
#define ktCstr(x) (x)->str
then you can pass it into write() or whatever you need:
... and end up with silent truncation unless you happen to always remember to use only C library functions with explicit length arguments (and which do not assume NUL-terminated strings).
Look, I get that there is a place for C, but string manipulation is absurdly bad and error-prone.
Hi! I can't imagine how you understood what I wrote. I specifically said to not use those C library string functions.
I fully admitted that string manipulation is absurdly bad and error-prone, then built on that by showing a way to make it better. Use ktStrcat() instead of strCat(), then you don't have to worry about truncation. Use ktSprintf() instead of snprintf(), then you don't have to worry about truncation. I wish you had understood.
Yes, I agree. If everyone would just avoid those C stdlib functions everything would be peachy. :)
I was agreeing with you, but just adding caveats. :)
Well, except... some problems surface when interfacing with "things" (libraries, OS'es) written by other people... and there's no escaping those problems, fundamentally. It's C.
Of course UTF-8 was invented with the express purpose of being "C-compatible", but... what happens if you have a string with a NUL in it and you pass that to the POSIX (I think?) printf function as an argument for a "%s" format string? Well, it gets truncated. Did you mean for that to happen, or didn't you? Who knows? That's the problem.
Honestly, I'm not trying to win "internet points" or something. It's just that C, as I'm trying to point out, is a bad language for almost everything that's required for a "user-facing" languages these days. Write the thing in C#, Java, O'Caml, Qt[1], or Haskell, or whatever... but please don't think you need to write in a sort of weird approximiation of the old PDP.
[1] Yeah, yeah, not a language, but it's at least an ecosystem that seems to be moderately successful.