Forget the typo. This is the slowest, most naive implementation of memset possible.
Is this actual shipping code? I've followed the changeset up to the top and it looks like it's in platform/bootable/bootloader/legacy. Hopefully this isn't actually called anywhere, but the review date says May 13th.
What's the point of optimizing the Java VM when functions like memset, malloc, memcmp, memcpy, memchr, etc, etc haven't even been optimized for the platform.
EDIT: I understand that 95% of your time will be writing in a language like Ruby, Python, Javascript or a C based language. But dear god, please add Agner Fog's Optimization Manuals to your reading list.
http://www.agner.org/optimize/
Because in boot code you generally don't care about speed, but size is a very important factor. I've personally had to rewrite standard library functions to be intentionally small and stupid just so a piece of boot code could fit into ROM. It's not fun, but there is a certain satisfaction when you rewrite putchar and think, I just saved 10k of code!
Practicality. Generally your stdlib is part of your cross compiler install. So now you'd need 2 versions of your stdlib--the small version and the speedy version. So now you aren't using -lc (which is usually built-in to the linker) foro your boot code but you are for your regular firmware. Now your makefile is more complex. And you are running a modified std library so it's a pain to upgrade. Blah blah blah the list goes on.
Of course you could do all that, but it's so much easier to just override the 5 functions you actually use in your code. Because of the way linkers work and the way standard libraries are designed, if you define your own version of a library function then the linker won't pull in the library version--effectively choosing your local version over the one in the library. That keeps the standard library clean of patches and keeps the small, tight code near the project that actually needs it.
Compilers have long ago memorized the code for memset (and memcopy and ...) and substitute optimized algorithms. As part of faking their benchmark stats for marketing purposes. So no worries, this code should result in kick-butt optimized assembler. Except for the obvious bug.
Optimized for what? Speed? While I can picture that happening, it has been my experience that this is not the case. The memory squishing optimizations I'm talking about require dumping the assembly output and I've never seen my stupid memcpy() routine suddenly turn into optimized-for-speed code. I'd be pretty upset if that were to happen...
Given that it is in "platform/bootable/bootloader/legacy", I'm not sure there is much point getting all worked up about performance -- presumably something much more efficient would be used for the vast majority of cases.
I haven't updated my source tree in a long time, but the ARM devices were using bionic/libc/arch-arm/bionic/memset.S. This is pure ARM assembly optimized for the pipeline and cache.
My ARM assembly is a little rusty, but it looks like they implement memset and bzero, the latter just calling memset with R1=0.
Don't underestimate the compiler optimizations. GCC 4.3.1 on i386 with -O3 generates pretty involved optimized code that actually stores whole aligned 32bit words. (and I strongly suspect that this generated code is faster than rep stosb).
If I understand hoelle's point, it is that you shouldn't need to fill the buffer with a constant value in the first place. Why do people do it? The only time you generally need to prefill something is if you're using some algorithm whose initial state includes a prefilled array. It's hard to find a place where a byte array needs to be memset.
If you need an array zeroed on first use, why didn't you use calloc? The only reason memset has for existing is when you need to reinitialize an existing array to zero. If things are so tight that you're wiping out an existing array and reusing it, which usually involves being deliberately less abstract than you'd naturally be, then, well, that's a pretty unusual situation.
(While we're on memset, I'd like to point out that the weirdest thing is that while memset initializes value byte by byte, the places where you do need a prefilled array are always places where the array is an array of larger values.)
Sometimes you manage your own memory, so you can't use calloc. Sometimes you're zeroing out stack allocated memory, such as the common practice when doing socket programming in C.
Setting memory you own to a known value is just good, defensive programming. If I screw up - and in C, you're going to screw up - it's good to see a known value rather than unknown values.
I took it as saying you should remember how much of the buffer you have actually used. If you only read what you write, it doesn't matter whether the rest of the buffer is filled with \0 or not, so you can skip the memset.
With todays CPU, particularly with branch prediction, and cache, loop unrolling is not always better.
In fact I bet the performance would be worse, especially because you have to worry about setting areas of memory that are not a multiple of the loop unroll size. (I guess you could use duff's device.)
I actually tested it while ignoring that, and the unrolled version took 34.551 seconds vs 34.239 for the regular version (but those number are meaningless since the variation in time between runs is greater than the difference in runtime between versions).
I would imagine writing the native register size bytes per iteration would be a win, with a little cleanup for the remaining bytes. For a 32-bit architecture, you'd typically be doing 1/4 the iterations.
Far more involved than that. There are alignment issues. Its several machine cycles faster to do aligned store of a large scalar e.g. 32-bit.
So a responsible implementation would do a head check for alignment, store 1,2 or 3 bytes efficiently, then do aligned stores of 32-bit values (or 64-bit or whatever native size is appropriate), then a tail check for the small change. OR better yet switch(p & 3) and handle each case with unrolled code.
Many modern platforms contain some logic, that can be (ab)used to do memset() directly in hardware without processor involvement (ie. in parallel with other code). But this tends to require so much setup and IO overhead so it simply isn't worthwhile to do.
Integrating memset/bzero logic into memory array itself will increase it's price drastically (but in some special cases it is done, generally when memory array already contains some other expensive non-memory logic).
Aren't there are processor-specific assembler versions of memset for common chips? There are usually assembler versions (for example using REP MOVS on Intel which is light years faster) or even unrolled C versions (typically setting 8 or 16 bytes per iteration) that are used. I suspect this might be "fallback code" that doesn't actually get executed on any chips that anybody really uses.
For this case, the required x86 opcode could be "rep stosX" (just write), and not "rep movsX" (copy). Anyway, on x86 processors, "rep movsX" was faster from 8086 to 80386, but since the 80486, it is faster to do assembly unrolling for reducing the jump penalty (as the "rep movsX" doesn't do unrolling, being its conditional jump a penalty). On CPUs with prefetch hardware support -e.g. SSE2-, you could still speed it up a bit more, by fetching/ordering future cache line before needed, avoiding pipeline stalls.
The shown example, that just writes to RAM, is probably just fast enough (it is harder to optimize read cases), although a bit of unrolling could reduce jump penalty impact (specially on non superscalar or in in-order superscalar CPUs -typical ARM included on handheld devices-).
While we're jumping down the architecture pedant rabbit hole, a simple loop like that will be trivially predicted, so the branch will be basically free. In addition, hardware prefetchers do a much better job at predicting linear memory access than manual prefetch instructions. On Core 2, iirc, if you have 2 or 3 L2 misses at fixed offsets either direction from each other, the hardware will automatically begin prefetching memory so it's there when you need. The problem with manual prefetch instructions is there high latency. They're best for hinting to the processor that you're about to make an unpredictable load.
The thing that shocks me about this bug is that it should have been caught when the compiler issued a warning about an unused function parameter. Either the warning was not enabled, or the programmer just ignored it; either of those options indicates a pretty severe lack of professionalism.
"Pretty severe" lack of professionalism? It's not unprofessional not to turn the warnings all the way up and sift through each and every one. Some of us work code inherited from other people with different coding guidelines, and some of the warnings will generate loads of messages for code that is not only technically correct but stylistically correct as well. The unused parameter warning is one of the biggest offenders in this area. And if you have a hundred warnings for unused parameters, which ones do you pay attention to?
Another potential issue is that the code in the patch might almost never be called in the first place. Most libc implementations (three that I know of, where I've examined the "memset" source) have a "memset" like that as a fallback only for platforms without an assembly version.
I'd say the problem is that there are several different ways this error could have been detected sooner: warnings, test, static analysis tools, and review, but none of them worked. But again, it's also possible that the code is never called because assembly versions are always available.
| The unused parameter warning is one of the biggest offenders in this area. And if you have a hundred warnings for unused parameters, which ones do you pay attention to?
Go look at Google code search, which includes vast quantities of code written by professional teams and (often later) open sourced. Very little of it casts expressions to (void). When I started at Arbor, I got rid of all the (void) casts. What an eyesore.
Very little C code is const-correct either. Most of the C devs I know make fun of const-correctness.
It's only an eyesore if you don't acknowledge the benefits that strict warnings can bring to the table.
| Most of the C devs I know make fun of const-correctness.
I for one don't mind putting in a tiny bit of upfront effort so that my compiler can double-check my work. Do the C devs you know also make fun of assert?
If assertions disappear in certain optimized builds then what do you do, or recommend, to deal with warnings related to unused parameters or values if you don't believe that casting to void has value?
I don't understand how the question you ask even follows from the premise you forwarded. The fact that memset is probably always dead code here is why it's not really a bug. The fact that warnings about unused return values are useless 100-1000 times for every time they're potentially useful is why I don't litter my code with pointless casts.
If what you say is true, then you have to either put up with the warnings, compile at a level that doesn't have the warnings, or figure out a hack around them - I don't like any of those options.
I'm not as dogmatic as some others are in the thread. I just think it's a good idea to be warned when a variable is unused, enough that I'll mark my very few that might be #ifdef'ed out. If you're using a compiler that doesn't let you mark variables as such... well that sucks. You do what you can.
So what if the code doesn't get called? That doesn't prevent it from being built. Considering that memset() is part of libc, which is a shared library, presumably it's going to get compiled regardless of whether it's called.
As a point of pride, none of them were created by me, but it is what it is. Some of them come from since deprecated functions and nobody wants to go back and mess with old code; some of them come from sloppiness; some of them come from java's collection handling.
And yet this codebase gets a lot of work accomplished anyway.
The "unused parameter" error is one of the most useless of C compiler errors. There are dozens of perfectly valid reasons why a parameter would go unused, so the occasional error just gets lost in the noise.
It's more helpful in C++, which can omit parameter names to indicate that it's intentionally unused. But then you're using C++; not worth the tradeoff.
It's not hard to type a statement like "(void)v;" to explicitly say, "v is intentionally unused." I fail to see the drawback of this approach. Of course you should only do this when (1) you can't remove the parameter from the function's prototype and (2) there is a good reason for it to be unused. But once you develop that habit, the compiler can help you catch bugs that are otherwise easy to miss. Sounds like a worthwhile tradeoff to me.
Then you get a "statement with no effect" warning, which is usually more serious. I'll take a bogus warning in real code over a bogus warning in contrived code any day.
I've never seen a compiler issue a warning about "(void)foo;". Even compilers which issue 'statement with no effect' warnings normally recognize deliberate casts to void as a way of saying "shut up, I know what I'm doing".
1. An API-defined callback signature where you need to use (for example) the first and third parameters, and not the second.
2. A public API function that has become deprecated and the previous behavior is emulated in a way that doesn't require all the parameters.
3. Stub/wrapper functions.
4. Init/cleanup functions for modules written to a plugin API, when all parameters passed are not needed in all cases.
5. Language binding closure/thunk functions.
Most of the uses, I think, boil down to backwards compat and fixed callback/plugin interfaces. In those cases I think an __attribute__((unused)) or void cast aren't a huge burden. I usually turn on -Wunused-parameter just in case.
Huh? Any halfway decent software shop will have a policy of compiling with all warnings enabled, and fixing all sources of warnings (with only very rare exceptions). It is extremely unprofessional to disable or ignore warnings such as the "unused function parameter" warning. If you disagree, I respectfully submit that I would never let you near a C project that I was working on.
Any halfway decent software shop will have a policy of compiling with all warnings enabled
FWIW, your definition of "halfway decent" would exclude most software shops in the world.
It is extremely unprofessional to disable or ignore warnings such as the "unused function parameter" warning
Unused function parameter warnings are often noise (e.g., due to #ifdefs, unimplemented APIs, backward compatible APIs, etc). I agree that code should compile w/o warnings, but the benefits of tracking down and squelching unused function argument warnings are pretty marginal in my experience.
All I can say is than in 15 years (fuck.) I've never worked on a team that would have caught a bug like this in dead code. -Wall has been the gold standard on the teams I've worked on.
I would probably agree that most software shops in the world are not halfway decent.
The benefits of tracking down and squelching such warnings would have very likely caused this bug to never exist. The point is that you make "compiles without warnings" a policy, and you build a habit of maintaining that policy -- after a while it begins to take almost no effort at all.
The unused function parameter warning is usually one of the most annoying and least useful. I routinely compile with "-Wall -Wno-unused" because I don't want to litter my code with casts to void.
You have to know that -Wextra exists, which I for one didn't until just now when I ran a sample program. All my projects just compiled with -Wall, which doesn't catch that error.
Also, unless every programmer on the project is being scrupulous about finding and correcting such errors, it quickly becomes line noise, and a lot of it completely unimportant stuff.
-Wextra is total insanity. I just compiled some C code I've been working on with that flag, and got the following: "warning: signed and unsigned type in conditional expression." I was testing if a variable I declared as an integer was less than 0. To my knowledge, every number literal in C is assumed to be of a signed type unless followed by a 'u'. But now I know better. 0 is the exception.
Yeah, that's why every programmer on the project has to be scrupulous about finding and correcting such errors. That's professionalism 101 -- if your team is not keeping the build warning free, things like this slip through.
Things like what, a blatantly obvious bug that would crash virtually every C program running on the system were it not for the fact that it was dead code?
This is practically a documentation bug.
[edit]
I kind of take it back, after mahmud's bzero() joke... out of oniguruma, Python, hexfiend, Apache, nginx, openssl, redis, regexkit, saxon, memcached, subversion, and valgrind, this hypothetical bug would have broken Python, redis, subversion, several apache modules, and memcached. The contents of my codebase/3p directory, FWIW.
It was. That's why this bug hasn't been caught before: the buggy code was never actually compiled as part of a production build. Real builds always use the version optimized for the actual hardware platform.
memset sets a block of memory to a given value. It is often used to initialize an uninitialized block of memory to 0. So often, in fact, that this bug where it always initialized the memory to 0 survived so long.
So often, in fact, that this bug where it always initialized the memory to 0 survived so long.
Given that the bug went undetected for so long, I think you could almost reasonably claim that calling memset with a nonzero parameter is a "corner case" ;)
It is a "corner case," or close to it, but that's not why it wasn't caught here. I don't think this code is ever called, as explained above. It's default code in case there isn't a platform specific version, but there pretty much always is a platform specific version.
memset's job is to fill a block of memory with a given byte. The most common fill byte is 0, to initialize or otherwise clear a buffer. It looks like it is so common, in fact, that a programmer accidentally made Android's memset _always_ fill with 0, no matter what byte was passed in, and presumably no one noticed for a while. (I can't figure out when this patch was introduced.)
Memset (according to the c++ reference) is supposed to: "Sets the first num bytes of the block of memory pointed by ptr to the specified value (interpreted as an unsigned char)."
So, give it a pointer to some memory, give it a value and the number of bytes you want to set and it'll go wild. In this case, memset was just blindly dropping in zeroes into the memory block
The function was always setting the memory to 0 instead of what was passed in. I think most uses of memset are to initialize memory to 0, but any other use would have had unexpected results.
No, this is The Daily WTF. "Oh look, someone wrote some really stupid code!" Who cares? How does it enrich us to know that someone mis-implemented memset() in a bootloader that possibly never even calls it, or if it does, probably passes zero as the argument anyway? (Or is dead code because on any arch anyone uses, memset() is implemented in asm.)
Is this actual shipping code? I've followed the changeset up to the top and it looks like it's in platform/bootable/bootloader/legacy. Hopefully this isn't actually called anywhere, but the review date says May 13th.
What's the point of optimizing the Java VM when functions like memset, malloc, memcmp, memcpy, memchr, etc, etc haven't even been optimized for the platform.
EDIT: I understand that 95% of your time will be writing in a language like Ruby, Python, Javascript or a C based language. But dear god, please add Agner Fog's Optimization Manuals to your reading list. http://www.agner.org/optimize/