Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Consider the idiomatic way of interating backward through a array:

  for(i=n; i-- > 0 ;)
    { /* operate on a[i] */ }
converting i--; to a statement at the start of block makes it less clear that it's part of the iteration idiom rather than a ad hoc adjustment that's specific to this particular logic. There are other examples, but they're either more involved or statementification is less obviously wrong.


Hmm, I think `for (i = n - 1; i >= 0; --i)` is way clearer and maybe more common?

edit: Ah unsigned underflow. :O


Yeah, so then you write

    for (size_t i = n-1; i < n; --i) { /* operate on a[i] */ }
It works fine (unsigned overflow is well defined) but it's even less clear.


It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'


> Indices are difference types

Umm, no? Indices are ordinals[0], forming the canonical/nominal well-ordering of a collection such as a array.

> an ordinal number, or ordinal, is one generalization of the concept of a natural number that is used to describe a way to arrange a (possibly infinite) collection of objects in order, one after another. [...] Ordinal numbers are thus the "labels" needed to arrange collections of objects in order.

0: https://en.wikipedia.org/wiki/Ordinal_number


In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.


ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?

In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.

There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.


Negative offset is used often to access fields in parent struct having pointer to a field only. For example, to implement garbage collection or string type.


That should be:

  Parent* p = container_of(field,Parent,pa_somefield);
  access(p->pa_otherfield);
You'd usually define container_of using subtraction (not negative offset per se):

  #define container_of(FIELD,TYPE,MEMB) ({ \
   const typeof( ((TYPE*)0)->MEMB )* _mptr = (FIELD); \
   (TYPE*)( (char*)_mptr - __builtin_offsetof(TYPE,MEMB) ); \
   })
but you shouldn't actually be using that directly, because thats what the macro is for.


But p - 2 is not the same as p + -2, and it's not clear in your example whether the former suffices or the latter is required. I can definitely imagine examples where the latter is required--certainly C clearly accommodates this usage--but IME it's nonetheless a rare scenario and not something that could, alone, justify always using signed offsets. Pointers are intrinsically neither signed nor unsigned; it's how you use them that matters.


The reason we are in this subthread is my message about being bitten by unsigned underflow when iterating backwards using an unsigned `i`.


The C language "de facto" uses size_t for indexing and ptrdiff_t for differences, or the rare case where you have a negative index.


size_t is unsigned? Since when?


It always has been. C89, 4.1.5[1]:

> The type are [...] size_t which is the unsigned integral type of the result of the sizeof operator

(Emphasis mine.)

1. https://port70.net/~nsz/c/c89/c89-draft.html#4.1.5


Couldn't find an online version of the C standard with links to parts of it, but here's one for C++: http://eel.is/c++draft/support.types#layout-3

> The type size_­t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).


Yes. ssize_t is signed.


ssize_t should never be used.

It's not guaranteed to have a full negative range, only to be able to represent -1.

Use ptrdiff_t as a signed size type.


Out of curiosity, do you know of any implementations where ssize_t has that kind of range limitation?


Nope. But I do know of at least one implementation where it's not present at all—msvcrt. ssize_t isn't specified in the c standard, it's part of posix. ptrdiff_t is standard.


That's the idiomatic way? Cool. The more straightforward-looking way,

    for(i = n-1; i >= 0; i--)
        { /* operate on a[i] */ }
breaks if i is unsigned, like a size_t.


Yep. That why it's a idiom, rather than a obvious-way-of-doing-it-that-anyone-competent-would-use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: