Consider the idiomatic way of interating backward through a array: for(i=n; i-- ...

nikki93 · on Sept 2, 2020

Hmm, I think `for (i = n - 1; i >= 0; --i)` is way clearer and maybe more common?

edit: Ah unsigned underflow. :O

ramshorns · on Sept 2, 2020

Yeah, so then you write

    for (size_t i = n-1; i < n; --i) { /* operate on a[i] */ }

It works fine (unsigned overflow is well defined) but it's even less clear.

nikki93 · on Sept 2, 2020

It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'

a1369209993 · on Sept 2, 2020

> Indices are difference types

Umm, no? Indices are ordinals[0], forming the canonical/nominal well-ordering of a collection such as a array.

> an ordinal number, or ordinal, is one generalization of the concept of a natural number that is used to describe a way to arrange a (possibly infinite) collection of objects in order, one after another. [...] Ordinal numbers are thus the "labels" needed to arrange collections of objects in order.

0: https://en.wikipedia.org/wiki/Ordinal_number

nikki93 · on Sept 2, 2020

In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.

wahern · on Sept 3, 2020

ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?

In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.

There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.

drran · on Sept 3, 2020

Negative offset is used often to access fields in parent struct having pointer to a field only. For example, to implement garbage collection or string type.

a1369209993 · on Sept 3, 2020

That should be:

  Parent* p = container_of(field,Parent,pa_somefield);
  access(p->pa_otherfield);

You'd usually define container_of using subtraction (not negative offset per se):

  #define container_of(FIELD,TYPE,MEMB) ({ \
   const typeof( ((TYPE*)0)->MEMB )* _mptr = (FIELD); \
   (TYPE*)( (char*)_mptr - __builtin_offsetof(TYPE,MEMB) ); \
   })

but you shouldn't actually be using that directly, because thats what the macro is for.

wahern · on Sept 3, 2020

But p - 2 is not the same as p + -2, and it's not clear in your example whether the former suffices or the latter is required. I can definitely imagine examples where the latter is required--certainly C clearly accommodates this usage--but IME it's nonetheless a rare scenario and not something that could, alone, justify always using signed offsets. Pointers are intrinsically neither signed nor unsigned; it's how you use them that matters.

nikki93 · on Sept 4, 2020

The reason we are in this subthread is my message about being bitten by unsigned underflow when iterating backwards using an unsigned `i`.

saagarjha · on Sept 2, 2020

The C language "de facto" uses size_t for indexing and ptrdiff_t for differences, or the rare case where you have a negative index.

andi999 · on Sept 2, 2020

size_t is unsigned? Since when?

moonchild · on Sept 3, 2020

It always has been. C89, 4.1.5[1]:

> The type are [...] size_t which is the unsigned integral type of the result of the sizeof operator

(Emphasis mine.)

1. https://port70.net/~nsz/c/c89/c89-draft.html#4.1.5

nikki93 · on Sept 2, 2020

Couldn't find an online version of the C standard with links to parts of it, but here's one for C++: http://eel.is/c++draft/support.types#layout-3

> The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).

swinglock · on Sept 3, 2020

Yes. ssize_t is signed.

moonchild · on Sept 3, 2020

ssize_t should never be used.

It's not guaranteed to have a full negative range, only to be able to represent -1.

Use ptrdiff_t as a signed size type.

AbacusAvenger · on Sept 3, 2020

Out of curiosity, do you know of any implementations where ssize_t has that kind of range limitation?

moonchild · on Sept 3, 2020

Nope. But I do know of at least one implementation where it's not present at all—msvcrt. ssize_t isn't specified in the c standard, it's part of posix. ptrdiff_t is standard.

ramshorns · on Sept 2, 2020

That's the idiomatic way? Cool. The more straightforward-looking way,

    for(i = n-1; i >= 0; i--)
        { /* operate on a[i] */ }

breaks if i is unsigned, like a size_t.

a1369209993 · on Sept 2, 2020

Yep. That why it's a idiom, rather than a obvious-way-of-doing-it-that-anyone-competent-would-use.