What the heck is an xrange?

sltkr · on June 19, 2012

Nitpicking, but this statement doesn't work with large numbers (which xrange() is supposed to handle correctly):

    self._len = int(ceil(float(stop - start) / step))

Python supports arbitrary-length integers; you can't just cast those to (fixed-length) floating point numbers without losing precision. It's better to use integer division here, for example:

    self._len = (stop - start)//step + bool((stop - start)%step)

(A variant like (stop - start + step - 1)//step works only for positive numbers; I guess it could work if you put it in the earlier if-clause and put a corresponding assignment with +1 at the end in the other branch.)

dcrosta · on June 19, 2012

Great catch. If you want to submit a pull request on github you can get the credit -- github.com/dcrosta/xrange

graue · on June 19, 2012

A related nitpick: since Python's integers are arbitrary size, technically operations like - and / are not constant time, so your claim of an implementation "with constant-time and constant-space operations" is not true.

However, I understand if that complaint is just too nitpicky for you to want to mention it. Great post btw.

sltkr · on June 19, 2012

Thanks for the consideration, but I don't really use github, so you go ahead and take the credit yourself.

dcrosta · on June 19, 2012

OK, thanks. Updated in github and on the blog.

dcrosta · on June 19, 2012

Huh -- CPython 2.x doesn't let you create an xrange with values past 263-1. CPython 3.x does.

sltkr · on June 19, 2012

Even 63-bit integers aren't (all) representable in IEEE double floating point values (that Python uses) which have a 53 bits mantissa.

For example, int(float(10¹⁸ - 1)) != 10¹⁸ - 1, but xrange(1, 10¹⁸) is perfectly valid (even in Python 2).

edit: how do I type two consecutive asterisks on Hacker News? Backslash doesn't seem to work as an escape character.

sp332 · on June 19, 2012

If you put two spaces at the beginning of a line, you'll get a monospaced "literal" mode.

  For example, int(float(10**18 - 1)) != 10**18 - 1,
  but xrange(1, 10**18) is perfectly valid (even in Python 2).

heretohelp · on June 19, 2012

\\\\

Edit: failure. That's two stymied people :(

ori_b · on June 19, 2012

You can also cast to long, which doesn't lose precision.

eliben · on June 19, 2012

Nice.

As a side-note, here's a related article on implementing a Python generator "for real" using the C API: http://eli.thegreenplace.net/2012/04/05/implementing-a-gener...

dbecker · on June 19, 2012

I didn't expect to learn much when I started reading this, but I was wrong. Excellent post. Thanks.

leetrout · on June 19, 2012

I had a _very_ similar question during my Google interview. In the course of the day I was tasked with implementing a generator (although answering with `(x for x in foo)` got a smile I did have to build a class) and later in the day was asked how a sequence manager can maintain constant time.

This is an excellent post and every Python hacker should read it. Kudos to the author.

mercuryrising · on June 19, 2012

This is cool. I know this isn't how they actually do it, but seeing it in a language I can understand (Python) and not in C is really pleasing.

You can do tons of examples, but until you figure out how the machine works inside, you'll be completely lost (you could get it with examples, but you won't necessarily know WHY you got it, so you'll be useless in helping other people learn).

gus_massa · on June 19, 2012

Does anyone know how this is implemented in Pypy?

brian_cloutier · on June 19, 2012

https://bitbucket.org/pypy/pypy/src/default/pypy/annotation/...

kingkilr · on June 19, 2012

https://bitbucket.org/pypy/pypy/src/default/pypy/module/__bu...

lloeki · on June 19, 2012

Not worth a pull request, but personally I'd replace:

        if len(args) == 1:
            start, stop, step = 0, args[0], 1
        elif len(args) == 2:
            start, stop, step = args[0], args[1], 1
        elif len(args) == 3:
            start, stop, step = args
        else:
            raise TypeError('xrange() requires 1-3 int arguments')

with:

        map = [
                lambda args: (0, args[0], 1),
                lambda args: (args[0], args[1], 1),
                lambda args: args,
              ]
        try:
            start, stop, step = map[len(args)](args)
        except IndexError:
            raise TypeError('xrange() requires 1-3 int arguments')

It's more DRY, and it conveys the intent better.

I would not do such a change to the if step block since its pattern feels noticeably different: "open" checks fit well in a if/else, whereas bunch-of-equalities fit a dispatch map better (plus you can actually modify the map at runtime).

zokier · on June 19, 2012

That imho seems like overcomplicating a relatively straight-forward piece of code without noticeable improvement. In the original code the intent is clear from the first line ("check the number of arguments"), but in your version I have to parse 7 lines of code before I find out that. Also in your code I'm required to keep larger, more complicated state (the map array) in my head when reading it. Your version also looks like it would perform worse than the original. Imho code that looks like it performs suboptimally is code that looks ugly, even if the performance difference in reality would be negligible.

sateesh · on June 19, 2012

To me your version is less clearer than the explicit if calls:

* Name 'map' for a variable is a poor choice (as it has same name as the python builtin function map)

* Your version has off by one error, it doesn't give correct results when called with a single element list or if the list has three elements:

  >>>mymap = [
                lambda args: (0, args[0], 1),
                lambda args: (args[0], args[1], 1),
                lambda args: args,
              ]

  >>>args = [5]

  >>>mymap[len(args)](args)
    IndexError 
    Traceback (most recent call last)
    ....
   # This shouldn't be the case, a list with a single
   # element is a valid input

  >>>args = [1,6,1]

  >>>mymap[len(args)](args)
    IndexError   Traceback (most recent call last)
    ...

   # This shouldn't be the case, a list with a three
   # elements is a valid input

saintfiends · on June 19, 2012

If only lambda's allowed statements :(.

  >>>mymap = [
               lambda args: raise IndexError,
               lambda args: (0, args[0], 1),
               lambda args: (args[0], args[1], 1),
               lambda args: args,
             ]

That aside, this approach would be much slower too.

d0mine · on June 19, 2012

Answers to http://stackoverflow.com/questions/1482480/xrange2100-overfl... contain several implementations of xrange() in pure Python

dcrosta · on June 19, 2012

Thanks, hadn't seen this.

ufo · on June 19, 2012

I cried a little bit on the inside when he said that that the iterator should not be implemented using generators, etc. Writing iterators by hand forces one to turn the iteration code inside out and it can be a painful experience if the logic is anything nontrivial.

andreasvc · on June 19, 2012

But there's a good motivation here: (x)range objects are supposed to be index-able, while a generator (expression) cannot go back once an element has been consumed.

exDM69 · on June 19, 2012

I feel that this kind of combination of lazy evaluation for sequences, together with eager evaluation for imperative code hits a particular sweet spot. Python and Clojure have very nice lazy sequences.