More

delifue · 2025-07-03T01:28:46 1751506126

Its main benefit is to avoid having extra data structure (like hash map) to find the missing or duplicate, using O(n) time and O(1) space.

moron4hire · 2025-07-03T01:40:59 1751506859

No, again, that's not my point. The code from the article is O(2n) when it could be O(n). I know we're not supposed to care about constant factors, but I've lived in a world where not micro optimizing the ever loving shit out of my software could potentially make people throw up, so this sort of stuff kind of stands out to me.

repiret · 2025-07-03T02:27:24 1751509644

The code in the article is written in Python, whose only for loop is for-each. It is 2N XOR operations, regardless of whether you use one or two loops.

I probably would have written it with a single loop, using the `enumerate` iterator adapter. But in Python, two loops is almost certainly more efficient.

Dylan16807 · 2025-07-03T03:22:13 1751512933

You can loop over the range and then do result ^= i ^ A[i]. If adapters are slow you don't need them here.

Having only one loop gives you a better memory access pattern, because it's 2 XOR operations in between each memory access. Two loops is the same number of instructions, but it spends one loop ignoring memory and then another loop doing rapid-fire memory accesses. In python there's enough overhead that it's unlikely to matter. But in a faster language running on a busy machine that could make a real difference.

sfn42 · 2025-07-03T10:29:04 1751538544

for i in range(n - 1):

Is pretty much a standard for loop. Between that and

for n in numbers:

You can do pretty much the same things as a more conventional language.

You could also solve it pretty simply like this:

expected_sum = (n * (n + 1)) / 2

missing_num = expected_sum - sum(numbers)

This only iterates the list once. This would probably be my solution if I was given this task in an interview.

perfmode · 2025-07-03T01:44:51 1751507091

real world performance will depend on how much of that N fits in cache. and in what cache it fits (L1, 2, 3). once loaded, you may not pay much cost to access each value a second time.

MindSpunk · 2025-07-03T02:44:40 1751510680

Doing 2 loops over the data means you have to do a full pass over the data twice. If your N doesn’t fit in L3 then you’re going to load N twice instead of once. Loading twice, even out of L1 is still slower than never loading twice at all.

moron4hire · 2025-07-03T03:00:16 1751511616

Exactly. And there's also the fact that sometimes the data stream we're processing is unbounded and ephemeral. For example, reading values from a physical sensor. It may not match up to this specific example, but the point remains that a single "loop" over a data set might be all you get, so pack as much into that loop as you can.

NoahZuniga · 2025-07-03T03:34:09 1751513649

O(2n) doesn't exist. The whole point of big O is that you ignore such "trivial" things as what factor comes before the n

moron4hire · 2025-07-03T03:44:51 1751514291

Did I not say that?

haskellshill · 2025-07-03T05:21:23 1751520083

>we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"

You seem to believe that "O(2n)"

  for value in range(1, n + 1):
    result ^= value
  for value in A:
    result ^= value

is slower than "O(n2)"

  for value in range(1, n + 1):
    result ^= value
    result ^= A[value-1]

simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?

a_t48 · 2025-07-03T05:47:59 1751521679

Unless both loops get unrolled it's ever so slightly slower due to having to check for the end value twice. Plus potentially a cache hit at the start of the second loop.

codebje · 2025-07-03T07:41:18 1751528478

None of this is as straightforward as it seems.

A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.

"FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.

A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).

Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.

If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).

So maybe we be a bit more Pythonic, and do:

    for i, value in enumerate(A)
        result ^= i
        result ^= value

One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.

Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.

moron4hire · 2025-07-03T07:55:20 1751529320

Even assuming python's foreach loop in these cases get optimized down to a very bare for loop, the operations being performed are dominated by the looping logic itself, because the loop body is so simple.

Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.

It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.

The smaller the loop body, the higher the gains from optimizing the looping construct itself.

iainmerrick · 2025-07-03T13:12:27 1751548347

You said the code from the article is O(2n) when it could be O(n), but those are the same thing.

codebje · 2025-07-03T05:08:27 1751519307

You did, but it might not be an effective strategy to mention asymptotic complexity to help forward your argument that one linear implementation is faster than another.

Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.

In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.

delifue · 2025-07-01T06:01:24 1751349684

Go keeps on not adding syntax sugar for if err != nil. https://go.dev/blog/error-syntax

But often the problem of language can be partially solved by IDE. IDE can already generate if err != nil branches. Goland can fold if err != nil branches. https://github.com/golang/vscode-go/issues/2311

delifue · 2025-07-01T05:24:53 1751347493

The hardware(GPU)'s architectural limitations may slow research more than PyTorch. The hardware lottery https://hardwarelottery.github.io/

mikewarot · 2025-07-01T18:23:16 1751394196

I've been trying to get BitGrid [1] a hardware lottery ticket for a decade now.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

delifue · 2025-06-18T01:39:41 1750210781

"Software 2.0" reminds me of "web 3.0".

mountainriver · 2025-06-18T01:50:16 1750211416

Luckily it’s nothing like that

delifue · 2025-05-23T06:09:22 1747980562

Vibe coding is easy. Vibe debugging is hard.

delifue · 2025-05-13T09:36:48 1747129008

The fifth point is the same as in 'Ask for no, don’t ask for yes'

https://www.mooreds.com/wordpress/archives/3518

https://news.ycombinator.com/item?id=43144611

delifue · 2025-04-30T06:17:05 1745993825

If you use Box to refer to parent, then parent cannot own the child (unless using things like Arc<Mutex<>>).

delifue · 2025-04-30T05:29:28 1745990968

The "if I then go and code in Python, Java or C#, pretty much all objects have the overhead of an Arc" is not accurate. Rust Arc involves atomic operation and its preformance can greatly degrade when the reference count is being mutated by many threads. See https://pkolaczk.github.io/server-slower-than-a-laptop/

Java, C# and Go don't use atomic reference counting and don't have such overhead.

delifue · 2025-03-26T06:55:45 1742972145

Not allowing take reference can avoid interior pointer. Interior pointers make memory safety harder and also make GC harder.

delifue · 2025-03-24T02:43:05 1742784185

If I keep removing one element in front and adding one element on back, then normal ring-buffer deque will involve no copying, but this will keep doing copying to empty space, so its performance could be much worse than deque if the queue is large.