The problem with this approach is that it requires the system doing randomizatio...

hruk · 2025-01-14T03:20:16 1736824816

I don't know, all of these are pretty surmountable. We've done dynamic pricing with contextual multi-armed bandits, in which each context gets a single decision per time block and gross profit is summed up at the end of each block and used to reward the agent.

That being said, I agree that MABs are poor for experimentation (they produce biased estimates that depend on somewhat hard-to-quantify properties of your policy). But they're not for experimentation! They're for optimizing a target metric.

empiko · 2025-01-14T13:57:06 1736863026

Surmountable, yes, but in practice it is often just too much hassle. If you are doing tons of these tests you can probably afford to invest in the infrastructure for this, but otherwise AB is just so much easier to deploy that it does not really matter to you that you will have a slightly ineffective algo out there for a few days. The interpretation of the results is also easier as you don't have to worry about time sensitivity of the collected data.

hinkley · 2025-01-14T18:36:15 1736879775

You do know Amazon got sued and lost for showing different prices to different users? That kind of price discrimination is illegal in the US. Related to actual discrimination.

I think Uber gets away with it because it’s time and location based, not person based. Of course if someone starts pointing out that segregation by neighborhoods is still a thing, they might lose their shiny toys.

hruk · 2025-01-15T14:55:54 1736952954

Indeed, we are well aware.

taion · 2025-01-15T01:07:45 1736903265

You can do that, but now you have a runtime dependency on your analytics system, right? This can be reasonable for a one-off experimentation system but it's not likely you'll be able to do all of your experimentation this way.

hruk · 2025-01-15T15:05:17 1736953517

No, you definitely have to pick your battles. Something that you want to continuously optimize over time makes a lot more sense than something where it's reasonable to test and the commit to a path forever.

jacob019 · 2025-01-14T17:17:25 1736875045

Hey, I'd love to hear more about dynamic pricing with contextual multi-armed bandits. If you're willing to share your experience, you can find my email on my profile.

ivalm · 2025-01-13T21:50:53 1736805053

You can assign multiarm bandit trials on a lazy per user basis.

So first time user touches feature A they are assigned to some trial arm T_A and then all subsequent interactions keep them in that trial arm until the trial finishes.

kridsdale1 · 2025-01-13T22:34:38 1736807678

The systems I’ve use pre-allocate users effectively randomly an arm by hashing their user id or equivalent.

ivalm · 2025-01-14T03:10:07 1736824207

To make sure user id U doesn’t always end up in eg control group it’s useful to concatenate the id with experiment uuid.

ryan-duve · 2025-01-13T23:34:13 1736811253

How do you handle different users having different numbers of trials when calculating the "click through rate" described in the article?

s1mplicissimus · 2025-01-13T23:28:08 1736810888

careful when doing that though! i've seen some big eyes when people assumed IDs to be uniform randomly distributed and suddenly their "test group" was 15% instead of the intended 1%. better generate a truely random value using your languages favorite crypto functions and be able to work with it without fear of busting production

np_tedious · 2025-01-14T02:52:52 1736823172

The user ID is non uniform after hash and mod? How?

lern_too_spel · 2025-01-14T03:22:09 1736824929

If you mod by anything other than a power of two, it won't be. https://lemire.me/blog/2019/06/06/nearly-divisionless-random...

np_tedious · 2025-01-15T08:26:09 1736929569

That article is mostly about speed. The following seems like the one thing that might be relevant:

> Naively, you could take the random integer and compute the remainder of the division by the size of the interval. It works because the remainder of the division by D is always smaller than D. Yet it introduces a statistical bias

That's all it says. Is the point here just that 2^31 % 17 is not zero, so 1,2,3 are potentially happening slightly more than 15,16? If so, this is not terribly important

lern_too_spel · 2025-01-16T00:48:10 1736988490

> If so, this is not terribly important

It is not uniformly random, which is the whole point.

> That article is mostly about speed

The article is about how to actually achieve uniform random at high speed. Just doing mod is faster but does not satisfy the uniform random requirement.

np_tedious · 2025-01-24T12:37:36 1737722256

If your number of AB testing combos cohorts is fewer then 100 then yeah this passes for being uniform

lern_too_spel · 2025-02-05T06:49:47 1738738187

It doesn't, mathematically. It might be good enough for some cases, but it is not good enough for cases that actually require uniformity.

s1mplicissimus · 2025-01-16T19:27:41 1737055661

additional to the other excellent comments they will become non-uniform once you start deleting records. that will break all hopes you might have had in modulo and percentages being reliable partitions because the "holes" in your ID space could be maximally bad for whatever usecase you thought up.

hinkley · 2025-01-14T18:41:15 1736880075

Just make sure you do the hash right so you don’t end up with cursed user IDs like EverQuest.