Summary: if you run an experiment where you try to rush users to convert, and yo...

jollybean · on Jan 7, 2022

This is a good summary because the math is hugely distracting from the basic realities.

You need to have basic intuition for what might be happening, the math is just a formality and frankly, is unnecessary beyond a very, very simple calculation.

You have to actually 'think' about behaviour a bit if you want to get it right, that's the hard part.

If you have something reasonable, then the conversions/control numbers can be worked out into a probability of success very quickly, and even then, just looking at them will give you a good idea if it worked or not.

The maths is a shiny lure for technical people, it gets us all excited as though there is some kind of truth behind it.

xibalba · on Jan 7, 2022

You're summary is incorrect.

Rather, these are simulated data for a fictitious company. The author is demonstrating a scenario in which a purely frequentist approach to A/B testing can result in erroneous conclusions, whereas a Bayesian approach will avoid that error. The broad conclusions are (as noted explicitly at the end of the article):

- The data generating process should dictate the analysis technique(s)

- lagged response variables require special handling

- Stan propaganda ;) but also :(

It would be cool to understand what the weaknesses or risks of erroneous conclusion to the Bayseian approach in this or similar scenarios. In other words, is it truly a risk-free trade off to switch from a frequentist technique to a Bayesian technique, or are we simply swapping one set of risks for another?

tl;dr The author's point is not to make a general claim about the aggressiveness of CTAs.

jefftk · on Jan 7, 2022

While I am generally in favor of applying Bayesian approaches, that's overkill for this problem. In their (fictitious) example, the key problem is that they ran their test for too short a time. They already know that the typical lag from visit to conversion on their site is longer than a week, which means that if they want to learn the effect on conversions a week isn't enough data.

While it is possible to make some progress on this issue with careful math, simply running the test longer is a far more effective and robust approach.

OJFord · on Jan 7, 2022

I'm no statistician, but don't you have the same problem however long you run it? Giving even more time for slow conversions to amass?

Also you and GP are calling the example fictitious, but seems to based on 'real traffic logs' via https://dl.acm.org/doi/10.1145/2623330.2623634

xibalba · on Jan 7, 2022

We're taking the author at his word:

> "Let us consider the following fictitious example in which Larry the analyst of the internet company Nozama"

Nozama is Amazon backwards.

kqr · on Jan 7, 2022

> - The data generating process should dictate the analysis technique(s)

And to expand on this, the data generating process is not about a statistical distribution or any other theoretical construct. Only in the frequentist world do you start with assuming a generating process (for the null hypothesis, specifically).

The data generating process in this case are living, breathing humans doing things humans do.

rwilson4 · on Jan 7, 2022

The data generating process is the random assignment of people to experiment groups.

The potential outcomes are fixed: if a person is assigned to one group the outcome is x1; if another, x2. No assumption is made about these potential outcomes. They are not considered random, unless the Population Average Treatment Effect is being estimated. And even in that case, no distribution is assumed. It certainly is not Gaussian for example.

Under random assignment, the observed treatment effect is unbiased for the Sample Average Treatment Effect. So again, the data generating process of interest to the analyst is random assignment.

kqr · on Jan 7, 2022

Assuming you're able to actually achieve truly random participation in the various arms you're trialing, you're right.

And it's my fault for not thinking of that as a possibility. Colour me jaded after experiencing very many bad attempts at randomization that actually suffer from Simpson's paradox in various ways!

rwilson4 · on Jan 7, 2022

You're absolutely correct, proper A/B testing has many engineering challenges!

omegalulw · on Jan 7, 2022

Wouldn't it be better to run the experiment longer _and_ discard the data from initial few weeks?

gpt5 · on Jan 7, 2022

This could make the entire org/company run and innovate much slower. Ideally you can build better models that predict long term conversion from short term data. These models can be refined with long term experiments.

dmkii · on Jan 7, 2022

The current state of browser tracking preventions also means that you’re unlikely to identify conversions from the same user that saw your experiment after a week or sometimes even 24 hours.

aeternum · on Jan 7, 2022

Yes, browser tracking prevention is one of those things that seems like a good idea at first but likely makes the internet slightly worse overall.

Sites can only optimize for what they can see and we've made it so they can only see short-term engagement.

Another is all the annoying cookie popups as a result of GDPR.

chunkyks · on Jan 7, 2022

You haven't convinced me that preventing browser tracking is making the internet "slightly worse overall".

If sites are having trouble converting me, perhaps it's not me that's the problem.

aeternum · on Jan 7, 2022

The issue is most sites can no longer tell if they are converting you

chunkyks · on Jan 8, 2022

It's not obvious to me that that is a problem for me, or that it makes the internet worse

def_true_false · on Jan 7, 2022

The popups are a result of tracking, not GDPR. Websites without tracking don't need to have them.

It's somewhat amusing that the overlap of garbage content farms and sites with annoying consent popups is almost perfect. I wonder if it could be used for search engine ranking.

lifeplusplus · on Jan 8, 2022

I don't get this summary how are you capturing a larger part of conversations

jefftk · on Jan 8, 2022

Conversions, not conversations, if that helps?

machiaweliczny · on Jan 7, 2022

Also it might be hard to ensure you aren't externalising "cost"