Surprised no one said this yet, so I'll bite the bullet.
I don't think A/B testing is a good idea at all for the long term.
Seems like a recipe for having your software slowly evolved into a giant heap of dark patterns. When a metric becomes a target, it ceases to be a good metric.
More or less, it tells you the "cost" of removing an accidental dark pattern. For example we had three plans and a free plan. The button for the free plan was under the plans, front-and-center ... unless you had a screen/resolution that most of our non-devs/designers had.
So, the button, (for user's most common resolution) had the button just below the fold.
This was an accident though some of our users called us out for it -- suggesting we'd removed the free plan altogether.
So, we a/b tested moving the button to the top.
It would REALLY hurt the bottom line and explained some growth we'd experienced. To remove the "dark pattern" would mean laying off some people.
I think you can guess which one was chosen and still implemented.
When I left that company it had grown to massive and the product was full of dark patterns… I mean bugs, seriously, they were tracked as bugs that no one could fix without severe consequences. No one put them there on purpose. When you have hundreds of devs working on the same dozen files (onboarding/payments/etc) there are bound to be bad merges (when a git merge results in valid but incorrect code), misunderstanding of requirements, etc.
Good multivariate testing and (statistically significant) data doesn't do that. It shows lots of ways to improve your UX, and if your guesses at improving UX actually work. Example from TFA:
> more people signed up using Google and Github, overall sign-ups didn't increase, and nor did activation
Less friction on login for the user, 0 gains in conversions, they shipped it anyway. That's not a dark pattern.
If you're intentionally trying to make dark patterns it will help with that too I guess; the same way a hammer can build a house, or tear it down, depending on use.
I often see this argument, and although I can happily accept the examples given in defence as making sense, I never see an argument that this multivariate approach solves the problem in general and doesn't merely ameliorate some of the worst cases(I suppose I'm open to the idea that it could at least get it from "worse than the disease" to "actually useful in moderation").
Fundamentally, if you pick some number of metrics, you're always leaving some number of possible metrics "dark", right? Is there some objective method of deciding which metrics should be chosen, and which shouldn't?
Rolled out some tests to streamline cancelling subscriptions in response to user feedback, with Marketing's begrudging approval.
Short term, predictably, we saw an increase in cancellations, then a decrease and eventual levelling out. Long term we continued to see an increase in subscriptions after rollout, and focused on more important questions like "how do we provide a good product that a user doesn't want to cancel?"
And how do you determine that? I'm not trying to be coy here, I genuinely don't understand.
Because you're not testing for patterns, what you test is some measurable metric(s) you want to maximise(or minimise), right? So how can you determine which metrics lead to dark patterns, without just using them and seeing if dark pattern emerge? And how do you spot these dark patterns if by their very nature they're undetectable by the metrics you chose to test first?
The "patterns" in dark patterns doesn't mean they're an emergent property of the system. You test whether a change improves a metric in A/B tests. You avoid accidental dark patterns in the change like you avoid bugs that cause accidental data loss in the change: you think carefully about what you're doing, maybe a reviewer looks it over, and so on. This isn't perfect, but nothing is.
What they're describing is a serious problem in modern product innovation, so maybe it is you who should take it seriously, aye?
Let's rephrase: if we are not to test for changes in user behaviour that give positive signal to progressive innovation, then what should we do? And how should we avoid the loudest voices in a room full of whiteboards creating a product that biases towards the needs of tech company and startup employees?
What if it's at an airport queue, where they are testing how to improve queue times and whether having the queue be in a straight line or in zig-zag makes it faster for passing security checks?
Should the passengers sign an agreement before being "experimented on", and having them be split in two groups, where one stays in a straight line and one in zig-zag?
I don't think A/B testing is a good idea at all for the long term.
Seems like a recipe for having your software slowly evolved into a giant heap of dark patterns. When a metric becomes a target, it ceases to be a good metric.