This has been my experience as well. Even using a small subset of a playbook via tags can take a long time, especially if you're doing a run in serial. One of our deployments that only affects six servers takes fifteen minutes.
This can be mitigated somewhat by putting Ansible on the target machine, downloading all the necessary files to that machine, and then running Ansible locally... but that seems awfully fragile to me.
I am much more interested in Salt's ZeroMQ path these days. It seems to scale better, at least on paper and in my few small tests.
I'd be interested in hearing how this stacks up against simply running the tasks via shell scripts, because the time to install packages/do other tasks is orders of magnitude higher than the connection overhead. Things will always be slow when doing `serial: 1`, so I'd definitely recommend a canary setup where you run a play with a small serial batch followed by a play with no serial limitation - that'll speed things up considerably.
Finally, when using ControlPersist with pipelining mode in Ansible, it's as fast if not faster than zeromq or our own accelerated mode (which we will be deprecating at some future point when older SSH installs are not as common).
This can be mitigated somewhat by putting Ansible on the target machine, downloading all the necessary files to that machine, and then running Ansible locally... but that seems awfully fragile to me.
I am much more interested in Salt's ZeroMQ path these days. It seems to scale better, at least on paper and in my few small tests.