Mainly because not everyone is a great or even decent programmer, so I wanted an easy way in for them for the situation I was in. Hence automation for everyone. But I don't think this is a serious competitor for Ansible.
My life resolves around Ansible and has for the last year and a half. Ansible is a great tool, unfortunately it has a bunch of short comings. Especially now some of these are becoming a major issue.
- Its dependence on Python 2.X, unfortunately all the newer OS's have Python 3.X installed by default. While i have a role for installing this manually it isn't super nice.
- My main devops project consists of 11 different environments, and it manages nearly 2k servers. Trying to write environment agnostic roles is an absolute nightmare. Furthermore my group_vars, host_vars, and dynamic inventories have turned into spaghetti. Why? Because of how confusing variable precedence becomes, when you get into extremely complex hierarchy of group_vars which can randomly break.
- It is really hard to handle cases where playbooks/roles fail and deal with it cleanly. Blocks have helped a bit, but there is no way to grab the exception that caused the playbook to fail.
- The yaml syntax becomes messy fast, Jinja2 is powerful but there are times i wish you could just shove a single line of python into places to prevent the huge mess of with_items, register, set_fact, etc.
- AWS modules are a very mixed bag. A lot of them are missing features, and there is some very weird issues that apparently boto is the cause of (e.g. they should move to boto3). Even then i need to do a huge amount of boilerplate to launch ec2 instances.
- There is no way to share a common group_vars between inventories (all is specific to inventory). Which leads to a lot of duplication. (Yes the symlink trick does work, but causes a bunch of other issues).
I still love Ansible and i would pick it over Chef/Puppet. Salt is also another decent one that has made a lot of progress lately.
That's interesting - I just received a request for ShutIt to work with Python3.
Regarding failure, ShutIt drops to a shell on error (if there's a tty), so you can correct manually before continuing (pause_point). These can also be triggered directly for debugging purposes.
I don't like YAML for anything complicated. The bash scripts I saw embedded in YAML felt wrong. Why not just use shell?
Env-agnosticism (eg 'install' packages rather than yum/apt) was built in because it was created in a bi-OS env (ubuntu and CentOS).
We rely heavily on Ansible and your description seems very accurate. I worry we have too much invested at this point to consider anything else. Still would pick it over Chef/Puppet.
Salt does look interesting but it was too young when we started. I don't know if we would be trading Ansible's shortcomings for Salt's.
As someone who has previously used Chef and is currently using Ansible, I can understand those pain points. For me it's not as much of a problem as I have 10x less servers, but I have heard a lot of complaints about Ansible scaling. I definitely miss being able to just use Ruby in a cookbook, but it's usually very rare that I can't find a reasonable workaround.
Usually I describe Ansible as the tool for people who don't come from a programming background and won't be administrating a huge amount of servers. Otherwise, if you need scale then use Chef (as why would you want to have to deal with scaling AND Puppets terrible DSL).
Regarding variable precedence, it's pretty simple to grok in 2.x+, and well-documented now (it was a serious nightmare, and a very valid complaint... but isn't as much anymore).
For sharing common group_vars, what I do is my inventory passes an environment-aware variable which I include in the playbook by doing like:
Unfortunately the above way will load the vars at a higher precedence then group_vars. Also if you have multiple groups with the same var but differing values you will need to start loading multiple files (or you can do |default, and use an alias to the value in group_vars (which leads to even more spaghetti)).
Azeralthefallen hit the nail on the head for the most part and I wasn't managing anywhere near as many servers s/he was. I had come from Salt prior to Ansible and there was a lot to miss.
Since configuration management isn't a typical part of my day job, this is appealing - I have used Chef and Ansible a little, but haven't needed them badly enough to really choose one, which makes ShutIt attractive to me, especially recording sessions. This may be a 'careful what you wish for' but I am curious if you think it would be possible to parse the shell history file on a box to produce a recording retroactively? It would probably have to be interactive, press return to add this command or space to ignore it etc., but it would provide for backing up everything you'd done to a server, which I have certainly wished I had an easy way to do (short of manually poring over my history and cherry-picking the things that mattered).
Does this provide automatic subcommands? For example if I write a module/script can I just do `shutit <custom command here>` from within any directory?
The use case is a heroku/docker type cli without needing to mess with the Python args module and setting up a bash alias
Just curious how you decide the line between what is handled by ShutIt and what's handled in the shell?
For example, from the example on the home page:
# Ensure git is installed. This handles different distros gracefully.
shutit.install('git')
Awesome and very powerful abstraction! Vs the next lines...
# If the directory does not exist, we create it
if not shutit.file_exists('/opt/shutit',directory=True):
shutit.send('mkdir /opt/shutit')
Why not just:
shutit.send('mkdir -p /opt/shutit')
There are a few other examples where I was wondering how you decided to do some operations in ShutIt vs delegating to shell. In fact, some entire examples look simpler to me in shell that you could just send. Does using the ShutIt native commands make things more testable?
I am a firm believer that example code that serves as a reference should illustrate the absolute best practices. "mkdir -p" is the best practice to create a dir that may or may not already exist, therefore this is a bad example (you are teaching some devs who don't know "mkdir -p" a bad practice). If you want to demonstrate the file_exists function, pick a better example.
This is the same reason why I dislike, for example, seeing the documentation of a library showing how to perform actions X, Y, and Z, but a footnote says "by the way, there is no error checking in this example". Well then, why use this code as an example?
Anyway, sorry for the rant. ShutIt does seem interesting :)
I know what you mean, but I couldn't think of a better example that would not have involved explaining something else that would have distracted from the point. I wasn't comfortable doing it, but at the same time, couldn't think of anything better.
Actually, I have similar problems when teaching - I'm writing a git course at the moment, and it's hard to illustrate concepts in a realistic way without hopelessly artificial repos.
I wish some other question was asked more: "for configuration management, why Ansible with its atrocious YAML-encoded programming language over CFEngine or Puppet?"
I agree that YAML's a mess, but not requiring a client-side agent is a pretty big win in my experience, and to be honest none of the configuration management systems I've used has had a particularly comfortable specification language.
Oh, but Ansible does require a client-side agent. It just happens that the
agent is the same one as you use for interactive purposes, which is another
big no-go in the long run (when you let your servers' users configure SSH
server to some extent, and then you need to make sure they don't cut out your
configuration management tool).
Also, configuration management tool should be, apart from downloading
configuration from somewhere, independent from any external server/service.
As a (somewhat) seasoned sysadmin, I tell you that mixing responsibilities of
services is a Bad Idea[tm]. You wouldn't put disk access into your HTTP header
handling code, would you? So why do you think overloading an interactive shell
with configuration management is a good idea? It doesn't even work well,
considering that you need to manage SSH keys, which quickly gets ugly when the
number of hosts grows, and then there's the need to re-send configuration for
each host that happened to be down or otherwise unreachable at the time of
issuing it the first time.
You need a) interactive shell, b) configuration management tool (which works
in scheduled batch mode and in "pull" style, not "push" as in Ansible),
and c) system for running ad-hoc commands synchronously (this one is "push"
style). Yes, all three. Yes, separate, not a weird mix of them. Been there,
done that.
Nobody's saying that Puppet or CFEngine isn't the best solution for a lot of cases, only that there are also cases where Ansible, or something like it, is a better choice.
You seem to have a serious down on the whole concept behind Ansible, and I'm curious why that is. Have you had a bad experience with it? If so, I'd like to hear about it, in order to better inform my own analysis of the tool.
> You seem to have a serious down on the whole concept behind Ansible, and I'm curious why that is.
No, not really. Not on the high-level idea(s). Heck, I've written myself (or
at least deployed already-written) tools for what Ansible tries to do:
configuration management, synchronous mass running an ad-hoc command on a set
of machines, and running a specific, parametrized procedure on a single host.
I recognize all these scenarios are important and I want to have tools for
each of those, but let the tools do these jobs robustly.
What bothers me is design decisions and architecture of Ansible (and other
tools sharing them). The decisions seem sound from far away, but they start
leaking once you look closely. Good general tool would be robust for pretty
much all the use cases; Ansible is brittle (breaks when you misconfigure
sshd_config a little; breaks when you accidentally overwrite SSH keys; can't
work if you don't have account with working shell and $HOME; breaks apart when
some of the hosts are down; and so on).
Ansible apparently started as a script that automated somebody's workflow
(which is a good thing) -- and stayed that way, but now is marketed as
a general tool (which is not, but resembles it closely enough to trick many
its users into thinking it is).
It all boils down to Ansible still being a quick hack for somebody's specific
workflow and me hating buzz around half-assed tools.
> Have you had a bad experience with it? If so, I'd like to hear about it, in order to better inform my own analysis of the tool.
I have so many objections against Ansible (some I mentioned above) that it
actually makes more sense to write a separate article about it, but this will
take time.
> Ansible apparently started as a script that automated somebody's workflow (which is a good thing) -- and stayed that way, but now is marketed as a general tool (which is not, but resembles it closely enough to trick many its users into thinking it is).
I think that's a key critique of many of these things. They start off solving one problem, then eventually become 'enterprise', and the design trade-offs get exposed. This is what I was alluding to above with my trade-offs comment.
Well, CFEngine (especially 3.x, a third rewrite of the same basic idea) does
not show these trade-offs. It was designed from the ground up as a general
tool based on previous experiences with wide deployments, it (its codebase)
did not start as a script for solving some specific problem in some specific
workflow.
Puppet has similar history, I believe, as a tool that was written to replace
cfengine (2.x) without its shortcomings.
Client-side agent is great for some orgs (esp those that need control), but I wanted something that would be more general-purpose (hence using (p)expect).
Who, aside from you, is giving anyone shit here? I'm not saying ShutIt shouldn't exist; I'm asking what, if anything, makes learning it a worthwhile use of my time. Should the answer to that question be "nothing", it's still no judgment on ShutIt's existence; it's just a piece of information useful to people who calibrate their expenditure of attention perhaps differently from the way in which you do.
I've used Fabric a lot and it's "cousin" invoke. I wish I'd seen fabric listed in the requirements.txt because I feel like it's got a nice API and has been around a while and is probably a bit more battle tested...
Another related tool I'm working on is:
https://github.com/ianmiell/shutitfile
Why this over Ansible? It's a long story :)
Mainly because not everyone is a great or even decent programmer, so I wanted an easy way in for them for the situation I was in. Hence automation for everyone. But I don't think this is a serious competitor for Ansible.
Happy to answer questions as they arise.