Ansible is declarative. It will do what you tell it to do. You tell it to copy a file, it will either copy it or fail. You tell it to start a service, it will start it or fail. So why would we need to validate that this works, it either does or it doesn’t, right? Well in our experience it is not that simple. The environments, services and applications that ansible configures can be complicated affairs. It is great to be able to run tests on them to verify they are working correctly. Here at Poppulo we place a large emphasis on automated testing our application code. We decided that our infrastructure code deserved the same level of attention. And when you think automation you immediately think Ansible and lucky for us it has some simple yet powerful modules that are well suited to this job of testing your infrastructure.
Similar to asserts in any other testing framework you have ever used, Ansible's assert module checks to see if a condition is met and if it is not the task will fail.
Typically we use it to check that a command we ran (like creating a test znode in Zookeeper) ran successfully.
This is similar to assert but it allows you to output a message when the fail condition is met. Here we are validating our Cassandra cluster by running a stress test on it and making sure there are no errors.
This is vital for testing all things networking. It tries (with retries, delay and timeout) to connect to a host on a specified port. If it cannot the task fails. Simple!
Normally when we encounter issues (a lot of our infrastructure is distributed) the first question asked is: can the applications talk to each other? Before implementing validation playbooks this would mean trying to telnet from one machine to another to check if ports are open and using tools like netcat and tcpdump if we needed to go deeper. With wait_for we now have these tests ready to go so if there is an issue we can run the validation playbook to rule out (or in) any networking problems.
Here is gist of a sample validation playbook (including the source of the test app for those who are interested) where I am testing our Spark cluster. Spark is relatively new to us here in Poppulo so in getting to grips with it we had to troubleshoot all sorts of issues (e.g. workers not registering with master). We added test cases to cover each of the issues we encountered.
The tasks in this playbook check are the various Spark UI's available, checks cluster connectivity and finally runs a very simple test app (to make sure that Spark is fine under the hood). It may seem like common sense to have this but it gives us great confidence in any spark cluster we provision.
Using the Ansible modules described above we have ended up developing a set of validation playbooks that live alongside each of our higher level playbooks. These give us the confidence to change and improve our infrastructure code knowing any negative impact of those changes will be caught early.
Personally, I found that you spend a lot of time up front when investigating new pieces of technology. You not only figure out how to get it running and what configuration works, but manually go in then and verify your success. With ansible it is trivial to encode this verification knowledge as a set of tasks in a playbook. Also as your experience with any technology grows and you inevitably troubleshoot issues and make changes, you can also add some validation tasks to capture these issues so that the person following you does not have to go through the same pain. You are now not only saying here is how you install and configure it, but here is how I expect it to behave and integrate with the surrounding environment.