Using OpenStack-Ansible to performance test rolling upgrades

I’ve spent the last year or so dabbling with ways to provide consistent performance results for keystone. In addition to that, the keystone community has been trying to implement rolling upgrades support. Getting both of these tested and in the gate would be a huge step forward for developers and deployers.

Today I hopped into IRC and Jesse from the OpenStack-Ansible team passed me a review that used OpenStack-Ansible to performance test keystone during a rolling upgrade… Since that pretty much qualifies as one of the coolest reviews someone has ever handed me, I couldn’t wait to test it out. I was able to do everything on a fresh Ubuntu 16.04 virtual machine with 8 GB of memory, 8 VCPUs and I brought it up to speed following the initial steps provided in the OpenStack-Ansible AIO Guide.

Next I made sure I had pip and tox available, as well as tmux for my own personal preference. Luckily the OpenStack-Ansible team does a good job of managing binary dependencies in tree, which makes getting fresh installs up and off the ground virtually headache-free. Since the patch was still in review at the time of this writing, I went ahead and checked that out of Gerrit.

From here, the os_keystone role should be able to setup the infrastructure and environment. Another nice thing about the various roles in OpenStack-Ansible is that they isolate tox environments much like you would for building docs, syntax linting, or running tests using a specific version of python. In this case, there happens to be one dedicated to upgrades. Behind the scenes this is going to prepare the infrastructure, install lxc, orchestrate multiple installations of the most recent stable keystone release isolated into separate containers (which plays a crucial role in achieving rolling upgrades), install the latest keystone source code from master, and perform a rolling upgrade (whew!). Lucky for us, we only have to run one command.

The first time I ran tox locally I did get one failure related to the absence of libpq-dev while installing requirements for os_tempest:

Other folks were seeing the same thing, but only locally. For some reason the gate was not hitting this specific issue (maybe it was using wheels?). There is a patch up for review to fix this. After that I reran tox and was rewarded with:

Not only do we see that the rolling upgrade succeeded according to os_keystone‘s functional tests, but we also see the output from the performance tests. There were 2527 total requests during the execution of the upgrade, 10 of which resulted in an error (could probably use some tweaking here to see if node rotation timing using HAProxy mitigates those?).

Next Steps

Propose a rolling upgrade keystone gate job

Now that we have a consistent way to test rolling upgrades while running a performance script, we can start looping this into other gate jobs. It would be awesome to be able to leverage this work to test every patch proposed to ensure it is not only performant, but also maintains our commitment to delivering rolling upgrades.

Build out the performance script

The performance script is just python that gets fed into Locust. The current version is really simple and only focuses on authenticating for a token and validating it. Locust has some flexibility that allows writers to add new test cases and even assign different call percentages to different operations (i.e. authenticate for a token 30% of the time and validate 70% of the time). Since it’s all python making API calls, Locust test cases are really just functional API tests. This makes it easy to propose patches that add more scenarios as we move forward, increasing our rolling upgrade test coverage. From the output we should be able to inspect which calls failed, just like today when we saw we had 10 authentication/validation failures.

Publish performance results

With running this as part of the gate, it would be a waste to not stash or archive the results from each run (especially if two separate projects are running it). We could even look into running it on dedicated hardware somewhere, similar to the performance testing project I was experimenting with last year. The OSIC Performance Bot would technically be a first-class citizen gate job (and we could retire the first iteration of it!). All the results could be stuffed away somewhere and made available for people to write tools that analyze it. I’d personally like to revamp our keystone performance site to continuously update according to the performance results from the latest master patch. Maybe we could even work some sort of performance view into OpenStack Health.

The final bit that helps seal the deal is that we get this at the expense of a single virtual machine. Since OpenStack-Ansible uses containers to isolate services we can feel confident in testing rolling upgrades while only consuming minimal gate resources.

I’m look forward to doing a follow up post as we hopefully start incorporating this into our gate.