After some recent work with systemd I’ve realized it’s power and I can come clean that I am a fan. I realize that there are multitudes of posts out there with arguments both for and against systemd but let’s look at some nice ways to have systemd provide us with (but not limited to) pt-kill-as-a-service.
This brief post introduces you to a systemd unit file and how we can leverage them to enable pt-kill at startup, have it start after mysqld and ensure that MySQL is running by using the mysql service as a dependency of pt-kill. By using systemd to handle this we don’t have to complicate matters by ‘monitoring the monitor’ using hacky shell scripts, cron or utilities like monit.
So then, a quick primer on systemd, because lets face it, we’ve all been avoiding it. Systemd is not new but it made recent headlines in the Linux world due to some of the major distros announcing their intentions to migrate upcoming releases to systemd.
What is it? Well due to it’s depth it is best described as a suite of management daemons, libraries and tools that will replace the traditional init scripts. So essentially remember how you start a service, mount a volume or read the system logs…well start forgetting all of that because systemd is disrupting this space. With systemd comes some really neat tricks for administering your machines and I’m really only beginning to see the tip of this iceberg. There is admittedly a lot to learn with systemd but this should serve as pragmatic entrée.
Systemd what? When did this happen?
|Linux distribution||Date released as default|
|Arch Linux||000000002012-10-01-0000October 2012|
|CoreOS||000000002013-10-01-0000October 2013 (v94.0.0)|
|Debian||000000002015-04-01-0000April 2015 (v8 aka jessie)|
|Fedora||000000002011-05-01-0000May 2011 (v15)|
|Mageia||000000002012-05-01-0000May 2012 (v2.0)|
|openSUSE||000000002012-09-01-0000September 2012 (v12.2)|
|Red Hat Enterprise Linux||000000002014-06-01-0000June 2014 (v7.0)|
|SUSE Linux Enterprise Server||000000002014-10-01-0000October 2014 (v12)|
|Ubuntu||000000002015-04-01-0000April 2015 (v15.04)|
Lennart Poettering, the name frequently attached with systemd is seeking to modernize the most fundamental process(es) of the Linux startup system, bringing the paradigms of modern computing; concurrency, parallelism and efficiency. The dependency tree of processes and services is more intuitive and the structure of the underlying startup scripts are unified. I feel that the direction proposed by systemd is an evolutionary one which promotes consistency within the startup scripts enabling conventions that can be easier understood by a broader audience.
Systemd and Percona Toolkit
This post aims to show that we can rely on systemd to handle processes such as pt-kill, pt-stalk, and other daemonized scripts that we like to have running perpetually, are fired at startup and can be reinstated after failure.
The scenario is this; I want pt-kill to drop all sleeping connections from a certain application user, lets call them, ‘caffeinebob’, because they never close connections. Due to various reasons we can’t make changes in the application so we’re employing Percona Toolkit favourite, pt-kill, to do this for us. For convenience we want this result to persist across server restarts. In the olden days we might have some cron job that fires a shell script in combination with a sentinal file to ensure it’s running. I’m pretty sure that this kitty could be skinned many ways.
The systemd Unit File
After some research and testing, the below unit file will play nicely on a Centos 7 node with systemd at it’s core. In this example I am running Percona Server 5.6 installed using Percona’s yum repo with the mysql.service unit file generated at installation. I suspect that there could be some systemd deviation with other MySQL variants however, this configuration is working for me.
[Unit] Description = pt-kill caffeinebob After=syslog.target mysql.service Requires=mysql.service [Service] Type = simple PIDFile = /var/run/ptkill.pid ExecStart = /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print Restart=on-abort [Install] WantedBy=multi-user.target
Let’s examine the above and see what we’re working with. Systemd unit files have various biologies. The example above is a simple Service unit file. This means we are enacting a process controlled and supervised by systemd. The significance of the
After directive is that this service will not attempt startup until after syslog.target and mysql.service have been called. The
Required directive is makes ptkill.service dependant on the mysql.service startup being successful.
The next part, the [Service] grouping, details the actions to be taken by the service. The
Type can be one of many but as it’s a simple call to a script I’ve used the
simple type. We are describing the command and the handling of it. The
ExecStart is evidently the pt-kill command that we would usually run from the shell prompt or from within a shell script. This is a very corse example because we can opt to parameterize the command with the assistance of an Environment file. Note the use of the
Restart directive, used so that systemd can handle a reaction should a failure occur that interrupts the process.
Finally under the [Install] grouping we’re telling systemd that this service should startup on a multi user system, and could be thought of as runlevel 2 or 3 (Multiuser mode).
So providing that we’ve got all the relevant paths, users and dependencies in place, once you reboot your host, mysql.service should in order, initiate mysqld and when that dependency is met, systemd will initiate pt-kill with our desired parameters to cull connections that meet the criteria stipulated in our configuration. This means you rely on systemd to manage pt-kill for you and you don’t necessarily need to remember to start this or similar processes when you restart you node.
Start up & enable
[moore@localhost ~]$ sudo systemctl start ptkill.service [moore@localhost ~]$ sudo systemctl enable ptkill.service
No feedback but no errors so we can check the status of the service
[moore@localhost ~]$ sudo systemctl status ptkill -l ptkill.service - keep pt-kill persistent across restarts Loaded: loaded (/etc/systemd/system/ptkill.service; enabled) Active: active (running) since Wed 2015-08-12 02:39:13 BST; 1h 19min ago Main PID: 2628 (perl) CGroup: /system.slice/ptkill.service └─2628 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print
Perfect we can also instruct systemd to disable this and|or stop our service when the application is changed and caffeinebob close() all those open connections.
[moore@localhost ~]$ sudo systemctl stop ptkill.service [moore@localhost ~]$ sudo systemctl disable ptkill.service
Now after successful implementation we see that our process is running delightfully;
[moore@localhost ~]$ ps -ef | grep pt-kill root 2547 1 0 02:37 ? 00:00:00 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print
Catch me if I fall
Lets issue a kill signal to the process and observe it’s behaviour using
[moore@localhost ~]$ sudo kill -SEGV 2547
This will write similar entries into the system log;
[moore@localhost ~]$ sudo journalctl -xn -f Aug 12 02:39:13 localhost.localdomain sudo: moore : TTY=pts/1 ; PWD=/home/moore ; USER=root ; COMMAND=/bin/kill -SEGV 2547 Aug 12 02:39:13 localhost.localdomain systemd: ptkill.service: main process exited, code=killed, status=11/SEGV Aug 12 02:39:13 localhost.localdomain systemd: Unit ptkill.service entered failed state. Aug 12 02:39:13 localhost.localdomain systemd: ptkill.service holdoff time over, scheduling restart. Aug 12 02:39:13 localhost.localdomain systemd: Stopping keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has begun shutting down. Aug 12 02:39:13 localhost.localdomain systemd: Starting keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun with start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has begun starting up. Aug 12 02:39:13 localhost.localdomain systemd: Started keep pt-kill persistent across restarts. -- Subject: Unit ptkill.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has finished starting up. -- -- The start-up result is done.
Pt-kill flaps after the kill signal but systemd has been instructed to restart on failure so we don’t see
caffeinebob saturate our processlist with sleeping connections.
Another bonus with this workflow is use within orchestration. Any standardized unit files can be propagated to your fleet of hosts with tools such as Ansible, Chef, Puppet or Saltstack.
I’d love to hear from the pragmatists from the systemd world to understand if this approach can be improved or whether there are any flaws in this example unit file that would require addressing. This is very much a new-school of thought for me and feedback is both welcome and encouraged.
Thank you for your time, happy systemd-ing.