Daemontools and runit
Tired of PID files, needing root access, and writing init scripts just
to have your UNIX apps start when your server boots? Want a simpler,
better alternative that will also restart them if they crash? If so,
this is an introduction to process supervision with runit/daemontools.
Classic init scripts, e.g. /etc/init.d/apache, are widely used for
starting processes at system boot time, when they are executed by init.
Sadly, init scripts are cumbersome and error-prone to write, they must
typically be edited and run as root, and the processes they launch do
not get restarted automatically if they crash.
In an alternative scheme called "process supervision", each important
process is looked after by a tiny supervising process, which deals with
starting and stopping the important process on request, and re-starting
it when it exits unexpectedly. Those supervising processes can in turn
be supervised by other supervising processes.
Dan Bernstein wrote the process supervision toolkit, "daemontools",
which is a set of small, reliable programs that cooperate in the
UNIX tradition to manage process supervision trees.
Runit is a more conveniently licensed and more actively maintained
reimplementation of daemontools, written by Gerrit Pape.
Here I’ll use runit, however, the ideas are the same for other
daemontools-like projects (there are several).
Service directories and scripts
In runit parlance a "service" is simply a directory containing a script
There are just two key programs in runit. Firstly, runsv supervises the
process for an individual service. Service directories themselves sit
inside a containing directory, and the runsvdir program supervises that
directory, running one child runsv process for the service in each
subdirectory. A typical choice is to start an instance of runsvdir
which supervises services in subdirectories of /var/service/.
If /var/service/log/ exists, runsv will supervise two services,
and will connect stdout of main service to the stdin of log service.
This is primarily used for logging.
You can debug an individual service by running its SERVICE_DIR/run script.
In this case, its stdout and stderr go to your terminal.
You can also run "runsv SERVICE_DIR", which runs both the service
and its logger service (SERVICE_DIR/log/run) if logger service exists.
If logger service exists, the output will go to it instead of the terminal.
"runsvdir /var/service" merely runs "runsv SERVICE_DIR" for every subdirectory
This directory contains some examples of services:
Runs a getty on <tty>. (run script looks at $PWD and extracts suffix
after "_" as tty name). Create copies (or symlinks) of this directory
with different names to run many gettys on many ttys.
Runs gpm, the cut and paste utility and mouse server for text consoles.
Runs inetd. This is an example of a service with log. Log service
writes timestamped, rotated log data to /var/log/service/inetd/*
using "svlogd -tt". p_log and w_log scripts demonstrage how you can
"page log" and "watch log".
Other services which have logs handle them in the same way.
Runs nmeter '%t %c ....' with output to /dev/tty9. This gives you
a 1-second sampling of server load and health on a dedicated text console.
In many cases, network configuration makes it necessary to run several daemons:
dhcp, zeroconf, ppp, openvpn and such. They need to be controlled,
and in many cases you also want to babysit them.
They present a case where different services need to control (start, stop,
restart) each other.
controls a udhcpc instance which provides dhpc-assigned IP
address on interface named "if". Copy/rename this directory as needed to run
udhcpc on other interfaces (var_service/dhcp_if/run script uses _foo suffix
of the parent directory as interface name).
When IP address is obtained or lost, var_service/dhcp_if/dhcp_handler is run.
It saves new config data to /var/run/service/fw/dhcp_if.ipconf and (re)starts
/var/service/fw service. This example can be used as a template for other
dynamic network link services (ppp/vpn/zcip).
This is an example of service with has a "finish" script. If downed ("sv d"),
"finish" is executed. For this service, it removes DHCP address from
the interface. This is useful when ifplugd detects that the the link is dead
(cable is no longer attached anywhere) and downs us - keeping DHCP configured
addresses on the interface would make kernel still try to use it.
Zeroconf IP service: assigns a 169.254.x.y/16 address to interface "if".
This allows to talk to other devices on a network without DHCP server
(if they also assign 169.254 addresses to themselves).
Watches link status of interface "if". Downs and ups /var/service/dhcp_if
service accordingly. In effect, it allows you to unplug/plug-to-different-network
and have your IP properly re-negotiated at once.
Uses var_service/dhcp_if's data to determine router IP. Pings it.
If ping fails, restarts /var/service/dhcp_if service.
Basically, an example of watchdog service for networks which are not reliable
and need babysitting.
Wireless supplicant (wifi association and encryption daemon) service for
"Firewall" script, although it is tasked with much more than setting up firewall.
It is responsible for all aspects of network configuration.
This is an example of *one-shot* service.
It reconfigures network based on current known state of ALL interfaces.
Uses conf/*.ipconf (static config) and /var/run/service/fw/*.ipconf
(dynamic config from dhcp/ppp/vpn/etc) to determine what to do.
One-shot-ness of this service means that it shuts itself off after single run.
IOW: it is not a constantly running daemon sort of thing.
It starts, it configures the network, it shuts down, all done
(unlike infamous NetworkManagers which sit in RAM forever).
However, any dhcp/ppp/vpn or similar service can restart it anytime
when it senses the change in network configuration.
This even works while fw service runs: if dhcp signals fw to (re)start
while fw runs, fw will not stop after its execution, but will re-execute once,
picking up dhcp's new configuration.
This is achieved very simply by having
# Make ourself one-shot
sv o .
at the very beginning of fw/run script, not at the end.
Therefore, any "sv u /var/run/service/fw" command by any other
script "undoes" o(ne-shot) command if fw still runs, thus
runsv will rerun it; or start it in a normal way if fw is not running.
This mechanism is the reason why fw is a service, not just a script.
System administrators are expected to edit fw/run script, since
network configuration needs are likely to be very complex and different
for non-trivial installations.
Examples of typical network daemons.