SIGTERM and PID 1: Why does a container linger after receiving a SIGTERM.
Table of contents
tl;dr: pid 1 is special on Linux, it is unkillable, meaning that signals that would normally terminate
a process if it has no handler installed do not terminate it. In other words, pid 1 must handle
SIGTERM
explicitely for the usual scemantics to apply. I keep rediscovering this with containers...
Yesterday I spent way too much time trying to understand why a process
(flexo) didn't die instantaneously when I tried to kill it.
The process was running in a container in a pod in k3s, and
it would take up to 30s
(terminationGracePeriodSeconds
default value) before it died.
SIGTERM
behaviors
In my recollection, the behavior for SIGTERM
on Linux could be described as follows:
- If there's is no handler for the signal, the process is killed by the kernel.
- If there's a handler, the process is free to do whatever it wants with the signal. (i.e: it could do nothing at all)
- A process can also ignore the signal. (By setting the handler to
SIG_IGN
in a call tosigaction
) - (the process can also block the signal, in which case the signal becomes 'pending' and is delivered when it is unblocked)
There's a crucial piece missing in this list...
Observing signal handling for a process
All of these cases can be observed by looking at the /proc/<pid>/status
file:
...
SigPnd: 0000000000000000 # pending, process
ShdPnd: 0000000000000000 # pending, thread
SigBlk: 0000000000000000 # blocked signals
SigIgn: 0000000000001000 # ignored signals
SigCgt: 0000000180000440 # caught/handled signals
...
The format of this file is described in proc(5)
.
These are hexadecimal signal masks which can be converted to a human readable format by the following snippet:
#!/bin/bash
# https://stackoverflow.com/a/61365083
pid=${1:?Missing pid}
cat /proc/$pid/status|egrep '(Sig|Shd)(Pnd|Blk|Ign|Cgt)'|while read name mask;do
bin=$(echo "ibase=16; obase=2; ${mask^^*}"|bc)
echo -n "$name $mask $bin "
i=1
while [[ $bin -ne 0 ]];do
if [[ ${bin:(-1)} -eq 1 ]];then
kill -l $i | tr '\n' ' '
fi
bin=${bin::-1}
set $((i++))
done
echo
done
Output:
SigPnd: 0000000000000000 0
ShdPnd: 0000000000000000 0
SigBlk: 0000000000000000 0
SigIgn: 0000000000001000 1000000000000 PIPE
SigCgt: 0000000180000440 110000000000000000000010001000000 BUS SEGV
In my case (shown above), only SIGPIPE
was ignored and SIGTERM
didn't have a handler installed.
I tried to observed the behavior with strace
, perf record
, gdb
, etc... and I could see the signal being
delivered, but the process didn't die.
Oh also, if the process was started under strace
(strace prog
as opposed to strace -p
<pid-of-prog>
), the process would die when it received the signal. Very amusing.
Pid 1 is special
After staring at the traces for a while, I remembered that I had debugged this in a former professional life, like 5-6 years ago:
PID 1 is special on linux, it is unkillable, meaning that it doesn't get killed by signals which would terminate regular processes.
So, in a container, the first process that is started really must install handlers for SIGTERM
,
otherwise it will stick around.
Solution
In my case I enabled
shareProcessNamespace
which makes the pause
process PID 1 in the pod.
This process is very simple and has the capability to reap children process, so they don't linger as
zombies and handles SIGTERM
properly.
With that change in place, deleting the pod causes the main process of every container in the pod to
receive a SIGTERM
. pause
has a handler and exits quickly, flexo
doesn't have a handler but it is
no longer pid 1, so it gets killed since that's the default action for the SIGTERM
signal.
Docker
For processes running under docker, the easiest solution is to use the
--init
flag, which will run
tini
as the init (PID 1) process in the container.
Docker compose
For docker-compose, the same behavior can be enabled by activating the init
parameter:
services:
web:
image: alpine:latest
init: true