Blog posts for tags/recipe

How to always execute exit functions in Python

...or why atexit.register() and signal.signal() are evil

UPDATE (2016-02-13): this recipe no longer handles SIGINT, SIGQUIT and SIGABRT as aliases for "application exit" because it was a bad idea. It only handles SIGTERM. Also it no longer support Windows because signal.signal() implementation is too different than POSIX.*

Many people erroneously think that any function registered via atexit module is guaranteed to always be executed when the program terminates. You may have noticed this is not the case when, for example, you daemonize your app in production then try to stop it or restart it: the cleanup functions will not be executed. This is because functions registered wth atexit module are not called when the program is killed by a signal:

import atexit, os, signal

@atexit.register
def cleanup():
    print("on exit")  # XXX this never gets printed

os.kill(os.getpid(), signal.SIGTERM)

It must be noted that the same thing would happen if instead of atexit.register() we would use a "finally" clause. It turns out the correct way to make sure the exit function is always called in case a signal is received is to register it via signal.signal(). That has a drawback though: in case a third-party module has already registered a function for that signal (SIGTERM or whatever), your new function will overwrite the old one:

import os, signal

def old(*args):
    print("old")  # XXX this never gets printed

def new(*args):
    print("new")

signal.signal(signal.SIGTERM, old)
signal.signal(signal.SIGTERM, new)
os.kill(os.getpid(), signal.SIGTERM)

Also, we would still have to use atexit.register() so that the function is called also on "clean" interpreter exit and take into account other signals other than SIGTERM which would cause the process to terminate. This recipe attempts to address all these issues so that:

the exit function is always executed for all exit signals (SIGTERM, SIGINT, SIGQUIT, SIGABRT) on SIGTERM and on "clean" interpreter exit.
any exit function(s) previously registered via atexit.register() or signal.signal() will be executed as well (after the new one).
It must be noted that the exit function will never be executed in case of SIGKILL, SIGSTOP or os._exit().

The code¶

"""
Function / decorator which tries very hard to register a function to
be executed at importerer exit.

Author: Giampaolo Rodola'
License: MIT
"""

from __future__ import print_function
import atexit
import os
import functools
import signal
import sys


_registered_exit_funs = set()
_executed_exit_funs = set()


def register_exit_fun(fun=None, signals=[signal.SIGTERM],
                      logfun=lambda s: print(s, file=sys.stderr)):
    """Register a function which will be executed on "normal"
    interpreter exit or in case one of the `signals` is received
    by this process (differently from `atexit.register() <https://docs.python.org/3/library/atexit.html#atexit.register>`__).
    Also, it makes sure to execute any other function which was
    previously registered via signal.signal(). If any, it will be
    executed after our own `fun`.

    Functions which were already registered or executed via this
    function will be ignored.

    Note: there's no way to escape SIGKILL, SIGSTOP or os._exit(0)
    so don't bother trying.

    You can use this either as a function or as a decorator:

        @register_exit_fun
        def cleanup():
            pass

        # ...or

        register_exit_fun(cleanup)

    Note about Windows: I tested this some time ago and didn't work
    exactly the same as on UNIX, then I didn't care about it
    anymore and didn't test since then so may not work on Windows.

    Parameters:

    - fun: a callable
    - signals: a list of signals for which this function will be
      executed (default SIGTERM)
    - logfun: a logging function which is called when a signal is
      received. Default: print to standard error. May be set to
      None if no logging is desired.
    """
    def stringify_sig(signum):
        if sys.version_info < (3, 5):
            smap = dict([(getattr(signal, x), x) for x in dir(signal)
                         if x.startswith('SIG')])
            return smap.get(signum, signum)
        else:
            return signum

    def fun_wrapper():
        if fun not in _executed_exit_funs:
            try:
                fun()
            finally:
                _executed_exit_funs.add(fun)

    def signal_wrapper(signum=None, frame=None):
        if signum is not None:
            if logfun is not None:
                logfun("signal {} received by process with PID {}".format(
                    stringify_sig(signum), os.getpid()))
        fun_wrapper()
        # Only return the original signal this process was hit with
        # in case fun returns with no errors, otherwise process will
        # return with sig 1.
        if signum is not None:
            if signum == signal.SIGINT:
                raise KeyboardInterrupt
            # XXX - should we do the same for SIGTERM / SystemExit?
            sys.exit(signum)

    def register_fun(fun, signals):
        if not callable(fun):
            raise TypeError("{!r} is not callable".format(fun))
        set([fun])  # raise exc if obj is not hash-able

        signals = set(signals)
        for sig in signals:
            # Register function for this signal and pop() the previously
            # registered one (if any). This can either be a callable,
            # SIG_IGN (ignore signal) or SIG_DFL (perform default action
            # for signal).
            old_handler = signal.signal(sig, signal_wrapper)
            if old_handler not in (signal.SIG_DFL, signal.SIG_IGN):
                # ...just for extra safety.
                if not callable(old_handler):
                    continue
                # This is needed otherwise we'll get a KeyboardInterrupt
                # strace on interpreter exit, even if the process exited
                # with sig 0.
                if (sig == signal.SIGINT and
                        old_handler is signal.default_int_handler):
                    continue
                # There was a function which was already registered for this
                # signal. Register it again so it will get executed (after our
                # new fun).
                if old_handler not in _registered_exit_funs:
                    atexit.register(old_handler)
                    _registered_exit_funs.add(old_handler)

        # This further registration will be executed in case of clean
        # interpreter exit (no signals received).
        if fun not in _registered_exit_funs or not signals:
            atexit.register(fun_wrapper)
            _registered_exit_funs.add(fun)

    # This piece of machinery handles 3 usage cases. register_exit_fun()
    # used as:
    # - a function
    # - a decorator without parentheses
    # - a decorator with parentheses
    if fun is None:
        @functools.wraps
        def outer(fun):
            return register_fun(fun, signals)
        return outer
    else:
        register_fun(fun, signals)
        return fun

Usage¶

As a function:

def cleanup():
    print("cleanup")

register_exit_fun(cleanup)

As a decorator:

@register_exit_fun
def cleanup():
    print("cleanup")

Unit tests¶

This recipe is hosted on ActiveState and has a full set of unittests. It works with Python 2 and 3.

Notes about Windows¶

On Windows signals are only partially supported meaning a function which was previously registered via signal.signal() will be executed only on interpreter exit, but not if the process receives a signal. Apparently this is a limitation either of Windows or the signal module.

Because of how different signal.signal() behaves on Windows, this code is UNIX only, see BPO-26350.

Proposal for stdlib inclusion¶

The fact that atexit module does not handle signals and that signal.signal() overwrites previously registered handlers is unfortunate. It is also confusing because it is not immediately clear which one you are supposed to use (and it turns out you're supposed to use both). Most of the times you have no idea (or don't care) that you're overwriting another exit function. As a user, I would just want to execute an exit function, no matter what, possibly without messing with whatever a module I've previously imported has done with signal.signal(). To me this suggests there could be space for something like atexit.register_w_signals.

External discussions¶

Python and sendfile

sendfile(2) is a UNIX system call which provides a "zero-copy" way of copying data from one file descriptor (a file) to another (a socket). Because this copying is done entirely within the kernel, sendfile(2) is more efficient than the combination of file.read() and socket.send(), which requires transferring data to and from user space. This copying of the data twice imposes some performance and resource penalties which sendfile(2) syscall avoids; it also results in a single system call (and thus only one context switch), rather than the series of read(2) / write(2) system calls (each system call requiring a context switch) used internally for the data copying. A more exhaustive explanation of how sendfile(2) works is available here, but long story short is that sending a file with sendfile() is usually twice as fast than using plain socket.send(). Typical applications which can benefit from using sendfile() are FTP and HTTP servers.

socket.sendfile()¶

I recently contributed a patch for Python's socket module which adds a high-level socket.sendfile() method (see full discussion at BPO-17552). socket.sendfile() will transmit a file until EOF is reached by attempting to use os.sendfile(), if available, else it falls back on using plain socket.send(). Internally, it takes care of handling socket timeouts and provides two optional parameters to move the file offset or to send only a limited amount of bytes. I came up with this idea because getting all of that right is a bit tricky, so a generic wrapper seemed to be convenient to have. socket.sendfile() will make its appearance in Python 3.5.

sendfile and Python¶

sendfile(2) made its first appearance into the Python stdlib kind of late: Python 3.3. It was contributed by Ross Lagerwall and me in BPO-10882. Since the patch didn't make it into python 2.X and I wanted to use sendfile() in pyftpdlib I later decided to release it as a stand alone module working with older (2.5+) Python versions (see pysendfile project). Starting with version 3.5, Python will hopefully start using sendfile() more extensively, in details:

BPO-13563: ftplib
BPO-13559: httplib
asyncio: there are some plans for this even though no actual patch yet, see discussion and BDFL involvement.

Also, Windows provides something similar to sendfile(2): TransmitFile. Now that socket.sendfile() is in place it seems natural to add support for it as well (see BPO-21721).

Backport to Python 2.6 and 2.7¶

For those of you who are interested in using socket.sendfile() with older Python 2.6 and 2.7 versions here's a backport. It requires pysendfile module to be installed. Full code including tests is hosted here.

#!/usr/bin/env python

"""
This is a backport of socket.sendfile() for Python 2.6 and 2.7.
socket.sendfile() will be included in Python 3.5:
http://bugs.python.org/issue17552
Usage:

>>> import socket
>>> file = open("somefile.bin", "rb")
>>> sock = socket.create_connection(("localhost", 8021))
>>> sendfile(sock, file)
42319283
>>>
"""

import errno
import io
import os
import select
import socket
try:
    memoryview  # py 2.7 only
except NameError:
    memoryview = lambda x: x

if os.name == 'posix':
    import sendfile as pysendfile  # requires "pip install pysendfile"
else:
    pysendfile = None


_RETRY = frozenset((errno.EAGAIN, errno.EALREADY, errno.EWOULDBLOCK,
                    errno.EINPROGRESS))


class _GiveupOnSendfile(Exception):
    pass


if pysendfile is not None:

    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        _check_sendfile_params(sock, file, offset, count)
        sockno = sock.fileno()
        try:
            fileno = file.fileno()
        except (AttributeError, io.UnsupportedOperation) as err:
            raise _GiveupOnSendfile(err)  # not a regular file
        try:
            fsize = os.fstat(fileno).st_size
        except OSError:
            raise _GiveupOnSendfile(err)  # not a regular file
        if not fsize:
            return 0  # empty file
        blocksize = fsize if not count else count

        timeout = sock.gettimeout()
        if timeout == 0:
            raise ValueError("non-blocking sockets are not supported")
        # poll/select have the advantage of not requiring any
        # extra file descriptor, contrarily to epoll/kqueue
        # (also, they require a single syscall).
        if hasattr(select, 'poll'):
            if timeout is not None:
                timeout *= 1000
            pollster = select.poll()
            pollster.register(sockno, select.POLLOUT)

            def wait_for_fd():
                if pollster.poll(timeout) == []:
                    raise socket._socket.timeout('timed out')
        else:
            # call select() once in order to solicit ValueError in
            # case we run out of fds
            try:
                select.select([], [sockno], [], 0)
            except ValueError:
                raise _GiveupOnSendfile(err)

            def wait_for_fd():
                fds = select.select([], [sockno], [], timeout)
                if fds == ([], [], []):
                    raise socket._socket.timeout('timed out')

        total_sent = 0
        # localize variable access to minimize overhead
        os_sendfile = pysendfile.sendfile
        try:
            while True:
                if timeout:
                    wait_for_fd()
                if count:
                    blocksize = count - total_sent
                    if blocksize <= 0:
                        break
                try:
                    sent = os_sendfile(sockno, fileno, offset, blocksize)
                except OSError as err:
                    if err.errno in _RETRY:
                        # Block until the socket is ready to send some
                        # data; avoids hogging CPU resources.
                        wait_for_fd()
                    else:
                        if total_sent == 0:
                            # We can get here for different reasons, the main
                            # one being 'file' is not a regular mmap(2)-like
                            # file, in which case we'll fall back on using
                            # plain send().
                            raise _GiveupOnSendfile(err)
                        raise err
                else:
                    if sent == 0:
                        break  # EOF
                    offset += sent
                    total_sent += sent
            return total_sent
        finally:
            if total_sent > 0 and hasattr(file, 'seek'):
                file.seek(offset)
else:
    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        raise _GiveupOnSendfile(
            "sendfile() not available on this platform")


def _sendfile_use_send(sock, file, offset=0, count=None):
    _check_sendfile_params(sock, file, offset, count)
    if sock.gettimeout() == 0:
        raise ValueError("non-blocking sockets are not supported")
    if offset:
        file.seek(offset)
    blocksize = min(count, 8192) if count else 8192
    total_sent = 0
    # localize variable access to minimize overhead
    file_read = file.read
    sock_send = sock.send
    try:
        while True:
            if count:
                blocksize = min(count - total_sent, blocksize)
                if blocksize <= 0:
                    break
            data = memoryview(file_read(blocksize))
            if not data:
                break  # EOF
            while True:
                try:
                    sent = sock_send(data)
                except OSError as err:
                    if err.errno in _RETRY:
                        continue
                    raise
                else:
                    total_sent += sent
                    if sent < len(data):
                        data = data[sent:]
                    else:
                        break
        return total_sent
    finally:
        if total_sent > 0 and hasattr(file, 'seek'):
            file.seek(offset + total_sent)


def _check_sendfile_params(sock, file, offset, count):
    if 'b' not in getattr(file, 'mode', 'b'):
        raise ValueError("file should be opened in binary mode")
    if not sock.type & socket.SOCK_STREAM:
        raise ValueError("only SOCK_STREAM type sockets are supported")
    if count is not None:
        if not isinstance(count, int):
            raise TypeError(
                "count must be a positive integer (got %s)" % repr(count))
        if count <= 0:
            raise ValueError(
                "count must be a positive integer (got %s)" % repr(count))


def sendfile(sock, file, offset=0, count=None):
    """sendfile(sock, file[, offset[, count]]) -> sent

    Send a *file* over a connected socket *sock* until EOF is
    reached by using high-performance sendfile(2) and return the
    total number of bytes which were sent.
    *file* must be a regular file object opened in binary mode.
    If sendfile() is not available (e.g. Windows) or file is
    not a regular file socket.send() will be used instead.
    *offset* tells from where to start reading the file.
    If specified, *count* is the total number of bytes to transmit
    as opposed to sending the file until EOF is reached.
    File position is updated on return or also in case of error in
    which case file.tell() can be used to figure out the number of
    bytes which were sent.
    The socket must be of SOCK_STREAM type.
    Non-blocking sockets are not supported.
    """
    try:
        return _sendfile_use_sendfile(sock, file, offset, count)
    except _GiveupOnSendfile:
        return _sendfile_use_send(sock, file, offset, count)

Giampaolo Rodola Python enthusiast, core developer, psutil author

Blog posts for tags/recipe

How to always execute exit functions in Python

The code¶

Usage¶

Unit tests¶

Notes about Windows¶

Proposal for stdlib inclusion¶

External discussions¶

Python and sendfile

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

Feeds

Blog posts for tags/recipe

The code¶

Usage¶

Unit tests¶

Notes about Windows¶

Proposal for stdlib inclusion¶

External discussions¶

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

Social

Feeds