Blog posts for tags/network

  1. Recognize connection errors

    Lately I've been dealing with an asynchronous TCP client app which sends messages to a remote server. Some of these messages are important, and cannot get lost. Because the connection may drop at any time, I had to implement a mechanism to resend the message once the client reconnects. As such, I needed a way to identify what constitutes a connection error.

    Python provides a builtin ConnectionError exception precisely for this purpose, but it turns out it's not enough. After observing logs in production, I found some errors that were not related to the socket connection per se, but rather to the system connectivity, like ENETUNREACH ("network unreachable") or ENETDOWN ("network down"). It's interesting to note how this distinction is reflected in the UNIX errno code prefixes: ECONN* (connection errors) vs. ENET* (network errors). I've noticed ENET* errors usually occur on a DHCP renewal, or more in general when the Wi-Fi signal is weak or absent. Because this code runs on a cleaning robot which constantly moves around the house, connection can become unstable when the robot gets far from the Wi-Fi Access Point, so it's pretty common to bump into errors like these:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    OSError: [Errno 101] Network is unreachable
    
    File "/usr/lib/python3.7/socket.py", line 222, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    BrokenPipeError: [Errno 32] Broken pipe
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    socket.timeout: The write operation timed out
    

    Production logs also revealed a considerable amount of SSL-related errors. I was uncertain what to do about those. The app is supposed to gracefully handle them, so theoretically they should represent a bug. Still, they are unequivocally related to the connection stream, and represent a failed attempt to send data, so we want to retry it. Example of logs I found:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLError: [SSL: BAD_LENGTH] bad length
    

    Looking at production logs revealed what sort of brutal, rough and tumble place the Internet is, and how a network app must be ready to handle all sorts of unexpected error conditions which hardly show up during testing. To handle all of these cases I came up with this solution which I think is worth sharing, as it's generic enough to be reused in similar situations. If needed, this can be easily extended to include specific exceptions of third party libraries, like requests.exceptions.ConnectionError.

    import errno, socket, ssl
    
    # Network errors, usually related to DHCP or wpa_supplicant (Wi-Fi).
    NETWORK_ERRNOS = frozenset((
        errno.ENETUNREACH,  # "Network is unreachable"
        errno.ENETDOWN,  # "Network is down"
        errno.ENETRESET,  # "Network dropped connection on reset"
        errno.ENONET,  # "Machine is not on the network"
    ))
    
    def is_connection_err(exc):
        """Return True if an exception is connection-related."""
        if isinstance(exc, ConnectionError):
            # https://docs.python.org/3/library/exceptions.html#ConnectionError
            # ConnectionError includes:
            # * BrokenPipeError (EPIPE, ESHUTDOWN)
            # * ConnectionAbortedError (ECONNABORTED)
            # * ConnectionRefusedError (ECONNREFUSED)
            # * ConnectionResetError (ECONNRESET)
            return True
        if isinstance(exc, socket.gaierror):
            # failed DNS resolution on connect()
            return True
        if isinstance(exc, (socket.timeout, TimeoutError)):
            # timeout on connect(), recv(), send()
            return True
        if isinstance(exc, OSError):
            # ENOTCONN == "Transport endpoint is not connected"
            return (exc.errno in NETWORK_ERRNOS) or (exc.errno == errno.ENOTCONN)
        if isinstance(exc, ssl.SSLError):
            # Let's consider any SSL error a connection error. Usually this is:
            # * ssl.SSLZeroReturnError: "TLS/SSL connection has been closed"
            # * ssl.SSLError: [SSL: BAD_LENGTH]
            return True
        return False
    

    To use it:

    try:
        sock.sendall(b"hello there")
    except Exception as err:
        if is_connection_err(err):
            schedule_on_reconnect(lambda: sock.sendall(b"hello there"))
        raise
    
  2. Python and sendfile

    sendfile(2) is a UNIX system call which provides a "zero-copy" way of copying data from one file descriptor (a file) to another (a socket). Because this copying is done entirely within the kernel, sendfile(2) is more efficient than the combination of file.read() and socket.send(), which requires transferring data to and from user space. This copying of the data twice imposes some performance and resource penalties which sendfile(2) syscall avoids; it also results in a single system call (and thus only one context switch), rather than the series of read(2) / write(2) system calls (each system call requiring a context switch) used internally for the data copying. A more exhaustive explanation of how sendfile(2) works is available here, but long story short is that sending a file with sendfile() is usually twice as fast than using plain socket.send(). Typical applications which can benefit from using sendfile() are FTP and HTTP servers.

    socket.sendfile()

    I recently contributed a patch for Python's socket module which adds a high-level socket.sendfile() method (see full discussion at BPO-17552). socket.sendfile() will transmit a file until EOF is reached by attempting to use os.sendfile(), if available, else it falls back on using plain socket.send(). Internally, it takes care of handling socket timeouts and provides two optional parameters to move the file offset or to send only a limited amount of bytes. I came up with this idea because getting all of that right is a bit tricky, so a generic wrapper seemed to be convenient to have. socket.sendfile() will make its appearance in Python 3.5.

    sendfile and Python

    sendfile(2) made its first appearance into the Python stdlib kind of late: Python 3.3. It was contributed by Ross Lagerwall and me in BPO-10882. Since the patch didn't make it into python 2.X and I wanted to use sendfile() in pyftpdlib I later decided to release it as a stand alone module working with older (2.5+) Python versions (see pysendfile project). Starting with version 3.5, Python will hopefully start using sendfile() more extensively, in details:

    Also, Windows provides something similar to sendfile(2): TransmitFile. Now that socket.sendfile() is in place it seems natural to add support for it as well (see BPO-21721).

    Backport to Python 2.6 and 2.7

    For those of you who are interested in using socket.sendfile() with older Python 2.6 and 2.7 versions here's a backport. It requires pysendfile module to be installed. Full code including tests is hosted here.

    #!/usr/bin/env python
    
    """
    This is a backport of socket.sendfile() for Python 2.6 and 2.7.
    socket.sendfile() will be included in Python 3.5:
    http://bugs.python.org/issue17552
    Usage:
    
    >>> import socket
    >>> file = open("somefile.bin", "rb")
    >>> sock = socket.create_connection(("localhost", 8021))
    >>> sendfile(sock, file)
    42319283
    >>>
    """
    
    import errno
    import io
    import os
    import select
    import socket
    try:
        memoryview  # py 2.7 only
    except NameError:
        memoryview = lambda x: x
    
    if os.name == 'posix':
        import sendfile as pysendfile  # requires "pip install pysendfile"
    else:
        pysendfile = None
    
    
    _RETRY = frozenset((errno.EAGAIN, errno.EALREADY, errno.EWOULDBLOCK,
                        errno.EINPROGRESS))
    
    
    class _GiveupOnSendfile(Exception):
        pass
    
    
    if pysendfile is not None:
    
        def _sendfile_use_sendfile(sock, file, offset=0, count=None):
            _check_sendfile_params(sock, file, offset, count)
            sockno = sock.fileno()
            try:
                fileno = file.fileno()
            except (AttributeError, io.UnsupportedOperation) as err:
                raise _GiveupOnSendfile(err)  # not a regular file
            try:
                fsize = os.fstat(fileno).st_size
            except OSError:
                raise _GiveupOnSendfile(err)  # not a regular file
            if not fsize:
                return 0  # empty file
            blocksize = fsize if not count else count
    
            timeout = sock.gettimeout()
            if timeout == 0:
                raise ValueError("non-blocking sockets are not supported")
            # poll/select have the advantage of not requiring any
            # extra file descriptor, contrarily to epoll/kqueue
            # (also, they require a single syscall).
            if hasattr(select, 'poll'):
                if timeout is not None:
                    timeout *= 1000
                pollster = select.poll()
                pollster.register(sockno, select.POLLOUT)
    
                def wait_for_fd():
                    if pollster.poll(timeout) == []:
                        raise socket._socket.timeout('timed out')
            else:
                # call select() once in order to solicit ValueError in
                # case we run out of fds
                try:
                    select.select([], [sockno], [], 0)
                except ValueError:
                    raise _GiveupOnSendfile(err)
    
                def wait_for_fd():
                    fds = select.select([], [sockno], [], timeout)
                    if fds == ([], [], []):
                        raise socket._socket.timeout('timed out')
    
            total_sent = 0
            # localize variable access to minimize overhead
            os_sendfile = pysendfile.sendfile
            try:
                while True:
                    if timeout:
                        wait_for_fd()
                    if count:
                        blocksize = count - total_sent
                        if blocksize <= 0:
                            break
                    try:
                        sent = os_sendfile(sockno, fileno, offset, blocksize)
                    except OSError as err:
                        if err.errno in _RETRY:
                            # Block until the socket is ready to send some
                            # data; avoids hogging CPU resources.
                            wait_for_fd()
                        else:
                            if total_sent == 0:
                                # We can get here for different reasons, the main
                                # one being 'file' is not a regular mmap(2)-like
                                # file, in which case we'll fall back on using
                                # plain send().
                                raise _GiveupOnSendfile(err)
                            raise err
                    else:
                        if sent == 0:
                            break  # EOF
                        offset += sent
                        total_sent += sent
                return total_sent
            finally:
                if total_sent > 0 and hasattr(file, 'seek'):
                    file.seek(offset)
    else:
        def _sendfile_use_sendfile(sock, file, offset=0, count=None):
            raise _GiveupOnSendfile(
                "sendfile() not available on this platform")
    
    
    def _sendfile_use_send(sock, file, offset=0, count=None):
        _check_sendfile_params(sock, file, offset, count)
        if sock.gettimeout() == 0:
            raise ValueError("non-blocking sockets are not supported")
        if offset:
            file.seek(offset)
        blocksize = min(count, 8192) if count else 8192
        total_sent = 0
        # localize variable access to minimize overhead
        file_read = file.read
        sock_send = sock.send
        try:
            while True:
                if count:
                    blocksize = min(count - total_sent, blocksize)
                    if blocksize <= 0:
                        break
                data = memoryview(file_read(blocksize))
                if not data:
                    break  # EOF
                while True:
                    try:
                        sent = sock_send(data)
                    except OSError as err:
                        if err.errno in _RETRY:
                            continue
                        raise
                    else:
                        total_sent += sent
                        if sent < len(data):
                            data = data[sent:]
                        else:
                            break
            return total_sent
        finally:
            if total_sent > 0 and hasattr(file, 'seek'):
                file.seek(offset + total_sent)
    
    
    def _check_sendfile_params(sock, file, offset, count):
        if 'b' not in getattr(file, 'mode', 'b'):
            raise ValueError("file should be opened in binary mode")
        if not sock.type & socket.SOCK_STREAM:
            raise ValueError("only SOCK_STREAM type sockets are supported")
        if count is not None:
            if not isinstance(count, int):
                raise TypeError(
                    "count must be a positive integer (got %s)" % repr(count))
            if count <= 0:
                raise ValueError(
                    "count must be a positive integer (got %s)" % repr(count))
    
    
    def sendfile(sock, file, offset=0, count=None):
        """sendfile(sock, file[, offset[, count]]) -> sent
    
        Send a *file* over a connected socket *sock* until EOF is
        reached by using high-performance sendfile(2) and return the
        total number of bytes which were sent.
        *file* must be a regular file object opened in binary mode.
        If sendfile() is not available (e.g. Windows) or file is
        not a regular file socket.send() will be used instead.
        *offset* tells from where to start reading the file.
        If specified, *count* is the total number of bytes to transmit
        as opposed to sending the file until EOF is reached.
        File position is updated on return or also in case of error in
        which case file.tell() can be used to figure out the number of
        bytes which were sent.
        The socket must be of SOCK_STREAM type.
        Non-blocking sockets are not supported.
        """
        try:
            return _sendfile_use_sendfile(sock, file, offset, count)
        except _GiveupOnSendfile:
            return _sendfile_use_send(sock, file, offset, count)
    
  3. Reimplementing netstat in Python

    psutil 2.1.0 is out and with it I finally managed to implement something I've been wanting to have for a long time: netstat-like functionalities (see ticket). Similarly to "netstat -antp" on UNIX you can now list system-wide connections in pure python and also determine what process (PID) is using a particular connection:

    >>> import psutil
    >>> from pprint import pprint as pp
    >>> pp(psutil.net_connections())
    [sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 587), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 6379), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.1.1', 53), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('10.0.3.1', 53), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 631), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 25), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('0.0.0.0', 3389), raddr=(), status='LISTEN', pid=None),
     sconn(fd=17, family=2, type=1, laddr=('127.0.0.1', 34785), raddr=(), status='LISTEN', pid=3591),
     sconn(fd=15, family=2, type=1, laddr=('127.0.0.1', 56359), raddr=(), status='LISTEN', pid=3591),
     sconn(fd=-1, family=10, type=2, laddr=('::', 56720), raddr=(), status='NONE', pid=None)]
    >>>
    

    This is yet another functionality which can be used for monitoring purposes. For example, say you want to make sure your HTTP server is running on port 80, you can do something like this:

    import psutil
    
    def check_listening_port(port):
        """Return True if the given TCP port is busy and in LISTEN mode."""
        for conn in psutil.net_connections(kind='tcp'):
            if conn.laddr[1] == port and conn.status == psutil.CONN_LISTEN:
                return True
        return False
    
    print(check_listening_port(80))
    

    Netstat in pure python

    Here it is, in 65 lines of code: netstat.py. Pretty neat right? ;-)

    Implementation(s)

    As always, each platform required its own, different, implementation. Luckily for some platforms (OSX, Windows) I was able to reuse and customize some code from the existing Process.connections() implementation which was already in place. For those of you who are interested in knowing how this was done here's the source code references:

    Hopefully this will help whoever needs to do this into another language. The only platform where this is sort of clunky is OSX, which does not expose anything to list all system-wide sockets in a single shot, so you're forced to query each process. That means you'll need root privileges otherwise you'll get an access denied error. For what it's worth, I took a look at lsof and it has the same limitation and netstat runs with SUID. Well, I guess this is it. I'll leave you with some docs. For the next one I'm planning on working on a couple of other network-related functionalities: "ifconfig" and NIC speeds. But that's for another time...

Social

Feeds