Blog posts for tags/python

  1. Detect memory leaks of C extensions with psutil and psleak

    Memory leaks in Python are often straightforward to diagnose. Just look at RSS, track Python object counts, follow reference graphs. But leaks inside C extension modules are another story. Traditional memory metrics such as RSS and VMS frequently fail to reveal them because Python's memory allocator sits above the platform's native heap (see pymalloc). If something in an extension calls malloc() without a corresponding free(), that memory often won't show up where you expect it. You have a leak, and you don't know.

    psutil 7.2.0 introduces two new APIs for C heap introspection, designed specifically to catch these kinds of native leaks. They give you a window directly into the underlying platform allocator (e.g. glibc's malloc), letting you track how much memory the C layer is actually consuming.

    These C functions bypass Python entirely. They don't reflect Python object memory, arenas, pools, or anything managed by pymalloc. Instead, they examine the allocator that C extensions actually use. If your RSS is flat but your C heap usage climbs, you now have a way to see it.

    Why native heap introspection matters

    Many Python projects rely on C extensions: psutil, NumPy, pandas, PIL, lxml, psycopg, PyTorch, custom in-house modules, etc. And even cPython itself, which implements many of its standard library modules in C. If any of these components mishandle memory at the C level, you get a leak that:

    • Doesn't show up in Python reference counts (sys.getrefcount).
    • Doesn't show up in tracemalloc module.
    • Doesn't show up in Python's gc stats.
    • Often don't show up in RSS, VMS or USS due to allocator caching, especially for small objects. This can happen, for example, when you forget to Py_DECREF a Python object.

    psutil's new functions solve this by inspecting platform-native allocator state, in a manner similar to Valgrind.

    heap_info(): direct allocator statistics

    heap_info() exposes the following metrics:

    • heap_used: total number of bytes currently allocated via malloc() (small allocations).
    • mmap_used: total number of bytes currently allocated via mmap() or via large malloc() allocations.
    • heap_count: (Windows only) number of private heaps created via HeapCreate().

    Example:

    >>> import psutil
    >>> psutil.heap_info()
    pheap(heap_used=5177792, mmap_used=819200)
    

    Reference for what contributes to each field:

    Platform Allocation type Field affected
    UNIX / Windows small malloc() ≤128 KB without free() heap_used
    UNIX / Windows large malloc() >128 KB without free(), or mmap() without munmap() (UNIX) mmap_used
    Windows HeapAlloc() without HeapFree() heap_used
    Windows VirtualAlloc() without VirtualFree() mmap_used
    Windows HeapCreate() without HeapDestroy() heap_count

    heap_trim(): returning unused heap memory

    heap_trim() provides a cross-platform way to request that the underlying allocator free any unused memory it's holding in the heap (typically small malloc() allocations).

    In practice, modern allocators rarely comply, so this is not a general-purpose memory-reduction tool and won't meaningfully shrink RSS in real programs. Its primary value is in leak detection tools.

    Calling heap_trim() before taking measurements helps reduce allocator noise, giving you a cleaner baseline so that changes in heap_used come from the code you're testing, not from internal allocator caching or fragmentation.

    Real-world use: finding a C extension leak

    The workflow is simple:

    1. Take a baseline snapshot of the heap.
    2. Call the C extension hundreds of times.
    3. Take another snapshot.
    4. Compare.
    import psutil
    
    psutil.heap_trim()  # reduce noise
    
    before = psutil.heap_info()
    for _ in range(200):
        my_cext_function()
    after = psutil.heap_info()
    
    print("delta heap_used =", after.heap_used - before.heap_used)
    print("delta mmap_used =", after.mmap_used - before.mmap_used)
    

    If heap_used or mmap_used values increase consistently, you've found a native leak.

    To reduce false positives, repeat the test multiple times, increasing the number of calls on each retry. This approach helps distinguish real leaks from random noise or transient allocations.

    A new tool: psleak

    The strategy described above is exactly what I implemented in a new PyPI package, which I called psleak. It runs the target function repeatedly, trims the allocator before each run, and tracks differences across retries. Memory that grows consistently after several runs is flagged as a leak.

    A minimal test suite looks like this:

      from psleak import MemoryLeakTestCase
    
      class TestLeaks(MemoryLeakTestCase):
          def test_fun(self):
              self.execute(some_c_function)
    

    If the function leaks memory, the test will fail with a descriptive exception:

    psleak.MemoryLeakError: memory kept increasing after 10 runs
    Run # 1: heap=+388160  | uss=+356352  | rss=+327680  | (calls= 200, avg/call=+1940)
    Run # 2: heap=+584848  | uss=+614400  | rss=+491520  | (calls= 300, avg/call=+1949)
    Run # 3: heap=+778320  | uss=+782336  | rss=+819200  | (calls= 400, avg/call=+1945)
    Run # 4: heap=+970512  | uss=+1032192 | rss=+1146880 | (calls= 500, avg/call=+1941)
    Run # 5: heap=+1169024 | uss=+1171456 | rss=+1146880 | (calls= 600, avg/call=+1948)
    Run # 6: heap=+1357360 | uss=+1413120 | rss=+1310720 | (calls= 700, avg/call=+1939)
    Run # 7: heap=+1552336 | uss=+1634304 | rss=+1638400 | (calls= 800, avg/call=+1940)
    Run # 8: heap=+1752032 | uss=+1781760 | rss=+1802240 | (calls= 900, avg/call=+1946)
    Run # 9: heap=+1945056 | uss=+2031616 | rss=+2129920 | (calls=1000, avg/call=+1945)
    Run #10: heap=+2140624 | uss=+2179072 | rss=+2293760 | (calls=1100, avg/call=+1946)
    

    Psleak is now part of the psutil test suite, to make sure that the C code does not leak memory. All psutil APIs are tested (see test_memleaks.py), making it a de facto regression-testing tool.

    It's worth noting that without inspecting heap metrics, missing calls such as Py_CLEAR and Py_DECREF often go unnoticed, because they don't affect RSS, VMS, and USS. Something I confirmed from experimenting by commenting them out. Monitoring the heap is therefore essential to reliably detect memory leaks in Python C extensions.

    Under the hood

    For those interested in seeing how I did this in terms of code:

    • Linux: uses glibc's mallinfo2() to report uordblks (heap allocations) and hblkhd (mmap-backed blocks).
    • Windows: enumerates heaps and aggregates HeapAlloc / VirtualAlloc usage.
    • macOS: uses malloc zone statistics.
    • BSD: uses jemalloc's arena and stats interfaces.

    Summary

    psutil 7.2.0 fills a long-standing observability gap: native-level memory leaks in C extensions are now visible directly from Python. You now have a simple method to test C extensions for leaks. This turns psutil into not just a monitoring library, but a practical debugging tool for Python projects that rely on native C extension modules.

    To make leak detection practical, I created psleak, a test-regression framework designed to integrate into Python unit tests.

    References

    Discussion

  2. Wheels for free-threaded Python now available in psutil

    With the release of psutil 7.1.2, wheels for free-threaded Python are now available. This milestone was achieved largely through a community effort, as several internal refactorings to the C code were required to make it possible (see issue #2565). Many of these changes were contributed by Lysandros Nikolaou. Thanks to him for the effort and for bearing with me in code reviews! ;-)

    What is free-threaded Python?

    Free-threaded Python (available since Python 3.13) refers to Python builds that are compiled with the GIL (Global Interpreter Lock) disabled, allowing true parallel execution of Python bytecodes across multiple threads. This is particularly beneficial for CPU-bound applications, as it enables better utilization of multi-core processors.

    The state of free-threaded wheels

    According to Hugo van Kemenade's free-threaded wheels tracker, the adoption of free-threaded wheels among the top 360 most-downloaded PyPI packages with C extensions is still limited. Only 128 out of these 360 packages provide wheels compiled for free-threaded Python, meaning they can run on Python builds with the GIL disabled. This shows that, while progress has been made, most popular packages with C extensions still do not offer ready-made wheels for free-threaded Python.

    What it means for users

    When a library author provides a wheel, users can install a pre-compiled binary package without having to build it from source. This is especially important for packages with C extensions, like psutil, which is largely written in C. Such packages often have complex build requirements and require installing a C compiler. On Windows, that means installing Visual Studio or the Build Tools, which can take several gigabytes and a significant setup effort. Providing wheels spare users from this hassle, makes installation far simpler, and is effectively essential for the users of that package. You basically pip install psutil and you're done.

    What it means for library authors

    Currently, universal wheels for free-threaded Python do not exist. Each wheel must be built specifically for a Python version. Right now authors must create separate wheels for Python 3.13 and 3.14. Which means distributing a lot of files already:

    psutil-7.1.2-cp313-cp313t-macosx_10_13_x86_64.whl
    psutil-7.1.2-cp313-cp313t-macosx_11_0_arm64.whl
    psutil-7.1.2-cp313-cp313t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl
    psutil-7.1.2-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
    psutil-7.1.2-cp313-cp313t-win_amd64.whl
    psutil-7.1.2-cp313-cp313t-win_arm64.whl
    psutil-7.1.2-cp314-cp314t-macosx_10_15_x86_64.whl
    psutil-7.1.2-cp314-cp314t-macosx_11_0_arm64.whl
    psutil-7.1.2-cp314-cp314t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl
    psutil-7.1.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
    psutil-7.1.2-cp314-cp314t-win_amd64.whl
    psutil-7.1.2-cp314-cp314t-win_arm64.whl
    

    This also multiplies CI jobs and slows down the test matrix (see build.yml). A true universal wheel would greatly reduce this overhead, allowing a single wheel to support multiple Python versions and platforms. Hopefully, Python 3.15 will simplify this process. Two competing proposals, PEP 803 and PEP 809, aim to standardize wheel naming and metadata to allow producing a single wheel that covers multiple Python versions. That would drastically reduce distribution complexity for library authors, and it's fair to say it's essential for free-threaded CPython to truly succeed.

    How to install free-threaded psutil

    You can now install psutil for free-threaded Python directly via pip:

    pip install psutil --only-binary=:all:
    

    This ensures you get the pre-compiled wheels without triggering a source build.

    External links

  3. Speedup pytest startup

    Preface: the migration to pytest

    Last year, after 17 years since the inception of the project, I decided to start adopting pytest into psutil (see psutil/#2446). The advantages over unittest are numerous, but the two I cared about most are:

    • Being able to use base assert statements instead of unittest's self.assert*() APIs.
    • The excellent pytest-xdist extension, that lets you run tests in parallel, basically for free.

    Beyond that, I don't rely on any pytest-specific features in the code, like fixtures or conftest.py. I still organize tests in classes, with each one inheriting from unittest.TestCase. Why?

    • I like unittest's self.addCleanup too much to give it up (see some usages). I find it superior to fixtures. Less magical and more explicit.
    • I want users to be able to test their psutil installation in production environments where pytest might not be installed. To accommodate this, I created a minimal "fake" pytest class that emulates essential features like pytest.raises, @pytest.skip etc. (see PR-2456).

    But that's a separate topic. What I want to focus on here is one of pytest's most frustrating aspects: slow startup times.

    pytest invocation is slow

    To measure pytest's startup time, let's run a very simple test where execution time won't significantly affect the results:

    $ time python3 -m pytest --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ============================= test session starts =============================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ============================== 1 passed in 0.05s ==============================
    
    real    0m0,427s
    user    0m0,375s
    sys     0m0,051s
    

    0,427s. Almost half of a second. That's excessive for something I frequently execute during development. For comparison, running the same test with unittest:

    $ time python3 -m unittest psutil.tests.test_misc.TestMisc.test_version
    ----------------------------------------------------------------------
    Ran 1 test in 0.000s
    OK
    
    real    0m0,204s
    user    0m0,169s
    sys     0m0,035s
    

    0,204 secs. Meaning unittest is roughly twice as fast as pytest. But why?

    Where is time being spent?

    A significant portion of pytest's overhead comes from import time:

    $ time python3 -c "import pytest"
    real    0m0,151s
    user    0m0,135s
    sys     0m0,016s
    
    $ time python3 -c "import unittest"
    real    0m0,065s
    user    0m0,055s
    sys     0m0,010s
    

    There's nothing I can do about that. For the record, psutil import timing is:

    $ time python3 -c "import psutil"
    real    0m0,056s
    user    0m0,050s
    sys     0m0,006s
    

    Disable plugin auto loading

    After some research, I discovered that pytest automatically loads all plugins installed on the system, even if they aren't used. Here's how to list them (output is cut):

    $ pytest --trace-config --collect-only
    ...
    active plugins:
        ...
        setupplan           : ~/.local/lib/python3.12/site-packages/_pytest/setupplan.py
        stepwise            : ~/.local/lib/python3.12/site-packages/_pytest/stepwise.py
        warnings            : ~/.local/lib/python3.12/site-packages/_pytest/warnings.py
        logging             : ~/.local/lib/python3.12/site-packages/_pytest/logging.py
        reports             : ~/.local/lib/python3.12/site-packages/_pytest/reports.py
        python_path         : ~/.local/lib/python3.12/site-packages/_pytest/python_path.py
        unraisableexception : ~/.local/lib/python3.12/site-packages/_pytest/unraisableexception.py
        threadexception     : ~/.local/lib/python3.12/site-packages/_pytest/threadexception.py
        faulthandler        : ~/.local/lib/python3.12/site-packages/_pytest/faulthandler.py
        instafail           : ~/.local/lib/python3.12/site-packages/pytest_instafail.py
        anyio               : ~/.local/lib/python3.12/site-packages/anyio/pytest_plugin.py
        pytest_cov          : ~/.local/lib/python3.12/site-packages/pytest_cov/plugin.py
        subtests            : ~/.local/lib/python3.12/site-packages/pytest_subtests/plugin.py
        xdist               : ~/.local/lib/python3.12/site-packages/xdist/plugin.py
        xdist.looponfail    : ~/.local/lib/python3.12/site-packages/xdist/looponfail.py
        ...
    

    It turns out PYTEST_DISABLE_PLUGIN_AUTOLOAD environment variable can be used to disable them. By running PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest --trace-config --collect-only again I can see that the following plugins disappeared:

    anyio
    pytest_cov
    pytest_instafail
    pytest_subtests
    xdist
    xdist.looponfail
    

    Now let's run the test again by using PYTEST_DISABLE_PLUGIN_AUTOLOAD:

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ============================= test session starts =============================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ============================== 1 passed in 0.05s ==============================
    
    real    0m0,285s
    user    0m0,267s
    sys     0m0,040s
    

    We went from 0,427 secs to 0,285 secs, a ~40% improvement. Not bad. We now need to selectively enable only the plugins we actually use, via -p CLI option. Plugins used by psutil are pytest-instafail and pytest-subtests (we'll think about pytest-xdist later):

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -p instafail -p subtests --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ========================================================= test session starts =========================================================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ========================================================== 1 passed in 0.05s ==========================================================
    real    0m0,320s
    user    0m0,283s
    sys     0m0,037s
    

    Time went up again, from 0,285 secs to 0,320s. Quite a slowdown, but still better than the initial 0,427s. Now, let's add pytest-xdist to the mix:

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -p instafail -p subtests -p xdist --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ========================================================= test session starts =========================================================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ========================================================== 1 passed in 0.05s ==========================================================
    
    real    0m0,369s
    user    0m0,286s
    sys     0m0,049s
    

    We now went from 0,320s to 0,369s. Not too much, but still it's a pity to pay the price when NOT running tests in parallel.

    Handling pytest-xdist

    If we disable pytest-xdist psutil tests still run, but we get a warning:

    psutil/tests/test_testutils.py:367
      ~/svn/psutil/psutil/tests/test_testutils.py:367: PytestUnknownMarkWarning: Unknown pytest.mark.xdist_group - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
        @pytest.mark.xdist_group(name="serial")
    

    This warning appears for methods that are intended to run serially, those decorated with @pytest.mark.xdist_group(name="serial"). However, since pytest-xdist is now disabled, the decorator no longer exists. To address this, I implemented the following solution in psutil/tests/__init__.py:

    import pytest, functools
    
    PYTEST_PARALLEL = "PYTEST_XDIST_WORKER" in os.environ  # True if running parallel tests
    
    if not PYTEST_PARALLEL:
        def fake_xdist_group(*_args, **_kwargs):
            """Mimics `@pytest.mark.xdist_group` decorator. No-op: it just
            calls the test method or return the decorated class."""
            def wrapper(obj):
                @functools.wraps(obj)
                def inner(*args, **kwargs):
                    return obj(*args, **kwargs)
    
                return obj if isinstance(obj, type) else inner
    
            return wrapper
    
        pytest.mark.xdist_group = fake_xdist_group  # monkey patch
    

    With this in place the warning disappears when running tests serially. To run tests in parallel, we'll manually enable xdist:

    $ python3 -m pytest -p xdist -n auto --dist loadgroup
    

    Disable some default plugins

    pytests also loads quite a bunch of plugins by default (see output of pytest --trace-config --collect-only). I tried to disable some of them with:

    pytest -p no:junitxml -p no:doctest -p no:nose -p no:pastebin
    

    ...but that didn't make much of a difference.

    Optimizing test collection time

    By default, pytest searches the entire directory for tests, adding unnecessary overhead. In pyproject.toml you can tell pytest where test files are located, and only to consider test_*.py files:

    [tool.pytest.ini_options]
    testpaths = ["psutil/tests/"]
    python_files = ["test_*.py"]
    

    With this I saved another 0.03 seconds. Before:

    $ python3 -m pytest --collect-only
    ...
    ======================== 685 tests collected in 0.20s =========================
    

    After:

    $ python3 -m pytest --collect-only
    ...
    ======================== 685 tests collected in 0.17s =========================
    

    Putting it all together

    With these small optimizations, I managed to reduce pytest startup time by ~0.12 seconds, bringing it down from 0.42 seconds. While this improvement is insignificant for full test runs, it somewhat makes a noticeable difference (~28% faster) when repeatedly running individual tests from the command line, which is something I do frequently during development. Final result is visible in PR-2538.

    Other links which may be useful

  4. psutil: drop Python 2.7 support

    About dropping Python 2.7 support in psutil, 3 years ago I stated:

    Not a chance, for many years to come. [Python 2.7] currently represents 7-10% of total downloads, meaning around 70k / 100k downloads per day.

    Only 3 years later, and to my surprise, downloads for Python 2.7 dropped to 0.36%! As such, as of psutil 7.0.0, I finally decided to drop support for Python 2.7!

    The numbers

    These are downloads per month:

    $ pypinfo --percent psutil pyversion
    Served from cache: False
    Data processed: 4.65 GiB
    Data billed: 4.65 GiB
    Estimated cost: $0.03
    
    | python_version | percent | download_count |
    | -------------- | ------- | -------------- |
    | 3.10           |  23.84% |     26,354,506 |
    | 3.8            |  18.87% |     20,862,015 |
    | 3.7            |  17.38% |     19,217,960 |
    | 3.9            |  17.00% |     18,798,843 |
    | 3.11           |  13.63% |     15,066,706 |
    | 3.12           |   7.01% |      7,754,751 |
    | 3.13           |   1.15% |      1,267,008 |
    | 3.6            |   0.73% |        803,189 |
    | 2.7            |   0.36% |        402,111 |
    | 3.5            |   0.03% |         28,656 |
    | Total          |         |    110,555,745 |
    

    According to pypistats.org Python 2.7 downloads represents the 0.28% of the total, around 15.000 downloads per day.

    The pain

    Maintaining 2.7 support in psutil had become increasingly difficult, but still possible. E.g. I could still run tests by using old PYPI backports. GitHub Actions could still be tweaked to run tests and produce 2.7 wheels on Linux and macOS. Not on Windows though, for which I had to use a separate service (Appveyor). Still, the amount of hacks in psutil source code necessary to support Python 2.7 piled up over the years, and became quite big. Some disadvantages that come to mind:

    • Having to maintain a Python compatibility layers like psutil/_compat.py. This translated in extra extra code and extra imports.
    • The C compatibility layer to differentiate between Python 2 and 3 (#if PY_MAJOR_VERSION <= 3, etc.).
    • Dealing with the string vs. unicode differences, both in Python and in C.
    • Inability to use modern language features, especially f-strings.
    • Inability to freely use enums, which created a difference on how CONSTANTS were exposed in terms of API.
    • Having to install a specific version of pip and other (outdated) deps.
    • Relying on the third-party Appveyor CI service to run tests and produce 2.7 wheels.
    • Running 4 extra CI jobs on every commit (Linux, macOS, Windows 32-bit, Windows 64-bit) making the CI slower and more subject to failures (we have quite a bit of flaky tests).
    • The distribution of 7 wheels specific for Python 2.7. E.g. in the previous release I had to upload:
    psutil-6.1.1-cp27-cp27m-macosx_10_9_x86_64.whl
    psutil-6.1.1-cp27-none-win32.whl
    psutil-6.1.1-cp27-none-win_amd64.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_x86_64.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_x86_64.whl
    

    The removal

    The removal was done in PR-2841, which removed around 1500 lines of code (nice!). It felt liberating. In doing so, in the doc I still made the promise that the 6.1.* serie will keep supporting Python 2.7 and will receive critical bug-fixes only (no new features). It will be maintained in a specific python2 branch. I explicitly kept the setup.py script compatible with Python 2.7 in terms of syntax, so that, when the tarball is fetched from PYPI, it will emit an informative error message on pip install psutil. The user trying to install psutil on Python 2.7 will see:

    $ pip2 install psutil
    As of version 7.0.0 psutil no longer supports Python 2.7.
    Latest version supporting Python 2.7 is psutil 6.1.X.
    Install it with: "pip2 install psutil==6.1.*".
    

    As the informative message states, users that are still on Python 2.7 can still use psutil with:

    pip2 install psutil==6.1.*
    

    Related tickets

  5. Recognize connection errors

    Lately I've been dealing with an asynchronous TCP client app which sends messages to a remote server. Some of these messages are important, and cannot get lost. Because the connection may drop at any time, I had to implement a mechanism to resend the message once the client reconnects. As such, I needed a way to identify what constitutes a connection error.

    Python provides a builtin ConnectionError exception precisely for this purpose, but it turns out it's not enough. After observing logs in production, I found some errors that were not related to the socket connection per se, but rather to the system connectivity, like ENETUNREACH ("network unreachable") or ENETDOWN ("network down"). It's interesting to note how this distinction is reflected in the UNIX errno code prefixes: ECONN* (connection errors) vs. ENET* (network errors). I've noticed ENET* errors usually occur on a DHCP renewal, or more in general when the Wi-Fi signal is weak or absent. Because this code runs on a cleaning robot which constantly moves around the house, connection can become unstable when the robot gets far from the Wi-Fi Access Point, so it's pretty common to bump into errors like these:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    OSError: [Errno 101] Network is unreachable
    
    File "/usr/lib/python3.7/socket.py", line 222, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    BrokenPipeError: [Errno 32] Broken pipe
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    socket.timeout: The write operation timed out
    

    Production logs also revealed a considerable amount of SSL-related errors. I was uncertain what to do about those. The app is supposed to gracefully handle them, so theoretically they should represent a bug. Still, they are unequivocally related to the connection stream, and represent a failed attempt to send data, so we want to retry it. Example of logs I found:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLError: [SSL: BAD_LENGTH] bad length
    

    Looking at production logs revealed what sort of brutal, rough and tumble place the Internet is, and how a network app must be ready to handle all sorts of unexpected error conditions which hardly show up during testing. To handle all of these cases I came up with this solution which I think is worth sharing, as it's generic enough to be reused in similar situations. If needed, this can be easily extended to include specific exceptions of third party libraries, like requests.exceptions.ConnectionError.

    import errno, socket, ssl
    
    # Network errors, usually related to DHCP or wpa_supplicant (Wi-Fi).
    NETWORK_ERRNOS = frozenset((
        errno.ENETUNREACH,  # "Network is unreachable"
        errno.ENETDOWN,  # "Network is down"
        errno.ENETRESET,  # "Network dropped connection on reset"
        errno.ENONET,  # "Machine is not on the network"
    ))
    
    def is_connection_err(exc):
        """Return True if an exception is connection-related."""
        if isinstance(exc, ConnectionError):
            # https://docs.python.org/3/library/exceptions.html#ConnectionError
            # ConnectionError includes:
            # * BrokenPipeError (EPIPE, ESHUTDOWN)
            # * ConnectionAbortedError (ECONNABORTED)
            # * ConnectionRefusedError (ECONNREFUSED)
            # * ConnectionResetError (ECONNRESET)
            return True
        if isinstance(exc, socket.gaierror):
            # failed DNS resolution on connect()
            return True
        if isinstance(exc, (socket.timeout, TimeoutError)):
            # timeout on connect(), recv(), send()
            return True
        if isinstance(exc, OSError):
            # ENOTCONN == "Transport endpoint is not connected"
            return (exc.errno in NETWORK_ERRNOS) or (exc.errno == errno.ENOTCONN)
        if isinstance(exc, ssl.SSLError):
            # Let's consider any SSL error a connection error. Usually this is:
            # * ssl.SSLZeroReturnError: "TLS/SSL connection has been closed"
            # * ssl.SSLError: [SSL: BAD_LENGTH]
            return True
        return False
    

    To use it:

    try:
        sock.sendall(b"hello there")
    except Exception as err:
        if is_connection_err(err):
            schedule_on_reconnect(lambda: sock.sendall(b"hello there"))
        raise
    

Social

Feeds