Newsletter

Enter your email address to receive blog updates:

  1. Speedup pytest startup

    Preface: the migration to pytest

    Last year, after 17 years since the inception of the project, I decided to start adopting pytest into psutil (see psutil/#2446). The advantages over unittest are numerous, but the two I cared about most are:

    • Being able to use base assert statements instead of unittest's self.assert*() APIs.
    • The excellent pytest-xdist extension, that lets you run tests in parallel, basically for free.

    Beyond that, I don't rely on any pytest-specific features in the code, like fixtures or conftest.py. I still organize tests in classes, with each one inheriting from unittest.TestCase. Why?

    • I like unittest's self.addCleanup too much to give it up (see some usages). I find it superior to fixtures. Less magical and more explicit.
    • I want users to be able to test their psutil installation in production environments where pytest might not be installed. To accommodate this, I created a minimal "fake" pytest class that emulates essential features like pytest.raises, @pytest.skip etc. (see PR-2456).

    But that's a separate topic. What I want to focus on here is one of pytest's most frustrating aspects: slow startup times.

    pytest invocation is slow

    To measure pytest's startup time, let's run a very simple test where execution time won't significantly affect the results:

    $ time python3 -m pytest --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ============================= test session starts =============================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ============================== 1 passed in 0.05s ==============================
    
    real    0m0,427s
    user    0m0,375s
    sys     0m0,051s
    

    0,427s. Almost half of a second. That's excessive for something I frequently execute during development. For comparison, running the same test with unittest:

    $ time python3 -m unittest psutil.tests.test_misc.TestMisc.test_version
    ----------------------------------------------------------------------
    Ran 1 test in 0.000s
    OK
    
    real    0m0,204s
    user    0m0,169s
    sys     0m0,035s
    

    0,204 secs. Meaning unittest is roughly twice as fast as pytest. But why?

    Where is time being spent?

    A significant portion of pytest's overhead comes from import time:

    $ time python3 -c "import pytest"
    real    0m0,151s
    user    0m0,135s
    sys     0m0,016s
    
    $ time python3 -c "import unittest"
    real    0m0,065s
    user    0m0,055s
    sys     0m0,010s
    

    There's nothing I can do about that. For the record, psutil import timing is:

    $ time python3 -c "import psutil"
    real    0m0,056s
    user    0m0,050s
    sys     0m0,006s
    

    Disable plugin auto loading

    After some research, I discovered that pytest automatically loads all plugins installed on the system, even if they aren't used. Here's how to list them (output is cut):

    $ pytest --trace-config --collect-only
    ...
    active plugins:
        ...
        setupplan           : ~/.local/lib/python3.12/site-packages/_pytest/setupplan.py
        stepwise            : ~/.local/lib/python3.12/site-packages/_pytest/stepwise.py
        warnings            : ~/.local/lib/python3.12/site-packages/_pytest/warnings.py
        logging             : ~/.local/lib/python3.12/site-packages/_pytest/logging.py
        reports             : ~/.local/lib/python3.12/site-packages/_pytest/reports.py
        python_path         : ~/.local/lib/python3.12/site-packages/_pytest/python_path.py
        unraisableexception : ~/.local/lib/python3.12/site-packages/_pytest/unraisableexception.py
        threadexception     : ~/.local/lib/python3.12/site-packages/_pytest/threadexception.py
        faulthandler        : ~/.local/lib/python3.12/site-packages/_pytest/faulthandler.py
        instafail           : ~/.local/lib/python3.12/site-packages/pytest_instafail.py
        anyio               : ~/.local/lib/python3.12/site-packages/anyio/pytest_plugin.py
        pytest_cov          : ~/.local/lib/python3.12/site-packages/pytest_cov/plugin.py
        subtests            : ~/.local/lib/python3.12/site-packages/pytest_subtests/plugin.py
        xdist               : ~/.local/lib/python3.12/site-packages/xdist/plugin.py
        xdist.looponfail    : ~/.local/lib/python3.12/site-packages/xdist/looponfail.py
        ...
    

    It turns out PYTEST_DISABLE_PLUGIN_AUTOLOAD environment variable can be used to disable them. By running PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest --trace-config --collect-only again I can see that the following plugins disappeared:

    anyio
    pytest_cov
    pytest_instafail
    pytest_subtests
    xdist
    xdist.looponfail
    

    Now let's run the test again by using PYTEST_DISABLE_PLUGIN_AUTOLOAD:

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ============================= test session starts =============================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ============================== 1 passed in 0.05s ==============================
    
    real    0m0,285s
    user    0m0,267s
    sys     0m0,040s
    

    We went from 0,427 secs to 0,285 secs, a ~40% improvement. Not bad. We now need to selectively enable only the plugins we actually use, via -p CLI option. Plugins used by psutil are pytest-instafail and pytest-subtests (we'll think about pytest-xdist later):

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -p instafail -p subtests --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ========================================================= test session starts =========================================================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ========================================================== 1 passed in 0.05s ==========================================================
    real    0m0,320s
    user    0m0,283s
    sys     0m0,037s
    

    Time went up again, from 0,285 secs to 0,320s. Quite a slowdown, but still better than the initial 0,427s. Now, let's add pytest-xdist to the mix:

    $ time PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -p instafail -p subtests -p xdist --no-header psutil/tests/test_misc.py::TestMisc::test_version
    ========================================================= test session starts =========================================================
    collected 1 item
    psutil/tests/test_misc.py::TestMisc::test_version PASSED
    ========================================================== 1 passed in 0.05s ==========================================================
    
    real    0m0,369s
    user    0m0,286s
    sys     0m0,049s
    

    We now went from 0,320s to 0,369s. Not too much, but still it's a pity to pay the price when NOT running tests in parallel.

    Handling pytest-xdist

    If we disable pytest-xdist psutil tests still run, but we get a warning:

    psutil/tests/test_testutils.py:367
      ~/svn/psutil/psutil/tests/test_testutils.py:367: PytestUnknownMarkWarning: Unknown pytest.mark.xdist_group - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
        @pytest.mark.xdist_group(name="serial")
    

    This warning appears for methods that are intended to run serially, those decorated with @pytest.mark.xdist_group(name="serial"). However, since pytest-xdist is now disabled, the decorator no longer exists. To address this, I implemented the following solution in psutil/tests/__init__.py:

    import pytest, functools
    
    PYTEST_PARALLEL = "PYTEST_XDIST_WORKER" in os.environ  # True if running parallel tests
    
    if not PYTEST_PARALLEL:
        def fake_xdist_group(*_args, **_kwargs):
            """Mimics `@pytest.mark.xdist_group` decorator. No-op: it just
            calls the test method or return the decorated class."""
            def wrapper(obj):
                @functools.wraps(obj)
                def inner(*args, **kwargs):
                    return obj(*args, **kwargs)
    
                return obj if isinstance(obj, type) else inner
    
            return wrapper
    
        pytest.mark.xdist_group = fake_xdist_group  # monkey patch
    

    With this in place the warning disappears when running tests serially. To run tests in parallel, we'll manually enable xdist:

    $ python3 -m pytest -p xdist -n auto --dist loadgroup
    

    Disable some default plugins

    pytests also loads quite a bunch of plugins by default (see output of pytest --trace-config --collect-only). I tried to disable some of them with:

    pytest -p no:junitxml -p no:doctest -p no:nose -p no:pastebin
    

    ...but that didn't make much of a difference.

    Optimizing test collection time

    By default, pytest searches the entire directory for tests, adding unnecessary overhead. In pyproject.toml you can tell pytest where test files are located:

    [tool.pytest.ini_options]
    testpaths = ["psutil/tests/"]
    

    With this I saved another 0.03 seconds. Before:

    $ python3 -m pytest --collect-only
    ...
    ======================== 685 tests collected in 0.20s =========================
    

    After:

    $ python3 -m pytest --collect-only
    ...
    ======================== 685 tests collected in 0.17s =========================
    

    Putting it all together

    With these small optimizations, I managed to reduce pytest startup time by ~0.12 seconds, bringing it down from 0.42 seconds. While this improvement is insignificant for full test runs, it somewhat makes a noticeable difference (~28% faster) when repeatedly running individual tests from the command line, which is something I do frequently during development. Final result is visible in PR-2538.

    Other links which may be useful

  2. A Brave / Chrome extension to reorder new tabs

    While browsing, I almost always keep three tabs open: Gmail, Slack, and Microsoft Teams (for work). I find it convenient to have them always in the same position (1, 2, and 3), so I can quickly switch to them using keyboard shortcuts (alt+1, alt+2, alt+3). While the Gmail tab remains in place, Slack and Teams frequently move because I use them for work, closing them in the evening and reopening them the next morning. And each time I have to manually reposition them. To automate this process, I started looking for a Brave (Chrome) extension but couldn't find one that fully met my needs. So, I decided to spend a little time writing my own. It turns out it's incredibly easy. Here's how you can create a simple Brave extension. The same process should also work for Chrome.

    • Put these 2 files in a folder:

    manifest.json:

    {
        "manifest_version": 3,
        "name": "Tab repositioner",
        "version": "1.0",
        "description": "Set position of new tabs",
        "permissions": ["tabs"],
        "background": {
            "service_worker": "background.js"
        }
    }
    

    background.js:

    function move_tab(tab, index) {
        if (tab.index != index) {
            console.log(`moving tab '${tab.url}' to position ${index}`);
            chrome.tabs.move(tab.id, { index: index });
        }
    }
    
    function move_new_tabs(changeInfo, tab) {
        if ((changeInfo.status === "loading" || changeInfo.status === "complete") && tab.url) {
            if (tab.url.includes("mail.google.com")) { move_tab(tab, 0); }
            if (tab.url.includes("app.slack.com")) { move_tab(tab, 1); }
            if (tab.url.includes("teams.microsoft.com")) { move_tab(tab, 2); }
        }
    }
    
    chrome.tabs.onUpdated.addListener((tabId, changeInfo, tab) => {
        move_new_tabs(changeInfo, tab);
    });
    
    • Go to brave://extensions/ (chrome://extensions/ for Chrome).
    • Enable Developer Mode (top right).
    • Click Load unpacked and select your folder.
    • To observe debug messages printed in the console, click on Service Worker.
    • After editing background.js, click the refresh icon to apply the changes.
  3. psutil: drop Python 2.7 support

    About dropping Python 2.7 support in psutil, 3 years ago I stated:

    Not a chance, for many years to come. [Python 2.7] currently represents 7-10% of total downloads, meaning around 70k / 100k downloads per day.

    Only 3 years later, and to my surprise, downloads for Python 2.7 dropped to 0.36%! As such, as of psutil 7.0.0, I finally decided to drop support for Python 2.7!

    The numbers

    These are downloads per month:

    $ pypinfo --percent psutil pyversion
    Served from cache: False
    Data processed: 4.65 GiB
    Data billed: 4.65 GiB
    Estimated cost: $0.03
    
    | python_version | percent | download_count |
    | -------------- | ------- | -------------- |
    | 3.10           |  23.84% |     26,354,506 |
    | 3.8            |  18.87% |     20,862,015 |
    | 3.7            |  17.38% |     19,217,960 |
    | 3.9            |  17.00% |     18,798,843 |
    | 3.11           |  13.63% |     15,066,706 |
    | 3.12           |   7.01% |      7,754,751 |
    | 3.13           |   1.15% |      1,267,008 |
    | 3.6            |   0.73% |        803,189 |
    | 2.7            |   0.36% |        402,111 |
    | 3.5            |   0.03% |         28,656 |
    | Total          |         |    110,555,745 |
    

    According to pypistats.org Python 2.7 downloads represents the 0.28% of the total, around 15.000 downloads per day.

    The pain

    Maintaining 2.7 support in psutil had become increasingly difficult, but still possible. E.g. I could still run tests by using old PYPI backports. GitHub Actions could still be tweaked to run tests and produce 2.7 wheels on Linux and macOS. Not on Windows though, for which I had to use a separate service (Appveyor). Still, the amount of hacks in psutil source code necessary to support Python 2.7 piled up over the years, and became quite big. Some disadvantages that come to mind:

    • Having to maintain a Python compatibility layers like psutil/_compat.py. This translated in extra extra code and extra imports.
    • The C compatibility layer to differentiate between Python 2 and 3 (#if PY_MAJOR_VERSION <= 3, etc.).
    • Dealing with the string vs. unicode differences, both in Python and in C.
    • Inability to use modern language features, especially f-strings.
    • Inability to freely use enums, which created a difference on how CONSTANTS were exposed in terms of API.
    • Having to install a specific version of pip and other (outdated) deps.
    • Relying on the third-party Appveyor CI service to run tests and produce 2.7 wheels.
    • Running 4 extra CI jobs on every commit (Linux, macOS, Windows 32-bit, Windows 64-bit) making the CI slower and more subject to failures (we have quite a bit of flaky tests).
    • The distribution of 7 wheels specific for Python 2.7. E.g. in the previous release I had to upload:
    psutil-6.1.1-cp27-cp27m-macosx_10_9_x86_64.whl
    psutil-6.1.1-cp27-none-win32.whl
    psutil-6.1.1-cp27-none-win_amd64.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_x86_64.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_x86_64.whl
    

    The removal

    The removal was done in PR-2841, which removed around 1500 lines of code (nice!). It felt liberating. In doing so, in the doc I still made the promise that the 6.1.* serie will keep supporting Python 2.7 and will receive critical bug-fixes only (no new features). It will be maintained in a specific python2 branch. I explicitly kept the setup.py script compatible with Python 2.7 in terms of syntax, so that, when the tarball is fetched from PYPI, it will emit an informative error message on pip install psutil. The user trying to install psutil on Python 2.7 will see:

    $ pip2 install psutil
    As of version 7.0.0 psutil no longer supports Python 2.7.
    Latest version supporting Python 2.7 is psutil 6.1.X.
    Install it with: "pip2 install psutil==6.1.*".
    

    As the informative message states, users that are still on Python 2.7 can still use psutil with:

    pip2 install psutil==6.1.*
    

    Related tickets

  4. Recognize connection errors

    Lately I've been dealing with an asynchronous TCP client app which sends messages to a remote server. Some of these messages are important, and cannot get lost. Because the connection may drop at any time, I had to implement a mechanism to resend the message once the client reconnects. As such, I needed a way to identify what constitutes a connection error.

    Python provides a builtin ConnectionError exception precisely for this purpose, but it turns out it's not enough. After observing logs in production, I found some errors that were not related to the socket connection per se, but rather to the system connectivity, like ENETUNREACH ("network unreachable") or ENETDOWN ("network down"). It's interesting to note how this distinction is reflected in the UNIX errno code prefixes: ECONN* (connection errors) vs. ENET* (network errors). I've noticed ENET* errors usually occur on a DHCP renewal, or more in general when the Wi-Fi signal is weak or absent. Because this code runs on a cleaning robot which constantly moves around the house, connection can become unstable when the robot gets far from the Wi-Fi Access Point, so it's pretty common to bump into errors like these:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    OSError: [Errno 101] Network is unreachable
    
    File "/usr/lib/python3.7/socket.py", line 222, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    BrokenPipeError: [Errno 32] Broken pipe
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    socket.timeout: The write operation timed out
    

    Production logs also revealed a considerable amount of SSL-related errors. I was uncertain what to do about those. The app is supposed to gracefully handle them, so theoretically they should represent a bug. Still, they are unequivocally related to the connection stream, and represent a failed attempt to send data, so we want to retry it. Example of logs I found:

    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
    
    File "/usr/lib/python3.7/ssl.py", line 934, in send
        return self._sslobj.write(data)
    ssl.SSLError: [SSL: BAD_LENGTH] bad length
    

    Looking at production logs revealed what sort of brutal, rough and tumble place the Internet is, and how a network app must be ready to handle all sorts of unexpected error conditions which hardly show up during testing. To handle all of these cases I came up with this solution which I think is worth sharing, as it's generic enough to be reused in similar situations. If needed, this can be easily extended to include specific exceptions of third party libraries, like requests.exceptions.ConnectionError.

    import errno, socket, ssl
    
    # Network errors, usually related to DHCP or wpa_supplicant (Wi-Fi).
    NETWORK_ERRNOS = frozenset((
        errno.ENETUNREACH,  # "Network is unreachable"
        errno.ENETDOWN,  # "Network is down"
        errno.ENETRESET,  # "Network dropped connection on reset"
        errno.ENONET,  # "Machine is not on the network"
    ))
    
    def is_connection_err(exc):
        """Return True if an exception is connection-related."""
        if isinstance(exc, ConnectionError):
            # https://docs.python.org/3/library/exceptions.html#ConnectionError
            # ConnectionError includes:
            # * BrokenPipeError (EPIPE, ESHUTDOWN)
            # * ConnectionAbortedError (ECONNABORTED)
            # * ConnectionRefusedError (ECONNREFUSED)
            # * ConnectionResetError (ECONNRESET)
            return True
        if isinstance(exc, socket.gaierror):
            # failed DNS resolution on connect()
            return True
        if isinstance(exc, (socket.timeout, TimeoutError)):
            # timeout on connect(), recv(), send()
            return True
        if isinstance(exc, OSError):
            # ENOTCONN == "Transport endpoint is not connected"
            return (exc.errno in NETWORK_ERRNOS) or (exc.errno == errno.ENOTCONN)
        if isinstance(exc, ssl.SSLError):
            # Let's consider any SSL error a connection error. Usually this is:
            # * ssl.SSLZeroReturnError: "TLS/SSL connection has been closed"
            # * ssl.SSLError: [SSL: BAD_LENGTH]
            return True
        return False
    

    To use it:

    try:
        sock.sendall(b"hello there")
    except Exception as err:
        if is_connection_err(err):
            schedule_on_reconnect(lambda: sock.sendall(b"hello there"))
        raise
    
  5. Sublime Text: remember cursor position plugin

    My editor of choice for Python development is Sublime Text. It has been for a very long time (10 years). It's fast, minimalist and straight to the point, which is why I always resisted the temptation to use more advanced and modern IDEs such as PyCharm or VS code, which admittedly have superior auto-completion and refactoring tools.

    There is a very simple feature I've always missed in ST: the possibility to "remember" / save the cursor position when a file is closed. The only plugin promising to do such a thing is called BufferScroll, but for some reason it ceased working for me at some point. I spent a considerable amount of time Googling for an alternative but, to my surprise, I couldn't find any plugin which implements such a simple feature. Therefore today I decided to bite the bullet and try to implement this myself, by writing my first ST plugin, which I paste below.

    What it does is this:

    • every time a file is closed, save the cursor position (x and y axis) to a JSON file
    • if that same file is re-opened, restore the cursor at that position

    What's neat about ST plugins is that they are just Python files which you can install by copying them in ST's config directory. On Linux you can copy the script below in:

    ~/.config/sublime-text-3/Packages/User/cursor_positions.py

    ...and will work out of the box. This is exactly the kind of minimalism which I love about ST, and which I've always missed in other IDEs.

    # cursor_positions.py
    
    """
    A plugin for SublimeText which saves (remembers) cursor position when
    a file is closed.
    Install it by copying this file in ~/.config/sublime-text-3/Packages/User/
    directory (Linux).
    
    Author: Giampaolo Rodola'
    License: MIT
    """
    
    import datetime
    import json
    import os
    import tempfile
    import threading
    
    import sublime
    import sublime_plugin
    
    
    SUBLIME_ROOT = os.path.realpath(os.path.join(sublime.packages_path(), '..'))
    SESSION_FILE = os.path.join(
        SUBLIME_ROOT, "Local", "cursor_positions.session.json")
    # when reading the session file on startup, we'll remove entries
    # older than X days
    RM_FILE_OLDER_THAN_DAYS = 180
    
    
    def log(*args):
        print("    %s: " % os.path.basename(__file__), end="")
        print(*args)
    
    
    class Session:
    
        def __init__(self):
            self._lock = threading.Lock()
            os.makedirs(os.path.dirname(SESSION_FILE), exist_ok=True)
            self.prune_old_entries()
    
        # --- file
    
        def read_session_file(self):
            try:
                with self._lock:
                    with open(SESSION_FILE, "r") as f:
                        return json.load(f)
            except (FileNotFoundError, json.decoder.JSONDecodeError):
                return {}
    
        def write_session_file(self, d):
            # Use the same FS so that the move operation is atomic:
            # https://stackoverflow.com/a/18706666
            with tempfile.NamedTemporaryFile(
                    "wt", delete=False, dir=os.path.dirname(SESSION_FILE)) as f:
                f.write(json.dumps(d, indent=4, sort_keys=True))
            with self._lock:
                os.rename(f.name, SESSION_FILE)
    
        def prune_old_entries(self):
            old = self.read_session_file()
            new = old.copy()
            now = datetime.datetime.now()
            for file, entry in old.items():
                tstamp = entry["last_update"]
                last_update = datetime.datetime.strptime(
                    tstamp, '%Y-%m-%d %H:%M:%S.%f')
                delta_days = (now - last_update).days
                if delta_days > RM_FILE_OLDER_THAN_DAYS:
                    log("removing old saved file %r" % file)
                    del new[file]
            if new != old:
                self.write_session_file(new)
    
        # --- operations
    
        def add_entry(self, file, x, y):
            d = self.read_session_file()
            d[file] = dict(
                x=x,
                y=y,
                last_update=str(datetime.datetime.now()),
            )
            self.write_session_file(d)
    
        def load_entry(self, file):
            d = self.read_session_file()
            try:
                return d[file]
            except KeyError:
                return None
    
    
    session = Session()
    
    
    class Events(sublime_plugin.EventListener):
    
        # --- utils
    
        @staticmethod
        def get_cursor_pos(view):
            x, y = view.rowcol(view.sel()[0].begin())
            return x, y
    
        @staticmethod
        def set_cursor_pos(view, x, y):
            pt = view.text_point(x, y)
            view.sel().clear()
            view.sel().add(sublime.Region(pt))
            view.show(pt)
    
        def save_cursor_position(self, view):
            file_name = view.file_name()
            if file_name is None:
                return  # non-existent file
            log("saving cursor position for %s" % file_name)
            x, y = self.get_cursor_pos(view)
            session.add_entry(file_name, x, y)
    
        def load_cursor_position(self, view):
            entry = session.load_entry(view.file_name())
            if entry:
                self.set_cursor_pos(view, entry["x"], entry["y"])
    
        # --- callbacks
    
        def on_close(self, view):
            # called when a file is closed
            self.save_cursor_position(view)
    
        def on_load(self, view):
            # called when a file is opened
            self.load_cursor_position(view)
    

Social

Feeds