Blog posts for tags/python-core

  1. From Python 3.3 to today: ending 15 years of subprocess polling

    One of the less fun aspects of process management on POSIX systems is waiting for a process to terminate. The standard library's subprocess module has relied on a busy-loop polling approach since the timeout parameter was added to subprocess.Popen.wait() in Python 3.3, around 15 years ago (see source). And psutil's Process.wait() method uses exactly the same technique (see source).

    The logic is straightforward: check whether the process has exited using non-blocking waitpid(WNOHANG), sleep briefly, check again, sleep a bit longer, and so on.

    import os, time
    
    def wait_busy(pid, timeout):
        end = time.monotonic() + timeout
        interval = 0.0001
        while time.monotonic() < end:
            pid_done, _ = os.waitpid(pid, os.WNOHANG)
            if pid_done:
                return
            time.sleep(interval)
            interval = min(interval * 2, 0.04)
        raise TimeoutError
    

    In this blog post I'll show how I finally addressed this long-standing inefficiency, first in psutil, and most excitingly, directly in CPython's standard library subprocess module.

    The problem with busy-polling

    • CPU wake-ups: even with exponential backoff (starting at 0.1ms, capping at 40ms), the system constantly wakes up to check process status, wasting CPU cycles and draining batteries.
    • Latency: there's always a gap between when a process actually terminates and when you detect it.
    • Scalability: monitoring many processes simultaneously magnifies all of the above.

    Event-driven waiting

    All POSIX systems provide at least one mechanism to be notified when a file descriptor becomes ready. These are select(), poll(), epoll() (Linux) and kqueue() (BSD / macOS) system calls. Until recently, I believed they could only be used with file descriptors referencing sockets, pipes, etc., but it turns out they can also be used to wait for events on process PIDs!

    Linux

    In 2019, Linux 5.3 introduced a new syscall, os.pidfd_open(), which was added in Python 3.9. It returns a file descriptor referencing a process PID. The interesting thing is that pidfd_open() can be used in conjunction with select(), poll() or epoll() to effectively wait until the process exits. E.g. by using poll():

    import os, select
    
    def wait_pidfd(pid, timeout):
        pidfd = os.pidfd_open(pid)
        poller = select.poll()
        poller.register(pidfd, select.POLLIN)
        # block until process exits or timeout occurs
        events = poller.poll(timeout * 1000)
        if events:
            return
        raise TimeoutError
    

    This approach has zero busy-looping. The kernel wakes us up exactly when the process terminates or when the timeout expires if the PID is still alive.

    I chose poll() over select() because select() has a historical file descriptor limit (FD_SETSIZE), which typically caps it at 1024 file descriptors per-process (reminded me of BPO-1685000).

    I chose poll() over epoll() because it does not require creating an additional file descriptor. It also needs only a single syscall, which should make it a bit more efficient when monitoring a single FD rather than many.

    macOS and BSD

    BSD-derived systems (including macOS) provide the kqueue() syscall. It's conceptually similar to select(), poll() and epoll(), but more powerful (e.g. it can also handle regular files). kqueue() can be passed a PID directly, and it will return once the PID disappears or the timeout expires:

    import select
    
    def wait_kqueue(pid, timeout):
        kq = select.kqueue()
        kev = select.kevent(
            pid,
            filter=select.KQ_FILTER_PROC,
            flags=select.KQ_EV_ADD | select.KQ_EV_ONESHOT,
            fflags=select.KQ_NOTE_EXIT,
        )
        # block until process exits or timeout occurs
        events = kq.control([kev], 1, timeout)
        if events:
            return
        raise TimeoutError
    

    Windows

    Windows does not busy-loop, both in psutil and subprocess module, thanks to WaitForSingleObject. This means Windows has effectively had event-driven process waiting from the start. So nothing to do on that front.

    Graceful fallbacks

    Both pidfd_open() and kqueue() can fail for different reasons. For example, with EMFILE if the process runs out of file descriptors (usually 1024), or with EACCES / EPERM if the syscall was explicitly blocked at the system level by the sysadmin (e.g. via SECCOMP). In all cases, psutil silently falls back to the traditional busy-loop polling approach rather than raising an exception.

    This fast-path-with-fallback approach is similar in spirit to BPO-33671, where I sped up shutil.copyfile() by using zero-copy system calls back in 2018. In there, more efficient os.sendfile() is attempted first, and if it fails (e.g. on network filesystems) we fall back to the traditional read() / write() approach to copy regular files.

    Measurement

    As a simple experiment, here's a simple program which waits on itself for 10 seconds without terminating:

    # test.py
    import psutil, os
    try:
        psutil.Process(os.getpid()).wait(timeout=10)
    except psutil.TimeoutExpired:
        pass
    

    We can measure the CPU context switching using /usr/bin/time -v. Before the patch (the busy-loop):

    $ /usr/bin/time -v python3 test.py 2>&1 | grep context
        Voluntary context switches: 258
        Involuntary context switches: 4
    

    After the patch (the event-driven approach):

    $ /usr/bin/time -v python3 test.py 2>&1 | grep context
        Voluntary context switches: 2
        Involuntary context switches: 1
    

    This shows that instead of spinning in userspace, the process blocks in poll() / kqueue(), and is woken up only when the kernel notifies it, resulting in just a few CPU context switches.

    Sleeping state

    It's also interesting to note that waiting via poll() (or kqueue()) puts the process into the exact same sleeping state as a plain time.sleep call. From the kernel's perspective, both are interruptible sleeps: the process is de-scheduled, consumes zero CPU, and sits quietly in kernel space.

    The "S+" state shown below by ps means that the process "sleeps in foreground".

    • time.sleep:
    $ (python3 -c 'import time; time.sleep(10)' & pid=$!; sleep 0.3; ps -o pid,stat,comm -p $pid) && fg &>/dev/null
        PID STAT COMMAND
     491573 S+   python3
    
    • select.poll:
    $ (python3 -c 'import os,select; fd = os.pidfd_open(os.getpid(),0); p = select.poll(); p.register(fd,select.POLLIN); p.poll(10_000)' & pid=$!; sleep 0.3; ps -o pid,stat,comm -p $pid) && fg &>/dev/null
        PID STAT COMMAND
     491748 S+   python3
    

    CPython contribution

    After landing the psutil implementation (PR-2706), I took the extra step and submitted a matching pull request for CPython subprocess module: cpython/PR-144047.

    I'm especially proud of this one: this is the third time in psutil's 17+ year history that a feature developed in psutil made its way upstream into the Python standard library.

    • The first was back in 2010, when Process.nice() inspired os.getpriority() and os.setpriority(), see BPO-10784. Landed in Python 3.3.
    • The second was back in 2011, when psutil.disk_usage() inspired shutil.disk_usage(), see python-ideas ML proposal. Landed in Python 3.3.

    Funny thing: 15 years ago, Python 3.3 added the timeout parameter to subprocess.Popen.wait (see commit). That's probably where I took inspiration when I first added the timeout parameter to psutil's Process.wait() around the same time (see commit). Now, 15 years later, I'm contributing back a similar improvement for that very same timeout parameter. The circle is complete.

  2. Wheels for free-threaded Python now available

    With the release of psutil 7.1.2, wheels for free-threaded Python are now available. This milestone was achieved largely through a community effort, as several internal refactorings to the C code were required to make it possible (see #2565). Many of these changes were contributed by Lysandros Nikolaou. Thanks to him for the effort and for bearing with me in code reviews! ;-)

    What is free-threaded Python?

    Free-threaded Python (available since Python 3.13) refers to Python builds that are compiled with the GIL (Global Interpreter Lock) disabled, allowing true parallel execution of Python bytecodes across multiple threads. This is particularly beneficial for CPU-bound applications, as it enables better utilization of multi-core processors.

    The state of free-threaded wheels

    According to Hugo van Kemenade's free-threaded wheels tracker, the adoption of free-threaded wheels among the top 360 most-downloaded PyPI packages with C extensions is still limited. Only 128 out of these 360 packages provide wheels compiled for free-threaded Python, meaning they can run on Python builds with the GIL disabled. This shows that, while progress has been made, most popular packages with C extensions still do not offer ready-made wheels for free-threaded Python.

    What it means for users

    When a library author provides a wheel, users can install a pre-compiled binary package without having to build it from source. This is especially important for packages with C extensions, like psutil, which is largely written in C. Such packages often have complex build requirements and require installing a C compiler. On Windows, that means installing Visual Studio or the Build Tools, which can take several gigabytes and a significant setup effort. Providing wheels spares users from this hassle, makes installation far simpler, and is effectively essential for the users of that package. You basically pip install psutil and you're done.

    What it means for library authors

    Currently, universal wheels for free-threaded Python do not exist. Each wheel must be built specifically for a Python version. Right now authors must create separate wheels for Python 3.13 and 3.14. Which means distributing a lot of files already:

    psutil-7.1.2-cp313-cp313t-macosx_10_13_x86_64.whl
    psutil-7.1.2-cp313-cp313t-macosx_11_0_arm64.whl
    psutil-7.1.2-cp313-cp313t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl
    psutil-7.1.2-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
    psutil-7.1.2-cp313-cp313t-win_amd64.whl
    psutil-7.1.2-cp313-cp313t-win_arm64.whl
    psutil-7.1.2-cp314-cp314t-macosx_10_15_x86_64.whl
    psutil-7.1.2-cp314-cp314t-macosx_11_0_arm64.whl
    psutil-7.1.2-cp314-cp314t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl
    psutil-7.1.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
    psutil-7.1.2-cp314-cp314t-win_amd64.whl
    psutil-7.1.2-cp314-cp314t-win_arm64.whl
    

    This also multiplies CI jobs and slows down the test matrix (see build.yml). A true universal wheel would greatly reduce this overhead, allowing a single wheel to support multiple Python versions and platforms. Hopefully, Python 3.15 will simplify this process. Two competing proposals, PEP 803 and PEP 809, aim to standardize wheel naming and metadata to allow producing a single wheel that covers multiple Python versions. That would drastically reduce distribution complexity for library authors, and it's fair to say it's essential for free-threaded CPython to truly succeed.

    How to install free-threaded psutil

    You can now install psutil for free-threaded Python directly via pip:

    pip install psutil --only-binary=:all:
    

    This ensures you get the pre-compiled wheels without triggering a source build.

    Discussion

  3. Fixing Unicode across Python 2 and 3

    This one took a while. Adding proper Unicode support to psutil took four months of auditing, design decisions, and rewriting nearly every API that returned a string. The full journey is documented in #1040, and what follows is a summary.

    This can serve as a case study for any Python library with a C extension that needs to support both Python 2 and Python 3, as it will encounter the exact same set of problems.

    What was broken

    psutil has different APIs returning a string, many of which misbehaved when it came to unicode. There were three distinctive problems (#1040). Each API could:

    • A: raise a decoding error for non-ASCII strings (Python 3).
    • B: return unicode instead of str (Python 2).
    • C: return incorrect / invalid encoded data for non-ASCII strings (both).

    Process.memory_maps() hit all three on various OSes. disk_partitions() raised decoding errors on every UNIX except Linux. Windows service methods leaked unicode into Python 2 return values. The C extension had accumulated years of ad-hoc encode/decode decisions, with no single rule covering all of them.

    It was a mess.

    Filesystem or locale encoding?

    First problem was that the C extension was using 2 approaches when it came to decoding and returning a string: PyUnicode_DecodeFSDefault (filesystem encoding) for path-like APIs, and PyUnicode_DecodeLocale (user locale) for non-path strings like Process.username().

    It appeared clear that I had to use PyUnicode_DecodeFSDefault for all filesystem-related APIs like Process.exe() and Process.open_files().

    It was less clear, though, when to use PyUnicode_DecodeLocale.

    After some back and forth, I decided to use a single encoding for all APIs: the filesystem encoding (PyUnicode_DecodeFSDefault). This makes the encoding choice an implementation detail of psutil, not something the user has to care about.

    Error handling

    Second question was what to do in case the string cannot be correctly decoded (because invalid, corrupted or whatever). On Python 3 + UNIX the natural choice was 'surrogateescape', which is also the default for PyUnicode_DecodeFSDefault. On Windows the default is 'surrogatepass' (Python 3.6) or 'replace' as per PEP 529.

    And here come the troubles: Python 2 is different. To correctly handle all kinds of strings on Python 2 we should return unicode instead of str, but I didn't want to do that, nor have APIs which return two different types depending on the circumstance.

    Since unicode support is already broken in Python 2 and its stdlib (see bpo-18695), I was happy to always return str, use 'replace' as the error handler, and simply consider unicode support in psutil + Python 2 broken.

    Final behavior

    Starting from 5.3.0, psutil behaves consistently across all APIs that return a string. The rules are intentionally simple, even if the underlying implementation is not.

    The notes below apply to any method returning a string such as Process.exe() or Process.cwd(), including non-filesystem-related methods such as Process.username():

    • all strings are encoded using the OS filesystem encoding (PyUnicode_DecodeFSDefault), which varies depending on the platform you're on (e.g. 'UTF-8' on Linux, 'mbcs' on Windows).

    • no API call is supposed to crash with UnicodeDecodeError.

    • in case of badly encoded data returned by the OS, the following error handlers are used to replace the bad characters in the string:

      • Python 2: 'replace'.
      • Python 3: 'surrogateescape' on POSIX, 'replace' on Windows.
    • on Python 2 all APIs return bytes (str type), never unicode.

    • on Python 2 you can go back to unicode by doing:

      >>> unicode(proc.exe(), sys.getdefaultencoding(), errors="replace")
      

    The full journey was implemented in PR-1052, and shipped in 5.3.0 (see the changelog).

Social

Feed