Blog posts for tags/compatibility

  1. Letting go of Python 2.7

    About dropping Python 2.7 support in psutil, 3 years ago I stated (#2014):

    Not a chance, for many years to come. [Python 2.7] currently represents 7-10% of total downloads, meaning around 70k / 100k downloads per day.

    Only 3 years later, and to my surprise, downloads for Python 2.7 dropped to 0.36%! As such, as of psutil 7.0.0, I finally decided to drop support for Python 2.7!

    The numbers

    These are downloads per month:

    $ pypinfo --percent psutil pyversion
    Served from cache: False
    Data processed: 4.65 GiB
    Data billed: 4.65 GiB
    Estimated cost: $0.03
    
    | python_version | percent | download_count |
    | -------------- | ------- | -------------- |
    | 3.10           |  23.84% |     26,354,506 |
    | 3.8            |  18.87% |     20,862,015 |
    | 3.7            |  17.38% |     19,217,960 |
    | 3.9            |  17.00% |     18,798,843 |
    | 3.11           |  13.63% |     15,066,706 |
    | 3.12           |   7.01% |      7,754,751 |
    | 3.13           |   1.15% |      1,267,008 |
    | 3.6            |   0.73% |        803,189 |
    | 2.7            |   0.36% |        402,111 |
    | 3.5            |   0.03% |         28,656 |
    | Total          |         |    110,555,745 |
    

    According to pypistats.org Python 2.7 downloads represent 0.28% of the total, around 15,000 downloads per day.

    The pain

    Keeping 2.7 alive had become increasingly difficult, but still possible: tests ran via old PyPI backports and a tweaked GitHub Actions workflow on Linux and macOS, plus a separate third-party service (Appveyor) for Windows. But the workarounds in the source kept piling up:

    • A Python compatibility layer (psutil/_compat.py) plus #if PY_MAJOR_VERSION <= 3 branches in C, with constant str-vs-unicode juggling on both sides.
    • No f-strings, and no free use of enum for constants (which ended up with a different API shape than on Python 3).
    • An outdated pip and outdated deps.
    • 4 extra CI jobs per commit (Linux, macOS, Windows 32-bit and 64-bit), making the pipeline slower and flakier.
    • 7 wheels specific to Python 2.7 to ship on every release:
    psutil-6.1.1-cp27-cp27m-macosx_10_9_x86_64.whl
    psutil-6.1.1-cp27-none-win32.whl
    psutil-6.1.1-cp27-none-win_amd64.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27m-manylinux2010_x86_64.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_i686.whl
    psutil-6.1.1-cp27-cp27mu-manylinux2010_x86_64.whl
    

    The removal

    The removal was done in PR-2481, which dropped around 1500 lines of code (nice!). It felt liberating. In doing so, in the doc I still made the promise that the 6.1.* series will keep supporting Python 2.7 and will receive critical bug-fixes only (no new features). It will be maintained in a specific python2 branch. I explicitly kept the setup.py script compatible with Python 2.7 in terms of syntax, so that, when the tarball is fetched from PyPI, it will emit an informative error message on pip install psutil. The user trying to install psutil on Python 2.7 will see:

    $ pip2 install psutil
    As of version 7.0.0 psutil no longer supports Python 2.7.
    Latest version supporting Python 2.7 is psutil 6.1.X.
    Install it with: "pip2 install psutil==6.1.*".
    
  2. Announcing psutil 5.6.0

    psutil 5.6.0 is out. Highlights: a new Process.parents() method, several important Windows improvements, and the removal of Process.memory_maps() on macOS.

    Process parents()

    The new method returns the parents of a process as a list of Process instances. If no parents are known, an empty list is returned.

    >>> import psutil
    >>> p = psutil.Process(5312)
    >>> p.parents()
    [psutil.Process(pid=4699, name='bash', started='09:06:44'),
     psutil.Process(pid=4689, name='gnome-terminal-server', started='09:06:44'),
     psutil.Process(pid=1, name='systemd', started='05:56:55')]
    

    Nothing fundamentally new here, since this is a convenience wrapper around Process.parent(), but it's still nice to have it built in. It pairs well with Process.children() when working with process trees. The idea was proposed by Ghislain Le Meur.

    Windows

    Certain Windows APIs that need to be dynamically loaded from DLLs are now loaded only once at startup, instead of on every function call. This makes some operations 50% to 100% faster; see benchmarks in PR-1422.

    Process.suspend() and Process.resume() previously iterated over all process threads via CreateToolhelp32Snapshot(), which was unorthodox and broke when the process had been suspended by Process Hacker. They now call the undocumented NtSuspendProcess() / NtResumeProcess() NT APIs, same as Process Hacker and Sysinternals tools. Discussed in #1379, implemented in PR-1435.

    SE DEBUG is a privilege bit set on the Python process at startup so psutil can query processes owned by other users (Administrator, Local System), meaning fewer AccessDenied exceptions for low-PID processes. The code setting it had presumably been broken for years and is now finally fixed in PR-1429.

    Removal of Process.memory_maps() on macOS

    Process.memory_maps() is gone on macOS (#1291). The underlying Apple API would randomly raise EINVAL or segfault the host process, and no amount of reverse-engineering produced a safe fix. So I removed it. This is covered in a separate post.

    Improved exceptions

    One problem that affected psutil maintenance over the years was receiving bug reports whose tracebacks did not indicate which syscall had actually failed. This was especially painful on Windows, where a single routine may invoke multiple Windows APIs. Now the OSError (or WindowsError) exception includes the syscall from which the error originated. See PR-1428.

    Other changes

    See the changelog.

  3. Removing Process.memory_maps() on macOS

    This is part of the psutil 5.6.0 release (see the full release notes).

    As of 5.6.0, Process.memory_maps() is no longer defined on macOS.

    The bug

    #1291: on macOS, Process.memory_maps() would either raise OSError: [Errno 22] Invalid argument or segfault the whole Python process! Both triggered from code as simple as psutil.Process().as_dict(), since Process.as_dict() iterates every attribute, and Process.memory_maps() is one of them.

    The root cause was inside Apple's undocumented proc_regionfilename() syscall. On some memory regions it returns EINVAL. On others it takes the process down. Which regions? Nobody figured out. Arnon Yaari (@wiggin15) did most of the investigation: he wrote a standalone C reproducer and walked me through what he'd tried.

    In PR-1436 I attempted a fix by reverse-engineering vmmap(1) but it didn't work. The fundamental problem is that vmmap is closed source and proc_regionfilename is undocumented. Neither my virtualized macOS (10.11.6) nor Travis CI (10.12.1) could reproduce the bug, which reproduced reliably only on 10.14.3.

    Why remove outright

    While removing the C code I noticed that the macOS unit test had been disabled long ago, presumably by me after recurring flaky Travis runs. Meaning that the method had been broken on some macOS versions far longer than the 2018 bug report suggested.

    Deprecating for a cycle didn't help either: raising AccessDenied breaks code that relied on a successful return, returning an empty list does the same silently, and leaving the method in place doesn't stop the segfault. Basically there was no sane solution, so since 5.6 is a major version I decided to just remove Process.memory_maps() for good.

    On macOS it never supported other processes anyway. Calling it on any PID other than the current one (or its children) raised AccessDenied, even as root.

    If someone finds a Mach API path that works, the method can return. Nobody has found one so far.

  4. Fixing Unicode across Python 2 and 3

    This one took a while. Adding proper Unicode support to psutil took four months of auditing, design decisions, and rewriting nearly every API that returned a string. The full journey is documented in #1040, and what follows is a summary.

    This can serve as a case study for any Python library with a C extension that needs to support both Python 2 and Python 3, as it will encounter the exact same set of problems.

    What was broken

    psutil has different APIs returning a string, many of which misbehaved when it came to unicode. There were three distinctive problems (#1040). Each API could:

    • A: raise a decoding error for non-ASCII strings (Python 3).
    • B: return unicode instead of str (Python 2).
    • C: return incorrect / invalid encoded data for non-ASCII strings (both).

    Process.memory_maps() hit all three on various OSes. disk_partitions() raised decoding errors on every UNIX except Linux. Windows service methods leaked unicode into Python 2 return values. The C extension had accumulated years of ad-hoc encode/decode decisions, with no single rule covering all of them.

    It was a mess.

    Filesystem or locale encoding?

    First problem was that the C extension was using 2 approaches when it came to decoding and returning a string: PyUnicode_DecodeFSDefault (filesystem encoding) for path-like APIs, and PyUnicode_DecodeLocale (user locale) for non-path strings like Process.username().

    It appeared clear that I had to use PyUnicode_DecodeFSDefault for all filesystem-related APIs like Process.exe() and Process.open_files().

    It was less clear, though, when to use PyUnicode_DecodeLocale.

    After some back and forth, I decided to use a single encoding for all APIs: the filesystem encoding (PyUnicode_DecodeFSDefault). This makes the encoding choice an implementation detail of psutil, not something the user has to care about.

    Error handling

    Second question was what to do in case the string cannot be correctly decoded (because invalid, corrupted or whatever). On Python 3 + UNIX the natural choice was 'surrogateescape', which is also the default for PyUnicode_DecodeFSDefault. On Windows the default is 'surrogatepass' (Python 3.6) or 'replace' as per PEP 529.

    And here come the troubles: Python 2 is different. To correctly handle all kinds of strings on Python 2 we should return unicode instead of str, but I didn't want to do that, nor have APIs which return two different types depending on the circumstance.

    Since unicode support is already broken in Python 2 and its stdlib (see bpo-18695), I was happy to always return str, use 'replace' as the error handler, and simply consider unicode support in psutil + Python 2 broken.

    Final behavior

    Starting from 5.3.0, psutil behaves consistently across all APIs that return a string. The rules are intentionally simple, even if the underlying implementation is not.

    The notes below apply to any method returning a string such as Process.exe() or Process.cwd(), including non-filesystem-related methods such as Process.username():

    • all strings are encoded using the OS filesystem encoding (PyUnicode_DecodeFSDefault), which varies depending on the platform you're on (e.g. 'UTF-8' on Linux, 'mbcs' on Windows).

    • no API call is supposed to crash with UnicodeDecodeError.

    • in case of badly encoded data returned by the OS, the following error handlers are used to replace the bad characters in the string:

      • Python 2: 'replace'.
      • Python 3: 'surrogateescape' on POSIX, 'replace' on Windows.
    • on Python 2 all APIs return bytes (str type), never unicode.

    • on Python 2 you can go back to unicode by doing:

      >>> unicode(proc.exe(), sys.getdefaultencoding(), errors="replace")
      

    The full journey was implemented in PR-1052, and shipped in 5.3.0 (see the changelog).

  5. Real process memory and environ in Python

    psutil 4.0.0 is out, with some interesting news about process memory metrics. I'll get straight to the point and describe what's new.

    "Real" process memory info

    Determining how much memory a process really uses is not an easy matter (see this and this). RSS (Resident Set Size), which most people rely on, is misleading because it includes both memory unique to the process and memory shared with others. What's more interesting for profiling is the memory that would be freed if the process were terminated right now. In the Linux world this is called USS (Unique Set Size), the major feature introduced in psutil 4.0.0 (not only for Linux but also for Windows and macOS).

    USS memory

    The USS (Unique Set Size) is the memory unique to a process, that would be freed if the process were terminated right now. On Linux it can be determined by parsing the "private" blocks in /proc/PID/smaps. The Firefox team pushed this further and got it working on macOS and Windows too.

    >>> psutil.Process().memory_full_info()
    pfullmem(rss=101990, vms=521888, shared=38804, text=28200, lib=0, data=59672, dirty=0, uss=81623, pss=91788, swap=0)
    

    PSS and swap

    On Linux there are two additional metrics that can also be determined via /proc/PID/smaps: PSS and swap.

    pss, aka "Proportional Set Size", represents the amount of memory shared with other processes, accounted so that the amount is divided evenly between the processes that share it. I.e. if a process has 10 MBs all to itself (USS) and 10 MBs shared with another process, its PSS will be 15 MBs.

    swap is simply the amount of memory that has been swapped out to disk. With Process.memory_full_info() it is possible to implement a tool like procsmem.py, similar to smem on Linux, which provides a list of processes sorted by uss. It's interesting to see how rss differs from uss:

    ~/svn/psutil$ ./scripts/procsmem.py
    PID     User    Cmdline                            USS     PSS    Swap     RSS
    ==============================================================================
    ...
    3986    giampao /usr/bin/python3 /usr/bin/indi   15.3M   16.6M      0B   25.6M
    3906    giampao /usr/lib/ibus/ibus-ui-gtk3       17.6M   18.1M      0B   26.7M
    3991    giampao python /usr/bin/hp-systray -x    19.0M   23.3M      0B   40.7M
    3830    giampao /usr/bin/ibus-daemon --daemoni   19.0M   19.0M      0B   21.4M
    20529   giampao /opt/sublime_text/plugin_host    19.9M   20.1M      0B   22.0M
    3990    giampao nautilus -n                      20.6M   29.9M      0B   50.2M
    3898    giampao /usr/lib/unity/unity-panel-ser   27.1M   27.9M      0B   37.7M
    4176    giampao /usr/lib/evolution/evolution-c   35.7M   36.2M      0B   41.5M
    20712   giampao /usr/bin/python -B /home/giamp   45.6M   45.9M      0B   49.4M
    3880    giampao /usr/lib/x86_64-linux-gnu/hud/   51.6M   52.7M      0B   61.3M
    20513   giampao /opt/sublime_text/sublime_text   65.8M   73.0M      0B   87.9M
    3976    giampao compiz                          115.0M  117.0M      0B  130.9M
    32486   giampao skype                           145.1M  147.5M      0B  149.6M
    

    Implementation

    To get these values (uss, pss and swap) we need to walk the whole process address space. This usually requires higher privileges and is considerably slower than Process.memory_info(), which is probably why tools like ps and top show RSS/VMS instead of USS. A big thanks goes to the Mozilla team for figuring this out on Windows and macOS, and to Eric Rahm who put the psutil PRs together (see PR-744, PR-745 and PR-746). If you don't use Python and want to port the code to another language, here are the interesting parts:

    Memory type percent

    After reorganizing the process memory APIs (PR-744), I added a new memtype parameter to Process.memory_percent(). You can now compare a specific memory type (not only RSS) against the total physical memory. E.g.

    >>> psutil.Process().memory_percent(memtype='pss')
    0.06877466326787016
    

    Process environ

    The second biggest improvement in psutil 4.0.0 is the ability to read a process's environment variables. This opens up interesting possibilities for process recognition and monitoring. For instance, you can start a process with a custom environment variable, then iterate over all processes to find the one of interest:

    import psutil
    for p in psutil.process_iter():
        try:
            env = p.environ()
        except psutil.Error:
            pass
        else:
            if 'MYAPP' in env:
                ...
    

    Process environ was a long-standing issue (#52, from 2009) that I gave up on because the Windows implementation only worked for the current process. Frank Benkstein solved that (PR-747), and it now works on Linux, Windows and macOS for all processes (you may still hit AccessDenied for processes owned by another user):

    >>> import psutil
    >>> from pprint import pprint as pp
    >>> pp(psutil.Process().environ())
    {...
     'CLUTTER_IM_MODULE': 'xim',
     'COLORTERM': 'gnome-terminal',
     'COMPIZ_BIN_PATH': '/usr/bin/',
     'HOME': '/home/giampaolo',
     'PWD': '/home/giampaolo/svn/psutil',
      }
    >>>
    

    Note that the resulting dict usually doesn't reflect changes made after the process started (e.g. os.environ['MYAPP'] = '1'). Again, for anyone porting this to other languages, here are the interesting parts:

    Extended disk IO stats

    psutil.disk_io_counters() now reports additional metrics on Linux and FreeBSD:

    • busy_time: the time spent doing actual I/Os (in milliseconds).
    • read_merged_count and write_merged_count (Linux only): the number of merged reads and writes (see the iostats doc).

    These give a better picture of actual disk utilization (#756), similar to the iostat command on Linux.

    OS constants

    Given the growing number of platform-specific metrics, I added a set of constants to tell which platform you're on: psutil.LINUX, psutil.WINDOWS, etc.

    Other fixes

    The complete list of changes is available in the changelog.

    Porting code

    Since 4.0.0 is a major version, I took the chance to (lightly) change / break some APIs.

    • Process.memory_info() no longer returns just an (rss, vms) namedtuple. It returns a variable-length namedtuple that varies by platform (rss and vms are always present, even on Windows). Essentially the same result as the old Process.memory_info_ex(). This shouldn't break your code unless you were doing rss, vms = p.memory_info().
    • Process.memory_info_ex() is deprecated. It still works as an alias for Process.memory_info(), issuing a DeprecationWarning.
    • psutil.disk_io_counters() on NetBSD and OpenBSD no longer returns write_count and read_count because the kernel doesn't provide them (we were returning the busy time instead). Should be a small issue given NetBSD and OpenBSD support is very recent.

    Discussion

Social

Feeds