Blog posts for tags/new-api

  1. Making psutil twice as fast

    Starting from psutil 5.0.0 you can query multiple Process fields around twice as fast as before (see #799 and Process.oneshot() doc). It took 7 months, 108 commits, and a massive refactoring of psutil internals (PR-937), and I think it's one of the best improvements ever shipped in a psutil release.

    The problem

    How process information is retrieved varies by OS. Sometimes it means reading a file in /proc (Linux), other times calling C (Windows, BSD, macOS, SunOS), but it's always done differently. Psutil abstracts this away: you call Process.name() without worrying about what happens under the hood or which OS you're on.

    Internally, multiple pieces of process info (e.g. Process.name(), Process.ppid(), Process.uids(), Process.create_time()) are fetched by the same syscall. On Linux we read /proc/PID/stat to get the process name, terminal, CPU times, creation time, status and parent PID, but only one value is returned: the others are discarded. On Linux this code reads /proc/PID/stat 6 times:

    >>> import psutil
    >>> p = psutil.Process()
    >>> p.name()
    >>> p.cpu_times()
    >>> p.create_time()
    >>> p.ppid()
    >>> p.status()
    >>> p.terminal()
    

    On BSD most process metrics can be fetched with a single sysctl(), yet psutil was invoking it for each process method (e.g. see here and here).

    Do it in one shot

    It's clear that this approach is inefficient, especially in tools like top or htop, where process info is continuously fetched in a loop. psutil 5.0.0 introduces a new Process.oneshot() context manager. Inside it, the internal routine runs once (in the example, on the first Process.name() call) and the other values are cached. Subsequent calls sharing the same internal routine (read /proc/PID/stat, call sysctl() or whatever) return the cached value. The code above can now be rewritten like this, and on Linux it runs 2.4 times faster:

    >>> import psutil
    >>> p = psutil.Process()
    >>> with p.oneshot():
    ...     p.name()
    ...     p.cpu_times()
    ...     p.create_time()
    ...     p.ppid()
    ...     p.status()
    ...     p.terminal()
    

    Implementation

    One great thing about psutil's design is its abstraction. It is divided into 3 "layers". Layer 1 is represented by the main Process class (Python), which exposes the high-level API. Layer 2 is the OS-specific Python module, which is a thin wrapper on top of the OS-specific C extension module (layer 3).

    Because the code was organized this way (modular), the refactoring was reasonably smooth. I first refactored those C functions that collect multiple pieces of info and grouped them into a single function (e.g. see BSD implementation). Then I wrote a decorator that enables the cache only when requested (when entering the context manager), and decorated the "grouped functions" with it. The caching mechanism is controlled by the Process.oneshot() context manager, which is the only thing exposed to the end user. Here's the decorator:

    def memoize_when_activated(fun):
        """A memoize decorator which is disabled by default. It can be
        activated and deactivated on request.
        """
        @functools.wraps(fun)
        def wrapper(self):
            if not wrapper.cache_activated:
                return fun(self)
            else:
                try:
                    ret = cache[fun]
                except KeyError:
                    ret = cache[fun] = fun(self)
                return ret
    
        def cache_activate():
            """Activate cache."""
            wrapper.cache_activated = True
    
        def cache_deactivate():
            """Deactivate and clear cache."""
            wrapper.cache_activated = False
            cache.clear()
    
        cache = {}
        wrapper.cache_activated = False
        wrapper.cache_activate = cache_activate
        wrapper.cache_deactivate = cache_deactivate
        return wrapper
    

    To measure the speedup I wrote a benchmark script (well, two actually), and kept tuning until I was sure the change actually made psutil faster. The scripts report the speedup for calling all the "grouped" methods together (best-case scenario).

    Linux: +2.56x speedup

    The Linux implementation is mostly Python, reading files in /proc. These files typically expose multiple pieces of info per process; /proc/PID/stat and /proc/PID/status are the perfect example. We aggregate them into three groups. See the relevant code here.

    Windows: from +1.9x to +6.5x speedup

    Windows is an interesting one. For a process owned by our user, we group only Process.num_threads(), Process.num_ctx_switches() and Process.num_handles(), for a +1.9x speedup if we access those methods in one shot.

    Windows is special though, because certain methods have a dual implementation (#304): a "fast method" is tried first, but if the process is owned by another user it fails with AccessDenied. psutil then falls back to a second, "slower" method (see here for example).

    It's slower because it iterates over all PIDs, but unlike the "plain" Windows APIs it can still retrieve multiple pieces of information in one shot: number of threads, context switches, handles, CPU times, create time, and I/O counters.

    That's why querying processes owned by other users results in an impressive +6.5x speedup.

    macOS: +1.92x speedup

    On macOS we can get 2 groups of information. With sysctl() we get process parent PID, uids, gids, terminal, create time, name. With proc_info() we get CPU times (for PIDs owned by another user), memory metrics and ctx switches. Not bad.

    BSD: +2.18x speedup

    On BSD we gather tons of process info just by calling sysctl() (see implementation): process name, ppid, status, uids, gids, IO counters, CPU and create times, terminal and ctx switches.

    SunOS: +1.37x speedup

    SunOS is like Linux (it reads files in /proc), but the code is in C. Here too, we group different metrics together (see here and here).

    Discussion

  2. Windows services support

    New psutil 4.2.0 is out. The highlight of this release is the support for Windows services (executables that run at system startup, similar to UNIX init scripts):

    >>> import psutil
    >>> list(psutil.win_service_iter())
    [<WindowsService(name='AeLookupSvc', display_name='Application Experience') at 38850096>,
     <WindowsService(name='ALG', display_name='Application Layer Gateway Service') at 38850128>,
     <WindowsService(name='APNMCP', display_name='Ask Update Service') at 38850160>,
     <WindowsService(name='AppIDSvc', display_name='Application Identity') at 38850192>,
     ...]
    >>> s = psutil.win_service_get('alg')
    >>> s.as_dict()
    {'binpath': 'C:\\Windows\\System32\\alg.exe',
     'description': 'Provides support for 3rd party protocol plug-ins for Internet Connection Sharing',
     'display_name': 'Application Layer Gateway Service',
     'name': 'alg',
     'pid': None,
     'start_type': 'manual',
     'status': 'stopped',
     'username': 'NT AUTHORITY\\LocalService'}
    

    I decided to do this mainly because I find pywin32 APIs too low levelish. Having something like this in psutil can be useful to discover and monitor services more easily. The code was implemented in PR-803. The API for querying a service is similar to psutil.Process. You can get a reference to a service object by using its name (which is unique for every service) and then use methods like WindowsService.name() and WindowsService.status():

    >>> s = psutil.win_service_get('alg')
    >>> s.name()
    'alg'
    >>> s.status()
    'stopped'
    

    Initially I thought about providing a full set of APIs to handle all aspects of service management, including start(), stop(), restart(), install(), uninstall() and modify(). However, I soon realized I would have ended up reimplementing what pywin32 already provides, at the cost of overcrowding the psutil API (see my reasoning here). I think psutil really focuses on monitoring, not on installing and modifying system components, especially something as critical as a Windows service.

    Considerations about Windows services

    Typically, a Windows service is an executable (.exe) that runs at system startup and continues running in the background. It is roughly the equivalent of a UNIX init script. All services are controlled by a "manager", which keeps track of their status and metadata (e.g. description, startup type). It is interesting to note that since (most) services are bound to an executable (and hence a process) you can reference them via their process PID:

    >>> s = psutil.win_service_get('sshd')
    >>> s
    <WindowsService(name='sshd', display_name='Open SSH server') at 38853046>
    >>> s.pid()
    1865
    >>> p = psutil.Process(1865)
    >>> p
    <psutil.Process(pid=19547, name='sshd.exe') at 140461487781328>
    >>> p.exe()
    'C:\CygWin\bin\sshd'
    

    Other improvements

    psutil 4.2.0 comes with 2 other enhancements for Linux:

    • psutil.virtual_memory() returns a new shared memory field. This is the same value reported by free cmdline utility.
    • I changed how /proc was parsed. Instead of reading /proc/{pid}/status line by line I used a regular expression. Here's the speedups:
      • Process.ppid() ~20% faster.
      • Process.status() ~28% faster.
      • Process.name() ~25% faster.
      • Process.num_threads() ~20% faster (on Python 3 only; on Python 2 it's a bit slower; I suppose the re module received some improvements).

    Discussion

  3. Real process memory and environ in Python

    psutil 4.0.0 is out, with some interesting news about process memory metrics. I'll get straight to the point and describe what's new.

    "Real" process memory info

    Determining how much memory a process really uses is not an easy matter (see this and this). RSS (Resident Set Size), which most people rely on, is misleading because it includes both memory unique to the process and memory shared with others. What's more interesting for profiling is the memory that would be freed if the process were terminated right now. In the Linux world this is called USS (Unique Set Size), the major feature introduced in psutil 4.0.0 (not only for Linux but also for Windows and macOS).

    USS memory

    The USS (Unique Set Size) is the memory unique to a process, that would be freed if the process were terminated right now. On Linux it can be determined by parsing the "private" blocks in /proc/PID/smaps. The Firefox team pushed this further and got it working on macOS and Windows too.

    >>> psutil.Process().memory_full_info()
    pfullmem(rss=101990, vms=521888, shared=38804, text=28200, lib=0, data=59672, dirty=0, uss=81623, pss=91788, swap=0)
    

    PSS and swap

    On Linux there are two additional metrics that can also be determined via /proc/PID/smaps: PSS and swap.

    pss, aka "Proportional Set Size", represents the amount of memory shared with other processes, accounted so that the amount is divided evenly between the processes that share it. I.e. if a process has 10 MBs all to itself (USS) and 10 MBs shared with another process, its PSS will be 15 MBs.

    swap is simply the amount of memory that has been swapped out to disk. With Process.memory_full_info() it is possible to implement a tool like procsmem.py, similar to smem on Linux, which provides a list of processes sorted by uss. It's interesting to see how rss differs from uss:

    ~/svn/psutil$ ./scripts/procsmem.py
    PID     User    Cmdline                            USS     PSS    Swap     RSS
    ==============================================================================
    ...
    3986    giampao /usr/bin/python3 /usr/bin/indi   15.3M   16.6M      0B   25.6M
    3906    giampao /usr/lib/ibus/ibus-ui-gtk3       17.6M   18.1M      0B   26.7M
    3991    giampao python /usr/bin/hp-systray -x    19.0M   23.3M      0B   40.7M
    3830    giampao /usr/bin/ibus-daemon --daemoni   19.0M   19.0M      0B   21.4M
    20529   giampao /opt/sublime_text/plugin_host    19.9M   20.1M      0B   22.0M
    3990    giampao nautilus -n                      20.6M   29.9M      0B   50.2M
    3898    giampao /usr/lib/unity/unity-panel-ser   27.1M   27.9M      0B   37.7M
    4176    giampao /usr/lib/evolution/evolution-c   35.7M   36.2M      0B   41.5M
    20712   giampao /usr/bin/python -B /home/giamp   45.6M   45.9M      0B   49.4M
    3880    giampao /usr/lib/x86_64-linux-gnu/hud/   51.6M   52.7M      0B   61.3M
    20513   giampao /opt/sublime_text/sublime_text   65.8M   73.0M      0B   87.9M
    3976    giampao compiz                          115.0M  117.0M      0B  130.9M
    32486   giampao skype                           145.1M  147.5M      0B  149.6M
    

    Implementation

    To get these values (uss, pss and swap) we need to walk the whole process address space. This usually requires higher privileges and is considerably slower than Process.memory_info(), which is probably why tools like ps and top show RSS/VMS instead of USS. A big thanks goes to the Mozilla team for figuring this out on Windows and macOS, and to Eric Rahm who put the psutil PRs together (see PR-744, PR-745 and PR-746). If you don't use Python and want to port the code to another language, here are the interesting parts:

    Memory type percent

    After reorganizing the process memory APIs (PR-744), I added a new memtype parameter to Process.memory_percent(). You can now compare a specific memory type (not only RSS) against the total physical memory. E.g.

    >>> psutil.Process().memory_percent(memtype='pss')
    0.06877466326787016
    

    Process environ

    The second biggest improvement in psutil 4.0.0 is the ability to read a process's environment variables. This opens up interesting possibilities for process recognition and monitoring. For instance, you can start a process with a custom environment variable, then iterate over all processes to find the one of interest:

    import psutil
    for p in psutil.process_iter():
        try:
            env = p.environ()
        except psutil.Error:
            pass
        else:
            if 'MYAPP' in env:
                ...
    

    Process environ was a long-standing issue (#52, from 2009) that I gave up on because the Windows implementation only worked for the current process. Frank Benkstein solved that (PR-747), and it now works on Linux, Windows and macOS for all processes (you may still hit AccessDenied for processes owned by another user):

    >>> import psutil
    >>> from pprint import pprint as pp
    >>> pp(psutil.Process().environ())
    {...
     'CLUTTER_IM_MODULE': 'xim',
     'COLORTERM': 'gnome-terminal',
     'COMPIZ_BIN_PATH': '/usr/bin/',
     'HOME': '/home/giampaolo',
     'PWD': '/home/giampaolo/svn/psutil',
      }
    >>>
    

    Note that the resulting dict usually doesn't reflect changes made after the process started (e.g. os.environ['MYAPP'] = '1'). Again, for anyone porting this to other languages, here are the interesting parts:

    Extended disk IO stats

    psutil.disk_io_counters() now reports additional metrics on Linux and FreeBSD:

    • busy_time: the time spent doing actual I/Os (in milliseconds).
    • read_merged_count and write_merged_count (Linux only): the number of merged reads and writes (see the iostats doc).

    These give a better picture of actual disk utilization (#756), similar to the iostat command on Linux.

    OS constants

    Given the growing number of platform-specific metrics, I added a set of constants to tell which platform you're on: psutil.LINUX, psutil.WINDOWS, etc.

    Other fixes

    The complete list of changes is available in the changelog.

    Porting code

    Since 4.0.0 is a major version, I took the chance to (lightly) change / break some APIs.

    • Process.memory_info() no longer returns just an (rss, vms) namedtuple. It returns a variable-length namedtuple that varies by platform (rss and vms are always present, even on Windows). Essentially the same result as the old Process.memory_info_ex(). This shouldn't break your code unless you were doing rss, vms = p.memory_info().
    • Process.memory_info_ex() is deprecated. It still works as an alias for Process.memory_info(), issuing a DeprecationWarning.
    • psutil.disk_io_counters() on NetBSD and OpenBSD no longer returns write_count and read_count because the kernel doesn't provide them (we were returning the busy time instead). Should be a small issue given NetBSD and OpenBSD support is very recent.

    Discussion

  4. Reimplementing ifconfig in Python

    Here we are. It's been a long time since my last blog post and my last psutil release. The reason? I've been travelling! I mean... a lot. I've spent 3 months in Berlin, 3 weeks in Japan and 2 months in New York City. While I was there I finally had the chance to meet my friend Jay Loden in person. We originally started working on psutil together 7 years ago.

    Back then I didn't know any C (and I'm still a terrible C developer), so he was crucial in developing the initial psutil skeleton, including macOS and Windows support. Needless to say that this release builds on that work.

    net_if_addrs()

    We're now able to list network interface addresses similarly to the ifconfig command on UNIX:

    >>> import psutil
    >>> from pprint import pprint
    >>> pprint(psutil.net_if_addrs())
    {'ethernet0': [snic(family=<AddressFamily.AF_INET: 2>,
                        address='10.0.0.4',
                        netmask='255.0.0.0',
                        broadcast='10.255.255.255'),
                   snic(family=<AddressFamily.AF_PACKET: 17>,
                        address='9c:eb:e8:0b:05:1f',
                        netmask=None,
                        broadcast='ff:ff:ff:ff:ff:ff')],
     'localhost': [snic(family=<AddressFamily.AF_INET: 2>,
                        address='127.0.0.1',
                        netmask='255.0.0.0',
                        broadcast='127.0.0.1'),
                   snic(family=<AddressFamily.AF_PACKET: 17>,
                        address='00:00:00:00:00:00',
                        netmask=None,
                        broadcast='00:00:00:00:00:00')]}
    

    This is limited to AF_INET (IPv4), AF_INET6 (IPv6) and AF_LINK (Ethernet) address families. If you want something more powerful (e.g. AF_BLUETOOTH) you can take a look at the netifaces extension. If you want to see how this is implemented, here's the code for POSIX and Windows:

    net_if_stats()

    This new function returns information about network interface cards:

    >>> import psutil
    >>> from pprint import pprint
    >>> pprint(psutil.net_if_stats())
    {'ethernet0': snicstats(isup=True,
                            duplex=<NicDuplex.NIC_DUPLEX_FULL: 2>,
                            speed=100,
                            mtu=1500),
     'localhost': snicstats(isup=True,
                            duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>,
                            speed=0,
                            mtu=65536)}
    

    The implementation on each platform:

    Also in 3.0

    Beyond the network-interface APIs, psutil 3.0 ships a few other notable changes.

    Several integer/string constants (IOPRIO_CLASS_*, NIC_DUPLEX_*, *_PRIORITY_CLASS) now return enum values on Python 3.4+.

    Support for zombie processes on UNIX was broken. Covered in a separate post.

    Removal of deprecated APIs

    All aliases deprecated in the psutil 2.0 porting guide (January 2014) are gone. For the full list see the changelog.

    Final words

    I must say I'm pretty satisfied with how psutil is evolving and with the enjoyment I still get every time I work on it. It now gets almost 800,000 downloads a month, which is quite remarkable for a Python library.

    At this point, I consider psutil almost "complete" feature-wise, meaning I'm starting to run out of ideas for what to add next (see TODO). Going forward, development will likely focus on supporting more exotic platforms (OpenBSD #562, NetBSD PR-557, Android #355).

    There have also been discussions on the python-ideas mailing list about including psutil in the Python stdlib, but even if that happens, it's still a long way off, as it would require a significant time investment that I currently don't have.

Social

Feed