Giampaolo Rodola Python enthusiast, core developer, psutil author

Blog posts for tags/new-api

Making psutil twice as fast
Featured 06 Nov 2016 Tags: psutil, python, performance, new-api, release

Starting from psutil 5.0.0 you can query multiple Process fields around twice as fast as before (see #799 and Process.oneshot() doc). It took 7 months, 108 commits, and a massive refactoring of psutil internals (PR-937), and I think it's one of the best improvements ever shipped in a psutil release.
The problem¶

How process information is retrieved varies by OS. Sometimes it means reading a file in /proc (Linux), other times calling C (Windows, BSD, macOS, SunOS), but it's always done differently. Psutil abstracts this away: you call Process.name() without worrying about what happens under the hood or which OS you're on.

Internally, multiple pieces of process info (e.g. Process.name(), Process.ppid(), Process.uids(), Process.create_time()) are fetched by the same syscall. On Linux we read /proc/PID/stat to get the process name, terminal, CPU times, creation time, status and parent PID, but only one value is returned: the others are discarded. On Linux this code reads /proc/PID/stat 6 times:
>>> import psutil >>> p = psutil.Process() >>> p.name() >>> p.cpu_times() >>> p.create_time() >>> p.ppid() >>> p.status() >>> p.terminal()
On BSD most process metrics can be fetched with a single sysctl(), yet psutil was invoking it for each process method (e.g. see here and here).
Do it in one shot¶

It's clear that this approach is inefficient, especially in tools like top or htop, where process info is continuously fetched in a loop. psutil 5.0.0 introduces a new Process.oneshot() context manager. Inside it, the internal routine runs once (in the example, on the first Process.name() call) and the other values are cached. Subsequent calls sharing the same internal routine (read /proc/PID/stat, call sysctl() or whatever) return the cached value. The code above can now be rewritten like this, and on Linux it runs 2.4 times faster:
>>> import psutil >>> p = psutil.Process() >>> with p.oneshot(): ... p.name() ... p.cpu_times() ... p.create_time() ... p.ppid() ... p.status() ... p.terminal()
Implementation¶

One great thing about psutil's design is its abstraction. It is divided into 3 "layers". Layer 1 is represented by the main Process class (Python), which exposes the high-level API. Layer 2 is the OS-specific Python module, which is a thin wrapper on top of the OS-specific C extension module (layer 3).

Because the code was organized this way (modular), the refactoring was reasonably smooth. I first refactored those C functions that collect multiple pieces of info and grouped them into a single function (e.g. see BSD implementation). Then I wrote a decorator that enables the cache only when requested (when entering the context manager), and decorated the "grouped functions" with it. The caching mechanism is controlled by the Process.oneshot() context manager, which is the only thing exposed to the end user. Here's the decorator:
def memoize_when_activated(fun): """A memoize decorator which is disabled by default. It can be activated and deactivated on request. """ @functools.wraps(fun) def wrapper(self): if not wrapper.cache_activated: return fun(self) else: try: ret = cache[fun] except KeyError: ret = cache[fun] = fun(self) return ret def cache_activate(): """Activate cache.""" wrapper.cache_activated = True def cache_deactivate(): """Deactivate and clear cache.""" wrapper.cache_activated = False cache.clear() cache = {} wrapper.cache_activated = False wrapper.cache_activate = cache_activate wrapper.cache_deactivate = cache_deactivate return wrapper
To measure the speedup I wrote a benchmark script (well, two actually), and kept tuning until I was sure the change actually made psutil faster. The scripts report the speedup for calling all the "grouped" methods together (best-case scenario).
Linux: +2.56x speedup¶

The Linux implementation is mostly Python, reading files in /proc. These files typically expose multiple pieces of info per process; /proc/PID/stat and /proc/PID/status are the perfect example. We aggregate them into three groups. See the relevant code here.

Windows: from +1.9x to +6.5x speedup¶

Windows is an interesting one. For a process owned by our user, we group only Process.num_threads(), Process.num_ctx_switches() and Process.num_handles(), for a +1.9x speedup if we access those methods in one shot.

Windows is special though, because certain methods have a dual implementation (#304): a "fast method" is tried first, but if the process is owned by another user it fails with AccessDenied. psutil then falls back to a second, "slower" method (see here for example).

It's slower because it iterates over all PIDs, but unlike the "plain" Windows APIs it can still retrieve multiple pieces of information in one shot: number of threads, context switches, handles, CPU times, create time, and I/O counters.

That's why querying processes owned by other users results in an impressive +6.5x speedup.

macOS: +1.92x speedup¶

On macOS we can get 2 groups of information. With sysctl() we get process parent PID, uids, gids, terminal, create time, name. With proc_info() we get CPU times (for PIDs owned by another user), memory metrics and ctx switches. Not bad.

BSD: +2.18x speedup¶

On BSD we gather tons of process info just by calling sysctl() (see implementation): process name, ppid, status, uids, gids, IO counters, CPU and create times, terminal and ctx switches.

SunOS: +1.37x speedup¶

SunOS is like Linux (it reads files in /proc), but the code is in C. Here too, we group different metrics together (see here and here).
Discussion¶
- Reddit
Windows services support
15 May 2016 Tags: psutil, python, windows, new-api, release

New psutil 4.2.0 is out. The highlight of this release is the support for Windows services (executables that run at system startup, similar to UNIX init scripts):
```
>>> import psutil
>>> list(psutil.win_service_iter())
[<WindowsService(name='AeLookupSvc', display_name='Application Experience') at 38850096>,
 <WindowsService(name='ALG', display_name='Application Layer Gateway Service') at 38850128>,
 <WindowsService(name='APNMCP', display_name='Ask Update Service') at 38850160>,
 <WindowsService(name='AppIDSvc', display_name='Application Identity') at 38850192>,
 ...]
>>> s = psutil.win_service_get('alg')
>>> s.as_dict()
{'binpath': 'C:\\Windows\\System32\\alg.exe',
 'description': 'Provides support for 3rd party protocol plug-ins for Internet Connection Sharing',
 'display_name': 'Application Layer Gateway Service',
 'name': 'alg',
 'pid': None,
 'start_type': 'manual',
 'status': 'stopped',
 'username': 'NT AUTHORITY\\LocalService'}
```
I decided to do this mainly because I find pywin32 APIs too low levelish. Having something like this in psutil can be useful to discover and monitor services more easily. The code was implemented in PR-803. The API for querying a service is similar to psutil.Process. You can get a reference to a service object by using its name (which is unique for every service) and then use methods like WindowsService.name() and WindowsService.status():
```
>>> s = psutil.win_service_get('alg')
>>> s.name()
'alg'
>>> s.status()
'stopped'
```
Initially I thought about providing a full set of APIs to handle all aspects of service management, including start(), stop(), restart(), install(), uninstall() and modify(). However, I soon realized I would have ended up reimplementing what pywin32 already provides, at the cost of overcrowding the psutil API (see my reasoning here). I think psutil really focuses on monitoring, not on installing and modifying system components, especially something as critical as a Windows service.
Considerations about Windows services¶

Typically, a Windows service is an executable (.exe) that runs at system startup and continues running in the background. It is roughly the equivalent of a UNIX init script. All services are controlled by a "manager", which keeps track of their status and metadata (e.g. description, startup type). It is interesting to note that since (most) services are bound to an executable (and hence a process) you can reference them via their process PID:
>>> s = psutil.win_service_get('sshd') >>> s <WindowsService(name='sshd', display_name='Open SSH server') at 38853046> >>> s.pid() 1865 >>> p = psutil.Process(1865) >>> p <psutil.Process(pid=19547, name='sshd.exe') at 140461487781328> >>> p.exe() 'C:\CygWin\bin\sshd'
Other improvements¶

psutil 4.2.0 comes with 2 other enhancements for Linux:
- psutil.virtual_memory() returns a new shared memory field. This is the same value reported by free cmdline utility.
- I changed how /proc was parsed. Instead of reading /proc/{pid}/status line by line I used a regular expression. Here's the speedups:
  
  Process.ppid() ~20% faster.
  
  Process.status() ~28% faster.
  
  Process.name() ~25% faster.
  
  Process.num_threads() ~20% faster (on Python 3 only; on Python 2 it's a bit slower; I suppose the re module received some improvements).
Discussion¶
- Reddit
- Hacker News
Real process memory and environ in Python
17 Feb 2016 Tags: psutil, python, memory, new-api, compatibility, release, community, linux

psutil 4.0.0 is out, with some interesting news about process memory metrics. I'll get straight to the point and describe what's new.

"Real" process memory info¶

Determining how much memory a process really uses is not an easy matter (see this and this). RSS (Resident Set Size), which most people rely on, is misleading because it includes both memory unique to the process and memory shared with others. What's more interesting for profiling is the memory that would be freed if the process were terminated right now. In the Linux world this is called USS (Unique Set Size), the major feature introduced in psutil 4.0.0 (not only for Linux but also for Windows and macOS).
USS memory¶

The USS (Unique Set Size) is the memory unique to a process, that would be freed if the process were terminated right now. On Linux it can be determined by parsing the "private" blocks in /proc/PID/smaps. The Firefox team pushed this further and got it working on macOS and Windows too.
>>> psutil.Process().memory_full_info() pfullmem(rss=101990, vms=521888, shared=38804, text=28200, lib=0, data=59672, dirty=0, uss=81623, pss=91788, swap=0)
PSS and swap¶

On Linux there are two additional metrics that can also be determined via /proc/PID/smaps: PSS and swap.

pss, aka "Proportional Set Size", represents the amount of memory shared with other processes, accounted so that the amount is divided evenly between the processes that share it. I.e. if a process has 10 MBs all to itself (USS) and 10 MBs shared with another process, its PSS will be 15 MBs.

swap is simply the amount of memory that has been swapped out to disk. With Process.memory_full_info() it is possible to implement a tool like procsmem.py, similar to smem on Linux, which provides a list of processes sorted by uss. It's interesting to see how rss differs from uss:
~/svn/psutil$ ./scripts/procsmem.py PID User Cmdline USS PSS Swap RSS ============================================================================== ... 3986 giampao /usr/bin/python3 /usr/bin/indi 15.3M 16.6M 0B 25.6M 3906 giampao /usr/lib/ibus/ibus-ui-gtk3 17.6M 18.1M 0B 26.7M 3991 giampao python /usr/bin/hp-systray -x 19.0M 23.3M 0B 40.7M 3830 giampao /usr/bin/ibus-daemon --daemoni 19.0M 19.0M 0B 21.4M 20529 giampao /opt/sublime_text/plugin_host 19.9M 20.1M 0B 22.0M 3990 giampao nautilus -n 20.6M 29.9M 0B 50.2M 3898 giampao /usr/lib/unity/unity-panel-ser 27.1M 27.9M 0B 37.7M 4176 giampao /usr/lib/evolution/evolution-c 35.7M 36.2M 0B 41.5M 20712 giampao /usr/bin/python -B /home/giamp 45.6M 45.9M 0B 49.4M 3880 giampao /usr/lib/x86_64-linux-gnu/hud/ 51.6M 52.7M 0B 61.3M 20513 giampao /opt/sublime_text/sublime_text 65.8M 73.0M 0B 87.9M 3976 giampao compiz 115.0M 117.0M 0B 130.9M 32486 giampao skype 145.1M 147.5M 0B 149.6M
Implementation¶

To get these values (uss, pss and swap) we need to walk the whole process address space. This usually requires higher privileges and is considerably slower than Process.memory_info(), which is probably why tools like ps and top show RSS/VMS instead of USS. A big thanks goes to the Mozilla team for figuring this out on Windows and macOS, and to Eric Rahm who put the psutil PRs together (see PR-744, PR-745 and PR-746). If you don't use Python and want to port the code to another language, here are the interesting parts:
- Linux
- macOS
- Windows
Memory type percent¶

After reorganizing the process memory APIs (PR-744), I added a new memtype parameter to Process.memory_percent(). You can now compare a specific memory type (not only RSS) against the total physical memory. E.g.
>>> psutil.Process().memory_percent(memtype='pss') 0.06877466326787016
Process environ¶

The second biggest improvement in psutil 4.0.0 is the ability to read a process's environment variables. This opens up interesting possibilities for process recognition and monitoring. For instance, you can start a process with a custom environment variable, then iterate over all processes to find the one of interest:
import psutil for p in psutil.process_iter(): try: env = p.environ() except psutil.Error: pass else: if 'MYAPP' in env: ...
Process environ was a long-standing issue (#52, from 2009) that I gave up on because the Windows implementation only worked for the current process. Frank Benkstein solved that (PR-747), and it now works on Linux, Windows and macOS for all processes (you may still hit AccessDenied for processes owned by another user):
>>> import psutil >>> from pprint import pprint as pp >>> pp(psutil.Process().environ()) {... 'CLUTTER_IM_MODULE': 'xim', 'COLORTERM': 'gnome-terminal', 'COMPIZ_BIN_PATH': '/usr/bin/', 'HOME': '/home/giampaolo', 'PWD': '/home/giampaolo/svn/psutil', } >>>
Note that the resulting dict usually doesn't reflect changes made after the process started (e.g. os.environ['MYAPP'] = '1'). Again, for anyone porting this to other languages, here are the interesting parts:
- Linux
- macOS
- Windows
Extended disk IO stats¶

psutil.disk_io_counters() now reports additional metrics on Linux and FreeBSD:
- busy_time: the time spent doing actual I/Os (in milliseconds).
- read_merged_count and write_merged_count (Linux only): the number of merged reads and writes (see the iostats doc).
These give a better picture of actual disk utilization (#756), similar to the iostat command on Linux.
OS constants¶

Given the growing number of platform-specific metrics, I added a set of constants to tell which platform you're on: psutil.LINUX, psutil.WINDOWS, etc.

Other fixes¶

The complete list of changes is available in the changelog.
Porting code¶

Since 4.0.0 is a major version, I took the chance to (lightly) change / break some APIs.
- Process.memory_info() no longer returns just an (rss, vms) namedtuple. It returns a variable-length namedtuple that varies by platform (rss and vms are always present, even on Windows). Essentially the same result as the old Process.memory_info_ex(). This shouldn't break your code unless you were doing rss, vms = p.memory_info().
- Process.memory_info_ex() is deprecated. It still works as an alias for Process.memory_info(), issuing a DeprecationWarning.
- psutil.disk_io_counters() on NetBSD and OpenBSD no longer returns write_count and read_count because the kernel doesn't provide them (we were returning the busy time instead). Should be a small issue given NetBSD and OpenBSD support is very recent.
Discussion¶
- Reddit
- Hacker News
Reimplementing ifconfig in Python
Featured 13 Jun 2015 Tags: psutil, python, personal, new-api, compatibility, release

Here we are. It's been a long time since my last blog post and my last psutil release. The reason? I've been travelling! I mean... a lot. I've spent 3 months in Berlin, 3 weeks in Japan and 2 months in New York City. While I was there I finally had the chance to meet my friend Jay Loden in person. We originally started working on psutil together 7 years ago.

Back then I didn't know any C (and I'm still a terrible C developer), so he was crucial in developing the initial psutil skeleton, including macOS and Windows support. Needless to say that this release builds on that work.
net_if_addrs()¶

We're now able to list network interface addresses similarly to the ifconfig command on UNIX:
>>> import psutil >>> from pprint import pprint >>> pprint(psutil.net_if_addrs()) {'ethernet0': [snic(family=<AddressFamily.AF_INET: 2>, address='10.0.0.4', netmask='255.0.0.0', broadcast='10.255.255.255'), snic(family=<AddressFamily.AF_PACKET: 17>, address='9c:eb:e8:0b:05:1f', netmask=None, broadcast='ff:ff:ff:ff:ff:ff')], 'localhost': [snic(family=<AddressFamily.AF_INET: 2>, address='127.0.0.1', netmask='255.0.0.0', broadcast='127.0.0.1'), snic(family=<AddressFamily.AF_PACKET: 17>, address='00:00:00:00:00:00', netmask=None, broadcast='00:00:00:00:00:00')]}
This is limited to AF_INET (IPv4), AF_INET6 (IPv6) and AF_LINK (Ethernet) address families. If you want something more powerful (e.g. AF_BLUETOOTH) you can take a look at the netifaces extension. If you want to see how this is implemented, here's the code for POSIX and Windows:
- POSIX
- Windows
net_if_stats()¶

This new function returns information about network interface cards:
>>> import psutil >>> from pprint import pprint >>> pprint(psutil.net_if_stats()) {'ethernet0': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_FULL: 2>, speed=100, mtu=1500), 'localhost': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=0, mtu=65536)}
The implementation on each platform:
- Windows
- Linux
- macOS & FreeBSD
- SunOS
Also in 3.0¶

Beyond the network-interface APIs, psutil 3.0 ships a few other notable changes.

Several integer/string constants (IOPRIO_CLASS_*, NIC_DUPLEX_*, *_PRIORITY_CLASS) now return enum values on Python 3.4+.

Support for zombie processes on UNIX was broken. Covered in a separate post.

Removal of deprecated APIs¶

All aliases deprecated in the psutil 2.0 porting guide (January 2014) are gone. For the full list see the changelog.

Final words¶

I must say I'm pretty satisfied with how psutil is evolving and with the enjoyment I still get every time I work on it. It now gets almost 800,000 downloads a month, which is quite remarkable for a Python library.

At this point, I consider psutil almost "complete" feature-wise, meaning I'm starting to run out of ideas for what to add next (see TODO). Going forward, development will likely focus on supporting more exotic platforms (OpenBSD #562, NetBSD PR-557, Android #355).

There have also been discussions on the python-ideas mailing list about including psutil in the Python stdlib, but even if that happens, it's still a long way off, as it would require a significant time investment that I currently don't have.

Feed

atom