Proper zombie process handling
13 Jun 2015 Tags: psutil, python, api-design, compatibility

This is part of the psutil 3.0 release (see the full release notes).

Except on Linux and Windows (which does not have them), support for zombie processes was broken. The full story is in #428.
The problem¶

Say you create a zombie process and instantiate a Process for it:
import os, time def create_zombie(): pid = os.fork() # the zombie if pid == 0: os._exit(0) # child exits immediately else: time.sleep(1000) # parent does NOT call wait() pid = create_zombie() p = psutil.Process(pid)
Up until psutil 2.X, every time you tried to query it you'd get a NoSuchProcess exception:
>>> p.name() File "psutil/__init__.py", line 374, in _init raise NoSuchProcess(pid, None, msg) psutil.NoSuchProcess: no process found with pid 123
This was misleading, because the PID technically still existed:
>>> psutil.pid_exists(p.pid) True
Depending on the platform, some process information could still be retrieved:
>>> p.cmdline() ['python']
Worst of all, psutil.process_iter() didn't return zombies at all. That was a real problem, because identifying them is a legitimate use case: a zombie usually indicates a bug where a parent process spawns a child, kills it, but never calls wait() to reap it.
What changed¶
- A new ZombieProcess exception is raised whenever a process cannot be queried because it is a zombie.
- It replaces NoSuchProcess, which was incorrect and misleading.
- ZombieProcess inherits from NoSuchProcess, so existing code keeps working.
- psutil.process_iter() now correctly includes zombie processes, so you can reliably identify them:
import psutil zombies = [] for p in psutil.process_iter(): try: if p.status() == psutil.STATUS_ZOMBIE: zombies.append(p) except psutil.NoSuchProcess: pass
Windows wheels available in psutil 2.1.2
21 Sep 2014 Tags: psutil, python, wheels, release, personal

psutil 2.1.2 is out. This release has been cooking for a while now, and that's because I've been travelling for the past 3 months between Spain, Japan and Germany. Hopefully I will be staying in Berlin for a while now, so I will have more time to dedicate to the project. The main new "feature" of this release is that other than the exe files, Windows users can now also benefit from Python wheels (full story is here) which are available on PYPI. Frankly I don't know much about the new wheels packaging system but long story short is that Windows users can now install psutil via pip and therefore also include it as a dependency in requirements.txt. Other than this 2.1.2 can basically be considered a bug-fix release, including some important fixes amongst which:
- #506: restored Python 2.4 compatibility
- #340: Process.get_open_files() no longer hangs on Windows (this was a very old and high-priority issue)
- #501: disk_io_counters() may return negative values on Windows
- #504: (Linux) couldn't build RPM packages via setup.py
The list of all fixes can be found here. For the next release I plan to drop support for Python 2.4 and 2.5 and hopefully add network interfaces information similarly to ifconfig.

Python and sendfile

sendfile(2) is a UNIX system call which provides a "zero-copy" way of copying data from one file descriptor (a file) to another (a socket). Because this copying is done entirely within the kernel, sendfile(2) is more efficient than the combination of file.read() and socket.send(), which requires transferring data to and from user space. This copying of the data twice imposes some performance and resource penalties which the sendfile(2) syscall avoids; it also results in a single system call (and thus only one context switch), rather than the series of read(2) / write(2) system calls (each system call requiring a context switch) used internally for the data copying. A more exhaustive explanation of how sendfile(2) works is available here, but long story short is that sending a file with sendfile() is usually twice as fast as using plain socket.send(). Typical applications which can benefit from using sendfile() are FTP and HTTP servers.

socket.sendfile()¶

I recently contributed a patch for Python's socket module which adds a high-level socket.sendfile() method (see full discussion at BPO-17552). socket.sendfile() will transmit a file until EOF is reached by attempting to use os.sendfile(), if available, else it falls back on using plain socket.send(). Internally, it takes care of handling socket timeouts and provides two optional parameters to move the file offset or to send only a limited amount of bytes. I came up with this idea because getting all of that right is a bit tricky, so a generic wrapper seemed useful. socket.sendfile() will make its appearance in Python 3.5.

sendfile and Python¶

sendfile(2) made its first appearance into the Python stdlib kind of late: Python 3.3. It was contributed by Ross Lagerwall and me in BPO-10882. Since the patch didn't make it into Python 2.X and I wanted to use sendfile() in pyftpdlib (code.google.com/p/pyftpdlib/issues/detail?id=152), I later decided to release it as a stand alone module working with older (2.5+) Python versions (see pysendfile project). Starting with version 3.5, Python will hopefully start using sendfile() more extensively, in detail:

BPO-13563: ftplib
BPO-13559: httplib
asyncio: there are some plans for this even though no actual patch yet, see discussion and BDFL involvement.

Also, Windows provides something similar to sendfile(2): TransmitFile. Now that socket.sendfile() is in place it seems natural to add support for it as well (see BPO-21721).

Backport to Python 2.6 and 2.7¶

For those of you who are interested in using socket.sendfile() with older Python 2.6 and 2.7 versions here's a backport. It requires pysendfile module to be installed. Full code including tests is hosted here.

#!/usr/bin/env python

"""
This is a backport of socket.sendfile() for Python 2.6 and 2.7.
socket.sendfile() will be included in Python 3.5:
http://bugs.python.org/issue17552
Usage:

>>> import socket
>>> file = open("somefile.bin", "rb")
>>> sock = socket.create_connection(("localhost", 8021))
>>> sendfile(sock, file)
42319283
>>>
"""

import errno
import io
import os
import select
import socket
try:
    memoryview  # py 2.7 only
except NameError:
    memoryview = lambda x: x

if os.name == 'posix':
    import sendfile as pysendfile  # requires "pip install pysendfile"
else:
    pysendfile = None


_RETRY = frozenset((errno.EAGAIN, errno.EALREADY, errno.EWOULDBLOCK,
                    errno.EINPROGRESS))


class _GiveupOnSendfile(Exception):
    pass


if pysendfile is not None:

    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        _check_sendfile_params(sock, file, offset, count)
        sockno = sock.fileno()
        try:
            fileno = file.fileno()
        except (AttributeError, io.UnsupportedOperation) as err:
            raise _GiveupOnSendfile(err)  # not a regular file
        try:
            fsize = os.fstat(fileno).st_size
        except OSError:
            raise _GiveupOnSendfile(err)  # not a regular file
        if not fsize:
            return 0  # empty file
        blocksize = fsize if not count else count

        timeout = sock.gettimeout()
        if timeout == 0:
            raise ValueError("non-blocking sockets are not supported")
        # poll/select have the advantage of not requiring any
        # extra file descriptor, contrarily to epoll/kqueue
        # (also, they require a single syscall).
        if hasattr(select, 'poll'):
            if timeout is not None:
                timeout *= 1000
            pollster = select.poll()
            pollster.register(sockno, select.POLLOUT)

            def wait_for_fd():
                if pollster.poll(timeout) == []:
                    raise socket._socket.timeout('timed out')
        else:
            # call select() once in order to solicit ValueError in
            # case we run out of fds
            try:
                select.select([], [sockno], [], 0)
            except ValueError:
                raise _GiveupOnSendfile(err)

            def wait_for_fd():
                fds = select.select([], [sockno], [], timeout)
                if fds == ([], [], []):
                    raise socket._socket.timeout('timed out')

        total_sent = 0
        # localize variable access to minimize overhead
        os_sendfile = pysendfile.sendfile
        try:
            while True:
                if timeout:
                    wait_for_fd()
                if count:
                    blocksize = count - total_sent
                    if blocksize <= 0:
                        break
                try:
                    sent = os_sendfile(sockno, fileno, offset, blocksize)
                except OSError as err:
                    if err.errno in _RETRY:
                        # Block until the socket is ready to send some
                        # data; avoids hogging CPU resources.
                        wait_for_fd()
                    else:
                        if total_sent == 0:
                            # We can get here for different reasons, the main
                            # one being 'file' is not a regular mmap(2)-like
                            # file, in which case we'll fall back on using
                            # plain send().
                            raise _GiveupOnSendfile(err)
                        raise err
                else:
                    if sent == 0:
                        break  # EOF
                    offset += sent
                    total_sent += sent
            return total_sent
        finally:
            if total_sent > 0 and hasattr(file, 'seek'):
                file.seek(offset)
else:
    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        raise _GiveupOnSendfile(
            "sendfile() not available on this platform")


def _sendfile_use_send(sock, file, offset=0, count=None):
    _check_sendfile_params(sock, file, offset, count)
    if sock.gettimeout() == 0:
        raise ValueError("non-blocking sockets are not supported")
    if offset:
        file.seek(offset)
    blocksize = min(count, 8192) if count else 8192
    total_sent = 0
    # localize variable access to minimize overhead
    file_read = file.read
    sock_send = sock.send
    try:
        while True:
            if count:
                blocksize = min(count - total_sent, blocksize)
                if blocksize <= 0:
                    break
            data = memoryview(file_read(blocksize))
            if not data:
                break  # EOF
            while True:
                try:
                    sent = sock_send(data)
                except OSError as err:
                    if err.errno in _RETRY:
                        continue
                    raise
                else:
                    total_sent += sent
                    if sent < len(data):
                        data = data[sent:]
                    else:
                        break
        return total_sent
    finally:
        if total_sent > 0 and hasattr(file, 'seek'):
            file.seek(offset + total_sent)


def _check_sendfile_params(sock, file, offset, count):
    if 'b' not in getattr(file, 'mode', 'b'):
        raise ValueError("file should be opened in binary mode")
    if not sock.type & socket.SOCK_STREAM:
        raise ValueError("only SOCK_STREAM type sockets are supported")
    if count is not None:
        if not isinstance(count, int):
            raise TypeError(
                "count must be a positive integer (got %s)" % repr(count))
        if count <= 0:
            raise ValueError(
                "count must be a positive integer (got %s)" % repr(count))


def sendfile(sock, file, offset=0, count=None):
    """sendfile(sock, file[, offset[, count]]) -> sent

    Send a *file* over a connected socket *sock* until EOF is
    reached by using high-performance sendfile(2) and return the
    total number of bytes which were sent.
    *file* must be a regular file object opened in binary mode.
    If sendfile() is not available (e.g. Windows) or file is
    not a regular file socket.send() will be used instead.
    *offset* tells from where to start reading the file.
    If specified, *count* is the total number of bytes to transmit
    as opposed to sending the file until EOF is reached.
    File position is updated on return or also in case of error in
    which case file.tell() can be used to figure out the number of
    bytes which were sent.
    The socket must be of SOCK_STREAM type.
    Non-blocking sockets are not supported.
    """
    try:
        return _sendfile_use_sendfile(sock, file, offset, count)
    except _GiveupOnSendfile:
        return _sendfile_use_send(sock, file, offset, count)

Goodbye Google Code, I am moving to GitHub
26 May 2014 Tags: rant

8 years ago I started hosting my first open source project (pyftpdlib, code.google.com/p/pyftpdlib) on Google Code and I later ended up also hosting psutil (code.google.com/p/psutil) and pysendfile (code.google.com/p/pysendfile). Back then GC had just been released and similarly to other Google products I appreciated the clean and minimalistic interface, the excellent bug tracker and the freedom to choose between different revision control systems (SVN, GIT and Mercurial, which is my favourite one). Unfortunately as the years passed Google completely lost interest in maintaining GC to the point that now GC can basically be considered an abandoned project. If you take a look at the GC bug tracker (code.google.com/p/support/issues/list) you can see literally hundreds of issues which have been open for years, even some apparently easy ones such as #60 (code.google.com/p/support/issues/detail?id=60) and #919 (code.google.com/p/support/issues/detail?id=919). The lack of interest from Google is absolutely astonishing and it is the main reason why I ultimately decided to change. After at least a couple of years of thinking about migrating to github I finally bit the bullet and as of today psutil is now hosted on github (update: now also pyftpdlib and pysendfile).
What I will miss the most about GC¶

First of all I must say that despite the unfortunate situation of GC I'm also sad for abandoning it. It started as a really great hosting platform, and it still has some peculiar aspects which I know I will be missing. In order of importance:
- The bug tracker (code.google.com/p/psutil/issues/list): it is much more powerful than github's, especially for the extremely customizable labeling system which is pure gold, the excellent searching system and the grid view (code.google.com/p/psutil/issues/list?can=2&q=&colspec=ID+Summary+Type+Opsys+Status+Priority+Opened+Owner&groupby=&sort=&x=&y=&cells=tiles&mode=grid). GC bug tracker seriously kicks some ass so kudos to whoever was behind its design! By comparison github bug tracker is too minimalistic and it has no good way to order issues or list them in a more compact form. I'm totally gonna miss psutil bug tracker.
- Mercurial: I'm a big fan of Mercurial and I consider it way more pleasant to work with compared to GIT. I don't know exactly why GIT ended up being so much more used than Mercurial (probably because of github?) but I'm sure that many other guys like me who know both systems will agree that Mercurial is simply so much easier to use. Unfortunately once you decide to stick with github you have no other choice. Mercurial, I'm gonna miss you too!
- GC layout: it is much simpler than github's! Everything is easy to find, even for a non-geek person. The home page alone is perfect to summarize what the project is about and doesn't have tens of icons all over the place. github layout is more complicated and needs some time to get used to, even for a programmer. If these projects (psutil and others) weren't about programming I wouldn't have chosen github because it's "not for the masses".
What I appreciate about github¶
- Travis integration: there's this totally awesome free continuous integration service called Travis which given a configuration file like this will automatically run tests on multiple python versions every time a commit is pushed. They recently added OSX support and Windows support is on the way. This way I will finally be able to quickly test psutil on Linux, OSX and Windows without using virtualized systems except for FreeBSD and Solaris! To me this is like the ultimate Christmas gift and I couldn't ask for any better. Note: as of today Travis only works with github.
- forks and pull requests: honestly I'm not a big fan of them (yet?), probably because I'm used to the python-dev development workflow consisting of uploading patches on the bug tracker and reviewing them (see this for example). Nevertheless to my understanding most people use pull requests in order to contribute to open source projects so basically this is a service I'm glad to offer to my users who hopefully will be able to contribute back more easily. GC has a cloning system (code.google.com/p/psutil/source/clones) but isn't anywhere near github's and I'm not even sure how it works (never cared).
- the "social" side of github including the fact that you can "star" developers and receive notifications about their activity was another big incentive for migrating. The personal landing page collecting all your contributions to different projects is absolutely cool. GC had something similar but they stupidly removed it (code.google.com/p/support/issues/detail?id=24324) all of a sudden and never reintroduced it. A lot of people were angry but again, they didn't care. Actually this was the feature I appreciated the most about GC after the bug tracker and that is when I seriously started thinking about flipping off GC for good.
- the enormous user base: the fact that github is the most used code hosting platform out there will hopefully help me and my projects have a little more visibility. Also, in many job interviews I've been asked what my github profile was, so it seems github also became a factor in getting jobs.
- gists: gists are "a simple way to share code snippets and pastes with others. All gists are Git repositories, so they are automatically versioned, forkable and usable from Git". Seriously, they are beautiful. In order to share my code snippets I've always used ActiveState but I think I will eventually migrate them as well in order to have everything in one place.
- the fact that if you mention an issue number as part of your commit message that specific issue will automatically be updated. As I said GC bug tracker is superior in basically any aspect but since I always took care of updating issues by mentioning the specific cset which fixed them (see for example code.google.com/p/psutil/issues/detail?id=463), having this little extra feature will save me some time.
- SSH keys: using Mercurial on GC means using password based authentication. Incredibly they still do not support SSH key based auth. Simply "git push"ing without entering any password when I'm not on my laptop is nice, and of course, it is also much more secure.
Migration¶

For those of you who are interested in knowing how I did it, here goes: as for moving the issues from GC bug tracker to github's I used this tool. I managed to preserve the issue IDs but unfortunately not the real owners nor the real issue dates, which kind of sucks. As for migrating the code from mercurial to git I just used this. The Mercurial -> GIT transition was perfect and I also managed to preserve the original Mercurial named branches and tags, which for me was crucial. In conclusion, psutil is a five-year-old, medium-sized project with hundreds of issues: the transition in this case is definitely possible but not painless so if you plan on migrating, the sooner you do it the better.
Announcing psutil 2.0
10 Mar 2014 Tags: psutil, python, api-design, compatibility, release

psutil 2.0 is out. This is a major rewrite and reorganization of both the Python and C extension modules. It costed me four months of work and more than 22,000 lines (the diff against old 1.2.1). Many of the changes are not backward compatible; I'm sure this will cause some pain, but I think it's for the better and needed to be done.

API changes¶

I already wrote a detailed blog post about this, so use that as the official reference on how to port your code.

RST documentation¶

I've never been happy with the old doc hosted on Google Code. The markup language provided by Google is pretty limited, plus it's not under revision control. The new doc is more detailed, uses reStructuredText as the markup language, lives in the same code repository as psutil, and is hosted on the excellent Read the Docs: http://psutil.readthedocs.org/
Physical CPUs count¶

You're now able to distinguish between logical and physical CPUs. The full story is in #427.
>>> psutil.cpu_count() # logical 4 >>> psutil.cpu_count(logical=False) # physical cores only 2
Process instances are hashable¶

psutil.Process instances can now be compared for equality and used in sets and dicts. The most useful application is diffing process snapshots:
>>> before = set(psutil.process_iter()) >>> # ... some time passes ... >>> after = set(psutil.process_iter()) >>> new_procs = after - before # processes spawned in between
Equality is not just PID-based. It also includes the process creation time, so a Process whose PID got reused by the kernel won't be mistaken for the original. The full story is in #452.
Speedups¶
- #477: Process.cpu_percent() is about 30% faster.
- #478: (Linux) almost all APIs are about 30% faster on Python 3.X.
Other improvements and bugfixes¶
- #424: published Windows installers for Python 3.X 64-bit.
- #447: the psutil.wait_procs() timeout parameter is now optional.
- #459: a Makefile is now available for running tests and other repetitive tasks (also on Windows).
- #463: the timeout parameter of cpu_percent* functions defaults to 0.0, because the previous default was a common source of slowdowns.
- #340: (Windows) Process.open_files() no longer hangs.
- #448: (Windows) fixed a memory leak affecting Process.children() and Process.ppid().
- #461: namedtuples are now pickle-able.
- #474: (Windows) Process.cpu_percent() is no longer capped at 100%.

Giampaolo Rodola Python enthusiast, core developer, psutil author

Proper zombie process handling

The problem¶

What changed¶

Windows wheels available in psutil 2.1.2

Python and sendfile

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

Goodbye Google Code, I am moving to GitHub

What I will miss the most about GC¶

What I appreciate about github¶

Migration¶

Announcing psutil 2.0

API changes¶

RST documentation¶

Physical CPUs count¶

Process instances are hashable¶

Speedups¶

Other improvements and bugfixes¶

Feed

The problem¶

What changed¶

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

What I will miss the most about GC¶

What I appreciate about github¶

Migration¶

API changes¶

RST documentation¶

Physical CPUs count¶

Process instances are hashable¶

Speedups¶

Other improvements and bugfixes¶

Social

Feed