1. Proper zombie process handling

    This is part of the psutil 3.0 release (see the full release notes).

    Except on Linux and Windows (which does not have them), support for zombie processes was broken. The full story is in #428.

    The problem

    Say you create a zombie process and instantiate a Process for it:

    import os, time
    
    def create_zombie():
        pid = os.fork()  # the zombie
        if pid == 0:
            os._exit(0)  # child exits immediately
        else:
            time.sleep(1000)  # parent does NOT call wait()
    
    pid = create_zombie()
    p = psutil.Process(pid)
    

    Up until psutil 2.X, every time you tried to query it you'd get a NoSuchProcess exception:

    >>> p.name()
      File "psutil/__init__.py", line 374, in _init
        raise NoSuchProcess(pid, None, msg)
    psutil.NoSuchProcess: no process found with pid 123
    

    This was misleading, because the PID technically still existed:

    >>> psutil.pid_exists(p.pid)
    True
    

    Depending on the platform, some process information could still be retrieved:

    >>> p.cmdline()
    ['python']
    

    Worst of all, psutil.process_iter() didn't return zombies at all. That was a real problem, because identifying them is a legitimate use case: a zombie usually indicates a bug where a parent process spawns a child, kills it, but never calls wait() to reap it.

    What changed

    • A new ZombieProcess exception is raised whenever a process cannot be queried because it is a zombie.
    • It replaces NoSuchProcess, which was incorrect and misleading.
    • ZombieProcess inherits from NoSuchProcess, so existing code keeps working.
    • psutil.process_iter() now correctly includes zombie processes, so you can reliably identify them:
    import psutil
    
    zombies = []
    for p in psutil.process_iter():
        try:
            if p.status() == psutil.STATUS_ZOMBIE:
                zombies.append(p)
        except psutil.NoSuchProcess:
            pass
    
  2. Windows wheels available in psutil 2.1.2

    psutil 2.1.2 is out. This release has been cooking for a while now, and that's because I've been travelling for the past 3 months between Spain, Japan and Germany. Hopefully I will be staying in Berlin for a while now, so I will have more time to dedicate to the project. The main new "feature" of this release is that other than the exe files, Windows users can now also benefit from Python wheels (full story is here) which are available on PYPI. Frankly I don't know much about the new wheels packaging system but long story short is that Windows users can now install psutil via pip and therefore also include it as a dependency in requirements.txt. Other than this 2.1.2 can basically be considered a bug-fix release, including some important fixes amongst which:

    • #506: restored Python 2.4 compatibility
    • #340: Process.get_open_files() no longer hangs on Windows (this was a very old and high-priority issue)
    • #501: disk_io_counters() may return negative values on Windows
    • #504: (Linux) couldn't build RPM packages via setup.py

    The list of all fixes can be found here. For the next release I plan to drop support for Python 2.4 and 2.5 and hopefully add network interfaces information similarly to ifconfig.

  3. Python and sendfile

    sendfile(2) is a UNIX system call which provides a "zero-copy" way of copying data from one file descriptor (a file) to another (a socket). Because this copying is done entirely within the kernel, sendfile(2) is more efficient than the combination of file.read() and socket.send(), which requires transferring data to and from user space. This copying of the data twice imposes some performance and resource penalties which the sendfile(2) syscall avoids; it also results in a single system call (and thus only one context switch), rather than the series of read(2) / write(2) system calls (each system call requiring a context switch) used internally for the data copying. A more exhaustive explanation of how sendfile(2) works is available here, but long story short is that sending a file with sendfile() is usually twice as fast as using plain socket.send(). Typical applications which can benefit from using sendfile() are FTP and HTTP servers.

    socket.sendfile()

    I recently contributed a patch for Python's socket module which adds a high-level socket.sendfile() method (see full discussion at BPO-17552). socket.sendfile() will transmit a file until EOF is reached by attempting to use os.sendfile(), if available, else it falls back on using plain socket.send(). Internally, it takes care of handling socket timeouts and provides two optional parameters to move the file offset or to send only a limited amount of bytes. I came up with this idea because getting all of that right is a bit tricky, so a generic wrapper seemed useful. socket.sendfile() will make its appearance in Python 3.5.

    sendfile and Python

    sendfile(2) made its first appearance into the Python stdlib kind of late: Python 3.3. It was contributed by Ross Lagerwall and me in BPO-10882. Since the patch didn't make it into Python 2.X and I wanted to use sendfile() in pyftpdlib (code.google.com/p/pyftpdlib/issues/detail?id=152), I later decided to release it as a stand alone module working with older (2.5+) Python versions (see pysendfile project). Starting with version 3.5, Python will hopefully start using sendfile() more extensively, in detail:

    Also, Windows provides something similar to sendfile(2): TransmitFile. Now that socket.sendfile() is in place it seems natural to add support for it as well (see BPO-21721).

    Backport to Python 2.6 and 2.7

    For those of you who are interested in using socket.sendfile() with older Python 2.6 and 2.7 versions here's a backport. It requires pysendfile module to be installed. Full code including tests is hosted here.

    #!/usr/bin/env python
    
    """
    This is a backport of socket.sendfile() for Python 2.6 and 2.7.
    socket.sendfile() will be included in Python 3.5:
    http://bugs.python.org/issue17552
    Usage:
    
    >>> import socket
    >>> file = open("somefile.bin", "rb")
    >>> sock = socket.create_connection(("localhost", 8021))
    >>> sendfile(sock, file)
    42319283
    >>>
    """
    
    import errno
    import io
    import os
    import select
    import socket
    try:
        memoryview  # py 2.7 only
    except NameError:
        memoryview = lambda x: x
    
    if os.name == 'posix':
        import sendfile as pysendfile  # requires "pip install pysendfile"
    else:
        pysendfile = None
    
    
    _RETRY = frozenset((errno.EAGAIN, errno.EALREADY, errno.EWOULDBLOCK,
                        errno.EINPROGRESS))
    
    
    class _GiveupOnSendfile(Exception):
        pass
    
    
    if pysendfile is not None:
    
        def _sendfile_use_sendfile(sock, file, offset=0, count=None):
            _check_sendfile_params(sock, file, offset, count)
            sockno = sock.fileno()
            try:
                fileno = file.fileno()
            except (AttributeError, io.UnsupportedOperation) as err:
                raise _GiveupOnSendfile(err)  # not a regular file
            try:
                fsize = os.fstat(fileno).st_size
            except OSError:
                raise _GiveupOnSendfile(err)  # not a regular file
            if not fsize:
                return 0  # empty file
            blocksize = fsize if not count else count
    
            timeout = sock.gettimeout()
            if timeout == 0:
                raise ValueError("non-blocking sockets are not supported")
            # poll/select have the advantage of not requiring any
            # extra file descriptor, contrarily to epoll/kqueue
            # (also, they require a single syscall).
            if hasattr(select, 'poll'):
                if timeout is not None:
                    timeout *= 1000
                pollster = select.poll()
                pollster.register(sockno, select.POLLOUT)
    
                def wait_for_fd():
                    if pollster.poll(timeout) == []:
                        raise socket._socket.timeout('timed out')
            else:
                # call select() once in order to solicit ValueError in
                # case we run out of fds
                try:
                    select.select([], [sockno], [], 0)
                except ValueError:
                    raise _GiveupOnSendfile(err)
    
                def wait_for_fd():
                    fds = select.select([], [sockno], [], timeout)
                    if fds == ([], [], []):
                        raise socket._socket.timeout('timed out')
    
            total_sent = 0
            # localize variable access to minimize overhead
            os_sendfile = pysendfile.sendfile
            try:
                while True:
                    if timeout:
                        wait_for_fd()
                    if count:
                        blocksize = count - total_sent
                        if blocksize <= 0:
                            break
                    try:
                        sent = os_sendfile(sockno, fileno, offset, blocksize)
                    except OSError as err:
                        if err.errno in _RETRY:
                            # Block until the socket is ready to send some
                            # data; avoids hogging CPU resources.
                            wait_for_fd()
                        else:
                            if total_sent == 0:
                                # We can get here for different reasons, the main
                                # one being 'file' is not a regular mmap(2)-like
                                # file, in which case we'll fall back on using
                                # plain send().
                                raise _GiveupOnSendfile(err)
                            raise err
                    else:
                        if sent == 0:
                            break  # EOF
                        offset += sent
                        total_sent += sent
                return total_sent
            finally:
                if total_sent > 0 and hasattr(file, 'seek'):
                    file.seek(offset)
    else:
        def _sendfile_use_sendfile(sock, file, offset=0, count=None):
            raise _GiveupOnSendfile(
                "sendfile() not available on this platform")
    
    
    def _sendfile_use_send(sock, file, offset=0, count=None):
        _check_sendfile_params(sock, file, offset, count)
        if sock.gettimeout() == 0:
            raise ValueError("non-blocking sockets are not supported")
        if offset:
            file.seek(offset)
        blocksize = min(count, 8192) if count else 8192
        total_sent = 0
        # localize variable access to minimize overhead
        file_read = file.read
        sock_send = sock.send
        try:
            while True:
                if count:
                    blocksize = min(count - total_sent, blocksize)
                    if blocksize <= 0:
                        break
                data = memoryview(file_read(blocksize))
                if not data:
                    break  # EOF
                while True:
                    try:
                        sent = sock_send(data)
                    except OSError as err:
                        if err.errno in _RETRY:
                            continue
                        raise
                    else:
                        total_sent += sent
                        if sent < len(data):
                            data = data[sent:]
                        else:
                            break
            return total_sent
        finally:
            if total_sent > 0 and hasattr(file, 'seek'):
                file.seek(offset + total_sent)
    
    
    def _check_sendfile_params(sock, file, offset, count):
        if 'b' not in getattr(file, 'mode', 'b'):
            raise ValueError("file should be opened in binary mode")
        if not sock.type & socket.SOCK_STREAM:
            raise ValueError("only SOCK_STREAM type sockets are supported")
        if count is not None:
            if not isinstance(count, int):
                raise TypeError(
                    "count must be a positive integer (got %s)" % repr(count))
            if count <= 0:
                raise ValueError(
                    "count must be a positive integer (got %s)" % repr(count))
    
    
    def sendfile(sock, file, offset=0, count=None):
        """sendfile(sock, file[, offset[, count]]) -> sent
    
        Send a *file* over a connected socket *sock* until EOF is
        reached by using high-performance sendfile(2) and return the
        total number of bytes which were sent.
        *file* must be a regular file object opened in binary mode.
        If sendfile() is not available (e.g. Windows) or file is
        not a regular file socket.send() will be used instead.
        *offset* tells from where to start reading the file.
        If specified, *count* is the total number of bytes to transmit
        as opposed to sending the file until EOF is reached.
        File position is updated on return or also in case of error in
        which case file.tell() can be used to figure out the number of
        bytes which were sent.
        The socket must be of SOCK_STREAM type.
        Non-blocking sockets are not supported.
        """
        try:
            return _sendfile_use_sendfile(sock, file, offset, count)
        except _GiveupOnSendfile:
            return _sendfile_use_send(sock, file, offset, count)
    
  4. Goodbye Google Code, I am moving to GitHub

    8 years ago I started hosting my first open source project (pyftpdlib, code.google.com/p/pyftpdlib) on Google Code and I later ended up also hosting psutil (code.google.com/p/psutil) and pysendfile (code.google.com/p/pysendfile). Back then GC had just been released and similarly to other Google products I appreciated the clean and minimalistic interface, the excellent bug tracker and the freedom to choose between different revision control systems (SVN, GIT and Mercurial, which is my favourite one). Unfortunately as the years passed Google completely lost interest in maintaining GC to the point that now GC can basically be considered an abandoned project. If you take a look at the GC bug tracker (code.google.com/p/support/issues/list) you can see literally hundreds of issues which have been open for years, even some apparently easy ones such as #60 (code.google.com/p/support/issues/detail?id=60) and #919 (code.google.com/p/support/issues/detail?id=919). The lack of interest from Google is absolutely astonishing and it is the main reason why I ultimately decided to change. After at least a couple of years of thinking about migrating to github I finally bit the bullet and as of today psutil is now hosted on github (update: now also pyftpdlib and pysendfile).

    What I will miss the most about GC

    First of all I must say that despite the unfortunate situation of GC I'm also sad for abandoning it. It started as a really great hosting platform, and it still has some peculiar aspects which I know I will be missing. In order of importance:

    • The bug tracker (code.google.com/p/psutil/issues/list): it is much more powerful than github's, especially for the extremely customizable labeling system which is pure gold, the excellent searching system and the grid view (code.google.com/p/psutil/issues/list?can=2&q=&colspec=ID+Summary+Type+Opsys+Status+Priority+Opened+Owner&groupby=&sort=&x=&y=&cells=tiles&mode=grid). GC bug tracker seriously kicks some ass so kudos to whoever was behind its design! By comparison github bug tracker is too minimalistic and it has no good way to order issues or list them in a more compact form. I'm totally gonna miss psutil bug tracker.
    • Mercurial: I'm a big fan of Mercurial and I consider it way more pleasant to work with compared to GIT. I don't know exactly why GIT ended up being so much more used than Mercurial (probably because of github?) but I'm sure that many other guys like me who know both systems will agree that Mercurial is simply so much easier to use. Unfortunately once you decide to stick with github you have no other choice. Mercurial, I'm gonna miss you too!
    • GC layout: it is much simpler than github's! Everything is easy to find, even for a non-geek person. The home page alone is perfect to summarize what the project is about and doesn't have tens of icons all over the place. github layout is more complicated and needs some time to get used to, even for a programmer. If these projects (psutil and others) weren't about programming I wouldn't have chosen github because it's "not for the masses".

    What I appreciate about github

    • Travis integration: there's this totally awesome free continuous integration service called Travis which given a configuration file like this will automatically run tests on multiple python versions every time a commit is pushed. They recently added OSX support and Windows support is on the way. This way I will finally be able to quickly test psutil on Linux, OSX and Windows without using virtualized systems except for FreeBSD and Solaris! To me this is like the ultimate Christmas gift and I couldn't ask for any better. Note: as of today Travis only works with github.
    • forks and pull requests: honestly I'm not a big fan of them (yet?), probably because I'm used to the python-dev development workflow consisting of uploading patches on the bug tracker and reviewing them (see this for example). Nevertheless to my understanding most people use pull requests in order to contribute to open source projects so basically this is a service I'm glad to offer to my users who hopefully will be able to contribute back more easily. GC has a cloning system (code.google.com/p/psutil/source/clones) but isn't anywhere near github's and I'm not even sure how it works (never cared).
    • the "social" side of github including the fact that you can "star" developers and receive notifications about their activity was another big incentive for migrating. The personal landing page collecting all your contributions to different projects is absolutely cool. GC had something similar but they stupidly removed it (code.google.com/p/support/issues/detail?id=24324) all of a sudden and never reintroduced it. A lot of people were angry but again, they didn't care. Actually this was the feature I appreciated the most about GC after the bug tracker and that is when I seriously started thinking about flipping off GC for good.
    • the enormous user base: the fact that github is the most used code hosting platform out there will hopefully help me and my projects have a little more visibility. Also, in many job interviews I've been asked what my github profile was, so it seems github also became a factor in getting jobs.
    • gists: gists are "a simple way to share code snippets and pastes with others. All gists are Git repositories, so they are automatically versioned, forkable and usable from Git". Seriously, they are beautiful. In order to share my code snippets I've always used ActiveState but I think I will eventually migrate them as well in order to have everything in one place.
    • the fact that if you mention an issue number as part of your commit message that specific issue will automatically be updated. As I said GC bug tracker is superior in basically any aspect but since I always took care of updating issues by mentioning the specific cset which fixed them (see for example code.google.com/p/psutil/issues/detail?id=463), having this little extra feature will save me some time.
    • SSH keys: using Mercurial on GC means using password based authentication. Incredibly they still do not support SSH key based auth. Simply "git push"ing without entering any password when I'm not on my laptop is nice, and of course, it is also much more secure.

    Migration

    For those of you who are interested in knowing how I did it, here goes: as for moving the issues from GC bug tracker to github's I used this tool. I managed to preserve the issue IDs but unfortunately not the real owners nor the real issue dates, which kind of sucks. As for migrating the code from mercurial to git I just used this. The Mercurial -> GIT transition was perfect and I also managed to preserve the original Mercurial named branches and tags, which for me was crucial. In conclusion, psutil is a five-year-old, medium-sized project with hundreds of issues: the transition in this case is definitely possible but not painless so if you plan on migrating, the sooner you do it the better.

  5. Announcing psutil 2.0

    psutil 2.0 is out. This is a major rewrite and reorganization of both the Python and C extension modules. It costed me four months of work and more than 22,000 lines (the diff against old 1.2.1). Many of the changes are not backward compatible; I'm sure this will cause some pain, but I think it's for the better and needed to be done.

    API changes

    I already wrote a detailed blog post about this, so use that as the official reference on how to port your code.

    RST documentation

    I've never been happy with the old doc hosted on Google Code. The markup language provided by Google is pretty limited, plus it's not under revision control. The new doc is more detailed, uses reStructuredText as the markup language, lives in the same code repository as psutil, and is hosted on the excellent Read the Docs: http://psutil.readthedocs.org/

    Physical CPUs count

    You're now able to distinguish between logical and physical CPUs. The full story is in #427.

    >>> psutil.cpu_count()  # logical
    4
    >>> psutil.cpu_count(logical=False)  # physical cores only
    2
    

    Process instances are hashable

    psutil.Process instances can now be compared for equality and used in sets and dicts. The most useful application is diffing process snapshots:

    >>> before = set(psutil.process_iter())
    >>> # ... some time passes ...
    >>> after = set(psutil.process_iter())
    >>> new_procs = after - before  # processes spawned in between
    

    Equality is not just PID-based. It also includes the process creation time, so a Process whose PID got reused by the kernel won't be mistaken for the original. The full story is in #452.

    Speedups

    • #477: Process.cpu_percent() is about 30% faster.
    • #478: (Linux) almost all APIs are about 30% faster on Python 3.X.

    Other improvements and bugfixes

    • #424: published Windows installers for Python 3.X 64-bit.
    • #447: the psutil.wait_procs() timeout parameter is now optional.
    • #459: a Makefile is now available for running tests and other repetitive tasks (also on Windows).
    • #463: the timeout parameter of cpu_percent* functions defaults to 0.0, because the previous default was a common source of slowdowns.
    • #340: (Windows) Process.open_files() no longer hangs.
    • #448: (Windows) fixed a memory leak affecting Process.children() and Process.ppid().
    • #461: namedtuples are now pickle-able.
    • #474: (Windows) Process.cpu_percent() is no longer capped at 100%.

Social

Feed