Blog posts for tags/python

Windows wheels available in psutil 2.1.2
21 Sep 2014 Tags: psutil, python, wheels, release, personal

psutil 2.1.2 is out. This release has been cooking for a while now, and that's because I've been travelling for the past 3 months between Spain, Japan and Germany. Hopefully I will be staying in Berlin for a while now, so I will have more time to dedicate to the project. The main new "feature" of this release is that other than the exe files, Windows users can now also benefit from Python wheels (full story is here) which are available on PYPI. Frankly I don't know much about the new wheels packaging system but long story short is that Windows users can now install psutil via pip and therefore also include it as a dependency in requirements.txt. Other than this 2.1.2 can basically be considered a bug-fix release, including some important fixes amongst which:
- #506: restored Python 2.4 compatibility
- #340: Process.get_open_files() no longer hangs on Windows (this was a very old and high-priority issue)
- #501: disk_io_counters() may return negative values on Windows
- #504: (Linux) couldn't build RPM packages via setup.py
The list of all fixes can be found here. For the next release I plan to drop support for Python 2.4 and 2.5 and hopefully add network interfaces information similarly to ifconfig.

Python and sendfile

sendfile(2) is a UNIX system call which provides a "zero-copy" way of copying data from one file descriptor (a file) to another (a socket). Because this copying is done entirely within the kernel, sendfile(2) is more efficient than the combination of file.read() and socket.send(), which requires transferring data to and from user space. This copying of the data twice imposes some performance and resource penalties which the sendfile(2) syscall avoids; it also results in a single system call (and thus only one context switch), rather than the series of read(2) / write(2) system calls (each system call requiring a context switch) used internally for the data copying. A more exhaustive explanation of how sendfile(2) works is available here, but long story short is that sending a file with sendfile() is usually twice as fast as using plain socket.send(). Typical applications which can benefit from using sendfile() are FTP and HTTP servers.

socket.sendfile()¶

I recently contributed a patch for Python's socket module which adds a high-level socket.sendfile() method (see full discussion at BPO-17552). socket.sendfile() will transmit a file until EOF is reached by attempting to use os.sendfile(), if available, else it falls back on using plain socket.send(). Internally, it takes care of handling socket timeouts and provides two optional parameters to move the file offset or to send only a limited amount of bytes. I came up with this idea because getting all of that right is a bit tricky, so a generic wrapper seemed useful. socket.sendfile() will make its appearance in Python 3.5.

sendfile and Python¶

sendfile(2) made its first appearance into the Python stdlib kind of late: Python 3.3. It was contributed by Ross Lagerwall and me in BPO-10882. Since the patch didn't make it into Python 2.X and I wanted to use sendfile() in pyftpdlib (code.google.com/p/pyftpdlib/issues/detail?id=152), I later decided to release it as a stand alone module working with older (2.5+) Python versions (see pysendfile project). Starting with version 3.5, Python will hopefully start using sendfile() more extensively, in detail:

BPO-13563: ftplib
BPO-13559: httplib
asyncio: there are some plans for this even though no actual patch yet, see discussion and BDFL involvement.

Also, Windows provides something similar to sendfile(2): TransmitFile. Now that socket.sendfile() is in place it seems natural to add support for it as well (see BPO-21721).

Backport to Python 2.6 and 2.7¶

For those of you who are interested in using socket.sendfile() with older Python 2.6 and 2.7 versions here's a backport. It requires pysendfile module to be installed. Full code including tests is hosted here.

#!/usr/bin/env python

"""
This is a backport of socket.sendfile() for Python 2.6 and 2.7.
socket.sendfile() will be included in Python 3.5:
http://bugs.python.org/issue17552
Usage:

>>> import socket
>>> file = open("somefile.bin", "rb")
>>> sock = socket.create_connection(("localhost", 8021))
>>> sendfile(sock, file)
42319283
>>>
"""

import errno
import io
import os
import select
import socket
try:
    memoryview  # py 2.7 only
except NameError:
    memoryview = lambda x: x

if os.name == 'posix':
    import sendfile as pysendfile  # requires "pip install pysendfile"
else:
    pysendfile = None


_RETRY = frozenset((errno.EAGAIN, errno.EALREADY, errno.EWOULDBLOCK,
                    errno.EINPROGRESS))


class _GiveupOnSendfile(Exception):
    pass


if pysendfile is not None:

    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        _check_sendfile_params(sock, file, offset, count)
        sockno = sock.fileno()
        try:
            fileno = file.fileno()
        except (AttributeError, io.UnsupportedOperation) as err:
            raise _GiveupOnSendfile(err)  # not a regular file
        try:
            fsize = os.fstat(fileno).st_size
        except OSError:
            raise _GiveupOnSendfile(err)  # not a regular file
        if not fsize:
            return 0  # empty file
        blocksize = fsize if not count else count

        timeout = sock.gettimeout()
        if timeout == 0:
            raise ValueError("non-blocking sockets are not supported")
        # poll/select have the advantage of not requiring any
        # extra file descriptor, contrarily to epoll/kqueue
        # (also, they require a single syscall).
        if hasattr(select, 'poll'):
            if timeout is not None:
                timeout *= 1000
            pollster = select.poll()
            pollster.register(sockno, select.POLLOUT)

            def wait_for_fd():
                if pollster.poll(timeout) == []:
                    raise socket._socket.timeout('timed out')
        else:
            # call select() once in order to solicit ValueError in
            # case we run out of fds
            try:
                select.select([], [sockno], [], 0)
            except ValueError:
                raise _GiveupOnSendfile(err)

            def wait_for_fd():
                fds = select.select([], [sockno], [], timeout)
                if fds == ([], [], []):
                    raise socket._socket.timeout('timed out')

        total_sent = 0
        # localize variable access to minimize overhead
        os_sendfile = pysendfile.sendfile
        try:
            while True:
                if timeout:
                    wait_for_fd()
                if count:
                    blocksize = count - total_sent
                    if blocksize <= 0:
                        break
                try:
                    sent = os_sendfile(sockno, fileno, offset, blocksize)
                except OSError as err:
                    if err.errno in _RETRY:
                        # Block until the socket is ready to send some
                        # data; avoids hogging CPU resources.
                        wait_for_fd()
                    else:
                        if total_sent == 0:
                            # We can get here for different reasons, the main
                            # one being 'file' is not a regular mmap(2)-like
                            # file, in which case we'll fall back on using
                            # plain send().
                            raise _GiveupOnSendfile(err)
                        raise err
                else:
                    if sent == 0:
                        break  # EOF
                    offset += sent
                    total_sent += sent
            return total_sent
        finally:
            if total_sent > 0 and hasattr(file, 'seek'):
                file.seek(offset)
else:
    def _sendfile_use_sendfile(sock, file, offset=0, count=None):
        raise _GiveupOnSendfile(
            "sendfile() not available on this platform")


def _sendfile_use_send(sock, file, offset=0, count=None):
    _check_sendfile_params(sock, file, offset, count)
    if sock.gettimeout() == 0:
        raise ValueError("non-blocking sockets are not supported")
    if offset:
        file.seek(offset)
    blocksize = min(count, 8192) if count else 8192
    total_sent = 0
    # localize variable access to minimize overhead
    file_read = file.read
    sock_send = sock.send
    try:
        while True:
            if count:
                blocksize = min(count - total_sent, blocksize)
                if blocksize <= 0:
                    break
            data = memoryview(file_read(blocksize))
            if not data:
                break  # EOF
            while True:
                try:
                    sent = sock_send(data)
                except OSError as err:
                    if err.errno in _RETRY:
                        continue
                    raise
                else:
                    total_sent += sent
                    if sent < len(data):
                        data = data[sent:]
                    else:
                        break
        return total_sent
    finally:
        if total_sent > 0 and hasattr(file, 'seek'):
            file.seek(offset + total_sent)


def _check_sendfile_params(sock, file, offset, count):
    if 'b' not in getattr(file, 'mode', 'b'):
        raise ValueError("file should be opened in binary mode")
    if not sock.type & socket.SOCK_STREAM:
        raise ValueError("only SOCK_STREAM type sockets are supported")
    if count is not None:
        if not isinstance(count, int):
            raise TypeError(
                "count must be a positive integer (got %s)" % repr(count))
        if count <= 0:
            raise ValueError(
                "count must be a positive integer (got %s)" % repr(count))


def sendfile(sock, file, offset=0, count=None):
    """sendfile(sock, file[, offset[, count]]) -> sent

    Send a *file* over a connected socket *sock* until EOF is
    reached by using high-performance sendfile(2) and return the
    total number of bytes which were sent.
    *file* must be a regular file object opened in binary mode.
    If sendfile() is not available (e.g. Windows) or file is
    not a regular file socket.send() will be used instead.
    *offset* tells from where to start reading the file.
    If specified, *count* is the total number of bytes to transmit
    as opposed to sending the file until EOF is reached.
    File position is updated on return or also in case of error in
    which case file.tell() can be used to figure out the number of
    bytes which were sent.
    The socket must be of SOCK_STREAM type.
    Non-blocking sockets are not supported.
    """
    try:
        return _sendfile_use_sendfile(sock, file, offset, count)
    except _GiveupOnSendfile:
        return _sendfile_use_send(sock, file, offset, count)

Announcing psutil 2.0
10 Mar 2014 Tags: psutil, python, api-design, compatibility, release

psutil 2.0 is out. This is a major rewrite and reorganization of both the Python and C extension modules. It costed me four months of work and more than 22,000 lines (the diff against old 1.2.1). Many of the changes are not backward compatible; I'm sure this will cause some pain, but I think it's for the better and needed to be done.

API changes¶

I already wrote a detailed blog post about this, so use that as the official reference on how to port your code.

RST documentation¶

I've never been happy with the old doc hosted on Google Code. The markup language provided by Google is pretty limited, plus it's not under revision control. The new doc is more detailed, uses reStructuredText as the markup language, lives in the same code repository as psutil, and is hosted on the excellent Read the Docs: http://psutil.readthedocs.org/
Physical CPUs count¶

You're now able to distinguish between logical and physical CPUs. The full story is in #427.
>>> psutil.cpu_count() # logical 4 >>> psutil.cpu_count(logical=False) # physical cores only 2
Process instances are hashable¶

psutil.Process instances can now be compared for equality and used in sets and dicts. The most useful application is diffing process snapshots:
>>> before = set(psutil.process_iter()) >>> # ... some time passes ... >>> after = set(psutil.process_iter()) >>> new_procs = after - before # processes spawned in between
Equality is not just PID-based. It also includes the process creation time, so a Process whose PID got reused by the kernel won't be mistaken for the original. The full story is in #452.
Speedups¶
- #477: Process.cpu_percent() is about 30% faster.
- #478: (Linux) almost all APIs are about 30% faster on Python 3.X.
Other improvements and bugfixes¶
- #424: published Windows installers for Python 3.X 64-bit.
- #447: the psutil.wait_procs() timeout parameter is now optional.
- #459: a Makefile is now available for running tests and other repetitive tasks (also on Windows).
- #463: the timeout parameter of cpu_percent* functions defaults to 0.0, because the previous default was a common source of slowdowns.
- #340: (Windows) Process.open_files() no longer hangs.
- #448: (Windows) fixed a memory leak affecting Process.children() and Process.ppid().
- #461: namedtuples are now pickle-able.
- #474: (Windows) Process.cpu_percent() is no longer capped at 100%.
Reimplementing netstat in Python
10 Mar 2014 Tags: python, network

psutil 2.1.0 is out and with it I finally managed to implement something I've been wanting to have for a long time: netstat-like functionality (see ticket code.google.com/p/psutil/issues/detail?id=387). Similarly to "netstat -antp" on UNIX you can now list system-wide connections in pure python and also determine what process (PID) is using a particular connection:
```
>>> import psutil
>>> from pprint import pprint as pp
>>> pp(psutil.net_connections())
[sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 587), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 6379), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('127.0.1.1', 53), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('10.0.3.1', 53), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 631), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 25), raddr=(), status='LISTEN', pid=None),
 sconn(fd=-1, family=2, type=1, laddr=('0.0.0.0', 3389), raddr=(), status='LISTEN', pid=None),
 sconn(fd=17, family=2, type=1, laddr=('127.0.0.1', 34785), raddr=(), status='LISTEN', pid=3591),
 sconn(fd=15, family=2, type=1, laddr=('127.0.0.1', 56359), raddr=(), status='LISTEN', pid=3591),
 sconn(fd=-1, family=10, type=2, laddr=('::', 56720), raddr=(), status='NONE', pid=None)]
>>>
```
Another monitoring use case: say you want to make sure your HTTP server is running on port 80; you can do something like this:
```
import psutil

def check_listening_port(port):
    """Return True if the given TCP port is busy and in LISTEN mode."""
    for conn in psutil.net_connections(kind='tcp'):
        if conn.laddr[1] == port and conn.status == psutil.CONN_LISTEN:
            return True
    return False

print(check_listening_port(80))
```
Netstat in pure python¶

Here it is, in 65 lines of code: netstat.py. Pretty neat right? ;-)
Implementation(s)¶

As always, each platform required its own, different implementation. Luckily for some platforms (OSX, Windows) I was able to reuse and customize some code from the existing Process.connections() implementation which was already in place. For those of you who are interested in knowing how this was done here's the source code references:
- Linux
- Windows
- FreeBSD
- Solaris
- OSX
Hopefully this will help whoever needs to do this in another language. The only platform where this is sort of clunky is OSX, which does not expose anything to list all system-wide sockets in a single shot, so you're forced to query each process. That means you'll need root privileges otherwise you'll get an access denied error. For what it's worth, I took a look at lsof and it has the same limitation. netstat runs with SUID. Well, I guess this is it. I'll leave you with some docs. For the next one I'm planning on working on a couple of other network-related functionalities: "ifconfig" (code.google.com/p/psutil/issues/detail?id=376) and NIC speeds (code.google.com/p/psutil/issues/detail?id=250). But that's for another time...

Porting your code to psutil 2.0

This blog post is going to be about psutil 2.0, a major release in which I decided to reorganize the existing API for the sake of consistency. At the time of writing, psutil 2.0 is still under development, and the intent of this blog post is to serve as an official reference that describes how you should port your existing code base. In doing so, I will also explain why I decided to make these changes. Even though many APIs will still be available as aliases pointing to the newer ones, the overall changes are numerous and many of them are not backward compatible. I'm sure many people will be sorely bitten, but I think this is for the better and it needed to be done, hopefully for the first and last time.

Module constants turned into functions¶

What changed

Old name	Replacement
`psutil.BOOT_TIME`	`psutil.boot_time()`
`psutil.NUM_CPUS`	`psutil.cpu_count()`
`psutil.TOTAL_PHYMEM`	`psutil.virtual_memory().total`

Why I did it

I already talked about this more extensively in the previous Making constants part of your API is evil blog post. In short: other than introducing unnecessary slowdowns, calculating a module-level constant at import time is dangerous because if something goes wrong the whole app will crash. Also, the represented values may be subject to change (think about the system clock), but the constant cannot be updated. Thanks to this hack, accessing the old constants still works and produces a DeprecationWarning.

Renamed module functions¶

What changed

Old name	Replacement
`psutil.get_boot_time()`	`psutil.boot_time()`
`psutil.get_pid_list()`	`psutil.pids()`
`psutil.get_users()`	`psutil.users()`

Why I did it

They were the only module-level functions with a get_ prefix. None of the others had one.

Renamed Process class methods¶

All methods lost their get_ and set_ prefixes. A single method can now be used for both getting and setting (if a value is passed). Assuming p = psutil.Process():

Old name	Replacement
`p.get_children()`	`p.children()`
`p.get_connections()`	`p.connections()`
`p.get_cpu_affinity()`	`p.cpu_affinity()`
`p.get_cpu_percent()`	`p.cpu_percent()`
`p.get_cpu_times()`	`p.cpu_times()`
`p.get_io_counters()`	`p.io_counters()`
`p.get_ionice()`	`p.ionice()`
`p.get_memory_info()`	`p.memory_info()`
`p.get_ext_memory_info()`	`p.memory_info_ex()`
`p.get_memory_maps()`	`p.memory_maps()`
`p.get_memory_percent()`	`p.memory_percent()`
`p.get_nice()`	`p.nice()`
`p.get_num_ctx_switches()`	`p.num_ctx_switches()`
`p.get_num_fds()`	`p.num_fds()`
`p.get_num_threads()`	`p.num_threads()`
`p.get_open_files()`	`p.open_files()`
`p.get_rlimit()`	`p.rlimit()`
`p.get_threads()`	`p.threads()`
`p.getcwd()`	`p.cwd()`

...as for set_* methods:

Old name	Replacement
`p.set_cpu_affinity()`	`p.cpu_affinity(cpus)`
`p.set_ionice()`	`p.ionice(ioclass, value=None)`
`p.set_nice()`	`p.nice(value)`
`p.set_rlimit()`	`p.rlimit(resource, limits=None)`

Why I did it

I wanted to be consistent with system-wide module-level functions, which have no get_ prefix. After I got rid of the get_ prefixes, removing set_ too seemed natural and helped reduce the number of methods.

Process properties are now methods¶

What changed

Assuming p = psutil.Process():

Old name	Replacement
`p.cmdline`	`p.cmdline()`
`p.create_time`	`p.create_time()`
`p.exe`	`p.exe()`
`p.gids`	`p.gids()`
`p.name`	`p.name()`
`p.parent`	`p.parent()`
`p.ppid`	`p.ppid()`
`p.status`	`p.status()`
`p.uids`	`p.uids()`
`p.username`	`p.username()`

Why I did it

Different reasons:

Having a mixed API that uses both properties and methods for no particular reason is confusing and messy, because you don't know whether to use () or not.
A property is usually expected not to perform heavy computations internally, whereas psutil invokes a function every time it is accessed. This has two drawbacks:
- You may get an exception just by accessing the property (e.g. p.name may raise NoSuchProcess or AccessDenied).
- You may erroneously think properties are cached, but this is true only for name, exe, and create_time.

CPU percent intervals¶

What changed

The timeout parameter of cpu_percent* functions now defaults to 0.0 instead of 0.1. The functions affected are:

Process.cpu_percent()
psutil.cpu_percent()
psutil.cpu_times_percent()

Why I did it

I originally set 0.1 as the default timeout because you need to wait some time in order to get a meaningful percent value. Having an API that "sleeps" by default is risky, though, because it's easy to forget it does so. That is particularly problematic when calling Process.cpu_percent() for all processes: it's very easy to forget to specify timeout=0, resulting in dramatic slowdowns that are hard to spot. For example, this code snippet might take a variable number of seconds to complete depending on the number of active processes:

>>> # this will be slow
>>> for p in psutil.process_iter():
...    print(p.cpu_percent())

Migration strategy¶

Except for Process properties (name, exe, cmdline, etc.), all the old APIs are still available as aliases pointing to the newer names and raising DeprecationWarning. psutil will be very clear on what you should use instead of the deprecated API, as long as you start the interpreter with the -Wd option. This will enable deprecation warnings, which were silenced in Python 2.7 (IMHO, from a developer standpoint this was a bad decision).

giampaolo@ubuntu:/tmp$ python -Wd
Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil
>>> psutil.get_pid_list()
__main__:1: DeprecationWarning: psutil.get_pid_list is deprecated; use psutil.pids() instead
[1, 2, 3, 6, 7, 13, ...]
>>>
>>>
>>> p = psutil.Process()
>>> p.get_cpu_times()
__main__:1: DeprecationWarning: get_cpu_times() is deprecated; use cpu_times() instead
pcputimes(user=0.08, system=0.03)
>>>

If you have a solid test suite, you can run tests and fix the warnings one by one. As for the Process properties that were turned into methods, it's more difficult because, whereas psutil 1.2.1 returns the actual value, psutil 2.0.0 returns the bound method:

# psutil 1.2.1
>>> psutil.Process().name
'python'
>>>

# psutil 2.0.0
>>> psutil.Process().name
<bound method Process.name of psutil.Process(pid=19816, name='python') at 139845631328144>
>>>

What I would recommend, if you want to drop support for 1.2.1, is to grep for ".name", ".exe", etc. and just replace them with ".exe()" and ".name()" one by one. If, on the other hand, you want to write code that works with both versions, I see two possibilities:

#1 check version info, like this:

>>> PSUTIL2 = psutil.version_info >= (2, 0)
>>> p = psutil.Process()
>>> name = p.name() if PSUTIL2 else p.name
>>> exe = p.exe() if PSUTIL2 else p.exe

#2 get rid of all ".name", ".exe" occurrences you have in your code and use Process.as_dict() instead:

>>> p = psutil.Process()
>>> pinfo = p.as_dict(attrs=["name", "exe"])
>>> pinfo
{'exe': '/usr/bin/python2.7', 'name': 'python'}
>>> name = pinfo['name']
>>> exe = pinfo['exe']

New features introduced in 2.0.0¶

psutil 2.0.0 is not only about code breakage. I also had the chance to integrate a bunch of interesting features.

#427: you're now able to distinguish between the number of logical and physical CPUs:

>>> psutil.cpu_count()  # logical
4
>>> psutil.cpu_count(logical=False)  # physical cores only
2

#452: Process instances are now hashable and can be checked for equality. That means you can use Process objects with sets (finally!).
#447: the timeout parameter of psutil.wait_procs() is now optional.
#461: functions returning namedtuples are now picklable.
#459: a Makefile is now available to automate repetitive tasks such as build, install, running tests, etc. There's also a make.bat for Windows.
Introduced the unittest2 module as a requirement for running tests.

Giampaolo Rodola Python enthusiast, core developer, psutil author

Blog posts for tags/python

Windows wheels available in psutil 2.1.2

Python and sendfile

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

Announcing psutil 2.0

API changes¶

RST documentation¶

Physical CPUs count¶

Process instances are hashable¶

Speedups¶

Other improvements and bugfixes¶

Reimplementing netstat in Python

Netstat in pure python¶

Implementation(s)¶

Porting your code to psutil 2.0

Module constants turned into functions¶

Renamed module functions¶

Renamed Process class methods¶

Process properties are now methods¶

CPU percent intervals¶

Migration strategy¶

New features introduced in 2.0.0¶

Feed

Blog posts for tags/python

socket.sendfile()¶

sendfile and Python¶

Backport to Python 2.6 and 2.7¶

API changes¶

RST documentation¶

Physical CPUs count¶

Process instances are hashable¶

Speedups¶

Other improvements and bugfixes¶

Netstat in pure python¶

Implementation(s)¶

Module constants turned into functions¶

Renamed module functions¶

Renamed Process class methods¶

Process properties are now methods¶

CPU percent intervals¶

Migration strategy¶

New features introduced in 2.0.0¶

Social

Feed