Newsletter

Enter your email address to receive blog updates:

  1. Reimplementing netstat in Python

    psutil 2.1.0 is out and with it I finally managed to implement something I've been wanting to have for a long time: netstat-like functionalities (see ticket). Similarly to "netstat -antp" on UNIX you can now list system-wide connections in pure python and also determine what process (PID) is using a particular connection:

    >>> import psutil
    >>> from pprint import pprint as pp
    >>> pp(psutil.net_connections())
    [sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 587), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 6379), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.1.1', 53), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('10.0.3.1', 53), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 631), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('127.0.0.1', 25), raddr=(), status='LISTEN', pid=None),
     sconn(fd=-1, family=2, type=1, laddr=('0.0.0.0', 3389), raddr=(), status='LISTEN', pid=None),
     sconn(fd=17, family=2, type=1, laddr=('127.0.0.1', 34785), raddr=(), status='LISTEN', pid=3591),
     sconn(fd=15, family=2, type=1, laddr=('127.0.0.1', 56359), raddr=(), status='LISTEN', pid=3591),
     sconn(fd=-1, family=10, type=2, laddr=('::', 56720), raddr=(), status='NONE', pid=None)]
    >>>
    

    This is yet another functionality which can be used for monitoring purposes. For example, say you want to make sure your HTTP server is running on port 80, you can do something like this:

    import psutil
    
    def check_listening_port(port):
        """Return True if the given TCP port is busy and in LISTEN mode."""
        for conn in psutil.net_connections(kind='tcp'):
            if conn.laddr[1] == port and conn.status == psutil.CONN_LISTEN:
                return True
        return False
    
    print(check_listening_port(80))
    

    Netstat in pure python

    Here it is, in 65 lines of code: netstat.py. Pretty neat right? ;-)

    Implementation(s)

    As always, each platform required its own, different, implementation. Luckily for some platforms (OSX, Windows) I was able to reuse and customize some code from the existing Process.connections() implementation which was already in place. For those of you who are interested in knowing how this was done here's the source code references:

    Hopefully this will help whoever needs to do this into another language. The only platform where this is sort of clunky is OSX, which does not expose anything to list all system-wide sockets in a single shot, so you're forced to query each process. That means you'll need root privileges otherwise you'll get an access denied error. For what it's worth, I took a look at lsof and it has the same limitation and netstat runs with SUID. Well, I guess this is it. I'll leave you with some docs. For the next one I'm planning on working on a couple of other network-related functionalities: "ifconfig" and NIC speeds. But that's for another time...

  2. psutil 2.0 API redesign

    This my second blog post is going to be about psutil 2.0, a major release in which I decided to reorganize the existing API for the sake of consistency. At the time of writing psutil 2.0 is still under development and the intent of this blog post is to serve as an official reference which describes how you should port your existent code base. In doing so I will also explain why I decided to make these changes. Despite many APIs will still be available as aliases pointing to the newer ones, the overall changes are numerous and many of them are not backward compatible. I'm sure many people will be sorely bitten but I think this is for the better and it needed to be done, hopefully for the first and last time. OK, here we go now.

    Module constants turned into functions

    What changed

    Old name Replacement
    psutil.BOOT_TIME psutil.boot_time()
    psutil.NUM_CPUS psutil.cpu_count()
    psutil.TOTAL_PHYMEM psutil.virtual_memory().total

    Why I did it

    I already talked about this more extensively in this blog post. In short: other than introducing unnecessary slowdowns, calculating a module level constant at import time is dangerous because in case of error the whole app will crash. Also, the represented values may be subject to change (think about the system clock) but the constant cannot be updated. Thanks to this hack accessing the old constants still works and produces a DeprecationWarning.

    Module functions renamings

    What changed

    Old name Replacement
    psutil.get_boot_time() psutil.boot_time()
    psutil.get_pid_list() psutil.pids()
    psutil.get_users() psutil.users()

    Why I did it

    They were the only module level functions having a get_ prefix. All others do not.

    Process class' methods renamings

    All methods lost their get_ and set_ prefixes. A single method can now be used for both getting and setting (if a value is passed). Assuming p = psutil.Process():

    Old name Replacement
    p.get_children() p.children()
    p.get_connections() p.connections()
    p.get_cpu_affinity() p.cpu_affinity()
    p.get_cpu_percent() p.cpu_percent()
    p.get_cpu_times() p.cpu_times()
    p.get_io_counters() p.io_counters()
    p.get_ionice() p.ionice()
    p.get_memory_info() p.memory_info()
    p.get_ext_memory_info() p.memory_info_ex()
    p.get_memory_maps() p.memory_maps()
    p.get_memory_percent() p.memory_percent()
    p.get_nice() p.nice()
    p.get_num_ctx_switches() p.num_ctx_switches()
    p.get_num_fds() p.num_fds()
    p.get_num_threads() p.num_threads()
    p.get_open_files() p.open_files()
    p.get_rlimit() p.rlimit()
    p.get_threads() p.threads()
    p.getcwd() p.cwd()

    ...as for set_* methods:

    Old name Replacement
    p.set_cpu_affinity() p.cpu_affinity(cpus)
    p.set_ionice() p.ionice(ioclass, value=None)
    p.set_nice() p.nice(value)
    p.set_rlimit() p.rlimit(resource, limits=None)

    Why I did it

    I wanted to be consistent with system-wide module level functions which have no get_ prefix. After I got rid of get_ prefixes removing also set_ seemed natural and helped diminish the number of methods.

    Process properties are now methods

    What changed

    Assuming p = psutil.Process():

    Old name Replacement
    p.cmdline p.cmdline()
    p.create_time p.create_time()
    p.exe p.exe()
    p.gids p.gids()
    p.name p.name()
    p.parent p.parent()
    p.ppid p.ppid()
    p.status p.status()
    p.uids p.uids()
    p.username p.username()

    Why I did it

    Different reasons:

    • Having a mixed API which uses both properties and methods for no particular reason is confusing and messy as you don't know whether to use "()" or not (see here).
    • It is usually expected from a property to not perform many computations internally whereas psutil actually invokes a function every time it is accessed. This has two drawbacks: * you may get an exception just by accessing the property (e.g. "p.name" may raise NoSuchProcess or AccessDenied) * you may erroneously think properties are cached but this is true only for name, exe, and create_time.

    CPU percent intervals

    What changed

    The timeout parameter of cpu_percent* functions now defaults to 0.0 instead of 0.1. The functions affected are:

    • psutil.Process.cpu_percent()
    • psutil.cpu_percent()
    • psutil.cpu_times_percent()

    Why I did it

    I originally set 0.1 as the default timeout because in order to get a meaningful percent value you need to wait some time. Having an API which "sleeps" by default is risky though, because it's easy to forget it does so. That is particularly problematic when calling cpu_percent() for all processes: it's very easy to forget about specifying timeout=0 resulting in dramatic slowdowns which are hard to spot. For example, this code snippet might take different seconds to complete depending on the number of active processes:

    >>> # this will be slow
    >>> for p in psutil.process_iter():
    ...    print(p.cpu_percent())
    

    Migration strategy

    Except for Process properties (name, exe, cmdline, etc.) all the old APIs are still available as aliases pointing to the newer names and raising DeprecationWarning. psutil will be very clear on what you should use instead of the deprecated API as long you start the interpreter with the "-Wd" option. This will enable deprecation warnings which were silenced in Python 2.7 (IMHO, from a developer standpoint this was a bad decision).

    giampaolo@ubuntu:/tmp$ python -Wd
    Python 2.7.3 (default, Sep 26 2013, 20:03:06)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import psutil
    >>> psutil.get_pid_list()
    __main__:1: DeprecationWarning: psutil.get_pid_list is deprecated; use psutil.pids() instead
    [1, 2, 3, 6, 7, 13, ...]
    >>>
    >>>
    >>> p = psutil.Process()
    >>> p.get_cpu_times()
    __main__:1: DeprecationWarning: get_cpu_times() is deprecated; use cpu_times() instead
    pcputimes(user=0.08, system=0.03)
    >>>
    

    If you have a solid test suite you can run tests and fix the warnings one by one. As for the the Process properties which were turned into methods it's more difficult because whereas psutil 1.2.1 returns the actual value, psutil 2.0.0 will return the bound method:

    # psutil 1.2.1
    >>> psutil.Process().name
    'python'
    >>>
    
    # psutil 2.0.0
    >>> psutil.Process().name
    <bound method Process.name of psutil.Process(pid=19816, name='python') at 139845631328144>
    >>>
    

    What I would recommend if you want to drop support with 1.2.1 is to grep for ".name", ".exe" etc. and just replace them with ".exe()" and ".name()" one by one. If on the other hand you want to write a code which works with both versions I see two possibilities:

    • #1 check version info, like this:
    >>> PSUTIL2 = psutil.version_info >= (2, 0)
    >>> p = psutil.Process()
    >>> name = p.name() if PSUTIL2 else p.name
    >>> exe = p.exe() if PSUTIL2 else p.exe
    
    • #2 get rid of all ".name", ".exe" occurrences you have in your code and use as_dict() instead:
    >>> p = psutil.Process()
    >>> pinfo = p.as_dict(attrs=["name", "exe"])
    >>> pinfo
    {'exe': '/usr/bin/python2.7', 'name': 'python'}
    >>> name = pinfo['name']
    >>> exe = pinfo['exe']
    

    New features introduced in 2.0.0

    Ok, enough with the bad news. =) psutil 2.0.0 is not only about code breakage. I also had the chance to integrate a bunch of interesting features.

    • #427: you're now able to distinguish between the number of logical and physical CPUs:
    >>> psutil.cpu_count()  # logical
    4
    >>> psutil.cpu_count(logical=False)  # physical cores only
    2
    
    • #452: process classes are now hashable and can be checked for equality. That means you can use Process objects with sets (finally!).
    • #447: psutil.wait_procs() "timeout" parameter is now optional
    • #461: functions returning namedtuples are now pickle-able
    • #459: a Makefile is now available to automatize repetitive tasks such as build, install, running tests etc. There's also a make.bat for Windows.
    • introduced unittest2 module as a requirement for running tests
  3. Making constants part of your API is evil

    One of the initial features which were included in psutil since day one (5 years ago) were system's boot time, number of CPUs and total physical memory. These metrics have one thing in common: they are (apparently) not supposed to change over time. That is why we (me and Jay) decided that exposing them as module constants calculated at import time was the way to go.

    >>> import psutil
    >>> psutil.NUM_CPUS
    2
    >>> psutil.BOOT_TIME  # as seconds since the epoch
    1387579049.799092
    >>> psutil.TOTAL_PHYMEM
    8374120448
    

    5 years later I regret that decision and I'm going to explain you why you don't want to do the same mistake.

    A constant should not change

    When we think of 'constants', our expectations are that they should not change over time. It may be obvious, but before thinking about introducing a constant be absolutely sure the value it represents is going to remain the same. Now, back then we thought these 3 metrics were not supposed to change, at least until the system is rebooted. Well, we were wrong: it turns out 2 of them actually can. Apparently virtualized systems can change physical installed memory at runtime (see here and here) and system boot time can easily be altered every time you update the system clock. In both of these cases, of course, the constants will not reflect the updated values.

    Doing things at import time is dangerous

    That's because import time usually means startup time and if something goes wrong the whole application will crash. In general the only reason for a module to crash at import time is because of a missing dependancy or because it's not supposed to run on that platform in the first place. Now, here's a couple of bug reports which were collected over time: issue 188, issue 313. The inconvenience was so severe that I had to release different fixed versions ASAP and the fix consisted of a stinky workaround. That's when I started thinking about getting rid of those constants once and for all and introduce functions instead. But how to do that without breaking everybody's code?

    Backward compatibility matters

    Now here's the crucial part: every time you deliver a library to someone else you just cannot remove an API all of the sudden, especially if they are 3 and have been around since day one. It should first be deprecated, possibly turned into an alias pointing to a newer API and finally be removed after 1 or 2 major releases. Also, you want the deprecated API to explicitly raise a DeprecationWarning informing the user he's relying on something which will eventually be removed. With a module constant you cannot do any of that. What you would need is a module property.

    Module properties

    One of the greatest things about Python is that it's so dynamic that it lets you do all sort of nasty things with objects, including injecting names into modules (which are also objects) and make them behave like actual class properties! For this I have to thank Josiah Carlson who came up with this very simple yet effective solution:

    class ModuleWrapper(object):
    
        def __repr__(self):
            return repr(self._module)
        __str__ = __repr__
    
        @property
        def NUM_CPUS(self):
            msg = "NUM_CPUS constant is deprecated; use cpu_count() instead"
            warnings.warn(msg, category=DeprecationWarning, stacklevel=2)
            return cpu_count()
    
        @property
        def BOOT_TIME(self):
            msg = "BOOT_TIME constant is deprecated; " \
                  "use get_boot_time() instead"
            warnings.warn(msg, category=DeprecationWarning, stacklevel=2)
            return get_boot_time()
    
        @property
        def TOTAL_PHYMEM(self):
            msg = "TOTAL_PHYMEM constant is deprecated; " \
                  "use virtual_memory().total instead"
            warnings.warn(msg, category=DeprecationWarning, stacklevel=2)
            return virtual_memory().total
    
    mod = ModuleWrapper()
    mod.__dict__ = globals()
    mod._module = sys.modules[__name__]
    sys.modules[__name__] = mod
    

    You can put this at the bottom of your module and you'll have fully working module constants (tested on Python from 2.4 to 3.4)!

    EDIT: the only reason I applied this hack is to turn the old constants into aliases pointing to the newly introduced functions and produce a deprecation warning. That aside I can't think of a case where the use of a module property would be justified.

Social

Feeds