Blog posts for tags/memory

  1. Detecting memory leaks in C extensions with psutil and psleak

    Memory leaks in Python are usually straightforward to diagnose. Just look at RSS, track Python object counts, follow reference graphs, etc. But leaks inside C extension modules are another story. Traditional memory metrics such as RSS and VMS fail to reveal them because Python's memory allocator (pymalloc) sits above the platform's native heap. If something in an extension calls malloc() without a corresponding free(), that memory often won't show up in RSS / VMS. You have a leak, and you don't know.

    psutil 7.2.0 introduces two new APIs for C heap introspection, designed specifically to catch these kinds of native leaks. They give you a window directly into the underlying platform allocator (e.g. glibc's malloc), letting you track how much memory the C layer actually allocates. If your RSS is flat but your C heap usage climbs, you now have a way to see it.

    Why native heap introspection matters

    Many Python projects rely on C extensions: psutil, NumPy, pandas, PIL, lxml, psycopg, PyTorch, custom in-house modules, etc. And even CPython itself, which implements many of its standard library modules in C. If any of these components mishandle memory at the C level, you get a leak that doesn't show up in:

    • Python reference counts (sys.getrefcount).
    • tracemalloc module.
    • Python's gc stats.
    • RSS, VMS or USS due to allocator caching, especially for small objects. This can happen, for example, when you forget to Py_DECREF a Python object.

    psutil's new functions let you query the allocator (e.g. glibc) directly, returning low-level metrics from the platform's native heap.

    heap_info(): direct allocator statistics

    psutil.heap_info() exposes the following metrics:

    • heap_used: total number of bytes currently allocated via malloc() (small allocations).
    • mmap_used: total number of bytes currently allocated via mmap() or via large malloc() allocations.
    • heap_count: (Windows only) number of private heaps created via HeapCreate().

    Example:

    >>> import psutil
    >>> psutil.heap_info()
    pheap(heap_used=5177792, mmap_used=819200)
    

    Reference for what contributes to each field:

    Platform Allocation type Field affected
    UNIX / Windows small malloc() ≤128 KB without free() heap_used
    UNIX / Windows large malloc() >128 KB without free(), or mmap() without munmap() (UNIX) mmap_used
    Windows HeapAlloc() without HeapFree() heap_used
    Windows VirtualAlloc() without VirtualFree() mmap_used
    Windows HeapCreate() without HeapDestroy() heap_count

    heap_trim(): returning unused heap memory

    psutil.heap_trim() provides a cross-platform way to request that the underlying allocator free any unused memory it's holding in the heap (typically small malloc() allocations).

    In practice, modern allocators rarely comply, so this is not a general-purpose memory-reduction tool and won't meaningfully shrink RSS in real programs. Its primary value is in leak detection tools. Calling psutil.heap_trim() before taking measurements helps reduce allocator noise, giving you a cleaner baseline so that changes in heap_used come from the code you're testing, not from internal allocator caching or fragmentation.

    Real-world use: finding a C extension leak

    The workflow is simple:

    1. Take a baseline snapshot of the heap.
    2. Call the C extension hundreds of times.
    3. Take another snapshot.
    4. Compare.
    import psutil
    
    psutil.heap_trim()  # reduce noise
    
    before = psutil.heap_info()
    for _ in range(200):
        my_cext_function()
    after = psutil.heap_info()
    
    print("delta heap_used =", after.heap_used - before.heap_used)
    print("delta mmap_used =", after.mmap_used - before.mmap_used)
    

    If heap_used or mmap_used values increase consistently, you've found a native leak.

    To reduce false positives, repeat the test multiple times, increasing the number of calls on each retry. This approach helps distinguish real leaks from random noise or transient allocations.

    A new tool: psleak

    The strategy described above is exactly what I implemented in a new PyPI package, which I called psleak. It runs the target function repeatedly, trims the allocator before each run, and tracks differences across retries. Memory that grows consistently after several runs is flagged as a leak.

    A minimal test suite looks like this:

    from psleak import MemoryLeakTestCase
    
    class TestLeaks(MemoryLeakTestCase):
        def test_fun(self):
            self.execute(some_c_function)
    

    If the function leaks memory, the test will fail with a descriptive exception:

    psleak.MemoryLeakError: memory kept increasing after 10 runs
    Run # 1: heap=+388160  | uss=+356352  | rss=+327680  | (calls= 200, avg/call=+1940)
    Run # 2: heap=+584848  | uss=+614400  | rss=+491520  | (calls= 300, avg/call=+1949)
    Run # 3: heap=+778320  | uss=+782336  | rss=+819200  | (calls= 400, avg/call=+1945)
    Run # 4: heap=+970512  | uss=+1032192 | rss=+1146880 | (calls= 500, avg/call=+1941)
    Run # 5: heap=+1169024 | uss=+1171456 | rss=+1146880 | (calls= 600, avg/call=+1948)
    Run # 6: heap=+1357360 | uss=+1413120 | rss=+1310720 | (calls= 700, avg/call=+1939)
    Run # 7: heap=+1552336 | uss=+1634304 | rss=+1638400 | (calls= 800, avg/call=+1940)
    Run # 8: heap=+1752032 | uss=+1781760 | rss=+1802240 | (calls= 900, avg/call=+1946)
    Run # 9: heap=+1945056 | uss=+2031616 | rss=+2129920 | (calls=1000, avg/call=+1945)
    Run #10: heap=+2140624 | uss=+2179072 | rss=+2293760 | (calls=1100, avg/call=+1946)
    

    Psleak is now part of the psutil test suite. All psutil APIs are tested (see test_memleaks.py), making it a de facto regression-testing tool.

    It's worth noting that without inspecting heap metrics, missing calls in the C code such as Py_CLEAR and Py_DECREF often go unnoticed, because they don't affect RSS, VMS, and USS. I confirmed this by commenting them out. Monitoring the heap is therefore essential to reliably detect memory leaks in Python C extensions.

    Under the hood

    For those interested in seeing how I did this in terms of code:

    • Linux: uses glibc's mallinfo2() to report uordblks (heap allocations) and hblkhd (mmap-backed blocks).
    • Windows: enumerates heaps and aggregates HeapAlloc / VirtualAlloc usage.
    • macOS: uses malloc zone statistics.
    • BSD: uses jemalloc's arena and stats interfaces.

    References

    • psleak, the new memory leak testing framework.
    • PR-2692, the implementation.
    • #1275, the original proposal from 8 years earlier.
  2. Improved Linux memory metrics

    OK, another psutil release. The headline of 4.4.0 is more accurate memory metrics on Linux, plus a pile of macOS fixes I'd been sitting on for years.

    Linux virtual memory

    People had been complaining for a while that virtual_memory() didn't match what free reported on Linux (#862, #685, #538). I finally dug into it (#887, PR-890) and, funny enough, it turns out free itself was doing it wrong until about two years ago, when somebody got tired of everyone guessing and moved the calculation into the kernel.

    Starting with Linux 3.14, /proc/meminfo has a MemAvailable column. That's what psutil now uses, and the available / used fields match free exactly. On older kernels (< 3.14) psutil falls back to the same formula that kernel commit introduced.

    free's source code also inspired a fix that prevents available memory from overflowing total memory on LXC containers.

    macOS fixes

    For years I did psutil's macOS development on an old 10.5 install emulated via VirtualBox, running iDeneb (a hacked macOS). I finally got access to a more recent version (El Capitan) via VirtualBox + Vagrant, and could address a pile of long-standing bugs:

    • #514: Process.memory_maps() segfault (critical).
    • #783: Process.status() could return "running" for zombie processes.
    • #908: several methods could mask the real error for high-privileged PIDs, raising NoSuchProcess / AccessDenied instead of OSError / RuntimeError.
    • #909: Process.open_files() and Process.connections() could raise OSError with no exception set when the process was gone.
    • #916: fixed many compilation warnings.

    NIC netmask on Windows

    Small but nice: net_if_addrs() on Windows now returns the netmask too.

    Improved procinfo.py

    scripts/procinfo.py is my kitchen-sink demo script. I taught it a bunch of new tricks, so it now dumps pretty much everything psutil knows about a process:

    $ python scripts/procinfo.py
    pid           4600
    name          chrome
    parent        4554 (bash)
    exe           /opt/google/chrome/chrome
    cwd           /home/giampaolo
    cmdline       /opt/google/chrome/chrome
    started       2016-09-19 11:12
    cpu-tspent    27:27.68
    cpu-times     user=8914.32, system=3530.59,
                  children_user=1.46, children_system=1.31
    cpu-affinity  [0, 1, 2, 3, 4, 5, 6, 7]
    memory        rss=520.5M, vms=1.9G, shared=132.6M, text=95.0M, lib=0B,
                  data=816.5M, dirty=0B
    memory %      3.26
    user          giampaolo
    uids          real=1000, effective=1000, saved=1000
    terminal      /dev/pts/2
    status        sleeping
    nice          0
    ionice        class=IOPriority.IOPRIO_CLASS_NONE, value=0
    num-threads   47
    num-fds       379
    I/O           read_count=96.6M, write_count=80.7M,
                  read_bytes=293.2M, write_bytes=24.5G
    ctx-switches  voluntary=30426463, involuntary=460108
    children      PID    NAME
                  4605   cat
                  4606   cat
                  4609   chrome
                  4669   chrome
    open-files    PATH
                  /opt/google/chrome/icudtl.dat
                  /opt/google/chrome/snapshot_blob.bin
                  /opt/google/chrome/natives_blob.bin
                  /opt/google/chrome/chrome_100_percent.pak
                  [...]
    connections   PROTO LOCAL ADDR            REMOTE ADDR               STATUS
                  UDP   10.0.0.3:3693         *:*                       NONE
                  TCP   10.0.0.3:55102        172.217.22.14:443         ESTABLISHED
                  UDP   10.0.0.3:35172        *:*                       NONE
                  TCP   10.0.0.3:32922        172.217.16.163:443        ESTABLISHED
                  UDP   :::5353               *:*                       NONE
                  UDP   10.0.0.3:59925        *:*                       NONE
    threads       TID              USER          SYSTEM
                  11795             0.7            1.35
                  11796            0.68            1.37
                  15887            0.74            0.03
                  19055            0.77            0.01
                  [...]
                  total=47
    res-limits    RLIMIT                     SOFT       HARD
                  virtualmem             infinity   infinity
                  coredumpsize                  0   infinity
                  cputime                infinity   infinity
                  datasize               infinity   infinity
                  filesize               infinity   infinity
                  locks                  infinity   infinity
                  memlock                   65536      65536
                  msgqueue                 819200     819200
                  nice                          0          0
                  openfiles                  8192      65536
                  maxprocesses              63304      63304
                  rss                    infinity   infinity
                  realtimeprio                  0          0
                  rtimesched             infinity   infinity
                  sigspending               63304      63304
                  stack                   8388608   infinity
    mem-maps      RSS      PATH
                  381.4M   [anon]
                  62.8M    /opt/google/chrome/chrome
                  15.8M    /home/giampaolo/.config/google-chrome/Default/History
                  6.6M     /home/giampaolo/.config/google-chrome/Default/Favicons
                  [...]
    

    Other changes

    The full list is in the changelog.

  3. Real process memory and environ in Python

    psutil 4.0.0 is out, with some interesting news about process memory metrics. I'll get straight to the point and describe what's new.

    "Real" process memory info

    Determining how much memory a process really uses is not an easy matter (see this and this). RSS (Resident Set Size), which most people rely on, is misleading because it includes both memory unique to the process and memory shared with others. What's more interesting for profiling is the memory that would be freed if the process were terminated right now. In the Linux world this is called USS (Unique Set Size), the major feature introduced in psutil 4.0.0 (not only for Linux but also for Windows and macOS).

    USS memory

    The USS (Unique Set Size) is the memory unique to a process, that would be freed if the process were terminated right now. On Linux it can be determined by parsing the "private" blocks in /proc/PID/smaps. The Firefox team pushed this further and got it working on macOS and Windows too.

    >>> psutil.Process().memory_full_info()
    pfullmem(rss=101990, vms=521888, shared=38804, text=28200, lib=0, data=59672, dirty=0, uss=81623, pss=91788, swap=0)
    

    PSS and swap

    On Linux there are two additional metrics that can also be determined via /proc/PID/smaps: PSS and swap.

    pss, aka "Proportional Set Size", represents the amount of memory shared with other processes, accounted so that the amount is divided evenly between the processes that share it. I.e. if a process has 10 MBs all to itself (USS) and 10 MBs shared with another process, its PSS will be 15 MBs.

    swap is simply the amount of memory that has been swapped out to disk. With Process.memory_full_info() it is possible to implement a tool like procsmem.py, similar to smem on Linux, which provides a list of processes sorted by uss. It's interesting to see how rss differs from uss:

    ~/svn/psutil$ ./scripts/procsmem.py
    PID     User    Cmdline                            USS     PSS    Swap     RSS
    ==============================================================================
    ...
    3986    giampao /usr/bin/python3 /usr/bin/indi   15.3M   16.6M      0B   25.6M
    3906    giampao /usr/lib/ibus/ibus-ui-gtk3       17.6M   18.1M      0B   26.7M
    3991    giampao python /usr/bin/hp-systray -x    19.0M   23.3M      0B   40.7M
    3830    giampao /usr/bin/ibus-daemon --daemoni   19.0M   19.0M      0B   21.4M
    20529   giampao /opt/sublime_text/plugin_host    19.9M   20.1M      0B   22.0M
    3990    giampao nautilus -n                      20.6M   29.9M      0B   50.2M
    3898    giampao /usr/lib/unity/unity-panel-ser   27.1M   27.9M      0B   37.7M
    4176    giampao /usr/lib/evolution/evolution-c   35.7M   36.2M      0B   41.5M
    20712   giampao /usr/bin/python -B /home/giamp   45.6M   45.9M      0B   49.4M
    3880    giampao /usr/lib/x86_64-linux-gnu/hud/   51.6M   52.7M      0B   61.3M
    20513   giampao /opt/sublime_text/sublime_text   65.8M   73.0M      0B   87.9M
    3976    giampao compiz                          115.0M  117.0M      0B  130.9M
    32486   giampao skype                           145.1M  147.5M      0B  149.6M
    

    Implementation

    To get these values (uss, pss and swap) we need to walk the whole process address space. This usually requires higher privileges and is considerably slower than Process.memory_info(), which is probably why tools like ps and top show RSS/VMS instead of USS. A big thanks goes to the Mozilla team for figuring this out on Windows and macOS, and to Eric Rahm who put the psutil PRs together (see PR-744, PR-745 and PR-746). If you don't use Python and want to port the code to another language, here are the interesting parts:

    Memory type percent

    After reorganizing the process memory APIs (PR-744), I added a new memtype parameter to Process.memory_percent(). You can now compare a specific memory type (not only RSS) against the total physical memory. E.g.

    >>> psutil.Process().memory_percent(memtype='pss')
    0.06877466326787016
    

    Process environ

    The second biggest improvement in psutil 4.0.0 is the ability to read a process's environment variables. This opens up interesting possibilities for process recognition and monitoring. For instance, you can start a process with a custom environment variable, then iterate over all processes to find the one of interest:

    import psutil
    for p in psutil.process_iter():
        try:
            env = p.environ()
        except psutil.Error:
            pass
        else:
            if 'MYAPP' in env:
                ...
    

    Process environ was a long-standing issue (#52, from 2009) that I gave up on because the Windows implementation only worked for the current process. Frank Benkstein solved that (PR-747), and it now works on Linux, Windows and macOS for all processes (you may still hit AccessDenied for processes owned by another user):

    >>> import psutil
    >>> from pprint import pprint as pp
    >>> pp(psutil.Process().environ())
    {...
     'CLUTTER_IM_MODULE': 'xim',
     'COLORTERM': 'gnome-terminal',
     'COMPIZ_BIN_PATH': '/usr/bin/',
     'HOME': '/home/giampaolo',
     'PWD': '/home/giampaolo/svn/psutil',
      }
    >>>
    

    Note that the resulting dict usually doesn't reflect changes made after the process started (e.g. os.environ['MYAPP'] = '1'). Again, for anyone porting this to other languages, here are the interesting parts:

    Extended disk IO stats

    psutil.disk_io_counters() now reports additional metrics on Linux and FreeBSD:

    • busy_time: the time spent doing actual I/Os (in milliseconds).
    • read_merged_count and write_merged_count (Linux only): the number of merged reads and writes (see the iostats doc).

    These give a better picture of actual disk utilization (#756), similar to the iostat command on Linux.

    OS constants

    Given the growing number of platform-specific metrics, I added a set of constants to tell which platform you're on: psutil.LINUX, psutil.WINDOWS, etc.

    Other fixes

    The complete list of changes is available in the changelog.

    Porting code

    Since 4.0.0 is a major version, I took the chance to (lightly) change / break some APIs.

    • Process.memory_info() no longer returns just an (rss, vms) namedtuple. It returns a variable-length namedtuple that varies by platform (rss and vms are always present, even on Windows). Essentially the same result as the old Process.memory_info_ex(). This shouldn't break your code unless you were doing rss, vms = p.memory_info().
    • Process.memory_info_ex() is deprecated. It still works as an alias for Process.memory_info(), issuing a DeprecationWarning.
    • psutil.disk_io_counters() on NetBSD and OpenBSD no longer returns write_count and read_count because the kernel doesn't provide them (we were returning the busy time instead). Should be a small issue given NetBSD and OpenBSD support is very recent.

    Discussion

Social

Feed