Introduction: Why Python Memory Leak Detection Matters in Long‑Running Services
When I first started running Python daemons in production, I assumed the garbage collector would “just handle” memory for me. After a few weeks of slowly growing RSS on a critical API, followed by an out-of-memory restart in the middle of peak traffic, I learned the hard way that Python memory leak detection is not optional for long‑running services.
In Python, leaks often come from subtle reference cycles, unbounded caches, global lists that are never cleared, or background tasks that keep accumulating data. These issues may not show up in quick tests or short-lived scripts, but in a service that runs for days or weeks, even a tiny leak per request eventually snowballs into gigabytes of wasted RAM.
For backend services and daemons, this has three big consequences:
- Reliability: Leaks can cause latency spikes, crash loops, and unpredictable restarts that hurt SLAs and user experience.
- Scalability: A single leaky process may force you to run more instances or smaller workloads per host, masking the real problem instead of fixing it.
- Cost: Extra memory usage translates directly into higher cloud bills and more powerful (and expensive) machines than you actually need.
In my experience, the teams that treat Python memory leak detection as a routine part of their performance engineering end up with services that are easier to operate, cheaper to run, and far less prone to 3 a.m. incidents. The rest of this article focuses on practical, low-friction techniques you can apply to catch leaks early and keep your long‑running Python services healthy.
1. Use tracemalloc for Precise Python Memory Leak Detection
When I need precise Python memory leak detection without pulling in heavy external tools, I start with the standard library’s tracemalloc module. It lets me capture memory snapshots, compare them over time, and see exactly which lines of code are responsible for new allocations.
Enable tracemalloc and capture snapshots
In a long‑running service, I usually enable tracemalloc early in the process, then periodically capture snapshots around suspicious operations or on a timer. Here is a minimal pattern I’ve used in production-like environments:
import tracemalloc
import time
tracemalloc.start(25) # track up to 25 frames of traceback
def handle_request():
# Your normal request handling logic
data = [x for x in range(10000)]
return sum(data)
def debug_memory_growth():
snapshot1 = tracemalloc.take_snapshot()
for _ in range(1000):
handle_request()
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("Top 5 memory changes:")
for stat in top_stats[:5]:
print(stat)
if __name__ == "__main__":
while True:
debug_memory_growth()
time.sleep(60)
This kind of loop helped me confirm that memory was creeping up only when a particular handler ran, which narrowed down the leak to one module instead of the whole codebase.
Compare snapshots and interpret results
The real power of tracemalloc is in comparing snapshots. By grouping statistics by filename, line number, or traceback, I can see where memory growth is concentrated rather than guessing.
import tracemalloc
tracemalloc.start()
# Run warm-up workload here
snapshot_before = tracemalloc.take_snapshot()
# Run leak-suspect workload here
for _ in range(5000):
handle_request()
snapshot_after = tracemalloc.take_snapshot()
# Group by traceback to see call paths, not just single lines
top_stats = snapshot_after.compare_to(snapshot_before, 'traceback')
for stat in top_stats[:3]:
print("Leak candidate:")
for line in stat.traceback.format():
print(line)
print(f"Size diff: {stat.size_diff / 1024:.1f} KiB, Count diff: {stat.count_diff}")
print("-" * 40)
When I run this against a staging workload that mimics production traffic, I look for patterns: lines that keep allocating more objects across runs, or call paths that never show negative size_diff (which would indicate memory being freed). If the same traceback shows up at the top repeatedly, that is usually my leak hotspot.
One practical tip: tracemalloc.start(nframe) with a higher nframe gives more context (deeper tracebacks) at the cost of overhead. In my experience, a depth between 10 and 25 is a good balance for most services.
For more advanced workflows, I often dump snapshot statistics to logs or a file and visualize which modules grow over time with a small analysis script or external tool tracemalloc — Debugging and memory profiling — Python 3.11.4 documentation.
2. Profiling Long‑Running Services with objgraph and reference cycle analysis
While tracemalloc helps me see where memory grows, tools like objgraph help me understand what is growing and why it is not being collected. In my experience, many stubborn Python memory leaks in long‑running services come from reference cycles or object graphs that keep each other alive.
Spotting suspicious object growth with objgraph
The first thing I do with objgraph is compare object counts over time. This is especially useful in a staging service where I can trigger a realistic workload and then sample the heap.
import gc
import time
import objgraph
def log_top_growth():
gc.collect()
objgraph.show_growth(limit=10)
if __name__ == "__main__":
# Warm up the service
for _ in range(1000):
handle_request()
while True:
print("=== Object growth snapshot ===")
log_top_growth()
time.sleep(60)
When I ran this pattern on a leaky background worker, I saw certain custom classes (like TaskContext and JobResult) steadily increasing in count even when the queue was idle. That was the clue that those objects were stuck in memory, likely due to reference cycles or lingering global references.
Drilling into reference cycles and backreferences
Once I know which types are growing, I use objgraph to inspect backreferences and find what is holding them in memory. This is where I usually uncover cycles between closures, callbacks, and caches.
import objgraph
# Suppose we suspect LeakProneClass is leaking
from myapp.models import LeakProneClass
# Show why one instance is still alive
leaky_obj = next(objgraph.by_type("LeakProneClass"))
objgraph.show_backrefs(
[leaky_obj],
max_depth=5,
extra_ignore=[id(__builtins__)],
filename="leak_backrefs.png",
)
This generates a graph image showing how references lead back to globals, singletons, or long‑lived data structures. In one real case, I found that a log callback captured self in a closure, forming a cycle with an in‑memory registry. Breaking that cycle (by using weak references or decoupling the callback) immediately stabilized memory usage.
For deeper Python memory leak detection, I sometimes combine objgraph with manual gc.get_objects() inspection and periodic snapshots, then analyze which reference chains keep growing across intervals objgraph Documentation — objgraph 3.5.0 documentation.
3. Monitoring memory usage in production with psutil and metrics
Most of the nasty Python memory leaks I have dealt with only became obvious after many hours or days in production. That is why I always pair local Python memory leak detection tools with continuous memory monitoring using psutil and my metrics stack.
Exporting process memory metrics with psutil
With psutil, I can expose key memory stats like RSS, VMS, and percent of system memory from inside the service. Here is a simple pattern I have used to feed a Prometheus endpoint:
import psutil
import time
process = psutil.Process()
def collect_memory_stats():
mem = process.memory_info()
return {
"rss_bytes": mem.rss,
"vms_bytes": mem.vms,
"memory_percent": process.memory_percent(),
}
if __name__ == "__main__":
while True:
stats = collect_memory_stats()
# Export to your metrics system here
print(stats)
time.sleep(10)
In my services, I wire these values into existing metrics reporters (Prometheus, StatsD, or cloud-specific agents) so I can graph memory over time per instance.
Using metrics to detect slow leaks
Once memory metrics are in the monitoring system, the goal is to spot slow, monotonic growth. I typically:
- Plot RSS over days for each instance and look for steady upward trends that do not reset after low traffic periods.
- Correlate memory with request rate or job volume; leaks often show up as higher “memory per request” over time.
- Set alerts on slope (rate of change), not just absolute memory, to catch leaks early.
That combination of psutil-based metrics plus dashboards has saved me from multiple 3 a.m. outages by surfacing leaks while they were still small enough to investigate calmly Monitoring Memory Usage Best Practices – Prometheus.
4. Tuning Python’s garbage collector for fewer leaks and pauses
After I’ve used other Python memory leak detection techniques to narrow down a problem, I often end up looking at the garbage collector (GC). CPython’s cyclic GC can both hide leaks (by leaving objects in gc.garbage) and introduce latency spikes if it runs at the wrong time, so it is worth understanding and tuning.
Inspecting GC statistics and adjusting thresholds
The built-in gc module lets me see how often collections happen and how many objects are involved. In long‑running services, I sometimes log these stats periodically to understand GC behavior under load.
import gc
import time
def log_gc_stats():
stats = gc.get_stats()
thresholds = gc.get_threshold()
counts = gc.get_count()
print("GC stats:", stats)
print("Thresholds:", thresholds, "Counts:", counts)
if __name__ == "__main__":
gc.enable()
while True:
log_gc_stats()
time.sleep(60)
When I see frequent major collections with very few objects reclaimed, I take that as a sign to raise thresholds slightly to reduce pauses. Conversely, if memory drifts upward and collections are rare, I lower thresholds so cycles are found sooner. A small, measured change (for example, tweaking generation 0 and 1 thresholds) is usually safer than turning GC off entirely.
Dealing with __del__ methods and uncollectable cycles
One sneaky source of leaks is objects that define __del__ and participate in reference cycles. CPython cannot safely finalize these, so they end up in gc.garbage and stay in memory. I now routinely check for this whenever I chase a stubborn leak.
import gc
# Force a full collection across all generations
gc.collect()
if gc.garbage:
print("Uncollectable objects:", len(gc.garbage))
for obj in gc.garbage[:10]:
print(type(obj), getattr(obj, "__dict__", None))
In one service, this quick inspection showed several custom classes with __del__ methods tangled in cycles. Refactoring them to use context managers or explicit close() methods, and avoiding complex logic in __del__, immediately reduced long‑term memory growth and GC noise Python’s official documentation on object finalization (__del__) and garbage collection.
5. Isolating leaks with worker processes and restart strategies
Even with solid Python memory leak detection, I have worked on systems where a third‑party library or legacy code still leaked a bit. In those cases, architectural patterns like short‑lived workers and controlled restarts turned a hard problem into a manageable one.
Using worker processes to contain memory growth
The core idea is simple: keep the master process lean and move risky work into separate worker processes. If a worker leaks, you just replace it. This is the same model used by Gunicorn, Celery, and many job runners.
import multiprocessing as mp
import time
import psutil
MAX_RSS_MB = 500
def worker_main():
while True:
handle_request() # your real work here
def monitor_worker(proc):
p = psutil.Process(proc.pid)
while proc.is_alive():
if p.memory_info().rss > MAX_RSS_MB * 1024 * 1024:
proc.terminate()
proc.join()
break
time.sleep(5)
if __name__ == "__main__":
while True:
w = mp.Process(target=worker_main)
w.start()
monitor_worker(w)
When I first added a simple RSS limit like this around a leaky image-processing pipeline, the result was dramatic: instead of slowly exhausting the host, workers stayed within a predictable memory envelope by being restarted before they got too big.
Graceful restarts and pre‑fork models
To avoid dropping traffic, I prefer graceful restarts: start a fresh worker, wait until it is healthy, then drain and stop the old one. Pre‑fork models (a master that forks children up front) also help by sharing read‑only memory and reducing startup overhead.
In practice, I combine this with:
- Max requests per worker: Restart after N requests or jobs, even if memory is fine, to cap worst‑case leaks.
- Staggered restarts: Never restart all workers at once; roll them one by one.
- Health checks: Only route work to workers that have passed warm‑up and basic self‑tests.
This doesn’t replace proper debugging, but in my experience it provides a robust safety net when you are dealing with unavoidable or slow‑burn leaks in complex production systems Zero downtime deployments – Kubernetes.
Conclusion: Building a Python Memory Leak Detection Playbook
Over time I’ve learned that effective Python memory leak detection is less about a single magic tool and more about having a clear, repeatable playbook. For me, that starts with tracemalloc to pinpoint leaking code paths, then objgraph and GC inspection to understand which objects and cycles are actually stuck. From there, I lean on psutil-based metrics to watch memory in production and, when necessary, protect the system with worker isolation and restart strategies.
A practical workflow I keep in my notes looks like this: reproduce the leak in a staging or load-test environment, use tracemalloc snapshots to find hotspots, inspect object growth and reference cycles, check GC behavior (especially uncollectable objects and __del__), and finally validate fixes under real traffic with live memory dashboards. Turning that into a simple checklist has saved me hours during incidents, and I recommend every team running long‑lived Python services maintain their own leak‑debugging playbook and refine it after each incident.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





