Skip to content
Home » All Posts » Unlock Your Arsenals: GDB Debugging Essentials with PostgreSQL

Unlock Your Arsenals: GDB Debugging Essentials with PostgreSQL

postgresql gdb debugging essentials

As a seasoned C/C++ developer, I’ve always considered GDB (GNU Debugger) as my “best friend” in software development due to its indispensable role in assisting the development and debugging process. With this powerful tool, a developer can:

  1. Find and fix issues in a C/C++ program, such as segmentation faults and logical errors.
  2. Trace and understand the execution flow, variables, memory contents, signals, and system calls of your C/C++ program.

Much of my time spent with GDB revolves around the second point, to reveal how a complex software works before I can confidently add or enhance features to this software. In fact, GDB debugging has helped me significantly in understanding the internals and architecture of PostgreSQL database and kicked started my PostgreSQL journey.

In this blog, I will use PostgreSQL 16 as a practical example to demonstrate some of the powerful capabilities of GDB on a Ubuntu 18.04 command line environment. I will also demonstrate how I utilize GDB debugging to learn about PostgreSQL’s internal execution logics.

Enable Debug and Disable Optimization

The software needs to be compiled in debug mode to include debug symbols that GDB can understand. I would also recommend turning off compiler optimization to ensure that GDB debugging can trace the execution flow and print variables correctly. With optimization on, some variables may be “optimized out” while some code blocks may be skipped in execution. This causes confusing results with GDB.

On PostgreSQL 16, you can enable debug, disable optimization and build debug version software with any of the commands below:

With Traditional ./configure script and Makefile:

CFLAGS=-O0 ./configure --prefix=$PWD/mypg --enable-debug
make
make install

With meson build system (if you have it setup).

meson setup build --prefix=$PWD/mypg -Dbuildtype=debug -Doptimization=0
ninja
ninja install

Attach GDB to a Running Program

PostgreSQL is a multi-process software that is normally started by pg_ctl front end application. This application spawns postmaster, which in turn starts other processes, each having its own roles. In this blog, we will attach GDB to a backend process that is responsible for serving a psql client’s SQL queries.

# initialize a new database called debugdb
mypg/bin/initdb -D debugdb
# start debugdb
mypg/bin/pg_ctl -D debugdb -l logfile start
# connect to debugdb with system user and use default database called postgres
mypg/bin/psql -d postgres

On a separate terminal, we can examine all of the running PostgreSQL processes with ps -ef command.

ps -ef | grep postgres
caryh    3269353       1  0 18:13 ?        00:00:00 /home/caryh/postgres/mydb/bin/postgres -D debugdb
caryh    3269354 3269353  0 18:13 ?        00:00:00 postgres: checkpointer
caryh    3269355 3269353  0 18:13 ?        00:00:00 postgres: background writer
caryh    3269357 3269353  0 18:13 ?        00:00:00 postgres: walwriter
caryh    3269358 3269353  0 18:13 ?        00:00:00 postgres: autovacuum launcher
caryh    3269359 3269353  0 18:13 ?        00:00:00 postgres: logical replication launcher
caryh    3271568 3176393  0 18:15 pts/0    00:00:00 psql -d postgres -p 5432
caryh    3271569 3269353  0 18:15 ?        00:00:00 postgres: caryh postgres [local] idle      # this is the backend process serving sql
caryh    3271762 3238353  0 18:15 pts/1    00:00:00 grep --color=auto postgres

The PostgreSQL backend process responsible for serving psql client has description of postgres: caryh postgres [local] idle with process ID of 3271569. GDB needs this ID to attach to the process. During attach, GDB loads debug symbols from the backend process and all of the third party dynamic library that it uses. Some have debug symbols while some don’t. If you trace to a third party shared library without debug symbol, it just means GDB is not able to interpret the memory information or function back trace for you. This is okay, because we will be primarily tracing within PostgreSQL process anyway.

The process will suspend as soon as it is attached to GDB and awaits for further GDB commands.

# start GDB here
sudo gdb highgo/bin/postgres
(gdb) attach 3271569

# GDB loads debug symbols while attaching
Attaching to process 3271569
Reading symbols from /home/caryh/postgres/mydb/bin/postgres...done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.27.so...done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.27.so...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.27.so...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicuuc.so.60...(no debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicui18n.so.60...(no debugging symbols found)...done.
...
...
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/1f/06001733b9be9478b105faf0dac6bdf36c85de.debug...done.
[Thread debugging using libthread_db enabled]
Reading symbols from /usr/lib/x86_64-linux-gnu/libffi.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.27.so...done.

# program suspends here upon GDB attach
0x00007fa18ed71907 in epoll_wait (epfd=7, events=0x556537600ad0, maxevents=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.

# GDB awaits your command
(gdb)

Start a Program with GDB Debugging

It is also possible to start a program with GDB. This is mostly used on non-continuous running program, or single-process program. For example, pg_waldump is a front end tool provided by PostgreSQL to dump WAL into text and quit. It is ideal to use GDB to start pg_waldump instead of attach:

# use GDB to start pg_waldump program with arguments
sudo gdb --args mypg/bin/pg_waldump debugdb/pg_wal/000000010000000000000001
Reading symbols from highgo/bin/pg_waldump...done.
# we normally would specify some break points before "run". (See next section)
(gdb) b main
# use run to start pg_waldump
(gdb) run

Setup GDB For Tracing

Break Point

Break point refers to a location in program’s source code where GDB suspends the execution. It can be set using the b command and is normally set at:

  • before a suspicious function or code starts (if you are troubleshooting a bug).
  • set right at a particular feature to back or forward trace (if you are tracing a large software and learning how it executes)

For example, to learn how PostgreSQL handles INSERT operation, I would set my break point at heap_insert() function. There are several ways to set a break point; below is the most common ones I use:

Sets a break point at heap_insert() function:

(gdb) b heap_insert

suspends program at line 1836 of source file heapam.c:

(gdb) b heapam.c:1836

suspends program at heap_insert() function conditionally:

(gdb) b heap_insert if IsCatalogRelation(relation) != 1

Once a break point is set, we can tell GDB to continue program execution using the c command. If you need to set more break points, or execute other GDB commands, you can use Ctrl-C to interrupt GDB and get the GDB prompt again.

# continue the program
(gdb) c 
Continuing.
# get the GDB prompt again - It will suspend the program at current execution location
^C
Program received signal SIGINT, Interrupt.
0x00007fa18ed71907 in epoll_wait (epfd=7, events=0x556537600ad0, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      in ../sysdeps/unix/sysv/linux/epoll_wait.c
(gdb)

Back Trace

We can trigger a break point created previously by a simple insert command like INSERT INTO mytable VALUES (1, 'test'). Once hit, we can get the back trace with the bt command.

Breakpoint 1, heap_insert (relation=0x7f11005e7280, tup=0x562602e1d878, cid=0, options=0, bistate=0x0) at heapam.c:1828
(gdb) bt
#0  heap_insert (relation=0x7f11005e7280, tup=0x562602e1d878, cid=0, options=0, bistate=0x0) at heapam.c:1828
#1  0x0000562600ba3eb7 in heapam_tuple_insert (relation=0x7f11005e7280, slot=0x562602e1d770, cid=0, options=0, bistate=0x0) at heapam_handler.c:252
#2  0x0000562600e20c5e in table_tuple_insert (rel=0x7f11005e7280, slot=0x562602e1d770, cid=0, options=0, bistate=0x0) at ../../../src/include/access/tableam.h:1400
#3  0x0000562600e22976 in ExecInsert (context=0x7ffdca3e51f0, resultRelInfo=0x562602e1cc40, slot=0x562602e1d770, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1133
#4  0x0000562600e268fe in ExecModifyTable (pstate=0x562602e1ca38) at nodeModifyTable.c:3790
#5  0x0000562600deb512 in ExecProcNodeFirst (node=0x562602e1ca38) at execProcnode.c:464
#6  0x0000562600ddfb64 in ExecProcNode (node=0x562602e1ca38) at ../../../src/include/executor/executor.h:273
#7  0x0000562600de2327 in ExecutePlan (estate=0x562602e1c7f0, planstate=0x562602e1ca38, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x562602e01230, execute_once=true)
    at execMain.c:1670
#8  0x0000562600de00d6 in standard_ExecutorRun (queryDesc=0x562602e318c0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#9  0x0000562600ddff6d in ExecutorRun (queryDesc=0x562602e318c0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#10 0x000056260105d7a7 in ProcessQuery (plan=0x562602e010e0, sourceText=0x562602d39220 "insert into mytable values(1, 'fff');", params=0x0, queryEnv=0x0, dest=0x562602e01230, qc=0x7ffdca3e5650) at pquery.c:160
#11 0x000056260105f0c3 in PortalRunMulti (portal=0x562602daeec0, isTopLevel=true, setHoldSnapshot=false, dest=0x562602e01230, altdest=0x562602e01230, qc=0x7ffdca3e5650) at pquery.c:1277
#12 0x000056260105e68e in PortalRun (portal=0x562602daeec0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x562602e01230, altdest=0x562602e01230, qc=0x7ffdca3e5650) at pquery.c:791
#13 0x0000562601057d78 in exec_simple_query (query_string=0x562602d39220 "insert into mytable values(1, 'fff');") at postgres.c:1274
#14 0x000056260105c789 in PostgresMain (dbname=0x562602d33c50 "postgres", username=0x562602d6ee50 "caryh") at postgres.c:4637
#15 0x0000562600f9a479 in BackendRun (port=0x562602d624b0) at postmaster.c:4464
#16 0x0000562600f99d69 in BackendStartup (port=0x562602d624b0) at postmaster.c:4192
#17 0x0000562600f9655e in ServerLoop () at postmaster.c:1782
#18 0x0000562600f95e59 in PostmasterMain (argc=3, argv=0x562602d31ba0) at postmaster.c:1466
#19 0x0000562600e5ff1f in main (argc=3, argv=0x562602d31ba0) at main.c:198

Back trace is super important here because it literally tells you how the program got to your break point (from which function, which source file and line number) since the program started.

#0 is the break point we have set. #19 is the first frame of the entire call stack. With the back trace, we have a direction to learn about how PostgreSQL program works because we know exactly where to study.

Frame

With the program suspended, we are able to examine local variables within the current function (at #0) or examine variables at global scope. However, we are not able to examine local variable in other stacks. To do so, we need to change frame using the f command. For example, to print local variable cmdtagname declared inside function exec_simple_query (#13), we can:

(gdb) f 13
#13 0x000055a848310d78 in exec_simple_query (query_string=0x55a849a47380 "insert into mytable values(1, 'fff');") at postgres.c:1274
1274                    (void) PortalRun(portal,
(gdb) p cmdtagname
$1 = 0x55a848679294 "INSERT"
(gdb)

Please note that while in frame 13, we are not able to use next or step into commands to make program execute more from frame 13 because it has already been executed. Frame command simply allows you take a peek at each point in the stack trace.

Manage Break Points

To manage all break points created so far, you can use info b command:

W(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000055a847e4e9f2 in heap_insert at heapam.c:1828
        breakpoint already hit 1 time
2       breakpoint     keep y   0x000055a847e4ea14 in heap_insert at heapam.c:1836
3       breakpoint     keep y   0x000055a847e4e9f2 in heap_insert at heapam.c:1828
        stop only if IsCatalogRelation(relation) != 1

We can disable, enable or delete break points

# disable break point 2 & 3
disable 2 3
# enable break point 3
enable 3
# delete break point 2
delete 2

Observe a Break Point

Print Variable or Structure

You can print any variables or structures within the scope of current frame, including local and global ones. For example, the function heap_insert has a relation structure passed in. We can use the print (p) command to examine this structure. You may use set print pretty command to have the structure printed in a nicer form. Please note that since the structure is passed in as a pointer, you have to use a dereferencing operator * in front of it to view its contents, otherwise GDB will print just the memory address of this pointer.

You can also use -> or . operator to access another member structure within that structure.

Continuing.
Breakpoint 1, heap_insert (relation=0x7f3c1d6ca280, tup=0x5653ffa489d8, cid=0, options=0, bistate=0x0)
    at heapam.c:1828
1828    {
# print relation pointer address
(gdb) p relation
$1 = (Relation) 0x7f3c1d6ca280
# dereference relation pointer
(gdb) p *relation
$2 = {rd_locator = {spcOid = 1663, dbOid = 5, relNumber = 16388}, rd_smgr = 0x0, rd_refcnt = 1, rd_backend = -1,
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = false, rd_statvalid = false,
  rd_createSubid = 0, rd_newRelfilelocatorSubid = 0, rd_firstRelfilelocatorSubid = 0, rd_droppedSubid = 0,
  rd_rel = 0x7f3c1d6ca488, rd_att = 0x7f3c1d6ca590, rd_id = 16388, rd_lockInfo = {lockRelId = {relId = 16388,
      dbId = 5}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0,
  rd_fkeyvalid = false, rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0,
  rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0, rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0,
  rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x0, rd_pkindex = 0, rd_replidindex = 0,
  rd_statlist = 0x0, rd_attrsvalid = false, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0,
  rd_hotblockingattr = 0x0, rd_summarizedattr = 0x0, rd_pubdesc = 0x0, rd_options = 0x0, rd_amhandler = 3,
  rd_tableam = 0x5653ff75aaa0 <heapam_methods>, rd_index = 0x0, rd_indextuple = 0x0, rd_indexcxt = 0x0,
  rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0,
  rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0,
  rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0, rd_fdwroutine = 0x0,
  rd_toastoid = 0, pgstat_enabled = true, pgstat_info = 0x0}
# dereference particular member within relation
(gdb) set print pretty
(gdb) p *relation->rd_rel
$3 = {
  oid = 16388,
  relname = {
    data = "mytable", '\000' <repeats 56 times>
  },
  relnamespace = 2200,
  reltype = 16390,
  reloftype = 0,
  relowner = 10,
  relam = 2,
...

Examine Memory Block

In addition to print, we can also examine a block of memory using examine (x) command. For example, we can first print the contents of HeapTuple structure (which represents a row of data) to learn about its size, and then use x command to examine its data field using this size. Examine is better than print here because HeapTuple contains variable data array field, and actual user data actually starts at the end of structure definition.

The following example shows that with examine command, I can easily see the string content in the insert command (insert into mytable values(1, 'fff')). The 0x66 0x66 0x66 corresponds to input data of ‘fff’.

# learn the size of HeapTuple (which is 32)
(gdb) p *tup
$4 = {
  t_len = 32,
  t_self = {
    ip_blkid = {
      bi_hi = 65535,
      bi_lo = 65535
    },
    ip_posid = 0
  },
  t_tableOid = 16388,
  t_data = 0x5653ffa489f0
}
# examine the memory contents of tup->t_data
(gdb) x/32bx tup->t_data
0x5653ffa489f0: 0x80    0x00    0x00    0x00    0xff    0xff    0xff    0xff
0x5653ffa489f8: 0xc9    0x08    0x00    0x00    0xff    0xff    0xff    0xff
0x5653ffa48a00: 0x00    0x00    0x02    0x00    0x02    0x00    0x18    0x00
0x5653ffa48a08: 0x01    0x00    0x00    0x00    0x09    0x66    0x66    0x66

Print Source Code

You can use list (l) command to list the source code around current frame. This command seems handy at first but I rarely use this command to view the source code. It is better to view the entire source code in your favourite IDE (For example, Eclipse or VSCode) with proper syntax highlights.

List around current frame:

(gdb) l
1823     * reflected into *tup.
1824     */
1825    void
1826    heap_insert(Relation relation, HeapTuple tup, CommandId cid,
1827                            int options, BulkInsertState bistate)
1828    {
1829            TransactionId xid = GetCurrentTransactionId();
1830            HeapTuple       heaptup;
1831            Buffer          buffer;
1832            Buffer          vmbuffer = InvalidBuffer;

list around line 2000:

(gdb) list 2000
1995            pgstat_count_heap_insert(relation, 1);
1996
1997            /*
1998             * If heaptup is a private copy, release it.  Don't forget to copy t_self
1999             * back to the caller's image, too.
2000             */
2001            if (heaptup != tup)
2002            {
2003                    tup->t_self = heaptup->t_self;
2004                    heap_freetuple(heaptup);

Execution Control

Next and Step

Next (n) and Step (s) commands are 2 of the most commonly used commands to control the execution of the program and are extremely handy to debug and trace a program. You can also use the print (p) command to observe variable values as you advance the program.

Breakpoint 1, heap_insert (relation=0x7f3c1d6a5c60, tup=0x5653ffa4a4a8, cid=0, options=0, bistate=0x0) at heapam.c:1828
1828	{
# use the "n" command to execute one line of code. 
(gdb) n
1829		TransactionId xid = GetCurrentTransactionId();
# Continue hitting "enter" key to run the same command as last
(gdb)      
1832		Buffer		vmbuffer = InvalidBuffer;
(gdb) 
1833		bool		all_visible_cleared = false;
(gdb) 
1845		heaptup = heap_prepare_insert(relation, tup, xid, cid, options);
# when a function is reached, use the "s" command to step into the function
(gdb) s     
heap_prepare_insert (relation=0x7f3c1d6a5c60, tup=0x5653ffa4a4a8, xid=751, cid=0, options=0) at heapam.c:2024
2024		if (IsParallelWorker())
# then use the "n" command to execute within this function
(gdb) n    
2029		tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
(gdb) 
2030		tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
(gdb) 
2031		tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
# if the "n" command is run at a function, GDB automatically executes the entire function
(gdb)      
2032		HeapTupleHeaderSetXmin(tup->t_data, xid);
(gdb) 
2033		if (options & HEAP_INSERT_FROZEN)
(gdb) 
2036		HeapTupleHeaderSetCmin(tup->t_data, cid);
(gdb) 
2037		HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
(gdb) 
2038		tup->t_tableOid = RelationGetRelid(relation);
(gdb) 
2044		if (relation->rd_rel->relkind != RELKIND_RELATION &&
(gdb) 
2051		else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
(gdb) 
2054			return tup;
(gdb) 
2055	}
(gdb) 
# once the stepped function returns, GDB takes you to the next line back at the caller
heap_insert (relation=0x7f3c1d6a5c60, tup=0x5653ffa4a4a8, cid=0, options=0, bistate=0x0) at heapam.c:1851
1851		buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
(gdb) 
1871		CheckForSerializableConflictIn(relation, NULL, InvalidBlockNumber);
(gdb) 
1874		START_CRIT_SECTION();
(gdb)

Continue Program

Instead of using n or s commands to execute the program one line at a time, you can use the continue (c) command to continue the execution until the next break point is hit or until the program is interrupted by a signal (for example, you pressing Ctrl-C) or when the program exits.

# use the "c" command to continue the execution of program
(gdb) c    
Continuing.

Follow a Fork

It is possible that the current process will spawn a new process to handle some tasks using the fork() command. In this case, GDB will ignore the child process and continue the execution on the parent process. What if we would like GDB to trace into child process instead of the parent. You can use set follow-fork-mode child command to achieve it.

To demonstrate this on PostgreSQL, we need to attach GDB to the postmaster process, who is responsible for spawning new processes for various purposes. We will pre-populate a table with 1 million records and run SELECT COUNT(*) on it; this will by default trigger the postmaster to spawn parallel workers to help execute this query faster.

ps -ef | grep postgres
caryh    3269353       1  0 18:13 ?        00:00:00 /home/caryh/postgres/mydb/bin/postgres -D debugdb   # this is postmaster process
caryh    3269354 3269353  0 18:13 ?        00:00:00 postgres: checkpointer
caryh    3269355 3269353  0 18:13 ?        00:00:00 postgres: background writer
caryh    3269357 3269353  0 18:13 ?        00:00:00 postgres: walwriter
caryh    3269358 3269353  0 18:13 ?        00:00:00 postgres: autovacuum launcher
caryh    3269359 3269353  0 18:13 ?        00:00:00 postgres: logical replication launcher
caryh    3271568 3176393  0 18:15 pts/0    00:00:00 psql -d postgres -p 5432
caryh    3271569 3269353  0 18:15 ?        00:00:00 postgres: caryh postgres [local] idle      # this is the backend process serving sql
caryh    3271762 3238353  0 18:15 pts/1    00:00:00 grep --color=auto postgres

So, let’s populate these records on the existing psql prompt. If your GDB debugging session is still attached to the backend process, you may exit the GDB with the quit command so it does not continuously trigger a break point on heap_insert. Once inserted, we can use explain analyze to make sure that PostgreSQL will spawn parallel workers to execute the SELECT COUNT(*) query.

postgres=# insert into mytable values(generate_series(1,1000000), 'fff');
INSERT 0 1000000
postgres=# explain analyze select count(*) from mytable;
                                        QUERY PLAN                                                                 
------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=8352.17..8352.18 rows=1 width=8) (actual time=4083.773..4084.069 rows=1 loops=1)
   ->  Gather  (cost=8351.95..8352.16 rows=2 width=8) (actual time=4083.737..4084.045 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=7351.95..7351.96 rows=1 width=8) (actual time=1470.671..1470.673 rows=1 loops=3)
               ->  Parallel Seq Scan on mytable  (cost=0.00..6766.56 rows=234156 width=0) (actual time=0.064..1433.957 rows=333336 loops=3)
 Planning Time: 0.583 ms
 Execution Time: 4084.143 ms
(8 rows)

Now, we can start another GDB session and attach it to PID 3269353, and execute the following actions in order:

  • (gdb) set break point at fork_process
  • (gdb) continue
  • (psql) run SELECT COUNT(*) FROM mutable;
  • (gdb) continue again if interrupted by receipt of a signal (more later)
  • (gdb) hit fork_process break point.
  • (gdb) set follow-fork-mode child
  • (gdb) set break point at ParallelWorkerMain
  • (gdb) continue
(gdb) attach 3269353
Attaching to program: /home/caryh/highgo/git/postgres.community/highgo/bin/postgres, process 1768640
Reading symbols from /usr/lib/x86_64-linux-gnu/libssl.so.1.1...(no debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
...
0x00007f3c28d0d907 in epoll_wait (epfd=10, events=0x5653ff965910, maxevents=3, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30	../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
# set break point at fork_process
(gdb) b fork_process
Breakpoint 1 at 0x5653fef9c308: file fork_process.c, line 33.
(gdb) c
Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x00007f3c28d0d907 in epoll_wait (epfd=10, events=0x5653ff965910, maxevents=3, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30	in ../sysdeps/unix/sysv/linux/epoll_wait.c
(gdb) c
Continuing.

Breakpoint 1, fork_process () at fork_process.c:33
33	{
# set follow-fork-mode to child 
(gdb) set follow-fork-mode child
(gdb) b ParallelWorkerMain
Breakpoint 2 at 0x5653fec06128: file parallel.c, line 1263.
(gdb) c
Continuing.

# a new process with PID 1277703 is spawned and GDB is now attached to it
[New process 3279353]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Switching to Thread 0x7f3c2ad7b240 (LWP 3279353)]

# GDB is now at ParallelWorkerMain on the child process fork
Thread 2.1 "postgres" hit Breakpoint 2, ParallelWorkerMain (main_arg=385426468) at parallel.c:1263
1263	{
(gdb) n
1291		InitializingParallelWorker = true;
(gdb) l
1286		char	   *session_dsm_handle_space;
1287		Snapshot	tsnapshot;
1288		Snapshot	asnapshot;
1289	
1290		/* Set flag to indicate that we're initializing a parallel worker. */
1291		InitializingParallelWorker = true;
1292	
1293		/* Establish signal handlers. */
1294		pqsignal(SIGTERM, die);
1295		BackgroundWorkerUnblockSignals();

Following the above command procedure, GDB now attaches to a child process with PID 3279353 and we can continue to trace the behavior of parallel worker here. We can also observe this worker from the ps -ef command.

ps -ef | grep postgres
caryh    3269353       1  0 18:13 ?        00:00:00 /home/caryh/postgres/mydb/bin/postgres -D debugdb   # this is postmaster process
caryh    3269354 3269353  0 18:13 ?        00:00:00 postgres: checkpointer
caryh    3269355 3269353  0 18:13 ?        00:00:00 postgres: background writer
caryh    3269357 3269353  0 18:13 ?        00:00:00 postgres: walwriter
caryh    3269358 3269353  0 18:13 ?        00:00:00 postgres: autovacuum launcher
caryh    3269359 3269353  0 18:13 ?        00:00:00 postgres: logical replication launcher
caryh    3271568 3176393  0 18:15 pts/0    00:00:00 psql -d postgres -p 5432
caryh    3271569 3269353  0 18:15 ?        00:00:00 postgres: caryh postgres [local] idle      # this is the backend process serving sql
caryh    3271762 3238353  0 18:15 pts/1    00:00:00 grep --color=auto postgres
caryh    3279353 3269353  0 11:18 ?        00:00:00 postgres: parallel worker for PID 3271569  # this is the parallel worker 

POSIX Signals

PostgreSQL relies on POSIX signals to interrupt other process into performing some on-demand tasks. For example, the psql process may send a SIGUSR1 signal to postmaster to spawn a parallel worker to help process user’s SELECT COUNT(*) request in parallel. By default signals like this can interrupt GDB and allows you to trace into program’s signal handler code.

It is possible to configure GDB to stop or to ignore certain signals based on your requirement. To see a list of signal types and how GDB handles them, you can use info signals command where:

  • stop: means if GDB should suspend when it receives a signal
  • print: means if GDB should print a message about the receipt of a signal regardless
  • pass to program: means if GDB should forward this signal to the program

Please note that by default, SIGINT (Ctrl-C) and SIGTRAP (for break points) are not forwarded to the program once GDB receives them. This is because GDB uses them too. In other words, If you configure GDB to ignore SIGINT and SIGTRAP, this means that GDB can no longer suspend the program at a break point or when you input Ctrl-C. GDB will give you a warning if you do.

(gdb) info signals
Signal        Stop	Print	Pass to program	Description
SIGHUP        Yes	Yes	Yes		Hangup
SIGINT        Yes	Yes	No		Interrupt
SIGQUIT       Yes	Yes	Yes		Quit
SIGILL        Yes	Yes	Yes		Illegal instruction
SIGTRAP       Yes	Yes	No		Trace/breakpoint trap
SIGABRT       Yes	Yes	Yes		Aborted
SIGEMT        Yes	Yes	Yes		Emulation trap
SIGFPE        Yes	Yes	Yes		Arithmetic exception
SIGKILL       Yes	Yes	Yes		Killed
SIGBUS        Yes	Yes	Yes		Bus error
SIGSEGV       Yes	Yes	Yes		Segmentation fault
SIGSYS        Yes	Yes	Yes		Bad system call
SIGPIPE       Yes	Yes	Yes		Broken pipe
SIGALRM       No	No	Yes		Alarm clock
SIGTERM       Yes	Yes	Yes		Terminated
SIGURG        No	No	Yes		Urgent I/O condition
SIGSTOP       Yes	Yes	Yes		Stopped (signal)
SIGTSTP       Yes	Yes	Yes		Stopped (user)
SIGCONT       Yes	Yes	Yes		Continued
SIGCHLD       No	No	Yes		Child status changed
...
(gdb)handle SIGINT nostop noprint nopass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) n
Not confirmed, unchanged.

For all other signals, you are free to configure the behaviour when GDB receives them.

(gdb) handle SIGPIPE nostop noprint pass
Signal        Stop	Print	Pass to program	Description
SIGPIPE       No	No	Yes		Broken pipe
(gdb) handle SIGUSR1 nostop noprint pass
Signal        Stop	Print	Pass to program	Description
SIGUSR1       No	No	Yes		User defined signal 1
(gdb) handle SIGUSR1 stop print nopass
Signal        Stop	Print	Pass to program	Description
SIGUSR1       Yes	Yes	No		User defined signal 1

GDB Debugging Summary

GDB is a powerful tool with a lot of powerful features. This blog summarizes some of the most useful commands that you would use to debug or trace a program with GDB debugging, Refer to the table below for all the GDB commands that we have mentioned today:

GDB CommandsDescription
list (l)print source code around current location
backtrace (bt)print back trace to the current break point / location
next (n)execute one line of code
step (s)step into a function
frame (f)change current frame to one of the frames printed by bt
continue (c)continue program execution until next break point, or when a signal is received or when program quits
breakpoint (b)set a break point at specified source code location or function name with or without a condition
info bshow information about all created break points
delete xdelete a break point where x is the ID of a break point returned by info b
print (p)print value or address of a variable or reference
examine (x)print a block of memory and output in user-specified format
run (r)start the program with GDB debugging
set follow-fork-mode xtell GDB which process to follow when current process forks. X can be parent, child or ask. The default is parent.
info signalsshow information about how GDB handles all kinds of signals
handle w x y zconfigure how GDB handles a signal where
w = name of signal returned by info signals
x = stop or nostop
y = print or noprint
z = pass or nopass
quitquits GDB

Reference

Join the conversation

Your email address will not be published. Required fields are marked *