Skip to content
Home » All Posts » Explore Practical PostgreSQL Substring Use Cases with Examples

Explore Practical PostgreSQL Substring Use Cases with Examples

postgresql substring

If you work with strings a lot, you most likely will use substring function quite frequently for your data manipulation tasks. PostgreSQL provides a very versatile substring function that can handle a lot of string manipulation scenarios. In this blog, we’ll dive into real examples to show you how to make the most of substrings in PostgreSQL 16.

PostgreSQL Substr With Index and Offset Values

Index (one-based) and offset are the simplest ways to extract a substring. Useful for strings with fixed lengths. PostgreSQL provides several methods to specify index bounds.

Set Index Bounds as Arguments

Start index and offset values can be provided to substring function as argument 2 and 3 where offset is an optional argument. For example:

Returns the substring starting at index 1 to 4:

SELECT substring('techbuddies', 1, 4);
 substring
-----------
 tech
(1 row)

Returns the substring from index 5 to the end:

SELECT substring('techbuddies', 5);
 substring
-----------
 buddies
(1 row)

Set Index Bounds with FROM and FOR Clauses

The index bounds can also be set using FROM and FOR clauses; Both are optional of each other, meaning that you can specify FROM without FOR or FOR without FROM. Otherwise they work the same way as arguments. It is also possible to use LENGTH() operator to return the length of the input string, which is handy if you need to compute an index from the end of the string. For example:

Extracts the substring starting at index 1 to 4.

SELECT substring('techbuddies' FROM 1 FOR 4);
 substring
-----------
 tech
(1 row)

Returns the first 5 characters as substring:

SELECT substring('techbuddies.io' FOR 5);
 substring
-----------
 techb
(1 row)

Extracts the last 7 characters from the string and grabs the first 4 as substring

SELECT substring('techbuddies.io' FROM LENGTH('techbuddies.io') - 6 for 4);
 substring
-----------
 dies
(1 row)

PostgreSQL Substr With Regular Expressions

PostgreSQL supports substring extraction using POSIX regular expressions metacharacters. A metacharacter is a special character or character sequence that have specific meanings and functions to define patterns to match and manipulate text. The following list outlines the most commonly used metacharacters:

  • . (Dot): Matches any single character except a newline character. For example, the pattern b.t would match “bat,” “bit,” “bot,” etc.
  • * (Asterisk): Matches zero or more occurrences of the preceding character or subexpression. For example, ab*c would match “ac,” “abc,” “abbc,” etc.
  • + (Plus): Matches one or more occurrences of the preceding character or subexpression. For example, ab+c would match “abc,” “abbc,” “abbbc,” etc.
  • ? (Question Mark): Matches zero or one occurrence of the preceding character or subexpression. For example, ab?c would match “ac” and “abc.”
  • [] (Character Class): Defines a character class and matches any single character from the specified set. For example, [aeiou] would match any vowel character.
  • [^] (Negation): Defines a negated character class and matches any single character not in the specified set. For example, [^0-9] would match any non-digit character.
  • () (Grouping): Groups characters or subexpressions together and allows you to apply quantifiers to the entire group. For example, (ab)+ would match “ab,” “abab,” “ababab,” etc.
  • | (Alternation): Specifies alternation between two or more patterns. For example, cat|dog would match “cat” or “dog.”
  • ^ (Caret): Matches the start of a line or string. For example, ^abc would match “abc” at the start of a line.
  • $ (Dollar Sign): Matches the end of a line or string. For example, abc$ would match “abc” at the end of a line.

Match Words with One or More Dots

Extracts a substring starting with b plus 10 characters following it:

SELECT substring('techbuddies.io', 'b.........');
 substring
------------
 buddies.io
(1 row)

Matches a substring starting with t and ending with h inclusive with arbitrary number of characters in between:

SELECT substring('techbuddies.io', 't.*h');
 substring
-----------
 tech
(1 row)

Use * and + Operators to Match Word Repetition

Extracts a substring that matches techbud with zero or more repetition of d following after. Returns full string even if input string has only one d in techbuddies.io.

select substring('techbudies.io', 'techbudd*.*');
   substring
---------------
 techbudies.io
(1 row)

Returns a substring that matches techbud with one or more repetition of d following after. Returns NULL because input string has only one d in techbudies.io.

select substring('techbudies.io', 'techbudd+.*');
 substring
-----------

(1 row)

Set Match Constraints with $ and ^ Operators

Prints the full string if the string starts with tech

select substring('techbuddies.io', '^tech.*');
    substring
-----------------
 techbuddies.io
(1 row)

Returns io plus 8 chars before if the string ends with io.

select substring('techbuddies.io', '........io$');
 substring
------------
 buddies.io
(1 row)

Set Character Constraints with [abcd] and [^abcd] Operators

[abcd] and [^abcd] can match or un-match any character or range of characters (separated by a hyphen) within the square brackets. It also support digits.

Return substring starting at the first lowercase or uppercase letters and stops until the next non-letter character.

select substring('techbuddies.io', '[a-zA-Z]+');
  substring
-------------
 techbuddies
(1 row)

Return substring starting at the first non-letter character and stops until the next the next character.

postgres=# select substring('techbuddies.io', '[^a-zA-Z]+');
 substring
-----------
 .
(1 row)

Chain Multiple Match Conditions with | Operator

The | operator can be used to define a conditional match utilizing other operators discussed so far. Please note that the first or longest match of the string is always returned regardless the order of conditions separated by |.

Return substring if the string contains either tech or buddies

select substring('techbuddies.io', '(tech|buddies)');
 substring
-----------
 tech
(1 row)

Return first digit-character as substring or if the string matches one or more occurrence of bud or return substring starting at the first lowercase or uppercase letters and ends before a non-letter character is met.

select substring('techbuddies.io', '^[1-9]|bud+|tech|[a-z]+');
  substring
-------------
 techbuddies
(1 row)

Extract a Sub Expression with () Operator

The () brackets can be used to extract a sub-expression to fine-tune substring output. This is very handy in many substring scenarios.

Without brackets, the function returns the whole string:

select substring('techbuddies.io', '^tech.*dies');
  substring
-------------
 techbuddies
(1 row)

With brackets, the function returns the substring inside the brackets:

select substring('techbuddies.io', '^tech(.*)dies');
 substring
-----------
 bud
(1 row)

Other Examples

Extract the website name from a URL string:

select substring('https://techbuddies.io/home', '://(.*)/');
   substring
----------------
 techbuddies.io
(1 row)

Match a value after certain character:

select substring('a=techbuddies.io', '=(.*)buddies');
 substring
-----------
 tech
(1 row)

Obtain the value from a JSON key-value pair:

select substring('"key":"value"', '"key":"(.*)"');
 substring
-----------
 value
(1 row)

PostgreSQL Substr With Regular Expressions Class-Shorthand Escapes

PostgreSQL also supports substring extraction using class-shorthand escapes, which are very similar to common regular expressions described above but using special escape character sequence to represent. The tables below are taken directly from PostgreSQL documentation to showcase all supported escape character sequences.

EscapeDescription
\dmatches any digit, like [[:digit:]]
\smatches any whitespace character, like [[:space:]]
\wmatches any word character, like [[:word:]]
\Dmatches any non-digit, like [^[:digit:]]
\Smatches any non-whitespace character, like [^[:space:]]
\Wmatches any non-word character, like [^[:word:]]
Table 9.21. Regular Expression Class-Shorthand Escapes
EscapeDescription
\Amatches only at the beginning of the string (see Section 9.7.3.5 for how this differs from ^)
\mmatches only at the beginning of a word
\Mmatches only at the end of a word
\ymatches only at the beginning or end of a word
\Ymatches only at a point that is not the beginning or end of a word
\Zmatches only at the end of the string (see Section 9.7.3.5 for how this differs from $)
Table 9.22. Regular Expression Constraint Escapes

Match with Character Constraint Escapes

Extracts the first word character sequence and stops until the next non-word character (such as a space or a digit):

SELECT substring('welcome 123 techbuddies.io 456' FROM '(\w+)');
 substring
-----------
 welcome
(1 row)

Extracts the first digit character sequence and stops until the next non-digit character (such as a space or a word):

SELECT substring('welcome 123 techbuddies.io 456' FROM '(\d+)');
 substring
-----------
 123
(1 row)

Un-Match with Character Constraint Escapes

Returns the first non-digit character sequence as substring

SELECT substring('welcome 123 techbuddies.io 456' FROM '(\D+)');
 substring
-----------
 welcome
(1 row)

Returns the first non-word and non-space character sequence as substring.

SELECT substring('welcome 123 techbuddies.io 456' FROM '(\W\S+)');
 substring
-----------
  123
(1 row)

Summary

PostgreSQL substring is a simple but yet versatile function that can save you a lot of development efforts on string manipulations. I hope the examples used in this blog can provide you with the necessary information to simply your string manipulation efforts.

Reference

Tags:

8 Comment on this post

  1. Pingback: Cary Huang: Explore Practical PostgreSQL Substring Use Cases With Examples - RSSFeedsCloud

  2. You missed an important quantifier in your example : the brackets {} quantifier. The example that you gave of matching words with one or more dots:

    SELECT substring(‘techbuddies.io’, ‘b………’);

    would normally be more succinctly expressed as SELECT substring(‘techbuddies.io’, ‘b.{10}’);

    That is much less error-prone than stringing a bunch of dots together.

    Your next example states:

    Matches a substring starting with t and ending with h inclusive with exactly 2 characters in between:
    SELECT substring(‘techbuddies.io’, ‘t.*h’);

    That is just wrong. The asterick, as stated in the documentation you posted, matches zero or more characters. Perhaps you meant SELECT substring(‘techbuddies.io’, ‘t..h’);

    1. Thank you for pointing out. You are right, SELECT substring(‘techbuddies.io’, ‘b.{10}’); is a much better way to do the substring. Yes, this example “SELECT substring(‘techbuddies.io’, ‘t.*h’);” does not match the description. I changed the example but forgot to update the description :p

    2. I ran “SELECT substring(‘techbuddies.io’, ‘b.{10}’);” on PostgreSQL 16 but it returns empty instead. Am I missing something here?
      I also tried “SELECT substring(‘techbuddies.io’, strpos(‘techbuddies.io’, ‘b’), 10);” which returns the same thing, but the command is quite lengthy…

  3. I’m not convinced this is 100% correct:
    “Matches a substring starting with t and ending with h inclusive with exactly 2 characters in between:”
    SELECT substring(‘techbuddies.io’, ‘t.*h’);
    substring
    ———–
    tech
    (1 row)

    According to regex101.com, .* will match any character, 0 to unlimited times. See here for examples: https://regex101.com/r/FBTZLy/1

Join the conversation

Your email address will not be published. Required fields are marked *