• Home

  • Custom Ecommerce
  • Application Development
  • Database Consulting
  • Cloud Hosting
  • Systems Integration
  • Legacy Business Systems
  • Security & Compliance
  • GIS

  • Expertise

  • About Us
  • Our Team
  • Clients
  • Blog
  • Careers

  • VisionPort

  • Contact
  • Our Blog

    Ongoing observations by End Point Dev people

    Find Text in Any Column of a PostgreSQL Table

    Josh Tolley

    By Josh Tolley
    February 6, 2023

    Three tiers of power lines lead from the left side of the image down towards a power pole directly in the bottom center. The pole sits just higher than trees lining the bottom of the image, haloed by a circular cloud, lit orange by the sunset, spreading to the edges of the image to fill it.

    It’s not uncommon for me to want to find a particular text snippet in a PostgreSQL database. Easy enough, you might say. After all, that’s what databases are for: You feed them a bunch of information, ask them questions in the form of a query, and they give you the answer. So just write a query, right?

    Well, maybe.

    SQL stands for “Structured Query Language”, and the fact that it’s “structured” means not only that the database abides by some defined structure, but that your queries do, too, which implies that you know at the time you’re writing the query where in the structure you want to look. And that’s where the problem arises. What if I know “Kilroy” is somewhere in a table, but I don’t know what column to look in to find him? How do I write that query?

    The first answer I came up with to that question depends on pg_dump: dump the contents of a table, search the results with grep, and there you have it.

    $ pg_dump -t person mydb | grep -i kilroy
    633132  F       NH      \N      Cristen212      J    Kilroy44        1983-09-28 00:00:00     \N      t       \N      \N      \N      \N      F       USA  \N       \N      \N      \N      \N      \N      \N      \N
    

    Here I knew I wanted to look in the person table, so I specified that in the call to pg_dump, but this works as well when I want to search the entire database:

    $ pg_dump  mydb | grep -i kilroy
    716565  public  person  18185   {"session_user": "josh"}        {"id": 633132}  {"last_name": "Feeney44"}       {"last_name": "Kilroy44"}       U       123349  2023-02-03 11:19:40.587883-062023-02-03 11:19:40.587883-06    2023-02-03 11:19:40.592103-06   josh    \N      \N
    633132  F       NH      \N      Cristen212      J    Kilroy44        1983-09-28 00:00:00     \N      t       \N      \N      \N      \N      F       USA  \N       \N      \N      \N      \N      \N      \N      \N
    

    I can switch pg_dump to --inserts mode and thus see the table these entries were found in:

    $ pg_dump --inserts mydb | grep -i kilroy
    INSERT INTO audit.modification_log VALUES (716565, 'public', 'person', 18185, '{"session_user": "josh"}', '{"id": 633132}', '{"last_name": "Feeney44"}', '{"last_name": "Kilroy44"}', 'U', 123349, '2023-02-03 11:19:40.587883-06', '2023-02-03 11:19:40.587883-06', '2023-02-03 11:19:40.592103-06', 'josh', NULL, NULL);
    INSERT INTO public.person VALUES (633132, 'F', 'NH', NULL, 'Cristen212', 'J', 'Kilroy44', '1983-09-28 00:00:00', NULL, true, NULL, NULL, NULL, NULL, 'F', 'USA', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);
    

    But pg_dump is considerably slower with --inserts turned on (about 200% slower, in my rudimentary testing on this one database), so perhaps I can get fancier with grep and achieve the same thing:

    $ pg_dump mydb | grep -ie kilroy -e '^COPY' | grep -B1 -i kilroy
    COPY audit.modification_log (id, table_schema, table_name, table_relid, app_user_info, id_columns, old_data, new_data, operation, transaction_id, ts_transaction, ts_statement, ts_clock, session_user_name, client_addr, query_text) FROM stdin;
    716565  public  person  18185   {"session_user": "josh"}        {"id": 633132}  {"last_name": "Feeney44"}       {"last_name": "Kilroy44"}       U       123349  2023-02-03 11:19:40.587883-062023-02-03 11:19:40.587883-06    2023-02-03 11:19:40.592103-06   josh    \N      \N
    --
    COPY public.person (id, birth_gender, ethnicity, language, first_name, middle_name, last_name, birth_date, date_of_death, live, days_old_no_birthday, from_pregnant_id, fatal_condition_id, death_status, current_gender, country_of_birth, people_id, entity_id, birth_gender_id, ethnicity_id, primary_language_id, age_type_id, approximate_age_no_birthday, updated_at) FROM stdin;
    633132  F       NH      \N      Cristen212      J    Kilroy44        1983-09-28 00:00:00     \N      t       \N      \N      \N      \N      F       USA  \N       \N      \N      \N      \N      \N      \N      \N
    

    But the other day I realized I could easily search every a single table without leaving the database, thanks to \copy:

    mydb=# \copy person to program 'grep -i kilroy';
    COPY 117
    633132  F       NH      \N      Cristen212      J    Kilroy44        1983-09-28 00:00:00     \N      t       \N      \N      \N      \N      F       USA  \N       \N      \N      \N      \N      \N      \N      \N
    

    \copy requires copying the entire table to the client side, which wasn’t a concern in my instance but could be a problem for some folks, and you need to be a superuser to use server-side COPY ... TO PROGRAM, so there are some limitations to this approach.

    I started thinking about this problem fresh off an encounter with row types, and realized we could do the whole thing in SQL:

    mydb=# select * from person where person::text ~* 'kilroy';
       id   | birth_gender | ethnicity | language | first_name |             middle_name              | last_name |     birth_date      | date_of_death | live | days_old_no_birthday | from_pregnant_id | fatal_condition_id | death_status | current_gender | country_of_birth | people_id | entity_id | birth_gender_id | ethnicity_id | primary_language_id | age_type_id | approximate_age_no_birthday | updated_at
    --------+--------------+-----------+----------+------------+--------------------------------------+-----------+---------------------+---------------+------+----------------------+------------------+--------------------+--------------+----------------+------------------+-----------+-----------+-----------------+--------------+---------------------+-------------+-----------------------------+------------
     633132 | F            | NH        |          | Cristen212 | J | Kilroy44  | 1983-09-28 00:00:00 |               | t    |                      |                  |                    |              | F              | USA              |           |           |                 |              |                     |             |                             |
    (1 row)
    

    This is my new favorite way of solving this problem, when I know what table I’m looking in but don’t know the column. In cases where I don’t know either, there’s always this monstrosity:

    mydb=# select 'select ' || quote_literal(relname) || ' as relname, f.*
    from ' || oid::regclass || ' f where f::text ~* ''kilroy''' from pg_class
    where relkind = 'r' and relnamespace = 'public'::regnamespace; \gexec
    

    This will work only in psql, as it depends on the \gexec command. It composes one query for each table, and then executes them in turn. It produces rather a lot of probably useless output you’ll need to sort through, and if you have (as I do) some tables with a large number of columns, I recommend trying it with \x turned on, to avoid having to page through quite so many results. But it did work, to find the entry I was looking for:

    ... Lots of output
    (0 rows)
    
    -[ RECORD 1 ]---------------+-------------------------------------
    relname                     | person
    id                          | 633132
    birth_gender                | F
    ethnicity                   | NH
    language                    |
    first_name                  | Cristen212
    middle_name                 | J
    last_name                   | Kilroy44
    birth_date                  | 1983-09-28 00:00:00
    date_of_death               |
    live                        | t
    days_old_no_birthday        |
    from_pregnant_id            |
    fatal_condition_id          |
    death_status                |
    current_gender              | F
    country_of_birth            | USA
    people_id                   |
    entity_id                   |
    birth_gender_id             |
    ethnicity_id                |
    primary_language_id         |
    age_type_id                 |
    approximate_age_no_birthday |
    updated_at                  |
    
    (0 rows)
    ... Lots more output
    

    If I’m willing to expand beyond one-liner solutions, I can get psql to filter that useless output for me:

    mydb=# \o | grep -i kilroy | grep -v select
    mydb=# select 'select ' || quote_literal(relname) || ' as rname, f.* from ' || oid::regclass || ' f where f::text ~* ''kilroy'''
    from pg_class where relkind = 'r' and relnamespace = 'public'::regnamespace; \gexec
    mydb=# \o
     person | 633132 | F            | NH        |          | Cristen212 | J | Kilroy44  | 1983-09-28 00:00:00 |               | t    |                      |                  |                    |              | F              | USA              |           |           |                 |              |                     |             |                             |
    

    When I did this, I had to use a final \o to set the output mode back to normal, before I saw results.

    Do you have a different technique to solve this problem? I’m sure there are many other possibilities out there! Comment below!

    postgres data-processing database


    Comments