Celery tasks with psycopg: ProgrammingError the last operation didn't produce a result


  
Karim - Sept. 23, 2024


I'm working on aproject in which I have

  1. A PostgreSQL 16.2 database
  2. A Python 3.12 backend using psycopg 3.2.1 and psycopg_pool 3.2.2.
  3. Celery for handling asynchronous tasks.

The celery tasks uses the database pool through the following code:

import os
from psycopg_pool import ConnectionPool
from contextlib import contextmanager

PG_USERNAME = os.getenv('PG_USERNAME')
if not PG_USERNAME:
    raise ValueError(f"Invalid postgres username")

PG_PASSWORD = os.getenv('PG_PASSWORD')
if not PG_PASSWORD:
    raise ValueError(f"Invalid postgres pass")

PG_HOST = os.getenv('PG_HOST')
if not PG_HOST:
    raise ValueError(f"Invalid postgres host")

PG_PORT = os.getenv('PG_PORT')
if not PG_PORT:
    raise ValueError(f"Invalid postgres port")

# Options used to prevent closed connections
# conn_options = f"-c statement_timeout=1800000 -c tcp_keepalives_idle=30 -c tcp_keepalives_interval=30"
conninfo = f'host={PG_HOST} port={PG_PORT} dbname=postgres user={PG_USERNAME} password={PG_PASSWORD}'
connection_pool = ConnectionPool(
    min_size=4,
    max_size=100,
    conninfo=conninfo,
    check=ConnectionPool.check_connection,
    #options=conn_options,
)


@contextmanager
def get_db_conn():
    conn = connection_pool.getconn()
    try:
        yield conn
    finally:
        connection_pool.putconn(conn)

And an example celery task would be

@app.task(bind=True)
def example_task(self, id):
    with get_db_conn() as conn:
        try:
            with conn.cursor(row_factory=dict_row) as cursor:
                test = None
                cursor.execute('SELECT * FROM test WHERE id = %s', (id,))
                try:
                    test = cursor.fetchone()
                except psycopg.errors.ProgrammingError:
                    logger.warning(f'Test log msg')
                    conn.rollback()
                    return
                
                cursor.execute("UPDATE test SET status = 'running' WHERE id = %s", (id,))
                conn.commit()
                
                # Some processing...
                
               # Fetch another resource needed
               cursor.execute('SELECT * FROM test WHERE id = %s', (test['resource_id'],))
               cursor.fetchone()

                # Update the entry with the result
                cursor.execute("""
                    UPDATE test
                    SET status = 'done', properties = %s
                    WHERE id = %s
                """, (Jsonb(properties),  id))
                conn.commit()
        except Exception as e:
            logger.exception(f'Error: {e}')
            conn.rollback()
            with conn.cursor(row_factory=dict_row) as cursor:
                # Update status to error with exception information
                cursor.execute("""
                    UPDATE test
                    SET status = 'error', error = %s
                    WHERE id = %s
                """, (Jsonb({'error': str(e), 'stacktrace': traceback.format_exc()}), webpage_id))
                conn.commit()

The code works most of the times, but sometimes, when multiple tasks of the same type are being launched, I'm getting some errors of type psycopg.ProgrammingError: the last operation didn't produce a result on the second fetchone() call.

Meanwhile, on the database I can see the following warning WARNING: there is already a transaction in progress

I suspect there might be some problems with the way I'm working with connections, but I cannot find were.

As far as I know, once get_db_conn() is called that connection is not available for other tasks, so in theory there cannot be multiple tasks using the same connection, and therefore there should be no transaction already in progress when performing the second fetchone() call.


The resource exists, as every other task can access it, so that's not the problem.

Comments ( 1 )
@MontaF 2 months ago

If both the main target row of test as well as the additional one selected based on its test.resource_id foreign key aren't shareable, lock them. Otherwise, concurrent workers are likely bumping into each other, taking on the processing of the same row and altering its fields and the fields of the one its associated with through resource_id, at unpredictable points between subsequent steps of this operation.

8 Reply
Login to add comments