Write a Python program that connects to a PostgreSQL database using host, port, user, password, and schema name. The program should:
-
Loop over each day between a given
BEGIN_DATE
andEND_DATE
(formatted asYYYYMMDD
). Each date represents a value for thecalc_dt
column. -
For each day:
- Identify all tables in the given schema that contain a column named
calc_dt
. - For each matching table:
- Check if it has a primary key column that is a single
bigint
with an auto-increment default (PostgreSQL sequence). If so, sort exports bycalc_dt
and this primary key. If not, sort bycalc_dt
only. - Select all rows from the table where
calc_dt = current_date
, and export to a Parquet file named{table}_{calc_dt}.parquet
.
- Check if it has a primary key column that is a single
- Identify all tables in the given schema that contain a column named
-
Use
snappy
compression for a good balance of speed and file size.