1. Example Dataset (Logical View)

Assume this dataset coming from an ingestion job:

order_id	country	product	amount
1	US	Book	120
2	IN	Pen	20
3	US	Book	300
4	IN	Pencil	10
5	US	Book	150

Schema:

order_id INT
country STRING
product STRING
amount INT

2. Insert Operation (What “INSERT INTO” Really Means)

Example:

INSERT INTO TABLE orders
SELECT * FROM staging_orders;

What actually happens

There is no row-by-row insert into Parquet.

Instead:

Execution engine (Spark / Hive / Impala) runs a distributed job
Each task produces one or more Parquet files
Files are immutable and written once

3. In-Memory Row Buffering

Execution engine:

Reads rows
Buffers them in memory
Groups them into row groups

Typical row group size:

128 MB (default)

Our example is tiny, so assume:

Row Group 1 contains all 5 rows

4. Row Group Formation (Horizontal Split)

Row Group = horizontal partition

Row Group 1:
(1, US, Book, 120)
(2, IN, Pen, 20)
(3, US, Book, 300)
(4, IN, Pencil, 10)
(5, US, Book, 150)

Row groups are independent units for:

Parallelism
Skipping data
Compression

5. Column Chunk Formation (Vertical Split)

Inside each row group, data is split by column.

Row Group 1
 ├── order_id column chunk
 ├── country column chunk
 ├── product column chunk
 └── amount column chunk

Actual stored values

order_id → [1, 2, 3, 4, 5]
country  → [US, IN, US, IN, US]
product  → [Book, Pen, Book, Pencil, Book]
amount   → [120, 20, 300, 10, 150]

Each column chunk is written contiguously on disk.

6. Page Creation (Smallest Physical Unit)

Column chunks are further split into pages (default ~1MB).

Example: country column chunk

Page 1:
US, IN, US, IN, US

Each page contains:

Encoded values
Optional dictionary
Compression

7. Encoding (How Values Become Bytes)

Dictionary Encoding (Strings)

country:

Dictionary:
0 → US
1 → IN

Data:
[0, 1, 0, 1, 0]

product:

Dictionary:
0 → Book
1 → Pen
2 → Pencil

Data:
[0, 1, 0, 2, 0]

Integer Encoding

amount:

[120, 20, 300, 10, 150]
→ Bit-packed / RLE

Encoding dramatically reduces size.

8. Compression (After Encoding)

Encoded pages are compressed:

Common codecs:

Snappy (default)
GZIP
ZSTD

Result:

Encoded + compressed byte stream

9. Metadata Generation (Critical for Query Speed)

Page-level metadata

Number of values
Encoding type
Compressed size

Column chunk metadata

min / max values
null count
value count

Example:

amount:
  min = 10
  max = 300

Row group metadata

Total rows
Total size
Column offsets

10. File Footer (Written Last)

Parquet writes metadata at the end of the file.

Footer contains:

Full schema
Row group locations
Column chunk offsets
Statistics

Why footer at end? → Writer doesn’t know offsets until data is written.

11. Final Physical File Layout

[Magic Bytes]
[Row Group 1]
  [order_id column chunk]
  [country column chunk]
  [product column chunk]
  [amount column chunk]
[Footer Metadata]
[Magic Bytes]

This entire file is then split into HDFS blocks.

12. Query Execution (SELECT)

Example:

SELECT order_id, amount
FROM orders
WHERE country = 'US'
AND amount > 100;

13. How Query Engine Reads Parquet

Step 1: Read Footer

Schema
Row group metadata

Step 2: Row Group Pruning

Check statistics:

country.min = IN
country.max = US
amount.min = 10
amount.max = 300

Row group cannot be skipped → read it

14. Column Pruning

Only read required columns:

order_id
amount
country

❌ product column never read

15. Predicate Pushdown

country = 'US':

Dictionary scan
Page-level filtering

amount > 100:

Page skipped if max <= 100

16. Page Decoding & Row Reconstruction

Engine:

Decompress pages
Decode values
Apply filters
Reconstruct rows

Result:

(1, 120)
(3, 300)
(5, 150)

17. Insert / Update / Delete Reality

Insert

✔ Append new Parquet files

Update / Delete

❌ No in-place modification

Instead:

Rewrite files
Partition overwrite
Use Iceberg / Delta / Hudi

18. Multiple Files & Parallelism

Real system:

/orders/
  part-00000.snappy.parquet
  part-00001.snappy.parquet
  part-00002.snappy.parquet

Each file:

Independently readable
Independently skippable
Parallelizable

19. Mental Model (One Sentence)

Parquet stores columns inside row groups, encoded and compressed, with rich metadata at the end so query engines can avoid reading most of the data.

sany2k8/how_parquet_file_formed_with_data_and_queried.md

Select an option

No results found

Select an option

No results found

1. Example Dataset (Logical View)

2. Insert Operation (What “INSERT INTO” Really Means)

What actually happens

3. In-Memory Row Buffering

4. Row Group Formation (Horizontal Split)

5. Column Chunk Formation (Vertical Split)

Actual stored values

6. Page Creation (Smallest Physical Unit)

7. Encoding (How Values Become Bytes)

Dictionary Encoding (Strings)

Integer Encoding

8. Compression (After Encoding)

9. Metadata Generation (Critical for Query Speed)

Page-level metadata

Column chunk metadata

Row group metadata

10. File Footer (Written Last)

11. Final Physical File Layout

12. Query Execution (SELECT)

13. How Query Engine Reads Parquet

Step 1: Read Footer

Step 2: Row Group Pruning

14. Column Pruning

15. Predicate Pushdown

16. Page Decoding & Row Reconstruction

17. Insert / Update / Delete Reality

Insert

Update / Delete

18. Multiple Files & Parallelism

19. Mental Model (One Sentence)