Skip to content

Instantly share code, notes, and snippets.

@BexTuychiev
Created February 19, 2026 21:12
Show Gist options
  • Select an option

  • Save BexTuychiev/0d39fd93e08184b1cca223fb162cf5f8 to your computer and use it in GitHub Desktop.

Select an option

Save BexTuychiev/0d39fd93e08184b1cca223fb162cf5f8 to your computer and use it in GitHub Desktop.
Gemini 3 Flash agentic vision - Section 6: Multi-Step Extraction and Plotting (full output)
============================================================
AGENTIC VISION - MULTI-STEP EXTRACTION
============================================================
[THINK] **My Approach to Data Extraction and Analysis from the Image**
Okay, so I'm looking at this image, and the goal is to pull some data from a table, specifically the first 10 rows. There are several tables in this image, so let's start by clarifying what we're dealing with. The image contains tax tables, and it looks like there are a few options. I see a "Single or Married Filing Separately" table, and a "Head of Household" table. I'll focus on one, for simplicity and clarity. The one at the top
[CODE]
import PIL.Image
import PIL.ImageDraw
# Load the image to get its dimensions
img = PIL.Image.open('input_file_0.jpeg')
width, height = img.size
# Define regions of interest to find the table
# The table is roughly in the top half of the image.
# Let's zoom in on the "Single or Married Filing Separately" table.
# It starts from roughly (340, 160) to (1000, 600) in normalized coordinates.
# Let's do a wider crop first to identify headers and rows better.
objs = [
{'box_2d': [0, 200, 600, 800], 'label': 'top_table_area'},
{'box_2d': [160, 360, 480, 850], 'label': 'single_filing_table'},
{'box_2d': [430, 70, 720, 560], 'label': 'head_of_household_table'}
]
# Just crop the Single filing table for now as it's very clear.
# Let's zoom in more to read values.
# The table headers are around y=180-350.
# Rows are on the left.
def get_crop(box, img_w, img_h):
ymin, xmin, ymax, xmax = box
return img.crop((xmin * img_w / 1000, ymin * img_h / 1000, xmax * img_w / 1000, ymax * img_h / 1000))
crop1 = get_crop([160, 360, 480, 850], width, height)
crop1.save('single_filing_table.png')
# Also crop the headers and first few rows specifically
crop2 = get_crop([160, 360, 350, 850], width, height) # Headers and top rows
crop2.save('table_zoom.png')
print(f'Image size: {width}x{height}')
[RESULT] (Outcome.OUTCOME_OK)
Image size: 2250x1500
[IMAGE] image/png, 571,849 bytes
[IMAGE] image/png, 929,077 bytes
[THINK] **Analyzing the "Single or Married Filing Separately" Table**
Okay, so I'm looking at this crop, `single_filing_table.png`, and it's clear this is a portion of the "Single or Married Filing Separately" table. My initial thought is to identify the structure, the rows and columns. I see the column headers immediately: "$30,000 - 39,999", "$40,000 - 49,999", and so on. I need to figure out the rows, which represent the "Higher Paying Job Annual Taxable Wage & Salary."
Right, I need to find the fu
[CODE]
# Crop the left side and top of the first table to get headers
crop_top_left = get_crop([100, 50, 500, 400], width, height)
crop_top_left.save('table_top_left.png')
# The "Single or Married Filing Separately" table columns start at some point.
# Let's see the headers for that section.
# The table is roughly from row $0 to something.
# Let's get the headers for the columns as well.
crop_cols = get_crop([100, 250, 250, 850], width, height)
crop_cols.save('table_cols.png')
[IMAGE] image/png, 554,079 bytes
[IMAGE] image/png, 736,870 bytes
[THINK] **Analyzing the Tax Table: Decoding the "Single or Married Filing Separately" Section**
Okay, so I'm looking at this tax table, specifically the section labeled "Single or Married Filing Separately." My initial goal is to extract some data, and it looks like the user wants data from the first 10 rows and a few columns. I'll need to figure out the table's structure. First, I noted that the "Single or Married Filing Separately" label acts as a header for multiple columns, defining a specific sect
[CODE]
# Zoom into the first 10 rows and first 4 columns of the Single table
crop_data = get_crop([160, 250, 400, 550], width, height)
crop_data.save('data_top_rows.png')
[IMAGE] image/png, 475,103 bytes
[THINK] **Data Extraction and Visualization Plan**
Okay, so I need to extract data from what looks like a table representing some kind of financial or statistical information. The prompt specifies "the first 10 rows," which is straightforward enough, but I need to make sure I'm grabbing the right headers and data. First, I'll identify the column headers. The images are a bit fragmented, but I can see the column headers "$0 - 9,999", "$10,000 - 19,999", and so on.
Next, I need the corresponding row lab
[CODE]
import matplotlib.pyplot as plt
import pandas as pd
data = [
{"Higher Paying Job Wage": "$0 - 9,999", "$0 - 9,999": 0, "$10,000 - 19,999": 940, "$20,000 - 29,999": 1020, "$30,000 - 39,999": 1020},
{"Higher Paying Job Wage": "$10,000 - 19,999", "$0 - 9,999": 940, "$10,000 - 19,999": 1540, "$20,000 - 29,999": 1620, "$30,000 - 39,999": 2020},
{"Higher Paying Job Wage": "$20,000 - 29,999", "$0 - 9,999": 1020, "$10,000 - 19,999": 1620, "$20,000 - 29,999": 2100, "$30,000 - 39,999": 3100},
{"Higher Paying Job Wage": "$30,000 - 39,999", "$0 - 9,999": 1020, "$10,000 - 19,999": 2020, "$20,000 - 29,999": 3100, "$30,000 - 39,999": 4100},
{"Higher Paying Job Wage": "$40,000 - 59,999", "$0 - 9,999": 1870, "$10,000 - 19,999": 3470, "$20,000 - 29,999": 4550, "$30,000 - 39,999": 5550},
{"Higher Paying Job Wage": "$60,000 - 79,999", "$0 - 9,999": 1870, "$10,000 - 19,999": 3470, "$20,000 - 29,999": 4690, "$30,000 - 39,999": 5890},
{"Higher Paying Job Wage": "$80,000 - 99,999", "$0 - 9,999": 2000, "$10,000 - 19,999": 3810, "$20,000 - 29,999": 5090, "$30,000 - 39,999": 6290},
{"Higher Paying Job Wage": "$100,000 - 124,999", "$0 - 9,999": 2040, "$10,000 - 19,999": 3840, "$20,000 - 29,999": 5120, "$30,000 - 39,999": 6320},
{"Higher Paying Job Wage": "$125,000 - 149,999", "$0 - 9,999": 2040, "$10,000 - 19,999": 3840, "$20,000 - 29,999": 5120, "$30,000 - 39,999": 6910},
{"Higher Paying Job Wage": "$150,000 - 174,999", "$0 - 9,999": 2220, "$10,000 - 19,999": 4830, "$20,000 - 29,999": 6910, "$30,000 - 39,999": 8910}
]
df = pd.DataFrame(data)
# Print as a structured format (table)
print(df.to_string())
# Create a bar chart
df.set_index("Higher Paying Job Wage").plot(kind='bar', figsize=(12, 6))
plt.title('Tax Table: Single or Married Filing Separately')
plt.ylabel('Tax Amount')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('tax_bar_chart.png')
# Highest and lowest
# Sum of each row to compare? Or just absolute highest/lowest in the data?
# Let's say which row has the maximum and minimum total tax amount among these columns.
df['Total'] = df.iloc[:, 1:].sum(axis=1)
highest_row = df.loc[df['Total'].idxmax(), 'Higher Paying Job Wage']
lowest_row = df.loc[df['Total'].idxmin(), 'Higher Paying Job Wage']
print(f"Row with highest total values in these columns: {highest_row}")
print(f"Row with lowest total values in these columns: {lowest_row}")
[RESULT] (Outcome.OUTCOME_OK)
Higher Paying Job Wage $0 - 9,999 $10,000 - 19,999 $20,000 - 29,999 $30,000 - 39,999
0 $0 - 9,999 0 940 1020 1020
1 $10,000 - 19,999 940 1540 1620 2020
2 $20,000 - 29,999 1020 1620 2100 3100
3 $30,000 - 39,999 1020 2020 3100 4100
4 $40,000 - 59,999 1870 3470 4550 5550
5 $60,000 - 79,999 1870 3470 4690 5890
6 $80,000 - 99,999 2000 3810 5090 6290
7 $100,000 - 124,999 2040 3840 5120 6320
8 $125,000 - 149,999 2040 3840 5120 6910
9 $150,000 - 174,999 2220 4830 6910 8910
Row with highest total values in these columns: $150,000 - 174,999
Row with lowest total values in these columns: $0 - 9,999
[IMAGE] image/png, 52,545 bytes
[THINK] **My Analysis of the Data**
Okay, so I've pulled the data. I have the first 10 rows and 4 columns from that "Single or Married Filing Separately" table. The column headers are clear – thankfully, that's one less thing to worry about. I've already generated the bar chart, so that's done. I've identified the rows with the maximum and minimum total values, and now it's time to craft the final response.
Before I finalize everything, I should double-check those headers to ensure accuracy, and make
[ANSWER] The table extracted from the image is the **"Single or Married Filing Separately"** tax table, which calculates additional tax based on the annual taxable wage and salary of both a higher-paying job (rows) and a lower-paying job (columns).
### Extracted Data (First 10 Rows, First 4 Columns)
| Higher Paying Job Wage | $0 - 9,999 | $10,000 - 19,999 | $20,000 - 29,999 | $30,000 - 39,999 |
| :--- | :--- | :--- | :--- | :--- |
| **$0 - 9,999** | $0 | $940 | $1,020 | $1,020 |
| **$10,000 - 19,999** | $940 | $1,540 | $1,620 | $2,020 |
| **$20,000 - 29,999** | $1,020 | $1,620 | $2,100 | $3,100 |
| **$30,000 - 39,999** | $1,020 | $2,020 | $3,100 | $4,100 |
| **$40,000 - 59,999** | $1,870 | $3,470 | $4,550 | $5,550 |
| **$60,000 - 79,999** | $1,870 | $3,470 | $4,690 | $5,890 |
| **$80,000 - 99,999** | $2,000 | $3,810 | $5,090 | $6,290 |
| **$100,000 - 124,999** | $2,040 | $3,840 | $5,120 | $6,320 |
| **$125,000 - 149,999** | $2,040 | $3,840 | $5,120 | $6,910 |
| **$150,000 - 174,999** | $2,220 | $4,830 | $6,910 | $8,910 |
### Summary of Values
* **Highest Value:** The row for the **$150,000 - 174,999** wage bracket contains the highest values in this subset, with a peak of **$8,910** at the intersection with the $30,000 - 39,999 lower-paying job column.
* **Lowest Value:** The row for the **$0 - 9,999** wage bracket contains the lowest values, starting at **$0** for the lowest combined income category.
A bar chart comparing these values across the 10 rows and 4 columns has been generated and is displayed below.
![Tax Bar Chart](tax_bar_chart.png)
Saving generated images...
Saved: section6_chart_1.png
Saved: section6_chart_2.png
Saved: section6_chart_3.png
Saved: section6_chart_4.png
Saved: section6_chart_5.png
Saved: section6_chart_6.png
Done!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment