Skip to content

Instantly share code, notes, and snippets.

View hongbo-miao's full-sized avatar
❣️

Hongbo Miao hongbo-miao

❣️
View GitHub Profile
uv run python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf --model allenai/olmOCR-7B-0225-preview-FP8
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 21:14:55,004 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 21:14:55,004 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 21:14:55,004 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 552.46it/s]
2025-06-17 21:14:55,007 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 21:14:55,163 - __main__ - INFO - Starting pipeline with PID 3059549
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 14:58:03,862 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 14:58:03,862 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 14:58:03,862 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 337.81it/s]
2025-06-17 14:58:03,866 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 14:58:03,963 - __main__ - INFO - Starting pipeline with PID 2452147
2025-06-17 14:58:03,963 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 14:46:27,683 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 14:46:27,683 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 14:46:27,683 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 491.42it/s]
2025-06-17 14:46:27,686 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 14:46:27,796 - __main__ - INFO - Starting pipeline with PID 2425548
2025-06-17 14:46:27,796 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-
@hongbo-miao
hongbo-miao / gist:fe51beaa5faa2477ddb72c42e1914d96
Last active June 14, 2025 06:35
olmocr log when run in NVIDIA GeForce RTX 5090 GPU
root@2fdffe8b8e20:~# python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-14 06:27:39,378 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-14 06:27:39,378 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-14 06:27:39,378 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 530.66it/s]
2025-06-14 06:27:39,381 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
/usr/local/lib/python3.11/dist-packages/torch/cuda/__init__.py:235: UserWarning:
@hongbo-miao
hongbo-miao / uv.lock
Created January 31, 2025 09:48
mineru uv.lock
version = 1
requires-python = ">=3.12.0, <3.13"
resolution-markers = [
"platform_system == 'Windows' and sys_platform == 'win32'",
"platform_system == 'Windows' and sys_platform != 'win32'",
"platform_machine == 'aarch64' and platform_system == 'Linux'",
"platform_machine != 'aarch64' and platform_system == 'Linux'",
"platform_machine == 'arm64' and platform_system == 'Darwin'",
"platform_machine != 'arm64' and platform_system == 'Darwin'",
"platform_system != 'Darwin' and platform_system != 'Linux' and platform_system != 'Windows' and sys_platform == 'win32'",
ts=2025-01-30T08:59:35.298859187Z level=info "boringcrypto enabled"=false
ts=2025-01-30T08:59:35.29786316Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/[email protected]/memlimit/memlimit.go:170 msg="memory is not limited, skipping" package=github.com/KimMachineGun/automemlimit/memlimit
ts=2025-01-30T08:59:35.298890774Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2025-01-30T08:59:35.298894656Z level=info msg="running usage stats reporter"
ts=2025-01-30T08:59:35.298896901Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=env
ts=2025-01-30T08:59:35.2989009Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=env
ts=2025-01-30T08:59:35.298903588Z level=info msg="starting complete graph evaluation" controller_pa
This file has been truncated, but you can view the full file.
DEBUG: Using RE2 regex engine
DEBUG: Parsing configs
DEBUG: Checking for config file in /runner/renovate/job_config.json
DEBUG: Detected config in env RENOVATE_CONFIG
{
"config": {
"extends": [
"mergeConfidence:all-badges"
],
@hongbo-miao
hongbo-miao / gist:b10b9785997e6078b9290cb30af5ccf2
Last active October 15, 2024 21:39
LiteLLM log for Continue
21:18:26 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:195 - Request Headers: Headers({'host': 'litellm.example.com', 'user-agent': 'node-fetch', 'content-length': '3883', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'api-key': 'anything', 'authorization': 'Bearer anything', 'content-type': 'application/json', 'x-forwarded-for': '172.31.191.224', 'x-forwarded-host': 'litellm.example.com', 'x-forwarded-port': '443', 'x-forwarded-proto': 'https', 'x-forwarded-server': 'horizon-traefik-7765cbd49c-cm5n6', 'x-real-ip': '172.31.191.224'})
21:18:26 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:201 - receiving data: {'model': 'claude-3-5-sonnet', 'max_tokens': 2048, 'temperature': 0.01, 'stream': True, 'stop': ['</COMPLETION>', '\n\n', '\r\n\r\n', '/src/', '#- coding: utf-8', '```', '\ndef', '\nclass', '\n"""#'], 'prompt': 'You are a HOLE FILLER. You are provided with a file containing holes, formatted as \'{{HOLE_NAME}}\'. Your TASK is to complete with a string to replace this hol
@hongbo-miao
hongbo-miao / gist:03b3bb1dd9585d185611e4b848123df6
Created October 7, 2024 23:32
LiteLLM bug: Conversation blocks and tool result blocks cannot be provided in the same turn.
This file has been truncated, but you can view the full file.
22:34:16 - LiteLLM Proxy:DEBUG: proxy_server.py:3113 - Request received by LiteLLM:
{
"model": "claude-3-opus",
"messages": [
{
"role": "system",
@hongbo-miao
hongbo-miao / gist:8577107aba2db2cff0b577cede63e12b
Created October 4, 2024 04:23
LiteLLM error log: Conversation blocks and tool result blocks cannot be provided in the same turn.
This file has been truncated, but you can view the full file.
04:20:27 - LiteLLM Proxy:DEBUG: proxy_server.py:3122 - Request received by LiteLLM:
{
"model": "claude-3-opus",
"messages": [
{
"role": "system",