Since v0.8.0 (Jan 16, 2026): 154 commits, 130 non-merge, 150 files changed, ~26K lines added.
Checked the full commit messages (subject + body), not just diffs. The main developer (unclecode) uses a clear signal:
- Detailed commit messages (multi-line body with explanation, rationale, usage examples) = intended for release notes
- Short one-liner commits (no body, terse description) = keep quiet / internal
These all have substantial commit bodies explaining what, why, and how:
| # | Feature | Commit | Message Length | Evidence |
|---|---|---|---|---|
| 1 | Anti-bot detection + retry + fallback | 72b546c |
~20 lines | Full explanation of 3-tier detection, proxy escalation loop, fallback system |
| 2 | Proxy escalation chain + ProxyConfig.DIRECT + crawl_stats |
8752072, 8795539 |
~10 lines each | Explains list-based proxy_config, DIRECT sentinel, stats tracking |
| 3 | Shadow DOM flattening + js_code reordering | 8576331 |
~15 lines | Detailed: recursive serializer, slot projections, shadow-scoped styles, pipeline reorder, new js_code_before_wait |
| 4 | Deep crawl cancellation | f6897d1 |
~10 lines | Explains cancel(), should_cancel callback, all 3 strategies |
| 5 | Source/sibling selector in extraction | d267c65 |
~20 lines | Full explanation: why needed (Hacker News layout), how it works, 4 subclasses, schema validation, LLM prompts updated |
| 6 | Config defaults API | 13a4148 |
~8 lines | Explains set_defaults/get_defaults/reset_defaults, deep copy isolation |
| 7 | Consent popup removal | 3fc7730 |
~5 lines | Explains remove_consent_popups flag + from_kwargs fix |
| 8 | avoid_ads / avoid_css resource filtering |
c0912f7 |
~8 lines | Explains opt-in flags, pool release lifecycle, ad/tracker domain blocking |
| 9 | Browser recycling + memory-saving mode | c046918, 3401dd1, 2060c7e |
~30+ lines across 3 commits | Version-based recycling, deadlock fix, 3 root causes explained in detail |
| 10 | query_llm_config for adaptive crawler |
0a45c10 |
~15 lines | Explains two incompatible API call types problem, fallback chain, backward compat |
| 11 | GFM table compliance (leading/trailing pipes) | c70ab31 |
~15 lines | Before/after examples, 6 specific changes listed, 13 unit tests |
| 12 | Anti-bot Tier 3 structural integrity | 13048a1 |
~12 lines | Signal scoring system, size thresholds, 4 structural signals |
| 13 | Token usage tracking in schema generation | c9cb016 |
~10 lines | Explains 5 internal LLM calls problem, TokenUsage accumulator, backward compat |
| 14 | simulate_user fix (ArrowDown → mouse movement) |
c854e2b |
~10 lines | Explains how it destroyed content on SPAs, replacement approach |
| 15 | Cascading context crash fix (#1768) | 1a9f68d |
~12 lines | Flag-based dedup, explains root cause of unbounded script accumulation |
| 16 | Browser recycling deadlock fix (#1640) | 2060c7e |
~20 lines | 3 separate bugs explained with fix for each |
| 17 | Critical RCE fix (Docker /crawl) | 0104db6 |
~25 lines | 2 CVEs, 6 affected endpoints, breaking changes documented |
| 18 | Batch fix: 10 issues | 3a75dd3 |
~12 lines | Each of 10 issues explained in one line |
| 19 | Batch community PRs (#1622, #1786, etc.) | 7c0cc3e |
~15 lines | Bug fixes, security, CI, features — each explained |
| 20 | from_serializable_dict narrowing (Docker) | 71a6526 |
~15 lines | Detailed root cause, fix approach, RCE protection preserved |
These have one-line messages with no body — developer clearly didn't want attention on them:
| # | Feature | Commit | Message | Why quiet |
|---|---|---|---|---|
| A | force_viewport_screenshot |
cee79a8 |
feat: add force viewport screenshot (no body) |
Community PR, simple flag |
| B | device_scale_factor |
b54c200 |
feat: make device_scale_factor configurable via BrowserConfig (no body) |
Community PR, simple flag |
| C | redirected_status_code |
merged via 37a49c5 |
Short merge message | Community PR |
| D | Cloud CLI module | ef226f5 |
Add: Cloud CLI module for profile management (no body) |
Not ready for public, depends on cloud API |
| E | CycloneDX SBOM | 55de32d |
Add CycloneDX SBOM and generation script (no body) |
Compliance artifact, not user-facing |
| F | Stats dashboard | cbd36b7 |
Short message | Internal/marketing |
| G | link_preview_timeout |
c73aa27 |
fix: make link_preview_timeout configurable (no body) |
Community PR (Br1an67) |
| H | wait_for_images |
d229bee |
fix: add wait_for_images option (no body) |
Community PR (Br1an67) |
| I | score_threshold for BestFirst |
3795910 |
Short message with issue ref only | Simple param addition |
| J | Parallel URL processing | 2d5e530 |
Add support for parallel URL processing (no body) |
Internal optimization |
| K | Windows crawler monitor | 1029815 |
One-liner (no body) | Community PR (Br1an67) |
| L | scroll_delay fix |
cd81e3c |
One-liner | Minor fix |
| M | Schema generation prompt improvement | 4fb02f8 |
Short | Internal LLM prompt tweak |
| # | Fix | Issue | Commit | Why feature |
|---|---|---|---|---|
| 1 | Fix proxy auth ERR_INVALID_AUTH_CREDENTIALS |
#1281 | 1d6efb6 |
Detailed: 4 bullet points explaining dict-to-ProxyConfig, JSON serialization, context proxy |
| 2 | Fix <base> tag in html2text |
#1721 | 2016d66 |
Detailed explanation of HTML standard compliance |
| 3 | Fix bs4 deprecation warning | #1077 | 9a0585c |
Includes the actual warning message |
| 4 | Guard against None LLM content | #1788 | b138c94 |
Detailed: max_tokens edge case, finish_reason propagation |
| 5 | Fix nested brackets in LINK_PATTERN | #1790 | 669b466 |
Before/after examples, embedded images, Wikipedia URLs |
| 6 | Preserve class/id in cleaned_html | #1782 | 500d047 |
Explains user need for CSS class access |
| 7 | Fix is_external_url port comparison | #1783 | 2048862 |
Explains localhost:8000 edge case |
| 8 | Prevent AdaptiveCrawler external domain crawling | #1805 | 78434ea |
Root cause explained |
| 9 | Fix MediaItem crash on "100%", "auto" | #1635 | 0273b27 |
Pydantic BeforeValidator solution |
| 10 | MCP bridge httpx timeout | #1769 | 0e9b677 |
3 changes explained, HTTP 504 surfacing |
| 11 | Strip markdown fences in force_json_response | — | bd0f6e1 |
Explains LLM wrapping problem |
| 12 | Anti-bot false positive on browser JSON | — | 11b4576 |
3 separate fixes, detailed |
| 13 | 3 bug fixes (#1487, #1512, #1666) | — | 55956a8 |
Each issue explained |
| 14 | Docs modernization | #1770 | 04e83aa |
11 specific doc fixes listed |
| 15 | XSS prevention via DOMParser | #1796 | in 7c0cc3e |
Security fix in batch PR |
| 16 | Require api_token for /token endpoint | #1795 | in 7c0cc3e |
Security fix in batch PR |
| # | Fix | Issue | Commit |
|---|---|---|---|
| 17 | Fix script tag removal losing adjacent text | #1364 | 660d701 — one-liner, no body |
| 18 | Fix VersionManager ignoring env var | #1296 | 83b323f — one-liner |
| 19 | Fix deep-crawl CLI first page only | #1667 | 220a224 — one-liner |
| 20 | UTF-8 encoding CLI output | #1789 | 91330ef — one-liner (Br1an67) |
| 21 | UnicodeEncodeError in URL seeder | #1784 | e47e810 — one-liner (Br1an67) |
| 22 | Redis TTL expiry | #1730 | 761664d — one-liner |
| 23 | Docker health endpoint version | #1686 | fe1c1cb — one-liner |
| 24 | Replace tf-playwright-stealth | — | 624dfe7 — short body |
| 25 | Allow local embeddings | — | 2a04fc3 — short body |
| 26 | GoogleSearchCrawler script.js packaging | — | ef8f0c6 — short body |
| 27 | Default logger init | — | 232f007 — short body |
| 28 | Duplicate PROMPT_EXTRACT_BLOCKS | — | ffd3fac — one-liner |
| 29 | Re-raise in MemoryAdaptiveDispatcher | — | 6ea0e38 — one-liner |
| 30 | FilterChain.add_filter tuple | — | cfa7308 — one-liner |
| 31 | chardet.detect in thread executor | #1751 | 4298e26 — one-liner |
| 32 | total_score for failed head extraction | — | 094242d — one-liner |
| 33 | return in finally block | — | 87f57f1 — one-liner |
| 34 | can_process_url normalized URL | — | 43738c9 — one-liner |
| 35 | redirected_url raw HTML | — | 418bfcf — one-liner |
| 36 | Fix proxy auth persistent contexts | — | 112f44a — one-liner |
| 37 | Fix proxy escalation re-raise | — | 45d8e14 — one-liner |
| 38 | Fix fallback fetch never return None | — | ccd24aa — one-liner |
| 39 | Fix page reuse race condition | — | 9b52c14 — one-liner |
| 40 | URL Seeder Common Crawl | #1746 | 694ba44 — one-liner |
| 41 | newline before code fence html2text | #462 | 697c2b2 — one-liner |
| 42 | from_serializable_dict Docker | #1797 | 71a6526 — actually detailed (feature above) |
- Anti-bot detection + proxy escalation + fallback system
- Shadow DOM flattening + js_code pipeline reorder
- Browser recycling + memory-saving mode
- Deep crawl cancellation
- Source/sibling selector in JSON extraction
- Config defaults API (set_defaults/get_defaults/reset_defaults)
- Consent popup removal (40+ CMP platforms)
- avoid_ads/avoid_css resource filtering
- GFM table compliance fix
- query_llm_config for adaptive crawler
- force_viewport_screenshot, device_scale_factor, redirected_status_code
- wait_for_images, score_threshold, link_preview_timeout
- Token usage tracking in schema generation
- Cloud CLI module (not ready)
- CycloneDX SBOM (internal)
- Stats dashboard (marketing)
- Parallel URL processing (internal optimization)
- Schema prompt improvements (internal)
- Critical RCE fix in Docker /crawl (deserialization + eval)
- XSS prevention in iframe processing (DOMParser)
- API token required for /token endpoint
- Stealth improvements (sec-ch-ua sync, WebGL)
- ~16 worth explaining in detail (root cause described in commit)
- ~26 just list as "fixed #XXXX" one-liners
- Cloud CLI (#D) — Is the cloud API ready? Ship, hide, or remove from release?
- Shadow DOM (#3) — Is
flatten_shadow_domthe final API name? - Anti-bot (#1) — Prominently market or keep low-profile?
- Version number — v0.8.1 (patch) or v0.9.0 (minor)? This is a LOT of changes.
- Token usage tracking (#13) — Commit
c9cb016has detailed message but code may have been refactored. Verify? - Breaking changes —
cleaned_htmlnow preserves class/id attrs, Docker hooks default to disabled, file:// URLs blocked on Docker API. Document these? js_code_before_wait— New parameter from Shadow DOM commit. Feature or implementation detail?