Skip to content

Instantly share code, notes, and snippets.

@ntohidi
Last active March 11, 2026 09:42
Show Gist options
  • Select an option

  • Save ntohidi/9a23ef1553dbeaa130c3a59e8f515e6f to your computer and use it in GitHub Desktop.

Select an option

Save ntohidi/9a23ef1553dbeaa130c3a59e8f515e6f to your computer and use it in GitHub Desktop.
c4ai; 0.8.5 release

Release Assessment — v0.8.5

Since v0.8.0 (Jan 16, 2026): 154 commits, 130 non-merge, 150 files changed, ~26K lines added.


How This Was Assessed

Checked the full commit messages (subject + body), not just diffs. The main developer (unclecode) uses a clear signal:

  • Detailed commit messages (multi-line body with explanation, rationale, usage examples) = intended for release notes
  • Short one-liner commits (no body, terse description) = keep quiet / internal

Features — FEATURE IN RELEASE NOTES (detailed commit messages)

These all have substantial commit bodies explaining what, why, and how:

# Feature Commit Message Length Evidence
1 Anti-bot detection + retry + fallback 72b546c ~20 lines Full explanation of 3-tier detection, proxy escalation loop, fallback system
2 Proxy escalation chain + ProxyConfig.DIRECT + crawl_stats 8752072, 8795539 ~10 lines each Explains list-based proxy_config, DIRECT sentinel, stats tracking
3 Shadow DOM flattening + js_code reordering 8576331 ~15 lines Detailed: recursive serializer, slot projections, shadow-scoped styles, pipeline reorder, new js_code_before_wait
4 Deep crawl cancellation f6897d1 ~10 lines Explains cancel(), should_cancel callback, all 3 strategies
5 Source/sibling selector in extraction d267c65 ~20 lines Full explanation: why needed (Hacker News layout), how it works, 4 subclasses, schema validation, LLM prompts updated
6 Config defaults API 13a4148 ~8 lines Explains set_defaults/get_defaults/reset_defaults, deep copy isolation
7 Consent popup removal 3fc7730 ~5 lines Explains remove_consent_popups flag + from_kwargs fix
8 avoid_ads / avoid_css resource filtering c0912f7 ~8 lines Explains opt-in flags, pool release lifecycle, ad/tracker domain blocking
9 Browser recycling + memory-saving mode c046918, 3401dd1, 2060c7e ~30+ lines across 3 commits Version-based recycling, deadlock fix, 3 root causes explained in detail
10 query_llm_config for adaptive crawler 0a45c10 ~15 lines Explains two incompatible API call types problem, fallback chain, backward compat
11 GFM table compliance (leading/trailing pipes) c70ab31 ~15 lines Before/after examples, 6 specific changes listed, 13 unit tests
12 Anti-bot Tier 3 structural integrity 13048a1 ~12 lines Signal scoring system, size thresholds, 4 structural signals
13 Token usage tracking in schema generation c9cb016 ~10 lines Explains 5 internal LLM calls problem, TokenUsage accumulator, backward compat
14 simulate_user fix (ArrowDown → mouse movement) c854e2b ~10 lines Explains how it destroyed content on SPAs, replacement approach
15 Cascading context crash fix (#1768) 1a9f68d ~12 lines Flag-based dedup, explains root cause of unbounded script accumulation
16 Browser recycling deadlock fix (#1640) 2060c7e ~20 lines 3 separate bugs explained with fix for each
17 Critical RCE fix (Docker /crawl) 0104db6 ~25 lines 2 CVEs, 6 affected endpoints, breaking changes documented
18 Batch fix: 10 issues 3a75dd3 ~12 lines Each of 10 issues explained in one line
19 Batch community PRs (#1622, #1786, etc.) 7c0cc3e ~15 lines Bug fixes, security, CI, features — each explained
20 from_serializable_dict narrowing (Docker) 71a6526 ~15 lines Detailed root cause, fix approach, RCE protection preserved

Features — DO NOT FEATURE (short/terse commit messages)

These have one-line messages with no body — developer clearly didn't want attention on them:

# Feature Commit Message Why quiet
A force_viewport_screenshot cee79a8 feat: add force viewport screenshot (no body) Community PR, simple flag
B device_scale_factor b54c200 feat: make device_scale_factor configurable via BrowserConfig (no body) Community PR, simple flag
C redirected_status_code merged via 37a49c5 Short merge message Community PR
D Cloud CLI module ef226f5 Add: Cloud CLI module for profile management (no body) Not ready for public, depends on cloud API
E CycloneDX SBOM 55de32d Add CycloneDX SBOM and generation script (no body) Compliance artifact, not user-facing
F Stats dashboard cbd36b7 Short message Internal/marketing
G link_preview_timeout c73aa27 fix: make link_preview_timeout configurable (no body) Community PR (Br1an67)
H wait_for_images d229bee fix: add wait_for_images option (no body) Community PR (Br1an67)
I score_threshold for BestFirst 3795910 Short message with issue ref only Simple param addition
J Parallel URL processing 2d5e530 Add support for parallel URL processing (no body) Internal optimization
K Windows crawler monitor 1029815 One-liner (no body) Community PR (Br1an67)
L scroll_delay fix cd81e3c One-liner Minor fix
M Schema generation prompt improvement 4fb02f8 Short Internal LLM prompt tweak

Bug Fixes — Categorized by Commit Message Detail

MENTION in release notes (detailed messages explaining root cause):

# Fix Issue Commit Why feature
1 Fix proxy auth ERR_INVALID_AUTH_CREDENTIALS #1281 1d6efb6 Detailed: 4 bullet points explaining dict-to-ProxyConfig, JSON serialization, context proxy
2 Fix <base> tag in html2text #1721 2016d66 Detailed explanation of HTML standard compliance
3 Fix bs4 deprecation warning #1077 9a0585c Includes the actual warning message
4 Guard against None LLM content #1788 b138c94 Detailed: max_tokens edge case, finish_reason propagation
5 Fix nested brackets in LINK_PATTERN #1790 669b466 Before/after examples, embedded images, Wikipedia URLs
6 Preserve class/id in cleaned_html #1782 500d047 Explains user need for CSS class access
7 Fix is_external_url port comparison #1783 2048862 Explains localhost:8000 edge case
8 Prevent AdaptiveCrawler external domain crawling #1805 78434ea Root cause explained
9 Fix MediaItem crash on "100%", "auto" #1635 0273b27 Pydantic BeforeValidator solution
10 MCP bridge httpx timeout #1769 0e9b677 3 changes explained, HTTP 504 surfacing
11 Strip markdown fences in force_json_response bd0f6e1 Explains LLM wrapping problem
12 Anti-bot false positive on browser JSON 11b4576 3 separate fixes, detailed
13 3 bug fixes (#1487, #1512, #1666) 55956a8 Each issue explained
14 Docs modernization #1770 04e83aa 11 specific doc fixes listed
15 XSS prevention via DOMParser #1796 in 7c0cc3e Security fix in batch PR
16 Require api_token for /token endpoint #1795 in 7c0cc3e Security fix in batch PR

JUST LIST (short messages, routine fixes):

# Fix Issue Commit
17 Fix script tag removal losing adjacent text #1364 660d701 — one-liner, no body
18 Fix VersionManager ignoring env var #1296 83b323f — one-liner
19 Fix deep-crawl CLI first page only #1667 220a224 — one-liner
20 UTF-8 encoding CLI output #1789 91330ef — one-liner (Br1an67)
21 UnicodeEncodeError in URL seeder #1784 e47e810 — one-liner (Br1an67)
22 Redis TTL expiry #1730 761664d — one-liner
23 Docker health endpoint version #1686 fe1c1cb — one-liner
24 Replace tf-playwright-stealth 624dfe7 — short body
25 Allow local embeddings 2a04fc3 — short body
26 GoogleSearchCrawler script.js packaging ef8f0c6 — short body
27 Default logger init 232f007 — short body
28 Duplicate PROMPT_EXTRACT_BLOCKS ffd3fac — one-liner
29 Re-raise in MemoryAdaptiveDispatcher 6ea0e38 — one-liner
30 FilterChain.add_filter tuple cfa7308 — one-liner
31 chardet.detect in thread executor #1751 4298e26 — one-liner
32 total_score for failed head extraction 094242d — one-liner
33 return in finally block 87f57f1 — one-liner
34 can_process_url normalized URL 43738c9 — one-liner
35 redirected_url raw HTML 418bfcf — one-liner
36 Fix proxy auth persistent contexts 112f44a — one-liner
37 Fix proxy escalation re-raise 45d8e14 — one-liner
38 Fix fallback fetch never return None ccd24aa — one-liner
39 Fix page reuse race condition 9b52c14 — one-liner
40 URL Seeder Common Crawl #1746 694ba44 — one-liner
41 newline before code fence html2text #462 697c2b2 — one-liner
42 from_serializable_dict Docker #1797 71a6526 — actually detailed (feature above)

Summary: What Goes in Release Notes

Headline Features (detailed commit messages = developer wanted these featured):

  1. Anti-bot detection + proxy escalation + fallback system
  2. Shadow DOM flattening + js_code pipeline reorder
  3. Browser recycling + memory-saving mode
  4. Deep crawl cancellation
  5. Source/sibling selector in JSON extraction
  6. Config defaults API (set_defaults/get_defaults/reset_defaults)
  7. Consent popup removal (40+ CMP platforms)
  8. avoid_ads/avoid_css resource filtering
  9. GFM table compliance fix
  10. query_llm_config for adaptive crawler

Mention Briefly (community PRs, small additions):

  • force_viewport_screenshot, device_scale_factor, redirected_status_code
  • wait_for_images, score_threshold, link_preview_timeout
  • Token usage tracking in schema generation

Don't Mention:

  • Cloud CLI module (not ready)
  • CycloneDX SBOM (internal)
  • Stats dashboard (marketing)
  • Parallel URL processing (internal optimization)
  • Schema prompt improvements (internal)

Security (always mention):

  • Critical RCE fix in Docker /crawl (deserialization + eval)
  • XSS prevention in iframe processing (DOMParser)
  • API token required for /token endpoint
  • Stealth improvements (sec-ch-ua sync, WebGL)

Bug Fixes:

  • ~16 worth explaining in detail (root cause described in commit)
  • ~26 just list as "fixed #XXXX" one-liners

Questions for the Main Developer

  1. Cloud CLI (#D) — Is the cloud API ready? Ship, hide, or remove from release?
  2. Shadow DOM (#3) — Is flatten_shadow_dom the final API name?
  3. Anti-bot (#1) — Prominently market or keep low-profile?
  4. Version number — v0.8.1 (patch) or v0.9.0 (minor)? This is a LOT of changes.
  5. Token usage tracking (#13) — Commit c9cb016 has detailed message but code may have been refactored. Verify?
  6. Breaking changescleaned_html now preserves class/id attrs, Docker hooks default to disabled, file:// URLs blocked on Docker API. Document these?
  7. js_code_before_wait — New parameter from Shadow DOM commit. Feature or implementation detail?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment