Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save llimllib/e94f4f298cd3ef08e4afef4eb0351c39 to your computer and use it in GitHub Desktop.

Select an option

Save llimllib/e94f4f298cd3ef08e4afef4eb0351c39 to your computer and use it in GitHub Desktop.
▐▛███▜▌ Claude Code v2.1.7
▝▜█████▛▘ Opus 4.5 · Claude Pro
▘▘ ▝▝ ~/code/mdriver/remove-html-comments
❯ mdriver should remove HTML comments instead of displaying them. In
example.md, the comment should be stripped and the header should display
as an h2
⏺ Let me first look at the example file and understand the current behavior.
⏺ Read(example.md)
⎿  Read 7 lines
⏺ Search(pattern: "comment|html|HTML", path: "src")
⎿  Found 1 file (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Error: File content (25199 tokens) exceeds maximum allowed tokens (25000).
Please use offset and limit parameters to read specific portions of the
file, or use the GrepTool to search for specific content.
⏺ Search(pattern: "Html|HTML|comment", path: "src/lib.rs", output_mode:
"content")
⎿  Found 122 lines (ctrl+o to expand)
⏺ Let me see the parse_html_tag function more completely and understand where HTML comments should be stripped.
⏺ Read(src/lib.rs)
⎿  Read 250 lines
⏺ Let me find where parse_html_tag is called and see the inline formatting function.
⏺ Search(pattern: "parse_html_tag", path: "src/lib.rs", output_mode: "content")
⎿  Found 23 lines (ctrl+o to expand)
⏺ Let me check if there are existing tests for HTML comments.
⏺ Search(pattern: "<!--", path: "tests")
⎿  Found 0 files
⏺ Now I'll add HTML comment handling to the parse_html_tag function. Comments start with <!-- and end with -->.
⏺ Read(src/lib.rs)
⎿  Read 30 lines
⏺ Update(src/lib.rs)
⎿  Added 21 lines
2155 return None;
2156 }
2157
2158 + // Check for HTML comments: <!-- ... -->
2159 + if start + 3 < chars.len()
2160 + && chars[start + 1] == '!'
2161 + && chars[start + 2] == '-'
2162 + && chars[start + 3] == '-'
2163 + {
2164 + // Find the closing -->
2165 + let mut i = start + 4;
2166 + while i + 2 < chars.len() {
2167 + if chars[i] == '-' && chars[i + 1] == '-'
+&& chars[i + 2] == '>' {
2168 + return Some(HtmlTagResult {
2169 + formatted: String::new(),
2170 + end_pos: i + 3,
2171 + });
2172 + }
2173 + i += 1;
2174 + }
2175 + // No closing --> found, don't consume anything
2176 + return None;
2177 + }
2178 +
2179 // Find the closing '>' of the opening tag
2180 let mut tag_end = start + 1;
2181 while tag_end < chars.len() && chars[tag_end] !=
'>' {
⏺ Let me update the doc comment to reflect the new functionality.
⏺ Update(src/lib.rs)
⎿  Added 1 line
2149
2150 /// Parse an HTML tag and return formatted output
2151 /// Handles: em, i, strong, b, u, s, strike, del, code,
a, pre
2152 + /// HTML comments (<!-- ... -->) are stripped entirely
2153 /// Unknown tags are stripped but inner content is
preserved
2154 fn parse_html_tag(&self, chars: &[char], start: usize)
-> Option<HtmlTagResult> {
2155 if chars[start] != '<' {
⏺ Let me also check if HTML comments need to be handled at the block level, since the example shows a comment on its own line.
⏺ Search(pattern: "fn feed|process_line|emit_block", path: "src/lib.rs",
output_mode: "content")
⎿  Found 4 lines (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Read 200 lines
⏺ Let me run the code to test if comments are being stripped.
⏺ Bash(cargo build 2>&1)
⎿ Compiling mdriver v0.8.0 (/Users/llimllib/code/mdriver/remove-html-c
omments)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s
⏺ Bash(cat example.md | cargo run 2>/dev/null | cat -v)
⎿  Error: Exit code 2
error: unexpected argument '-v' found
tip: to pass '-v' as a value, use '-- -v'
Usage: bat [OPTIONS] [FILE]...
bat <COMMAND>
⏺ Bash(/bin/cat example.md | cargo run 2>/dev/null)
⎿  # remove HTML comments
## Installation
… +2 lines (ctrl+o to expand)
⏺ I see the issue. The comment line starts a paragraph, and then ## Installation gets added to that paragraph. I need to detect HTML comment lines in handle_ready_state and skip them entirely.
⏺ Read(src/lib.rs)
⎿  Read 80 lines
⏺ Update(src/lib.rs)
⎿  Added 6 lines
436 return None;
437 }
438
439 + // Check for HTML comment line (<!-- ... -->)
440 + // These should be silently skipped
441 + if self.is_html_comment_line(trimmed) {
442 + return None;
443 + }
444 +
445 // Check for ATX heading (# )
446 if let Some(level) = self.parse_atx_heading(trimmed)
{
447 let text =
trimmed[level..].trim_start().to_string();
⏺ Now I need to add the is_html_comment_line helper function. Let me find a good place to add it.
⏺ Search(pattern: "fn is_horizontal_rule", path: "src/lib.rs", output_mode:
"content")
⎿  Found 31 lines (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Read 10 lines
⏺ Update(src/lib.rs)
⎿  Added 16 lines
816 None
817 }
818
819 + /// Check if a line is entirely an HTML comment (<!--
+... -->)
820 + fn is_html_comment_line(&self, line: &str) -> bool {
821 + let trimmed = line.trim();
822 + if !trimmed.starts_with("<!--") {
823 + return false;
824 + }
825 + if !trimmed.ends_with("-->") {
826 + return false;
827 + }
828 + // Ensure the comment is properly formed (has
+content or is empty)
829 + // and doesn't have an early --> before the final
+one
830 + let inner = &trimmed[4..trimmed.len() - 3];
831 + // Make sure there's no --> in the middle (which
+would mean malformed)
832 + !inner.contains("-->")
833 + }
834 +
835 fn is_horizontal_rule(&self, line: &str) -> bool {
836 // Horizontal rule: 0-3 spaces, then 3+ matching -,
_, or * chars
837 // with optional spaces/tabs between them
⏺ Bash(cargo build 2>&1 && /bin/cat example.md | cargo run 2>/dev/null)
⎿ Compiling mdriver v0.8.0 (/Users/llimllib/code/mdriver/remove-html-c
omments)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.72s
… +5 lines (ctrl+o to expand)
⏺ The comment is now stripped and the heading is properly formatted. Let me run the tests and clippy to ensure everything passes.
⏺ Bash(cargo fmt && cargo clippy --all-targets --all-features -- -D warnings
2>&1)
⎿   Checking stable_deref_trait v1.2.1
Checking cfg-if v1.0.4
Checking memchr v2.7.6
… +138 lines (ctrl+o to expand)
⏺ Bash(cargo test 2>&1)
⎿   Compiling serde v1.0.228
Compiling winnow v0.7.14
Compiling toml_write v0.1.2
… +153 lines (ctrl+o to expand)
⏺ All tests pass. Let me verify a few edge cases work correctly.
⏺ Bash(echo 'Text with <!-- inline comment --> in middle' | cargo run
2>/dev/null)
⎿  Text with <\!-- inline comment --> in middle
⏺ Bash(echo 'Text with <!-- inline comment --> in middle' | cargo run
2>/dev/null | od -c | head -20)
⎿  0000000 T e x t w i t h < \ ! - -
0000020 i n l i n e c o m m e n t
… +4 lines (ctrl+o to expand)
⏺ I see there's a backslash being inserted before the !. Let me check where that escaping happens.
⏺ Search(pattern: "\\\\!", path: "src/lib.rs", output_mode: "content")
⎿  Found 0 lines
⏺ Search(pattern: "<!|escape", path: "src/lib.rs", output_mode: "content")
⎿  Found 20 lines (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Read 100 lines
⏺ Search(pattern: "push.*\\\\", path: "src/lib.rs", output_mode: "content")
⎿  Found 3 lines (ctrl+o to expand)
⏺ Search(pattern: "Check for <html", path: "src/lib.rs", output_mode: "content")
⎿  Found 31 lines (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Read 50 lines
⏺ Read(src/lib.rs)
⎿  Read 40 lines
⏺ Let me trace through more carefully by testing a simpler case.
⏺ Bash(echo '<!-- comment -->' | cargo run 2>/dev/null | od -c)
⎿  0000000 < \ ! - - c o m m e n t -
-
0000020 > \n \n
… +2 lines (ctrl+o to expand)
⏺ Bash(cargo test --lib -- --test-threads=1 2>&1 | head -40)
⎿ Finished `test` profile [unoptimized + debuginfo] target(s) in 0.06
s
Running unittests src/lib.rs (target/debug/deps/mdriver-88c5f32734
… +6 lines (ctrl+o to expand)
⏺ Let me debug by writing a quick test.
⏺ Bash(cat > /tmp/test_comment.rs << 'EOF'
use mdriver::StreamingParser;…)
⎿  error: no example target named `test_comment` in default-run packages
Need to add as example
⏺ Bash(cargo build 2>/dev/null && echo -e '<!-- comment -->\n' |
target/debug/mdriver | xxd | head)
⎿  00000000: 3c5c 212d 2d20 636f 6d6d 656e 7420 2d2d <\!-- comment --
00000010: 3e0a 0a >..
⏺ Bash(printf '<!-- comment -->\n\n' | target/debug/mdriver | xxd | head)
⎿  00000000: 3c5c 212d 2d20 636f 6d6d 656e 7420 2d2d <\!-- comment --
00000010: 3e0a 0a >..
⏺ Search(pattern: "'\\\\!'|r#"\\!"#|"\\\\!"", path: "src/lib.rs", output_mode:
"content")
⎿  Found 0 lines
⏺ Search(pattern: "push_str.*\\\\|push.*'\\\\'", path: "src/lib.rs",
output_mode: "content")
⎿  Found 3 lines (ctrl+o to expand)
⏺ Search(pattern: "chars\[i\] == '!'", path: "src/lib.rs", output_mode:
"content")
⎿  Found 43 lines (ctrl+o to expand)
⏺ Read(src/lib.rs)
⎿  Read 20 lines
⏺ Read(src/lib.rs)
⎿  Read 30 lines
⏺ Read(src/lib.rs)
⎿  Read 40 lines
⏺ Read(src/lib.rs)
⎿  Read 15 lines
⏺ Bash(echo '# hello' > /tmp/test.md && echo '' >> /tmp/test.md && echo '<!--
comment -->' >> /tmp/test.md && echo '' >> /tmp/test.md && echo '## world'
>> /tmp/test.m…)
⎿  00000000: 1b5b 313b 3334 6d23 2068 656c 6c6f 1b5b .[1;34m# hello.[
00000010: 306d 0a0a 3c5c 212d 2d20 636f 6d6d 656e 0m..<\!-- commen
00000020: 7420 2d2d 3e0a 0a1b 5b31 3b33 346d 2323 t -->...[1;34m##
00000030: 2077 6f72 6c64 1b5b 306d 0a0a world.[0m..
⏺ Bash(set +H && printf '<!-- comment -->\n\n' | target/debug/mdriver | xxd)
⎿  00000000: 3c5c 212d 2d20 636f 6d6d 656e 7420 2d2d <\!-- comment --
00000010: 3e0a 0a >..
⏺ Search(pattern: "fn emit_current_block|fn format_paragraph", path:
"src/lib.rs", output_mode: "content")
⎿  Found 159 lines (ctrl+o to expand)
⏺ Let me write a quick unit test to verify the is_html_comment_line function.
⏺ Search(pattern: "mod tests|#\[cfg\(test\)\]", path: "src/lib.rs", output_mode:
"content")
⎿  Found 0 lines
⏺ Read(tests/unit.rs)
⎿  Read 50 lines
⏺ Search(pattern: "^mod ", path: "tests/unit.rs", output_mode: "content")
⎿  Found 8 lines (ctrl+o to expand)
⏺ Read(tests/unit.rs)
⎿  Read 10 lines
⏺ Update(tests/unit.rs)
⎿  Added 29 lines
221 }
222 }
223
224 + mod html_comments {
225 + use super::*;
226 +
227 + #[test]
228 + fn test_comment_line_stripped() {
229 + let mut p = parser();
230 + let result = p.feed("<!-- comment -->\n\n");
231 + assert_eq!(strip_ansi(&result), "", "Comment line
+should be stripped");
232 + }
233 +
234 + #[test]
235 + fn test_inline_comment_stripped() {
236 + let mut p = parser();
237 + let result = p.feed("Text <!-- comment -->
+more\n\n");
238 + assert_eq!(strip_ansi(&result), "Text more\n\n",
+"Inline comment should be stripped");
239 + }
240 +
241 + #[test]
242 + fn test_comment_between_blocks() {
243 + let mut p = parser();
244 + let r1 = p.feed("# Hello\n\n");
245 + let r2 = p.feed("<!-- comment -->\n\n");
246 + let r3 = p.feed("## World\n\n");
247 + assert!(strip_ansi(&r1).contains("# Hello"));
248 + assert_eq!(strip_ansi(&r2), "", "Comment should be
+stripped");
249 + assert!(strip_ansi(&r3).contains("## World"));
250 + }
251 + }
252 +
253 mod extract_href {
254 use super::*;
255
⏺ Bash(cargo test html_comments 2>&1)
⎿  Error: Exit code 101
Compiling mdriver v0.8.0
(/Users/llimllib/code/mdriver/remove-html-comments)
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.40s
Running unittests src/lib.rs
(target/debug/deps/mdriver-88c5f32734416994)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out;
finished in 0.00s
Running unittests src/main.rs
(target/debug/deps/mdriver-3a7c079353b478b1)
… +35 lines (ctrl+o to see all)
⏺ Interesting! The tests show that comment lines ARE being stripped correctly. The only issue is the whitespace expectation in the inline comment test. Let me fix that test and verify.
⏺ Update(tests/unit.rs)
⎿  Added 2 lines, removed 1 line
235 fn test_inline_comment_stripped() {
236 let mut p = parser();
237 let result = p.feed("Text <!-- comment -->
more\n\n");
238 - assert_eq!(strip_ansi(&result), "Text more\n\n",
-"Inline comment should be stripped");
238 + // When the comment is stripped, the surrounding
+spaces collapse to one
239 + assert_eq!(strip_ansi(&result), "Text more\n\n",
+"Inline comment should be stripped");
240 }
241
242 #[test]
⏺ Bash(cargo test html_comments 2>&1 && cargo test 2>&1 | tail -30)
⎿ Compiling mdriver v0.8.0 (/Users/llimllib/code/mdriver/remove-html-c
omments)
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.40
… +66 lines (ctrl+o to expand)
⏺ All tests pass. Let me verify the example.md works correctly again.
⏺ Bash(/bin/cat example.md | cargo run 2>/dev/null)
⎿  # remove HTML comments
## Installation
… +2 lines (ctrl+o to expand)
⏺ The output is correct - the comment is stripped and the heading displays as an H2. Let me run the final checks.
⏺ Bash(cargo fmt && cargo clippy --all-targets --all-features -- -D warnings
2>&1 | tail -20)
⎿ Checking mdriver v0.8.0 (/Users/llimllib/code/mdriver/remove-html-c
omments)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
⏺ Bash(cargo test 2>&1 | tail -10)
⎿ test wrap_text_tests::test_wrap_with_indent_and_hyperlink ... ok
test result: ok. 101 passed; 0 failed; 0 ignored; 0 measured; 0 filtere
… +8 lines (ctrl+o to expand)
⏺ Done. The changes I made:
1. Added block-level HTML comment detection in handle_ready_state (src/lib.rs:439-443) - Lines that are entirely HTML comments (<!-- ... -->) are now skipped and don't start paragraphs.
2. Added is_html_comment_line helper (src/lib.rs:819-833) - Checks if a trimmed line starts with <!-- and ends with -->.
3. Added inline HTML comment handling in parse_html_tag (src/lib.rs:2181-2200) - When parsing HTML tags, comments are detected and return an empty string, effectively stripping them from inline content.
4. Added unit tests (tests/unit.rs:224-251) - Three tests verify:
- Comment-only lines are stripped
- Inline comments within text are stripped
- Comments between blocks don't interfere with heading parsing
The example.md now correctly shows ## Installation as an H2 heading (blue/bold) with the HTML comment completely removed.
✻ Cogitated for 7m 14s
❯ commit what you have, but the conformance tests don't pass. After you
commit it, fix them
⏺ Bash(git status)
⎿  On branch remove-html-comments
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
… +5 lines (ctrl+o to expand)
⏺ Bash(git log --oneline -5)
⎿ 97e51c0 feat: add citation-style reference link support (#13)
ac901f4 feat: fix image-inside-link bug and add parallel image prefetch
ing (#12)
… +5 lines (ctrl+o to expand)
⏺ Bash(git diff --stat)
⎿   src/lib.rs | 44 ++++++++++++++++++++++++++++++++++++++++++++
tests/unit.rs | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)
⏺ Bash(git add src/lib.rs tests/unit.rs && git commit -m "feat: strip HTML
comments from markdown output…)
⎿  [remove-html-comments 3e76547] feat: strip HTML comments from markdown
output
2 files changed, 78 insertions(+)
⏺ Now let me check what conformance tests are failing.
⏺ Bash(cargo test test_conformance 2>&1 || cargo test conformance 2>&1)
⎿ Finished `test` profile [unoptimized + debuginfo] target(s) in 0.07
s
Running unittests src/lib.rs (target/debug/deps/mdriver-88c5f32734
… +29 lines (ctrl+o to expand)
⏺ Bash(cargo test 2>&1)
⎿ Finished `test` profile [unoptimized + debuginfo] target(s) in 0.07
s
Running unittests src/lib.rs (target/debug/deps/mdriver-88c5f32734
… +143 lines (ctrl+o to expand)
⏺ All tests are passing - both the 5 conformance tests and the 101 unit tests. Let me verify with clippy as well.
⏺ Bash(cargo clippy --all-targets --all-features -- -D warnings 2>&1 | tail -5)
⎿ Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.10s
⏺ Bash(/bin/cat example.md | cargo run 2>/dev/null)
⎿  # remove HTML comments
## Installation
… +4 lines (ctrl+o to expand)
⏺ All tests are passing. The commit has been made:
3e76547 feat: strip HTML comments from markdown output
Summary:
- 5 conformance tests: all pass
- 101 unit tests: all pass (including 3 new HTML comment tests)
- Clippy: passes with no warnings
- example.md: renders correctly with the comment stripped and ## Installation showing as an H2
✻ Cooked for 50s
❯ When I run the conformance tests with "cargo run", 4 of them fail. I'm
using this version of cargo:
```
$ which cargo
/Users/llimllib/.cargo/bin/cargo
$ cargo --version
cargo 1.92.0 (344c4567c 2025-10-21)
```
Is it a version thing? What's up here
⏺ Let me check my cargo version and try to understand the discrepancy.
⏺ Bash(which cargo && cargo --version)
⎿  /Users/llimllib/.cargo/bin/cargo
cargo 1.92.0 (344c4567c 2025-10-21)
⏺ Same version. Let me do a clean build and run tests again.
⏺ Bash(cargo clean && cargo test 2>&1)
⎿   Compiling proc-macro2 v1.0.105
Compiling quote v1.0.43
Compiling unicode-ident v1.0.22
… +311 lines (ctrl+o to expand)
⏺ All tests pass for me even after a clean build. Let me check the fixture files to see if any might contain HTML comments.
⏺ Bash(grep -r "<!--" tests/fixtures/ 2>/dev/null || echo "No HTML comments
found in fixtures")
⎿  No HTML comments found in fixtures
⏺ Bash(ls -la tests/fixtures/)
⎿  total 0
drwxr-xr-x 7 llimllib 224 Jan 14 15:45 ./
drwxr-xr-x 6 llimllib 192 Jan 14 16:02 ../
… +5 lines (ctrl+o to expand)
⏺ Bash(ls -la *.sh 2>/dev/null; ls -la Makefile 2>/dev/null; ls -la justfile
2>/dev/null)
⎿  Error: Exit code 2
(eval):1: no matches found: *.sh
-rw-r--r-- 1 llimllib 107 Jan 14 15:45 Makefile
⏺ Read(Makefile)
⎿  Read 10 lines
⏺ I'm confused because cargo run runs the binary, not tests. Did you mean cargo test? Can you show me the exact command and error output you're seeing?
⏺ Bash(cargo test 2>&1 | grep -E "(FAILED|PASSED|test.*\.\.\.)")
⎿  test test_complex_fixtures ... ok
test test_streaming_fixtures ... ok
test test_wrapping_fixtures ... ok
… +104 lines (ctrl+o to expand)
❯ it's actually failing on me as well, so it probably isn't something in
this PR
⏺ Got it - so the test failures are a pre-existing issue unrelated to this PR.
The HTML comment stripping feature is complete and committed:
3e76547 feat: strip HTML comments from markdown output
The changes:
1. Block-level comments (<!-- ... --> on their own line) are skipped entirely
2. Inline comments (within text) are stripped from the output
3. Unit tests verify both behaviors
The example.md now renders correctly with the comment removed and ## Installation displaying as an H2.
⏺ Got it, sounds like a pre-existing issue then. The HTML comment stripping feature is complete and committed:
3e76547 feat: strip HTML comments from markdown output
The example.md now works correctly - the comment is stripped and ## Installation displays as an H2.
✻ Cooked for 1m 24s
❯ in main I mean
⏺ Ah, understood - the conformance test failures exist in main already, so they're not related to this PR at all.
The HTML comment stripping feature is complete and ready. The commit on this branch:
3e76547 feat: strip HTML comments from markdown output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment