Skip to content

Instantly share code, notes, and snippets.

@wolfgang42
Created April 15, 2025 19:49
Show Gist options
  • Save wolfgang42/1fb054567d41676a75ea5e135418ef4c to your computer and use it in GitHub Desktop.
Save wolfgang42/1fb054567d41676a75ea5e135418ef4c to your computer and use it in GitHub Desktop.
aider chat started at 2025-04-10 15:45:07

aider chat started at 2025-04-10 15:45:07

You can skip this check with --no-gitignore
Add .aider* to .gitignore (recommended)? (Y)es/(N)o [Yes]: n
/home/wolf/src/vend/aider.venv/bin/aider src/languages/bash.js test/markup/bash/not-comments.txt test/markup/bash/not-comments.expect.txt
Using sonnet model with API key from environment.
Aider v0.81.2
Main model: anthropic/claude-3-7-sonnet-20250219 with diff edit format, infinite output
Weak model: anthropic/claude-3-5-haiku-20241022
Git repo: .git with 1,881 files
Warning: For large repos, consider using --subtree-only and .aiderignore
See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo
Repo-map: using 4096 tokens, auto refresh
Added src/languages/bash.js to the chat.
Added test/markup/bash/not-comments.expect.txt to the chat.
Added test/markup/bash/not-comments.txt to the chat.

Scraping highlightjs/highlight.js#4234...
For the best web scraping, install Playwright:

/home/wolf/src/vend/aider.venv/bin/python3 -m pip install --upgrade --upgrade-strategy only-if-needed aider-chat[playwright] /home/wolf/src/vend/aider.venv/bin/python3 -m playwright install --with-deps chromium

See https://aider.chat/docs/install/optional.html#enable-playwright for more info.

Install playwright? (Y)es/(N)o [Yes]: n
Unable to initialize playwright.
... added to chat.

/run git show d78749a

Add 1.6k tokens of command output to the chat? (Y)es/(N)o [Yes]: y
Added 100 lines of output to the chat.

/ask thoughts on how to investigate and fix this bug?

thoughts on how to investigate and fix this bug?

Investigating the Bash Comment Highlighting Bug

Looking at the issue and the code, I can see what's happening. Let me break down the problem and propose a solution:

The Problem

  1. Before commit d78749a, bash comments were handled using hljs.HASH_COMMENT_MODE, which is a standard comment mode that includes highlighting for TODOs, FIXMEs, etc. in comments.

  2. In commit d78749a, this was replaced with a custom COMMENT implementation to fix a bug where # characters within tokens were incorrectly being detected as comments. The new implementation uses a regex pattern that requires whitespace or start-of-line before the # character.

  3. However, the custom COMMENT implementation doesn't include the doctag highlighting that was previously provided by hljs.HASH_COMMENT_MODE.

Investigation Steps

Let's look at how hljs.COMMENT() is defined in the codebase. In src/lib/modes.js, the COMMENT function creates a comment mode with specific properties:

export const COMMENT = function(begin, end, modeOptions = {}) {
  const mode = inherit(
    {
      scope: 'comment',
      begin,
      end,
      contains: []
    },
    modeOptions
  );
  // ...
}

The standard HASH_COMMENT_MODE likely includes doctags in its contains array, but our custom implementation doesn't inherit this behavior because we're overriding the match and scope properties.

Solution

We need to modify the custom COMMENT implementation to include doctag highlighting while maintaining the fix for the # character detection. Here's how we can fix it:

  1. Look at how doctags are implemented in the standard comment mode
  2. Add similar doctag detection to our custom comment implementation

Here's my proposed fix for src/languages/bash.js:

const COMMENT = hljs.inherit(
  hljs.COMMENT(),
  {
    match: [
      /(^|\s)/,
      /#.*$/
    ],
    scope: {
      2: 'comment'
    },
    contains: [
      {
        scope: 'doctag',
        match: /@(?:param|returns|throws|exception|see|since|author|version|deprecated|todo|fixme|note|warning|xxx)\b/i
      },
      {
        scope: 'doctag',
        match: /\b(?:TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/
      }
    ]
  }
);

This solution:

  1. Maintains the fix for # character detection (requiring whitespace or start-of-line)
  2. Adds doctag highlighting for common patterns like TODO:, FIXME:, etc.
  3. Also includes standard JSDoc-style tags like @todo, @deprecated, etc.

To test this solution, we should:

  1. Update the test case as suggested in the issue
  2. Verify that # TODO: this is a comment correctly highlights the TODO: part
  3. Ensure the original fix still works (i.e., # within tokens isn't detected as a comment)

Tokens: 10k sent, 675 received. Cost: $0.04 message, $0.04 session.
src/lib/modes.js
Add file to the chat? (Y)es/(N)o/(D)on't ask again [Yes]: y

Thanks for adding src/lib/modes.js. Now I can see exactly how the standard comment mode is implemented.

Looking at the code, I can see that the COMMENT function in modes.js already adds doctag detection to comment modes:

export const COMMENT = function(begin, end, modeOptions = {}) {
  const mode = inherit(
    {
      scope: 'comment',
      begin,
      end,
      contains: []
    },
    modeOptions
  );
  mode.contains.push({
    scope: 'doctag',
    // hack to avoid the space from being included. the space is necessary to
    // match here to prevent the plain text rule below from gobbling up doctags
    begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
    end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
    excludeBegin: true,
    relevance: 0
  });
  // ... more code for detecting plain text in comments
  return mode;
};

The issue is that in the bash language definition, we're inheriting from hljs.COMMENT() but then overriding the structure with our custom match and scope properties, which is causing the doctag detection to be lost.

Refined Solution

Here's a better solution based on the actual implementation:

const COMMENT = hljs.inherit(
  hljs.COMMENT(),
  {
    match: [
      /(^|\s)/,
      /#.*$/
    ],
    scope: {
      2: 'comment'
    }
  }
);

The problem is that when we use scope: { 2: 'comment' }, we're changing how the comment is structured in the output. Instead of having a single node with scope 'comment' that contains doctags, we're creating a structure where only part of the match (the second capture group) gets the 'comment' scope.

Let's try a different approach:

const COMMENT = {
  className: 'comment',
  begin: /(^|\s)#/,
  end: /$/,
  contains: [
    {
      scope: 'doctag',
      begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
      end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
      excludeBegin: true,
      relevance: 0
    }
  ],
  // This ensures the comment is only matched if it starts with whitespace or at beginning of line
  beginScope: {
    1: ''  // This makes the first capture group (whitespace or start of line) not get any special styling
  }
};

But actually, looking at the code more carefully, I think there's an even simpler solution. Let's modify the bash language definition to use the standard HASH_COMMENT_MODE but with a custom begin pattern:

const COMMENT = hljs.inherit(
  hljs.HASH_COMMENT_MODE,
  {
    begin: /(^|\s)#/  // Only match # if it's at the start of a line or after whitespace
  }
);

This should maintain all the doctag detection from the standard comment mode while fixing the issue with # characters within tokens.

Let's test this solution with the test case from the issue:

echo asdf#qwert yuiop

echo asdf #qwert yuiop

# TODO: this *is* a comment

The expected output should be:

  • First line: No comment highlighting
  • Second line: #qwert yuiop highlighted as a comment
  • Third line: TODO: highlighted as a doctag within the comment

Tokens: 13k sent, 807 received. Cost: $0.05 message, $0.09 session.

/ask Won't that fix mean that the whitespace also gets marked as "comment"?

Won't that fix mean that the whitespace also gets marked as "comment"?

You're absolutely right, and that's a good catch. The whitespace before the # shouldn't be part of the comment.

Let's refine the solution. We need to:

  1. Only match # if it's at the start of a line or after whitespace
  2. But don't include that whitespace in the comment scope
  3. Preserve the doctag detection

Here's a better approach:

const COMMENT = {
  className: 'comment',
  begin: /(?:^|\s)#/,  // Look-behind assertion: # after start of line or whitespace
  end: /$/,
  contains: [
    {
      scope: 'doctag',
      begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
      end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
      excludeBegin: true,
      relevance: 0
    }
    // We could copy the other contains from HASH_COMMENT_MODE if needed
  ],
  excludeBegin: true  // This excludes the whitespace from being part of the comment
};

But wait, that won't work because JavaScript regex doesn't support lookbehind in all browsers.

Let's try a different approach using the match property with capture groups:

const COMMENT = {
  scope: 'comment',
  match: [
    /(^|\s)/,  // First capture group: start of line or whitespace
    /#.*/      // Second capture group: # and everything after it
  ],
  captures: {
    // Only apply the 'comment' class to the second capture group
    1: { pattern: /.*/, scope: null },  // No scope for whitespace
    2: { pattern: /.*/, scope: 'comment' }  // Comment scope for the actual comment
  },
  contains: [
    {
      scope: 'doctag',
      begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
      end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
      excludeBegin: true,
      relevance: 0
    }
  ]
};

Actually, I think I'm overcomplicating this. Let's look at the current implementation again:

const COMMENT = hljs.inherit(
  hljs.COMMENT(),
  {
    match: [
      /(^|\s)/,
      /#.*$/
    ],
    scope: {
      2: 'comment'
    }
  }
);

The issue isn't that the whitespace is being marked as a comment - it's not. The scope: { 2: 'comment' } is correctly applying the 'comment' scope only to the second capture group.

The issue is that the doctag detection is being lost. Let's try a simpler approach:

const COMMENT = hljs.inherit(
  hljs.HASH_COMMENT_MODE,  // This has the doctag detection
  {
    begin: /(?:^|\s)#/,    // Only match # after start of line or whitespace
    excludeBegin: true     // Don't include the whitespace in the comment
  }
);

But this still has the lookbehind issue. Let's try one more approach:

// Create a custom comment mode that includes doctag detection
const COMMENT = {
  className: 'comment',
  begin: /(^|\s)(#.*)/,  // Capture both the whitespace and the comment
  beginScope: {
    2: 'comment'  // Only apply 'comment' scope to the actual comment part
  },
  contains: [
    {
      scope: 'doctag',
      begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
      end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
      excludeBegin: true,
      relevance: 0
    }
  ]
};

Actually, I think I'm still overcomplicating this. Let's look at the current implementation one more time and understand what's happening:

const COMMENT = hljs.inherit(
  hljs.COMMENT(),
  {
    match: [
      /(^|\s)/,
      /#.*$/
    ],
    scope: {
      2: 'comment'
    }
  }
);

The issue is that we're using match and scope instead of begin/end. When we do this, we're completely replacing the structure that would normally include the doctag detection.

Let's try a different approach that preserves the doctag detection:

// Start with the standard COMMENT function
const COMMENT = hljs.COMMENT(
  /(^|\s)#/,  // begin: match # only after start of line or whitespace
  /$/,        // end: end of line
  {
    excludeBegin: true  // Don't include the whitespace in the comment
  }
);

This should:

  1. Only match # if it's at the start of a line or after whitespace
  2. Not include that whitespace in the comment scope
  3. Preserve the doctag detection from the standard COMMENT function

Tokens: 13k sent, 1.1k received. Cost: $0.06 message, $0.15 session.

/ask If we excludeBegin /...#/, that seems like it has the opposite problem---now the # character is being left out of the highlighting

If we excludeBegin /...#/, that seems like it has the opposite problem---now the # character is being left out of the highlighting

You're absolutely right. Let's think about this more carefully.

The challenge is:

  1. We need to match # only if it's at the start of a line or after whitespace
  2. We need to include the # in the comment highlighting
  3. We need to exclude the whitespace before the # from the highlighting
  4. We need to preserve doctag detection

Let's try a different approach using a more precise regex pattern:

const COMMENT = hljs.COMMENT(
  /(?:^|\s)(#)/,  // begin: match # only after start of line or whitespace, and capture just the #
  /$/,            // end: end of line
  {
    beginScope: {
      1: 'punctuation'  // Give the # character a specific scope if desired
    }
  }
);

But this still has the lookbehind issue in JavaScript regex.

Let's try a different approach using the on:begin callback:

const COMMENT = hljs.COMMENT(
  /#/,  // begin: match any #
  /$/,  // end: end of line
  {
    /** @type {ModeCallback} */
    "on:begin": (match, response) => {
      // Check if # is at the start of a line or after whitespace
      const beforeHash = match.input.substring(0, match.index);
      if (beforeHash !== '' && !/\s$/.test(beforeHash)) {
        response.ignoreMatch();
      }
    }
  }
);

This should:

  1. Match any # character
  2. But then use the on:begin callback to check if it's at the start of a line or after whitespace
  3. If not, ignore the match
  4. Preserve doctag detection from the standard COMMENT function

Let's see if this approach works better.

Tokens: 15k sent, 398 received. Cost: $0.05 message, $0.20 session.

/ask We've made the decision to allow using lookbehind; all new browsers support it now.

We've made the decision to allow using lookbehind; all new browsers support it now.

Great! That simplifies our solution significantly. With lookbehind support, we can create a clean solution:

const COMMENT = hljs.COMMENT(
  /(?<=^|\s)#/,  // begin: match # only if preceded by start of line or whitespace
  /$/,           // end: end of line
);

This should:

  1. Only match # if it's at the start of a line or after whitespace (using lookbehind)
  2. Include the # in the comment highlighting
  3. Exclude the whitespace before the # from the highlighting
  4. Preserve doctag detection from the standard COMMENT function

This is a much cleaner solution that addresses all our requirements. Let's test it with the example:

echo asdf#qwert yuiop

echo asdf #qwert yuiop

# TODO: this *is* a comment

With this solution:

  • First line: No comment highlighting (# is part of a token)
  • Second line: Only #qwert yuiop is highlighted as a comment (whitespace before # is not)
  • Third line: # TODO: this *is* a comment is highlighted as a comment, with TODO: highlighted as a doctag

Tokens: 15k sent, 268 received. Cost: $0.05 message, $0.25 session.

/code go ahead

go ahead

I'll implement the fix for the bash comment highlighting issue using lookbehind. Here's the change:

src/languages/bash.js

<<<<<<< SEARCH
  const COMMENT = hljs.inherit(
    hljs.COMMENT(),
    {
      match: [
        /(^|\s)/,
        /#.*$/
      ],
      scope: {
        2: 'comment'
      }
    }
  );
=======
  const COMMENT = hljs.COMMENT(
    /(?<=^|\s)#/,  // begin: match # only if preceded by start of line or whitespace
    /$/            // end: end of line
  );
=======
>>>>>>> REPLACE

Now let's update the test case to include a TODO comment:

test/markup/bash/not-comments.txt

<<<<<<< SEARCH
echo asdf#qwert yuiop

echo asdf #qwert yuiop
=======
echo asdf#qwert yuiop

echo asdf #qwert yuiop

# TODO: this *is* a comment
>>>>>>> REPLACE

And update the expected output:

test/markup/bash/not-comments.expect.txt

<<<<<<< SEARCH
<span class="hljs-built_in">echo</span> asdf#qwert yuiop

<span class="hljs-built_in">echo</span> asdf <span class="hljs-comment">#qwert yuiop</span>
=======
<span class="hljs-built_in">echo</span> asdf#qwert yuiop

<span class="hljs-built_in">echo</span> asdf <span class="hljs-comment">#qwert yuiop</span>

<span class="hljs-comment"># <span class="hljs-doctag">TODO:</span> this *is* a comment</span>
>>>>>>> REPLACE

You can run the tests to verify the fix works correctly:

npm test

Or specifically test just the bash language:

npm test -- -g "markup bash"

Tokens: 17k sent, 423 received. Cost: $0.06 message, $0.31 session.
Applied edit to src/languages/bash.js
Applied edit to test/markup/bash/not-comments.expect.txt
Applied edit to test/markup/bash/not-comments.txt
Commit c524f41 fix: use lookbehind to improve bash comment highlighting

This commit resolves an issue with bash comment highlighting where:

  • # within tokens was incorrectly detected as a comment
  • TODO and other doctags were not being highlighted in comments

The solution uses a lookbehind regex to ensure comments are only detected when # is preceded by whitespace or the start of a line, while preserving the existing doctag detection mechanism.

Fixes #4234

npm test
Run shell command? (Y)es/(N)o/(S)kip all/(D)on't ask again [Yes]: y
Running npm test
Add command output to the chat? (Y)es/(N)o/(D)on't ask again [Yes]: n
npm test -- -g "markup bash"
Run shell command? (Y)es/(N)o/(S)kip all/(D)on't ask again [Yes]: n

/test npm run build && npm run test | tail -n 30

/exit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment