You can skip this check with --no-gitignore
Add .aider* to .gitignore (recommended)? (Y)es/(N)o [Yes]: n
/home/wolf/src/vend/aider.venv/bin/aider src/languages/bash.js test/markup/bash/not-comments.txt test/markup/bash/not-comments.expect.txt
Using sonnet model with API key from environment.
Aider v0.81.2
Main model: anthropic/claude-3-7-sonnet-20250219 with diff edit format, infinite output
Weak model: anthropic/claude-3-5-haiku-20241022
Git repo: .git with 1,881 files
Warning: For large repos, consider using --subtree-only and .aiderignore
See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo
Repo-map: using 4096 tokens, auto refresh
Added src/languages/bash.js to the chat.
Added test/markup/bash/not-comments.expect.txt to the chat.
Added test/markup/bash/not-comments.txt to the chat.
Scraping highlightjs/highlight.js#4234...
For the best web scraping, install Playwright:
/home/wolf/src/vend/aider.venv/bin/python3 -m pip install --upgrade --upgrade-strategy only-if-needed aider-chat[playwright] /home/wolf/src/vend/aider.venv/bin/python3 -m playwright install --with-deps chromium
See https://aider.chat/docs/install/optional.html#enable-playwright for more info.
Install playwright? (Y)es/(N)o [Yes]: n
Unable to initialize playwright.
... added to chat.
Add 1.6k tokens of command output to the chat? (Y)es/(N)o [Yes]: y
Added 100 lines of output to the chat.
Looking at the issue and the code, I can see what's happening. Let me break down the problem and propose a solution:
-
Before commit
d78749a
, bash comments were handled usinghljs.HASH_COMMENT_MODE
, which is a standard comment mode that includes highlighting for TODOs, FIXMEs, etc. in comments. -
In commit
d78749a
, this was replaced with a customCOMMENT
implementation to fix a bug where#
characters within tokens were incorrectly being detected as comments. The new implementation uses a regex pattern that requires whitespace or start-of-line before the#
character. -
However, the custom
COMMENT
implementation doesn't include the doctag highlighting that was previously provided byhljs.HASH_COMMENT_MODE
.
Let's look at how hljs.COMMENT()
is defined in the codebase. In src/lib/modes.js
, the COMMENT
function creates a comment mode with specific properties:
export const COMMENT = function(begin, end, modeOptions = {}) {
const mode = inherit(
{
scope: 'comment',
begin,
end,
contains: []
},
modeOptions
);
// ...
}
The standard HASH_COMMENT_MODE
likely includes doctags in its contains
array, but our custom implementation doesn't inherit this behavior because we're overriding the match
and scope
properties.
We need to modify the custom COMMENT
implementation to include doctag highlighting while maintaining the fix for the #
character detection. Here's how we can fix it:
- Look at how doctags are implemented in the standard comment mode
- Add similar doctag detection to our custom comment implementation
Here's my proposed fix for src/languages/bash.js
:
const COMMENT = hljs.inherit(
hljs.COMMENT(),
{
match: [
/(^|\s)/,
/#.*$/
],
scope: {
2: 'comment'
},
contains: [
{
scope: 'doctag',
match: /@(?:param|returns|throws|exception|see|since|author|version|deprecated|todo|fixme|note|warning|xxx)\b/i
},
{
scope: 'doctag',
match: /\b(?:TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/
}
]
}
);
This solution:
- Maintains the fix for
#
character detection (requiring whitespace or start-of-line) - Adds doctag highlighting for common patterns like
TODO:
,FIXME:
, etc. - Also includes standard JSDoc-style tags like
@todo
,@deprecated
, etc.
To test this solution, we should:
- Update the test case as suggested in the issue
- Verify that
# TODO: this is a comment
correctly highlights theTODO:
part - Ensure the original fix still works (i.e.,
#
within tokens isn't detected as a comment)
Tokens: 10k sent, 675 received. Cost: $0.04 message, $0.04 session.
src/lib/modes.js
Add file to the chat? (Y)es/(N)o/(D)on't ask again [Yes]: y
Thanks for adding src/lib/modes.js
. Now I can see exactly how the standard comment mode is implemented.
Looking at the code, I can see that the COMMENT
function in modes.js
already adds doctag detection to comment modes:
export const COMMENT = function(begin, end, modeOptions = {}) {
const mode = inherit(
{
scope: 'comment',
begin,
end,
contains: []
},
modeOptions
);
mode.contains.push({
scope: 'doctag',
// hack to avoid the space from being included. the space is necessary to
// match here to prevent the plain text rule below from gobbling up doctags
begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
excludeBegin: true,
relevance: 0
});
// ... more code for detecting plain text in comments
return mode;
};
The issue is that in the bash language definition, we're inheriting from hljs.COMMENT()
but then overriding the structure with our custom match
and scope
properties, which is causing the doctag detection to be lost.
Here's a better solution based on the actual implementation:
const COMMENT = hljs.inherit(
hljs.COMMENT(),
{
match: [
/(^|\s)/,
/#.*$/
],
scope: {
2: 'comment'
}
}
);
The problem is that when we use scope: { 2: 'comment' }
, we're changing how the comment is structured in the output. Instead of having a single node with scope 'comment' that contains doctags, we're creating a structure where only part of the match (the second capture group) gets the 'comment' scope.
Let's try a different approach:
const COMMENT = {
className: 'comment',
begin: /(^|\s)#/,
end: /$/,
contains: [
{
scope: 'doctag',
begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
excludeBegin: true,
relevance: 0
}
],
// This ensures the comment is only matched if it starts with whitespace or at beginning of line
beginScope: {
1: '' // This makes the first capture group (whitespace or start of line) not get any special styling
}
};
But actually, looking at the code more carefully, I think there's an even simpler solution. Let's modify the bash language definition to use the standard HASH_COMMENT_MODE
but with a custom begin
pattern:
const COMMENT = hljs.inherit(
hljs.HASH_COMMENT_MODE,
{
begin: /(^|\s)#/ // Only match # if it's at the start of a line or after whitespace
}
);
This should maintain all the doctag detection from the standard comment mode while fixing the issue with #
characters within tokens.
Let's test this solution with the test case from the issue:
echo asdf#qwert yuiop
echo asdf #qwert yuiop
# TODO: this *is* a comment
The expected output should be:
- First line: No comment highlighting
- Second line:
#qwert yuiop
highlighted as a comment - Third line:
TODO:
highlighted as a doctag within the comment
Tokens: 13k sent, 807 received. Cost: $0.05 message, $0.09 session.
You're absolutely right, and that's a good catch. The whitespace before the #
shouldn't be part of the comment.
Let's refine the solution. We need to:
- Only match
#
if it's at the start of a line or after whitespace - But don't include that whitespace in the comment scope
- Preserve the doctag detection
Here's a better approach:
const COMMENT = {
className: 'comment',
begin: /(?:^|\s)#/, // Look-behind assertion: # after start of line or whitespace
end: /$/,
contains: [
{
scope: 'doctag',
begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
excludeBegin: true,
relevance: 0
}
// We could copy the other contains from HASH_COMMENT_MODE if needed
],
excludeBegin: true // This excludes the whitespace from being part of the comment
};
But wait, that won't work because JavaScript regex doesn't support lookbehind in all browsers.
Let's try a different approach using the match
property with capture groups:
const COMMENT = {
scope: 'comment',
match: [
/(^|\s)/, // First capture group: start of line or whitespace
/#.*/ // Second capture group: # and everything after it
],
captures: {
// Only apply the 'comment' class to the second capture group
1: { pattern: /.*/, scope: null }, // No scope for whitespace
2: { pattern: /.*/, scope: 'comment' } // Comment scope for the actual comment
},
contains: [
{
scope: 'doctag',
begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
excludeBegin: true,
relevance: 0
}
]
};
Actually, I think I'm overcomplicating this. Let's look at the current implementation again:
const COMMENT = hljs.inherit(
hljs.COMMENT(),
{
match: [
/(^|\s)/,
/#.*$/
],
scope: {
2: 'comment'
}
}
);
The issue isn't that the whitespace is being marked as a comment - it's not. The scope: { 2: 'comment' }
is correctly applying the 'comment' scope only to the second capture group.
The issue is that the doctag detection is being lost. Let's try a simpler approach:
const COMMENT = hljs.inherit(
hljs.HASH_COMMENT_MODE, // This has the doctag detection
{
begin: /(?:^|\s)#/, // Only match # after start of line or whitespace
excludeBegin: true // Don't include the whitespace in the comment
}
);
But this still has the lookbehind issue. Let's try one more approach:
// Create a custom comment mode that includes doctag detection
const COMMENT = {
className: 'comment',
begin: /(^|\s)(#.*)/, // Capture both the whitespace and the comment
beginScope: {
2: 'comment' // Only apply 'comment' scope to the actual comment part
},
contains: [
{
scope: 'doctag',
begin: '[ ]*(?=(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):)',
end: /(TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):/,
excludeBegin: true,
relevance: 0
}
]
};
Actually, I think I'm still overcomplicating this. Let's look at the current implementation one more time and understand what's happening:
const COMMENT = hljs.inherit(
hljs.COMMENT(),
{
match: [
/(^|\s)/,
/#.*$/
],
scope: {
2: 'comment'
}
}
);
The issue is that we're using match
and scope
instead of begin
/end
. When we do this, we're completely replacing the structure that would normally include the doctag detection.
Let's try a different approach that preserves the doctag detection:
// Start with the standard COMMENT function
const COMMENT = hljs.COMMENT(
/(^|\s)#/, // begin: match # only after start of line or whitespace
/$/, // end: end of line
{
excludeBegin: true // Don't include the whitespace in the comment
}
);
This should:
- Only match
#
if it's at the start of a line or after whitespace - Not include that whitespace in the comment scope
- Preserve the doctag detection from the standard COMMENT function
Tokens: 13k sent, 1.1k received. Cost: $0.06 message, $0.15 session.
/ask If we excludeBegin /...#/, that seems like it has the opposite problem---now the # character is being left out of the highlighting
If we excludeBegin /...#/, that seems like it has the opposite problem---now the # character is being left out of the highlighting
You're absolutely right. Let's think about this more carefully.
The challenge is:
- We need to match
#
only if it's at the start of a line or after whitespace - We need to include the
#
in the comment highlighting - We need to exclude the whitespace before the
#
from the highlighting - We need to preserve doctag detection
Let's try a different approach using a more precise regex pattern:
const COMMENT = hljs.COMMENT(
/(?:^|\s)(#)/, // begin: match # only after start of line or whitespace, and capture just the #
/$/, // end: end of line
{
beginScope: {
1: 'punctuation' // Give the # character a specific scope if desired
}
}
);
But this still has the lookbehind issue in JavaScript regex.
Let's try a different approach using the on:begin
callback:
const COMMENT = hljs.COMMENT(
/#/, // begin: match any #
/$/, // end: end of line
{
/** @type {ModeCallback} */
"on:begin": (match, response) => {
// Check if # is at the start of a line or after whitespace
const beforeHash = match.input.substring(0, match.index);
if (beforeHash !== '' && !/\s$/.test(beforeHash)) {
response.ignoreMatch();
}
}
}
);
This should:
- Match any
#
character - But then use the
on:begin
callback to check if it's at the start of a line or after whitespace - If not, ignore the match
- Preserve doctag detection from the standard COMMENT function
Let's see if this approach works better.
Tokens: 15k sent, 398 received. Cost: $0.05 message, $0.20 session.
Great! That simplifies our solution significantly. With lookbehind support, we can create a clean solution:
const COMMENT = hljs.COMMENT(
/(?<=^|\s)#/, // begin: match # only if preceded by start of line or whitespace
/$/, // end: end of line
);
This should:
- Only match
#
if it's at the start of a line or after whitespace (using lookbehind) - Include the
#
in the comment highlighting - Exclude the whitespace before the
#
from the highlighting - Preserve doctag detection from the standard COMMENT function
This is a much cleaner solution that addresses all our requirements. Let's test it with the example:
echo asdf#qwert yuiop
echo asdf #qwert yuiop
# TODO: this *is* a comment
With this solution:
- First line: No comment highlighting (# is part of a token)
- Second line: Only
#qwert yuiop
is highlighted as a comment (whitespace before # is not) - Third line:
# TODO: this *is* a comment
is highlighted as a comment, withTODO:
highlighted as a doctag
Tokens: 15k sent, 268 received. Cost: $0.05 message, $0.25 session.
I'll implement the fix for the bash comment highlighting issue using lookbehind. Here's the change:
src/languages/bash.js
<<<<<<< SEARCH
const COMMENT = hljs.inherit(
hljs.COMMENT(),
{
match: [
/(^|\s)/,
/#.*$/
],
scope: {
2: 'comment'
}
}
);
=======
const COMMENT = hljs.COMMENT(
/(?<=^|\s)#/, // begin: match # only if preceded by start of line or whitespace
/$/ // end: end of line
);
=======
>>>>>>> REPLACE
Now let's update the test case to include a TODO comment:
test/markup/bash/not-comments.txt
<<<<<<< SEARCH
echo asdf#qwert yuiop
echo asdf #qwert yuiop
=======
echo asdf#qwert yuiop
echo asdf #qwert yuiop
# TODO: this *is* a comment
>>>>>>> REPLACE
And update the expected output:
test/markup/bash/not-comments.expect.txt
<<<<<<< SEARCH
<span class="hljs-built_in">echo</span> asdf#qwert yuiop
<span class="hljs-built_in">echo</span> asdf <span class="hljs-comment">#qwert yuiop</span>
=======
<span class="hljs-built_in">echo</span> asdf#qwert yuiop
<span class="hljs-built_in">echo</span> asdf <span class="hljs-comment">#qwert yuiop</span>
<span class="hljs-comment"># <span class="hljs-doctag">TODO:</span> this *is* a comment</span>
>>>>>>> REPLACE
You can run the tests to verify the fix works correctly:
npm test
Or specifically test just the bash language:
npm test -- -g "markup bash"
Tokens: 17k sent, 423 received. Cost: $0.06 message, $0.31 session.
Applied edit to src/languages/bash.js
Applied edit to test/markup/bash/not-comments.expect.txt
Applied edit to test/markup/bash/not-comments.txt
Commit c524f41 fix: use lookbehind to improve bash comment highlighting
This commit resolves an issue with bash comment highlighting where:
#
within tokens was incorrectly detected as a comment- TODO and other doctags were not being highlighted in comments
The solution uses a lookbehind regex to ensure comments are only detected when #
is preceded by whitespace or the start of a line, while preserving the existing doctag detection mechanism.
Fixes #4234
npm test
Run shell command? (Y)es/(N)o/(S)kip all/(D)on't ask again [Yes]: y
Running npm test
Add command output to the chat? (Y)es/(N)o/(D)on't ask again [Yes]: n
npm test -- -g "markup bash"
Run shell command? (Y)es/(N)o/(S)kip all/(D)on't ask again [Yes]: n