Current Heredoc Rules

Individual rules exist for the following scopes

Definition	Indented?	Quoted?	Scope	Notes
<<"HTML"	No	Double	text.html.embedded.perl	-
<<"XML"	No	Double	text.xml.embedded.perl	-
<<"CSS"	No	Double	text.css.embedded.perl	-
<<"JAVASCRIPT"	No	Double	text.js.embedded.perl	-
<<"SQL"	No	Double	source.sql.embedded.perl	-
<<"POSTSCRIPT"	No	Double	text.postscript.embedded.perl	-
<<"OTHER"	No	Double	string.unquoted.heredoc.doublequote.perl	This rule assigns `$self` to the incorrect capture.
<<'HTML'	No	Single	text.html.embedded.perl	-
<<'XML'	No	Single	text.xml.embedded.perl	-
<<'CSS'	No	Single	text.css.embedded.perl	-
<<'JAVASCRIPT'	No	Single	text.js.embedded.perl	-
<<'SQL'	No	Single	source.sql.embedded.perl	-
<<'POSTSCRIPT'	No	Single	text.postscript.embedded.perl	-
<<'OTHER'	No	Single	string.unquoted.heredoc.quote.perl	This rule assigns `$self` to the incorrect capture.
<<\:::	No	Single	string.unquoted.heredoc.quote.perl	This rule assigns `$self` to the incorrect capture.
<<`OTHER`	No	Backticks	string.unquoted.heredoc.backtick.perl	This rule assigns `$self` to the incorrect capture.
<<HTML	No	None	text.html.embedded.perl	-
<<XML	No	None	text.xml.embedded.perl	-
<<CSS	No	None	text.css.embedded.perl	This rule is not present.
<<JAVASCRIPT	No	None	source.js.embedded.perl	This scope uses "source.js" while its siblings use "text.js". Why?
<<SQL	No	None	source.sql.embedded.perl	-
<<POSTSCRIPT	No	None	source.postscript.embedded.perl	This scope uses "source.postscript" while its siblings use "text.postscript". Why?
<<OTHER	No	None	string.unquoted.heredoc.doublequote.perl	-

Easy Wins

Fix the 4 rules that assign $self to the incorrect capture. It should always match the (.*) in the first line after the heredoc part. (e.g. in my @vals = (<<HTML, 1);, $self should be interested in the , ); portion of text.)
Add the missing unquoted embedded CSS case.
Confirm if the scope name differences among siblings (i.e. text/source are intended or one or the other are in error).

Potential Wins

There are currently 22 rules used to handle heredocs. It would be nice if they could be refactored to a fewer number.

There's 3 main types of parsing approaches: double-quoted, single-quoted, and bare. It would be nice if all double-quoted approaches could be unified, all single-quoted approaches could be unified, and all bare approaches could be unified. This would require being able to specify the contentName attribute dynamically based on a begin capture.
In order to make the existing rules also work for indented heredocs, instead of duplicating each current rule and making a small tweak, we'd need conditional expressions to work, e.g. <<(~)?HTML(?(1)\s*)HTML would allow spaces between the two HTMLs only if the ~ was present. Despite this being apparently possible it doesn't seem to work. Can it be done?

Unresolved Issues

Why do the following approaches diverge?

    # Double-quoted other (<<"OTHER")
    (((&lt;&lt;) *"([^"]*)"))(.*)\n?

    # Single-quoted other and craziness
    # <<'OTHER'
    (((&lt;&lt;) *'([^']*)'))(.*)\n?
    # <<\:::
    (((&lt;&lt;) *\\((?![=\d\$\( ])[^;,'"`\s\)]*)))(.*)\n?

    # Un-quoted other (<<OTHER)
    (((&lt;&lt;) *((?![=\d\$\( ])[^;,'"`\s\)]*)))(.*)\n?

In order to correctly identify an indented heredoc, we should be checking that the whitespace portion of the end terminator is matched exactly at the beginning of each line within the indented heredoc, and if not, then we should not match it. For example:

my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
	WHERE id = ?
	SQL

is a valid indented heredoc, but the following is not:

my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
	WHERE id = ?
		SQL

and neither is:

my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
        WHERE id = ?
	SQL

Why is this last case not? Because the line for the where clause uses eight spaces to indent, while the end terminator uses a tab. Yeah, perl is that picky. Is this level of parsing possible with textmate grammars?

An Existing Heredoc Rule

<dict>
  <key>begin</key>
  <string>(((&lt;&lt;) *"HTML"))(.*)\n?</string>
  <key>captures</key>
  <dict>
    <key>1</key>
    <dict>
      <key>name</key>
      <string>punctuation.definition.string.perl</string>
    </dict>
    <key>2</key>
    <dict>
      <key>name</key>
      <string>string.unquoted.heredoc.doublequote.perl</string>
    </dict>
    <key>3</key>
    <dict>
      <key>name</key>
      <string>punctuation.definition.heredoc.perl</string>
    </dict>
    <key>4</key>
    <dict>
      <key>patterns</key>
      <array>
        <dict>
          <key>include</key>
          <string>$self</string>
        </dict>
      </array>
    </dict>
  </dict>
  <key>contentName</key>
  <string>text.html.embedded.perl</string>
  <key>end</key>
  <string>(^HTML$)</string>
  <key>patterns</key>
  <array>
    <dict>
      <key>include</key>
      <string>#escaped_char</string>
    </dict>
    <dict>
      <key>include</key>
      <string>#variable</string>
    </dict>
    <dict>
      <key>include</key>
      <string>text.html.basic</string>
    </dict>
  </array>
</dict>

PatrickCronin/textmate-grammar-perl-indented-heredocs.md

Links

Current Heredoc Rules

Easy Wins

Potential Wins

Unresolved Issues

An Existing Heredoc Rule