Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save PatrickCronin/0500f5af82f2cc5919a2867b78a7d431 to your computer and use it in GitHub Desktop.
Save PatrickCronin/0500f5af82f2cc5919a2867b78a7d431 to your computer and use it in GitHub Desktop.

Links

  1. The existing Perl 5/7 grammar
  2. The existing heredocs portion
  3. Docs for TextMate Language Grammars

Current Heredoc Rules

Individual rules exist for the following scopes

Definition Indented? Quoted? Scope Notes
<<"HTML" No Double text.html.embedded.perl -
<<"XML" No Double text.xml.embedded.perl -
<<"CSS" No Double text.css.embedded.perl -
<<"JAVASCRIPT" No Double text.js.embedded.perl -
<<"SQL" No Double source.sql.embedded.perl -
<<"POSTSCRIPT" No Double text.postscript.embedded.perl -
<<"OTHER" No Double string.unquoted.heredoc.doublequote.perl This rule assigns $self to the incorrect capture.
<<'HTML' No Single text.html.embedded.perl -
<<'XML' No Single text.xml.embedded.perl -
<<'CSS' No Single text.css.embedded.perl -
<<'JAVASCRIPT' No Single text.js.embedded.perl -
<<'SQL' No Single source.sql.embedded.perl -
<<'POSTSCRIPT' No Single text.postscript.embedded.perl -
<<'OTHER' No Single string.unquoted.heredoc.quote.perl This rule assigns $self to the incorrect capture.
<<\::: No Single string.unquoted.heredoc.quote.perl This rule assigns $self to the incorrect capture.
<<`OTHER` No Backticks string.unquoted.heredoc.backtick.perl This rule assigns $self to the incorrect capture.
<<HTML No None text.html.embedded.perl -
<<XML No None text.xml.embedded.perl -
<<CSS No None text.css.embedded.perl This rule is not present.
<<JAVASCRIPT No None source.js.embedded.perl This scope uses "source.js" while its siblings use "text.js". Why?
<<SQL No None source.sql.embedded.perl -
<<POSTSCRIPT No None source.postscript.embedded.perl This scope uses "source.postscript" while its siblings use "text.postscript". Why?
<<OTHER No None string.unquoted.heredoc.doublequote.perl -

Easy Wins

  • Fix the 4 rules that assign $self to the incorrect capture. It should always match the (.*) in the first line after the heredoc part. (e.g. in my @vals = (<<HTML, 1);, $self should be interested in the , ); portion of text.)
  • Add the missing unquoted embedded CSS case.
  • Confirm if the scope name differences among siblings (i.e. text/source are intended or one or the other are in error).

Potential Wins

There are currently 22 rules used to handle heredocs. It would be nice if they could be refactored to a fewer number.

  • There's 3 main types of parsing approaches: double-quoted, single-quoted, and bare. It would be nice if all double-quoted approaches could be unified, all single-quoted approaches could be unified, and all bare approaches could be unified. This would require being able to specify the contentName attribute dynamically based on a begin capture.
  • In order to make the existing rules also work for indented heredocs, instead of duplicating each current rule and making a small tweak, we'd need conditional expressions to work, e.g. <<(~)?HTML(?(1)\s*)HTML would allow spaces between the two HTMLs only if the ~ was present. Despite this being apparently possible it doesn't seem to work. Can it be done?

Unresolved Issues

  • Why do the following approaches diverge?
    # Double-quoted other (<<"OTHER")
    (((&lt;&lt;) *"([^"]*)"))(.*)\n?

    # Single-quoted other and craziness
    # <<'OTHER'
    (((&lt;&lt;) *'([^']*)'))(.*)\n?
    # <<\:::
    (((&lt;&lt;) *\\((?![=\d\$\( ])[^;,'"`\s\)]*)))(.*)\n?

    # Un-quoted other (<<OTHER)
    (((&lt;&lt;) *((?![=\d\$\( ])[^;,'"`\s\)]*)))(.*)\n?
  • In order to correctly identify an indented heredoc, we should be checking that the whitespace portion of the end terminator is matched exactly at the beginning of each line within the indented heredoc, and if not, then we should not match it. For example:
my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
	WHERE id = ?
	SQL

is a valid indented heredoc, but the following is not:

my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
	WHERE id = ?
		SQL

and neither is:

my @sql_and_bind = (<<~SQL, $id);
	SELECT a, b, c
	FROM the_table
        WHERE id = ?
	SQL

Why is this last case not? Because the line for the where clause uses eight spaces to indent, while the end terminator uses a tab. Yeah, perl is that picky. Is this level of parsing possible with textmate grammars?

An Existing Heredoc Rule

<dict>
  <key>begin</key>
  <string>(((&lt;&lt;) *"HTML"))(.*)\n?</string>
  <key>captures</key>
  <dict>
    <key>1</key>
    <dict>
      <key>name</key>
      <string>punctuation.definition.string.perl</string>
    </dict>
    <key>2</key>
    <dict>
      <key>name</key>
      <string>string.unquoted.heredoc.doublequote.perl</string>
    </dict>
    <key>3</key>
    <dict>
      <key>name</key>
      <string>punctuation.definition.heredoc.perl</string>
    </dict>
    <key>4</key>
    <dict>
      <key>patterns</key>
      <array>
        <dict>
          <key>include</key>
          <string>$self</string>
        </dict>
      </array>
    </dict>
  </dict>
  <key>contentName</key>
  <string>text.html.embedded.perl</string>
  <key>end</key>
  <string>(^HTML$)</string>
  <key>patterns</key>
  <array>
    <dict>
      <key>include</key>
      <string>#escaped_char</string>
    </dict>
    <dict>
      <key>include</key>
      <string>#variable</string>
    </dict>
    <dict>
      <key>include</key>
      <string>text.html.basic</string>
    </dict>
  </array>
</dict>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment