Skip to content

Instantly share code, notes, and snippets.

@matthewfranglen
Created August 12, 2025 08:33
Show Gist options
  • Save matthewfranglen/059a0c226fbcc929d0230520a62140d0 to your computer and use it in GitHub Desktop.
Save matthewfranglen/059a0c226fbcc929d0230520a62140d0 to your computer and use it in GitHub Desktop.
Lucene Query Syntax in Lark
start: clause_default
clause_default: clause_or+
clause_or: clause_and (or_ clause_and)*
clause_and: clause_not (and_ clause_not)*
clause_not: clause_basic (not_ clause_basic)*
clause_basic: modifier? LPAREN clause_default RPAREN term_modifier?
| atom
atom: modifier? field multi_value term_modifier?
| modifier? field? value term_modifier?
field: TERM_NORMAL COLON
value: range_term
| normal
| truncated
| quoted
| quoted_truncated
| QMARK
| anything
| STAR
anything: STAR COLON STAR
two_sided_range_term: (LBRACK|LCURLY) range_value (TO? range_value)? (RBRACK|RCURLY)
range_term: two_sided_range_term
range_value: truncated
| quoted
| quoted_truncated
| date
| normal
| STAR
multi_value: LPAREN clause_default RPAREN
normal: TERM_NORMAL
| NUMBER
truncated: TERM_TRUNCATED
quoted_truncated: PHRASE_ANYTHING
quoted: PHRASE
modifier: PLUS
| MINUS
term_modifier: boost fuzzy?
| fuzzy boost?
boost: CARAT NUMBER?
fuzzy: TILDE NUMBER?
not_: AND NOT
| NOT
and_: AND
or_: OR
date: DIGIT DIGIT? DATE_SEPARATOR DIGIT DIGIT? DATE_SEPARATOR DIGIT DIGIT (DIGIT DIGIT)?
LPAREN: "("
RPAREN: ")"
LBRACK: "["
RBRACK: "]"
COLON: ":"
PLUS: "+"
MINUS: /-|!/
STAR: "*"
QMARK: /\?+/
LCURLY: "{"
RCURLY: "}"
CARAT: "^"
TILDE: "~"
TO: "TO"
DIGIT: /\d/
AND: /(?i:and)|&&?/
OR: /(?i:or)|\|\|?/
NOT: /(?i:not)/
NUMBER: /\d+(\.\d+)?/
DATE_SEPARATOR: ("/"|"-"|".")
ESC_CHAR: /\\./
TERM_START_CHAR: /(\\.|[^ \t\n\r\u3000'"()\[\]{}+\-!:~^?*\\])/
TERM_CHAR: /(\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])/
TERM_NORMAL: /(\\.|[^ \t\n\r\u3000'"()\[\]{}+\-!:~^?*\\])(\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*/
TERM_TRUNCATED: /(?:\*|\?+)(?:(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])+(?:\?+|\*))+((?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*)|(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])(?:(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*(?:\?+|\*))+((?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*)|(?:\*|\?+)(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])+/
PHRASE: /"(\\.|[^"\\?*])+"/
PHRASE_ANYTHING: /"(\\.|[^"\\])+"/
WS: /[ \t\r\n\u3000]+/
%ignore WS
@matthewfranglen
Copy link
Author

This was created by asking GPT-5 to convert the lucene query syntax from ANTLR to LARK. The ANTLR version is available here.

GPT-5 made an error with the date which I manually corrected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment