Created
August 12, 2025 08:33
-
-
Save matthewfranglen/059a0c226fbcc929d0230520a62140d0 to your computer and use it in GitHub Desktop.
Lucene Query Syntax in Lark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
start: clause_default | |
clause_default: clause_or+ | |
clause_or: clause_and (or_ clause_and)* | |
clause_and: clause_not (and_ clause_not)* | |
clause_not: clause_basic (not_ clause_basic)* | |
clause_basic: modifier? LPAREN clause_default RPAREN term_modifier? | |
| atom | |
atom: modifier? field multi_value term_modifier? | |
| modifier? field? value term_modifier? | |
field: TERM_NORMAL COLON | |
value: range_term | |
| normal | |
| truncated | |
| quoted | |
| quoted_truncated | |
| QMARK | |
| anything | |
| STAR | |
anything: STAR COLON STAR | |
two_sided_range_term: (LBRACK|LCURLY) range_value (TO? range_value)? (RBRACK|RCURLY) | |
range_term: two_sided_range_term | |
range_value: truncated | |
| quoted | |
| quoted_truncated | |
| date | |
| normal | |
| STAR | |
multi_value: LPAREN clause_default RPAREN | |
normal: TERM_NORMAL | |
| NUMBER | |
truncated: TERM_TRUNCATED | |
quoted_truncated: PHRASE_ANYTHING | |
quoted: PHRASE | |
modifier: PLUS | |
| MINUS | |
term_modifier: boost fuzzy? | |
| fuzzy boost? | |
boost: CARAT NUMBER? | |
fuzzy: TILDE NUMBER? | |
not_: AND NOT | |
| NOT | |
and_: AND | |
or_: OR | |
date: DIGIT DIGIT? DATE_SEPARATOR DIGIT DIGIT? DATE_SEPARATOR DIGIT DIGIT (DIGIT DIGIT)? | |
LPAREN: "(" | |
RPAREN: ")" | |
LBRACK: "[" | |
RBRACK: "]" | |
COLON: ":" | |
PLUS: "+" | |
MINUS: /-|!/ | |
STAR: "*" | |
QMARK: /\?+/ | |
LCURLY: "{" | |
RCURLY: "}" | |
CARAT: "^" | |
TILDE: "~" | |
TO: "TO" | |
DIGIT: /\d/ | |
AND: /(?i:and)|&&?/ | |
OR: /(?i:or)|\|\|?/ | |
NOT: /(?i:not)/ | |
NUMBER: /\d+(\.\d+)?/ | |
DATE_SEPARATOR: ("/"|"-"|".") | |
ESC_CHAR: /\\./ | |
TERM_START_CHAR: /(\\.|[^ \t\n\r\u3000'"()\[\]{}+\-!:~^?*\\])/ | |
TERM_CHAR: /(\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])/ | |
TERM_NORMAL: /(\\.|[^ \t\n\r\u3000'"()\[\]{}+\-!:~^?*\\])(\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*/ | |
TERM_TRUNCATED: /(?:\*|\?+)(?:(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])+(?:\?+|\*))+((?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*)|(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])(?:(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*(?:\?+|\*))+((?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])*)|(?:\*|\?+)(?:\\.|[^ \t\n\r\u3000'"()\[\]{}!:~^?*\\])+/ | |
PHRASE: /"(\\.|[^"\\?*])+"/ | |
PHRASE_ANYTHING: /"(\\.|[^"\\])+"/ | |
WS: /[ \t\r\n\u3000]+/ | |
%ignore WS |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This was created by asking GPT-5 to convert the lucene query syntax from ANTLR to LARK. The ANTLR version is available here.
GPT-5 made an error with the date which I manually corrected.