Created
December 23, 2011 04:48
-
-
Save longzheng/1513173 to your computer and use it in GitHub Desktop.
Difference in regex quantifiers parsing of JAVA and .NET
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So I discovered this interesting difference in how JAVA and .NET parses Regex quantifiers by throwing myself against a brick wall (Twitter's regex library https://github.com/twitter/twitter-text-java/blob/master/src/com/twitter/Regex.java) | |
In JAVA, if you do | |
[[a-z]\-]* | |
It would actually match zero or more of "a-z" AND "-". It seems to automatically include everything inside any child square brackets. | |
In .NET, | |
[[a-z]\-]* | |
Only matches zero or more of "-". Just one of [a-z] | |
So the solution is to remove the inner bracket. | |
[a-z\-]* |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey Long Zheng, here is my friend (https://github.com/nicolasavru) and my take on the regex:
[[a-z]-]*
Break it down:
[ ] --> its a character class; what did you put inside it? '[a-z'. It will match a [ OR any lowercase character. Then it will match a single dash followed by zero or more ].
For this regex to match, you will need EXACTLY ONE of '[' or a lowercase letter, followed by EXACTLY ONE dash followed by ZERO OR MORE ']'.
[a-z-]*
The "solution" you had will match ONE OR MORE of a lowercase letter OR dash.
In the end, the question is: what exactly were you looking for? You might have been confused by what exactly is getting matched, because my friend and I believe that JAVA and .NET matches the regex the same way.