Last active
May 14, 2021 19:13
-
-
Save mahmoud/237eb20108b5805aed5f to your computer and use it in GitHub Desktop.
hashtag regex in python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
# the first group is noncapturing and just ensures we're at the beginning of | |
# the string or have whitespace before the hashtag (don't want to capture anchors) | |
# without the fullwidth hashmark, hashtags in asian languages would be tough | |
hashtag_re = re.compile("(?:^|\s)[##]{1}(\w+)", re.UNICODE) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
# similar to hashtag.py this regex finds username mentions, with a very permissive | |
# algorithm, suitable for MediaWiki/Wikipedia usernames, which can include | |
# unicode symbols and punctuation (almost anything but whitespace and a | |
# few punctuation marks) | |
mention_re = re.compile("(?:^|\s)[@ @]{1}([^\s#<>[\]|{}]+)", re.UNICODE) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I tried but it did not work?