Skip to content

Instantly share code, notes, and snippets.

@jgomo3
Created May 10, 2025 13:18
Show Gist options
  • Save jgomo3/7b29c1a63c97c1be789e6a0e996a3486 to your computer and use it in GitHub Desktop.
Save jgomo3/7b29c1a63c97c1be789e6a0e996a3486 to your computer and use it in GitHub Desktop.
Demo of regular expressions with Unicode support: split a text into words
(def split-into-words (partial re-seq #"\w+"))
(comment
;; split-into-words, implemented with the trivial \w+ regular
;; expression, it works fine in English:
(split-into-words "Have a nice day.")
;; => ("Have" "a" "nice" "day")
;; But it fails with other languages, like in Spanish. In the
;; following example, the word "día" is splited into "d" and "a"
(split-into-words "Que tenga un buen día")
;; => ("Que" "tenga" "un" "buen" "d" "a")
)
(def split-into-words (partial re-seq #"\p{IsAlphabetic}+"))
(comment
;; Now, split-into-words, implemented with the regular expression
;; with Unicode Support, works correctly with other languages like
;; Spanish:
(split-into-words "Que tenga un buen día")
;; => ("Que" "tenga" "un" "buen" "día")
)
;; Demostración del uso de expresiones regulares con soporte de Unicode.
;; Divide un texto en palabras.
(def split-into-words (partial re-seq #"\w+"))
(comment
;; split-into-words, implementado con la expression regular trivial
;; \w+, funciona bien en inglés:
(split-into-words "Have a nice day.")
;; => ("Have" "a" "nice" "day")
;; Pero falla con otros lenguajes, como el español. En el siguiente
;; ejemplo, la palabra día es dividida en «d» y en «a»:
(split-into-words "Que tenga un buen día")
;; => ("Que" "tenga" "un" "buen" "d" "a")
)
(def split-into-words (partial re-seq #"\p{IsAlphabetic}+"))
(comment
;; En cambio ahora, split-into-words, implementado con una
;; expression regular usando el soporte a Unicode, trabaja
;; conrrectamente con lenguajes como el español:
(split-into-words "Que tenga un buen día")
;; => ("Que" "tenga" "un" "buen" "día")
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment