Created
August 8, 2014 08:19
-
-
Save richardpascual/7595a3f92bdb3c09168e to your computer and use it in GitHub Desktop.
How to remove unwanted non-alphanumeric characters from an input string
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- my-regexp-workshop (Oracle database table) | |
-- This is how to make the supporing schema tables. The autosequenced column value/trigger combination is optional. | |
CREATE SEQUENCE "MY_REGEXP_WORKSHOP_SEQ" MINVALUE 1 MAXVALUE 9999999999999999999999999999 INCREMENT BY 1 START WITH 21 CACHE 20 NOORDER NOCYCLE | |
/ | |
CREATE TABLE "MY_REGEXP_WORKSHOP" | |
( "RECORD_ID" NUMBER NOT NULL ENABLE, | |
"CUSTOMER_NAME" VARCHAR2(50), | |
"INPUT_VALUE" VARCHAR2(50), | |
CONSTRAINT "MY_REGEXP_WORKSHOP_PK" PRIMARY KEY ("RECORD_ID") | |
USING INDEX ENABLE | |
) | |
/ | |
CREATE OR REPLACE TRIGGER "BI_MY_REGEXP_WORKSHOP" | |
before insert on "MY_REGEXP_WORKSHOP" | |
for each row | |
begin | |
if :NEW."RECORD_ID" is null then | |
select "MY_REGEXP_WORKSHOP_SEQ".nextval into :NEW."RECORD_ID" from sys.dual; | |
end if; | |
end; | |
/ | |
ALTER TRIGGER "BI_MY_REGEXP_WORKSHOP" ENABLE | |
/ |
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 3 in line 1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Some sample test data intended to test out the regexp functionality. | |
"RECORD_ID","CUSTOMER_NAME","INPUT_VALUE" | |
"1","HA=?<>::RRI**+S","" | |
"2","GO(RDON)","" | |
"3","<SAMUEL>","" | |
"5","M%OR?RIS^^","" | |
"6","[Q]&UIX-OT//E@","" | |
"7","SA$B,RINA!!!","" | |
"4","A#SHTON*&^%$","" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- remove-str: using Oracle PL/SQL Regular Expressions | |
-- This used to be hard with older versions of Oracle RDBMS. Now REGEXP recipes seem to simplify it all. | |
-- model implementation borrowed from the following source: | |
-- https://community.oracle.com/message/2231311#2231311 | |
select regexp_replace('This is a test $%&^*&*'||chr(10)||'*"ú%bsfd.', '( *[[:punct:]])', '') from dual | |
-- applying the concept to a full table: MY_REGEXP_WORKSHOP: | |
select record_id, customer_name, | |
regexp_replace(customer_name, '( *[[:punct:]])', '') as converted_string | |
from my_regexp_workshop | |
order by record_id asc; | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Data-Scrubbing Text Inputs with Oracle:
ORACLE-11g
(and later)remove-str
Used to be the name of the custom function I developed to strip off unwanted symbols, or non-alphanumeric characters from data values processed throughSQL
orPL/SQL
driven processes.Using Oracle PL/SQL Regular Expressions
This used to be hard with older versions of the Oracle RDBMS. Now
REGEXP
recipes seem to simplify it all. Oracle official references on theREGEXP
function reveal lots of great Oracle Derived and more standard POSIX Flavored regular expression recipes. Find or build the ones that work the best for you and share!