joragupra’s gists

joragupra / add_stop_words.py

Created March 25, 2018 20:16

Use stop words for better classification

	prepositions =['a','ante','bajo','cabe','con','contra','de','desde','en','entre','hacia','hasta','para','por','según','sin','so','sobre','tras']
	prep_alike = ['durante','mediante','excepto','salvo','incluso','más','menos']
	adverbs = ['no','si','sí']
	articles = ['el','la','los','las','un','una','unos','unas','este','esta','estos','estas','aquel','aquella','aquellos','aquellas']
	aux_verbs = ['he','has','ha','hemos','habéis','han','había','habías','habíamos','habíais','habían']

	tfid = TfidfVectorizer(stop_words=prepositions+prep_alike+adverbs+articles+aux_verbs)

joragupra / check_text_classificator.py

Created March 25, 2018 20:15

Check accuracy of new text classificator

	test = read_all_documents('examples2')
	X_test = tfid.transform(test['docs'])
	y_test = test['labels']
	pred = clf.predict(X_test)

	print('accuracy score %0.3f' % clf.score(X_test, y_test))

joragupra / kmeans_tfidf.py

Created March 25, 2018 20:14

Learn using k-means clustering to classify texts

	from sklearn.neighbors import KNeighborsClassifier

	clf = KNeighborsClassifier(n_neighbors=3)
	clf.fit(X_train, y_train)

joragupra / tf_idf_creation.py

Created March 25, 2018 20:12

Create tf-idf matrix for text classification

	from sklearn.feature_extraction.text import TfidfVectorizer

	X_train = tfid.fit_transform(documents)
	y_train = labels

joragupra / execute_read_all_documents.py

Created March 25, 2018 20:11

Create documents and labels for text classification

	data = read_all_documents('examples')
	documents = data['docs']
	labels = data['labels']

joragupra / read_all_documents.py

Created March 25, 2018 20:08

Read documents for text classification

	def read_all_documents(root):
	labels = []
	docs = []
	for r, dirs, files in os.walk(root):
	for file in files:
	with open(os.path.join(r, file), "r") as f:
	docs.append(f.read())
	labels.append(r.replace(root, ''))
	return dict([('docs', docs), ('labels', labels)])

joragupra / master.xml

Created July 5, 2016 06:37

Delete address columns from customer table

	<changeSet id="customer-005" author="joragupra">

	<comment>Delete columns for address information from customer table.</comment>

	<dropColumn tableName="customer" columnName="street_name"/>
	<dropColumn tableName="customer" columnName="street_number"/>
	<dropColumn tableName="customer" columnName="postal_code"/>
	<dropColumn tableName="customer" columnName="city"/>
	<dropColumn tableName="customer" columnName="address_since"/>

joragupra / Customer.java

Created July 5, 2016 06:36

Remove address information fields from Customer class

	public class Customer {

	@Id
	@GeneratedValue
	private Long id;
	@Column(name = "first_name")
	private String firstName;
	@Column(name = "last_name")
	private String lastName;
	@OneToMany(cascade = CascadeType.ALL)

joragupra / Customer.java

Created July 5, 2016 06:32

Use address history as primary source when retrieving address information

	public class Customer {

	...

	public Address currentAddress() {
	return addressHistory().stream().sorted(comparing(Address::addressSince).reversed()).findFirst().get();
	}

	...

joragupra / address_migration_2.sql

Last active July 5, 2016 06:52

Update address in address table with data from customer table

	WITH caddresses_not_updated AS (SELECT c.* FROM customer c LEFT JOIN address a ON a.customer_id = c.id
	WHERE (c.street_name IS NOT NULL OR c.street_number IS NOT NULL OR c.postal_code IS NOT NULL OR c.city IS NOT NULL)
	AND a.id IS NOT NULL AND NOT exists(SELECT * FROM address a2 WHERE a2.customer_id = c.id AND a2.address_since > a.address_since)
	AND c.address_since > a.address_since)
	INSERT INTO address (
	id,
	street_name,
	street_number,
	postal_code,
	city,

Jorge Agudo Praena joragupra