Skip to content

Instantly share code, notes, and snippets.

View aloknayak29's full-sized avatar

Alok Nayak aloknayak29

View GitHub Profile
@aloknayak29
aloknayak29 / ft_wiki_preproc.py
Created July 29, 2017 08:00 — forked from bittlingmayer/ft_wiki_preproc.py
fastText pre-trained vectors preprocessing
# See https://github.com/facebookresearch/fastText/blob/master/get-wikimedia.sh
#
# From https://github.com/facebookresearch/fastText/issues/161:
#
# We now have a script called 'get-wikimedia.sh', that you can use to download and
# process a recent wikipedia dump of any language. This script applies the preprocessing
# we used to create the published word vectors.
#
# The parameters we used to build the word vectors are the default skip-gram settings,
# except with a dimensionality of 300 as indicated on the top of the list of word