-
-
Save lukas-vlcek/1075067 to your computer and use it in GitHub Desktop.
| #!/bin/sh | |
| host=localhost:9200 | |
| curl -X DELETE "${host}/test" | |
| curl -X PUT "${host}/test" -d '{ | |
| "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }} | |
| }' | |
| curl -X GET "${host}/_cluster/health?wait_for_status=green&pretty=1&timeout=5s" | |
| curl -X PUT "${host}/test/attachment/_mapping" -d '{ | |
| "attachment" : { | |
| "properties" : { | |
| "file" : { | |
| "type" : "attachment", | |
| "fields" : { | |
| "title" : { "store" : "yes" }, | |
| "file" : { "term_vector":"with_positions_offsets", "store":"yes" } | |
| } | |
| } | |
| } | |
| } | |
| }' | |
| curl -C - -O http://www.intersil.com/data/fn/fn6742.pdf | |
| coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'` | |
| json="{\"file\":\"${coded}\"}" | |
| echo "$json" > json.file | |
| curl -X POST "${host}/test/attachment/" -d @json.file | |
| echo | |
| curl -XPOST "${host}/_refresh" | |
| curl "${host}/_search?pretty=true" -d '{ | |
| "fields" : ["title"], | |
| "query" : { | |
| "query_string" : { | |
| "query" : "amplifier" | |
| } | |
| }, | |
| "highlight" : { | |
| "fields" : { | |
| "file" : {} | |
| } | |
| } | |
| }' | |
| # | |
| # The following is output of the last search query: | |
| # | |
| # | |
| # | |
| #{ | |
| # "took" : 6, | |
| # "timed_out" : false, | |
| # "_shards" : { | |
| # "total" : 1, | |
| # "successful" : 1, | |
| # "failed" : 0 | |
| # }, | |
| # "hits" : { | |
| # "total" : 1, | |
| # "max_score" : 0.005872132, | |
| # "hits" : [ { | |
| # "_index" : "test", | |
| # "_type" : "attachment", | |
| # "_id" : "UUaHJ6CfTOC3T2I4Kj_pXg", | |
| # "_score" : 0.005872132, | |
| # "fields" : { | |
| # "file.title" : "ISL99201" | |
| # }, | |
| # "highlight" : { | |
| # "file" : [ "\nMono <em>Amplifier</em> • Filterless Class D with Efficiency > 86% at 400mW\nThe ISL99201 is a fully integrat", "\nmono <em>amplifier</em>. It is designed to maximize performance for \nmobile phone applications. The applicat" ] | |
| # } | |
| # } ] | |
| # } | |
| #} |
This helped me a ton. Thanks! I made a similar gist using Python - inspired by this one. https://gist.github.com/stevehanson/7461706
The script downloads an empty pdf file because the redirection from http://www.intersil.com/data/fn/fn6742.pdf to http://www.intersil.com/content/dam/Intersil/documents/fn67/fn6742.pdf is not followed by the curl command !
There is no error message or warning in the script, but as a result the Elastic query returns an empty resultset ! which might send you a long way wondering what happened..
So, to have a successfull result, you'll have to edit the line 27 of the script to edit the file URL or to use wget instead of curl (which will follow the redirection)
I forked this gist with the aforementioned correction here :
https://gist.github.com/zipang/6fe4ee9b821b5e454962
This helped me so much with custom attachment type support implementation for django-haystack!!!
Thank you very much Lukas!