Created
November 27, 2018 08:29
-
-
Save otuoma/4b6fb6b22e5293c0a75b9258ef8e562c to your computer and use it in GitHub Desktop.
Enable OAI and import records in DSpace
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# DSpace interoperability with OAI\n", | |
"\n", | |
"DSpace can act both as a data provider or consumer of data from other repositories. Several technologies are built-in that enable dspace to act as either a data provider or consumer.\n", | |
"1. **SWORD** (Simple Web-service Offering Repository Deposit) is enabled by ensuring that the sword webapp is available in [dspace]/webapps directory. It can be used to remotely deposit content into other repositories.\n", | |
"2. **REST** (Representational State Transfer) is a programming interface (API) that allows developers to create other applications that can create, read, update and delete objects (communities, collections and documents) in dspace.\n", | |
"3. **OAI-PMH** (Open Archives Initiative - Protocol for Metadata Harvesting) is a widely used technology for metadata exchange between digital repositories\n", | |
"\n", | |
"**A Data Provider** is a repository that has exposed its metadata for harvesting via OAI protocol.\n", | |
"\n", | |
"**A Service Provider** is a platform that harvests metadata from a data provider and makes it available for consumption by users usually via a searchable web interface.\n", | |
"\n", | |
"DSpace is fully compliant with OAI-PMH. It acts as a data provider and also as a service provider at the same time." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Enabling OAI\n", | |
"OAI is enabled in DSpace when the oai webapp is copied to the [webapps]/oai directory. However, your existing records must be indexed for them to be discoverable by external harvesters.\n", | |
"\n", | |
"Ensure that solr.server and other relevant settings in [dspace]/config/local.cfg file are correctly configured:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"solr.server = http://localhost/solr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dspace.hostname = my-university.ac.ke" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dspace.baseUrl = http://my-university.ac.ke" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dspace.name = My University Name" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Then in [dspace]/config/modules/oai.cfg ensure the following are correctly set" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"oai.url = http://my-university.ac.ke/oai" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"oai.solr.url=http://localhost/solr/oai" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"oai.identifier.prefix = my-university.ac.ke" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Confirm the **description.xml** file has the right correct values in [dspace]/config/crosswalks/oai/description.xml\n", | |
"\n", | |
"The **Repository identifier** should be the same value as the **hostname** in your repository URL for some harvestors to correctly harvest your metadata." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"nano [dspace]/config/crosswalks/oai/description.xml" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To test if your oai is correctly set, go to this link on your repository:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"http://[repository-URL]/oai/request?verb=Identify" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You should see output similar to what's on this page http://erepository.mku.ac.ke/oai/request?verb=Identify" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If you make change values in [dspace]/config/local.cfg and still don't see the changes, delete the file in [dspace]/var/oai/requests/ directory to clear the cache." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sudo rm ./var/oai/requests/cmVxdWVzdElkZW50aWZ5bnVsbG51bGxudWxsbnVsbG51bGxudWxs" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Import Records into the Index\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Existing records must be imported into the index before they can be reached by external harvesters. The dspace executable in [dspace]/bin/dspace is used to import records into the oai index." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sudo [dspace]/bin/dspace oai import -v -o" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It has the following switch options:\n", | |
"1. **-v** - Verbose i.e print out progress messages\n", | |
"2. **-o** - Optimize the index after importing\n", | |
"3. **-c** - Clear the existing index and import everything afresh" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"After executing this command, records will now be listed when you access the following URL on your repository:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"http://[my-university.ac.ke]/oai/request?verb=ListRecords&metadataPrefix=oai_dc" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Updating the index\n", | |
"The index needs to be updated each time new records are added to the repository. This can be achieved using a scheduled cronjob. The cronjob should be executed as the user who has permissions to write to the index usually the tomcat-user:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sudo crontab -e -u tomcat8" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Then add to the bottom of the file this code:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"0 0 * * * [dspace]/bin/dspace oai import -o > /dev/null" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This executes the program everyday at midnight.\n", | |
"\n", | |
"We have excluded the -v switch because it is an automated process and there's no need to print the output" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Bash", | |
"language": "bash", | |
"name": "bash" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment