The objective of this project is to build a simple autocomplete feature based on data extracted from wikipedia.
See the on-line demo.
The html/javascript part is based on jQueryUI autocomplete, the server script is based on Silex microframework. Wikipedia knowledge is distilled by Dbpedia and hosted by LinkedData.Center.
Because is not always easy to understand the power of SPARQL and of Semantic Web technologies in day-by-day programming, I provided a simple example that solves a very general and frequent problem: autocomplete an input field selecting data from a large dataset.
Suppose that you want to write an autocomplete script to help a user in writing the name of a river in the world in a form, and suppose that you want this this available in different languages. You face a big problem: populating and maintaining the big dataset needed to drive the script.
Here is where the Semantic Web does the magic: you can use Dbpedia to access the full "Wisdom of the crowd" contained in Wikipedia and use it to get a list of all rivers, translated in any language!
Dpedia is a great public service but unfortunately it does't ensure SLA, sometime the service is down for maintenance and you can't predict when this happens. This is not acceptable if you need to build a solid application based directly on such service.
A reasonable solution is to copy the data you need from dbpedia to your own knowlege base system, so you can safely use it in your application.
This is where LinkedData.Center service plays its role. It allows you to quickly create and host a knowledge base populated from linked open data sources, from private data or from any combination of both. LinkedData.Centers exposes a dedicated and password protected sparql end-point full compliant with the last W3C semantic web standards. You can create data mashup, apply rules, data inferences and many other features. Last but not least LinkedData.Center keeps aligned your knowledge base with the data sources reindexing when needed.
This project is composed by javascript/html page, a server script and a knowledge base configuration.
The html page is a standard implementation of [jQueryUi remote autocomplete] (http://jqueryui.com/autocomplete#remote) javascript.
The server script is an API interface to the datasets, by default, connects to http://pub.linkeddata.center/demo/sparql endpoint. You can use your own LinkedData.Center instance (free tiers available) just changing credentials in the api code or setting environment variables LDC_ENDPOINT, LDC_USER, LDC_PASSWORD)
The knoledge base is populated starting from a Knowledge Exchange Engine Schema (KEES) file (find it in pub/kees.ttl). This file is the core of the project. Through this configuration the ingestion engine of LinkedData.Center cache all needed dbpedia data, managing updates and dbpedia server failure. If you want to use your own knowledge base instance, you need just to add following line to your graph-db configuration:
@prefix kees: <http://linkeddata.center/kees/v1#> .
[] kees:includes <http://autocomplete.linkeddata.center/kees.ttl> .
In production environment do not link the master branch, instead use the preferred tagged version: e.g [] kees:includes <http://linkeddata.center/project/autocomplete/1.0.1/pub/kees.ttl> .
For more information about how to populate a knowledge base, please refer to LinkedData.Center Knowledge base configuration handbook.
These instructions allow you to install and test the project on your local workstation using some simple virtualization technologies:
- install vagrant and virtual box on your workstation.
- clone this project in directory of your workstation and cwd in it
- open a shell and type the command
vagrant up
. A new virtual machine with all needed tools will be ready and running in few minutes. - point your browser to http://localhosts:8080/demo .
- to destroy your virtual host just type
vagrant destroy
You should get locally the same results available in demo site.
- Publish the project in a web server that supports php 5 (with curl extension ).
The provision script contained in the Vagrant file will give an idea of a complete api installation on a ubuntu 14.04 box.
##The server side script [jQueryUi remote autocomplete] (http://jqueryui.com/autocomplete#remote) requires a server script file. The script source that searches labels in wikipedia is provided in pub/label/index.php file. Here is the script usage:
http://your_endpoint_path/api?term=[&list=10][&lang=en][&class=Automobile|River|Mammal]
Mandatory parameters:
- term: filter for auto complention. Search is enabled if you provide at least two chars.
Optional parameters:
- list: maximum number of items returned. Default 10, max 100, min 1
- lang: preferred language using the two chars international coding standard. Default is en (means english).
- class: the name of the dbpedia classification. This examples supports Automobile (default), River, Mammal
Example:
the resource:
http://localhost:8080/demo/api?term=ri&list=3&lang=en&class=River
will return something like:
[ "River Garavogue", "River Oykel", "River Afan" ]
The Html source with all required javascript is contatined in [pub/index.html] file.
Please note that you can extend this approach to query any data in billons of linked data sources (public or private) in just three steps:
- add the required dataset to the abox list in your linkeddata.center endpoint;
- start a learn job;
- create your domain specific api to allow your application to access data
To improve performances you can add cache at server side script.
Where possible, I will try and provide support for this project, feel free to open an issue and I'll do my best to help.
I have to thank a lot of awesome open source projects that were suggested by BOTK architecture:
- BOTK packages.
- Rest by Alexandre Gomes Gaigalas
- Mimeparse by Joe Gregorio
- Guzzle] by Michael Dowling
- EasyRDF by Nicholas J Humfrey
- Composer by Nils Adermann, Jordi Boggiano
And, of course, PHP and JQuery community.
This project is licensed under the MIT license, in LICENSE file.