This is the source code for the little app we created that allows people to browse Google Summer of Code (GSoC) projects.
If you are curious about how we implemented this app, feel free to check our source code as well.
- Type-ahead suggestion is done via CORS-enabled Ajax queries to DBpedia Lookup. This API takes in some phrase and searches the DBpedia knowledge base to find possible meanings for this phrase. Once you pick one of those meanings, we store its unique identifier (URI) from DBpedia. The client side javascript uses AutoSuggest jQuery Plugin by Drew Wilson.
- Suggestion of related concepts is done via DBpedia's wikiPageLinks and using DBpedia Spotlight's notion of resource relatedness. For each of the URIs you have selected in step 1, we find all concepts linked to them via DBpedia properties. We add to that any other concepts that are "topically similar" according to DBpedia Spotlight. The wikiPageLinks dataset was loaded into a Virtuoso triple store, in order to provide the "expand" functionality.
- Retrieval and ranking is done via queries over annotated projects stored in an elasticsearch server. Projects were annotated with DBpedia Spotlight's Web Service.
- Results are displayed by the DataTables jQuery plugin.
DBpedia and DBpedia Spotlight has been selected as an organization for GSoC2013. If you have project ideas involving DBpedia or DBpedia Spotlight, please let us know, for example through our discussion list at SourceForge.net.
This demo relies on three Web services.
DBpedia Lookup returns tags in the DBpedia knowledge base that match some string. For example, the query below searches for everything containing Berlin:
curl "http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryClass=place&QueryString=berlin"
DBpedia Spotlight models DBpedia "tags" based on their distributional similarity. Therefore we can use their service to give us related tags.
Testing the deployed demo
curl -H "application/json" "http://spotlight.dbpedia.org/related/?uri=Berlin"
Getting the code
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Installation
Starting the server
mvn scala:run -DmainClass="org.dbpedia.spotlight.web.rest.RelatedResources"
Using the server
curl -H "application/json" "http://localhost:2222/related/?uri=Berlin"
We use an ElasticSearch server to query data about the GSoC projects.
Once you started an ElasticSearch server, you can index the data with the index-gsoc-searcher-data.py
script. The input for it is the output of the extract-gsoc-searcher-data.py
script.
Then you can check if your development is working by visiting
http://localhost:9200/gsoc2013/d/_search?q=*:*
The data that is indexed in the ElasticSearch server was created using the extract-gsoc-searcher-data.py
script with the GSoC organizations listing in CSV which can be retrieved at http://www.google-melange.com/gsoc/accepted_orgs/google/gsoc2013.