GitHub - TBFY/search-API: explore large multilingual documentary collections from their semantics

Basic Overview

Explore collections of multilingual public procurement data through a Restful API:

/documents : list of existing documents
/documents/{id} : details of a document
/documents/{id}/items : similar documents

Or search for a similar document given a text:

/items : similar documents

Quick Start

A Swagger-based API is available online at:
http://tbfy.librairy.linkeddata.es/search-api
Get the list of available documents, and filter by language or source, using /documents:
http://tbfy.librairy.linkeddata.es/search-api/documents
Get the content, and additional information, of a document through /documents/{id}:
http://tbfy.librairy.linkeddata.es/search-api/documents/jrc32002D0996-en
Obtain similar documents, regardless of language, through /documents/{id}/items: http://tbfy.librairy.linkeddata.es/search-api/documents/jrc32002D0996-en/items
To obtain only documents in Spanish, just add lang=es to the query:
http://tbfy.librairy.linkeddata.es/search-api/documents/jrc32002D0996-en/items?lang=es

Similar documents to a free text can also be searched. All you have to do is make a HTTP-POST request with a json like this at :

{
  "size": 10,
  "source": "jrc",
  "text": "Council Directive 9343EEC on the hygiene of foodstuffs as regards the transport of bulk liquid oils and fats by seaText with EEA relevance."
}

In order to obtain only documents in Spanish, just add lang=es to the json:

{
  "size": 10,
  "source": "jrc",
  "text": "Council Directive 9343EEC on the hygiene of foodstuffs as regards the transport of bulk liquid oils and fats by seaText with EEA relevance.",
  "lang":"es"
}

Index Documents

Download the latest data dump available at Zenodo:
https://doi.org/10.5281/zenodo.3783736
Unzip it, for example in /tmp. A folder is created per month.
Download the indexing script. It is implemented in Python, but is easily exportable to other languages:
http://tbfy.librairy.linkeddata.es/search-api/src/main/python/index-tenders.py
Edit it to set the root directory where the documents are. For example /tmp:
```
main('/tmp/20*')
```
As you can see, a filtering of directories to be indexed can be defined in the path itself by adding * characters.
Run it! That's it.

More info here

Lastest Stable Release

This tool is part of the librAIry ecosystem, and needs librAIry-API for deployment.

It can start as a service via docker-compose.yml:

Or through Maven dependencies:

Add the JitPack repository to your build file

    <repositories>
	      <repository>
	        <id>jitpack.io</id>
	        <url>https://jitpack.io</url>
	      </repository>
      </repositories>

Add the dependency

    <dependency>
         <groupId>com.github.TBFY</groupId>
         <artifactId>search-API</artifactId>
         <version>last-stable-release-version</version>
  </dependency>

Contributing

Please take a look at our contributing guidelines if you're interested in helping!

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
TBFY_Crosslingual_Example.ipynb		TBFY_Crosslingual_Example.ipynb
TBFY_Crosslingual_SearchAPI.ipynb		TBFY_Crosslingual_SearchAPI.ipynb
docker-compose-basic.yml		docker-compose-basic.yml
docker-compose.yml		docker-compose.yml
documents.csv		documents.csv
jitpack.yml		jitpack.yml
logo.png		logo.png
pairs.csv		pairs.csv
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Overview

Quick Start

Index Documents

Lastest Stable Release

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

TBFY/search-API

Folders and files

Latest commit

History

Repository files navigation

Basic Overview

Quick Start

Index Documents

Lastest Stable Release

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages