Demo console app for indexing windows files with elasticsearch

What is it?

A simple test program showing how to use TikaOnDotNet to extract metadata from files on a windows file system, and index them into Elasticsearch.

OK?

TikaOnDotNet is a .NET port/wrapper of Apache Tika, using IKVM to magically transform java code into a .net dll. Using TikaOnDotNet, we don't have to use the mapper attachments plugin, but can read text content from files before indexing. It is also easy to retrieve other metadata if you should need that. Communicating with Elasticsearch is done using the wonderful NEST library.

How can I use it?

Clone the repo, and open the program in Visual Studio. Running the program will read all documents from your "my documents" folder, and index into Elasticsearch. If you want to search the files you indexed, you can for instance use Sense ( automagically installed by the below-mentioned ansible vagrant box at http://localhost:5601/app/sense Or, if no sense available, you should see something on this url http://localhost:9200/files/_search See below for a sample query with highlighting.

The program can even read input arguments - the folder to read files from, and the indexname to use in Elasticsearch

But why doesn't it...?

Hey, this is a simple demo. Emphasis on simple. If you need something, do it yourself. ##Requirements ####Windows This is a windows program ####Visual Studio You most likely want to run this program in visual Studio. I used version 2013 ####Elasticsearch Requires elasticsearch running on localhost at port 9200. Tested with Elasticsearch 2.0, but it likely runs with even older versions.

(you can install elasticsearch in a VM using https://github.com/comperiosearch/vagrant-elk-box-ansible))

Sample query with highlighting of body contents

Excluding the body from showing up in the search results, as it is often too large to be of any real value in this context.

GET /files/_search
{
  "_source": {
    "exclude": "body"
  },
  "query": {
    "match": {
      "body": "fish"
    }
  },
  "highlight": {
    "fields": {
      "body": {}
    }
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
testingtika		testingtika
.gitignore		.gitignore
.travis.yml		.travis.yml
readme.md		readme.md
testingtika.sln		testingtika.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo console app for indexing windows files with elasticsearch

What is it?

OK?

How can I use it?

But why doesn't it...?

Sample query with highlighting of body contents

About

Releases

Packages

Languages

babadofar/TikaNestDemo

Folders and files

Latest commit

History

Repository files navigation

Demo console app for indexing windows files with elasticsearch

What is it?

OK?

How can I use it?

But why doesn't it...?

Sample query with highlighting of body contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages