Skip to content

walid-shalaby/patents_toolbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

patents toolbox

collection of apps to work with patents data

  1. Topic Crawler

  • Used to create a DB of resilience and sustainability (R&S) related patents.
  • Reads list of topics related to R&S from DB then search for related patents at lucene index or DB then write retrieved patents data to destination DB
  • Usage: java -jar topic_crawler.jar --topics-db "path" --topic-lookup-mode title|abstract|title_abstract --patents-lookup-mode db|index --patents-src "path"
  • Dependencies:
    • Apache lucene4.6.0: lucene-core-4.6.0.jar, lucene-analyzers-common-4.6.0.jar, lucene-queries-4.6.0.jar, lucene-queryparser-4.6.0.jar
    • Included: sqlite-jdbc-3.7.2.jar
  1. Patent Indexer

  • Used to create a lucene index of patents records stored in DB (currently tested on R&S patents).
  • Reads list of patents from DB then dump them to destination lucene index
  • Commandline must include DB columns and Index fields mappings, below is a list of supported DB columns names:
    • id
    • title
    • abstract
    • description
    • claims
    • class
    • subclass
    • further_classification
    • state
    • assignee
    • publication_date
    • city
  • Usage: java -jar patent_indexer.jar --patents-src "path" --patents-index "path" [--preprocess stem] [--analyzer "lucene-analyzer-class"] --DBcolunm-Indexfield-tuples {(DBcolumn1:IndexField1),(DBcolumn2:IndexField2),...}
  • Example Usage: --patents-src /home/user/patents.db --patents-index /home/user/patents-index --analyzer org.apache.lucene.analysis.core.WhitespaceAnalyzer --DBcolunm-Indexfield-tuples "{(id:patent_id),(title:title),(abstract:abstract),(description:description),(claims:claims),(class:class),(subclass:subclass),(further_classification:further_classification),(state:state),(assignee:assignee),(publication_date:publication_date),(city:city)}"
  • Dependencies:
    • Apache lucene4.6.0: lucene-core-4.6.0.jar, lucene-analyzers-common-4.6.0.jar, lucene-queries-4.6.0.jar, lucene-queryparser-4.6.0.jar
    • Included: sqlite-jdbc-3.7.2.jar
  1. Concept Searcher

  • Used to create a coincidence DB table of concepts and patents in which these concepts appear (currently lookup in patent title and/or abstract).
  • The list of concepts are stored in a DB table and search proceeds by forming a search query to a lucene index containing patents data for each concept. The hit list of each concept is then written to the conincidence DB table.
  • Usage: java -jar concept_searcher.jar --patents-index "path-to-lucene-index" --concepts-db "path-to-concepts-db" --concepts-table "concepts-table-name" --concepts-field "concepts-field-name" --target-table "target-coincidence-table" --target-patent-field "patent-field-id" --target-concept-field "concept-field-id" [--analyzer "lucene-analyzer-class"] --target-Index-fields "{IndexField1,IndexField2,...}"
  • Example Usage: java -jar concept_searcher.jar --patents-index /home/user/patents --concepts-db /home/user/concepts.db --concepts-table wiki_concepts --concepts-field name --target-table wiki_concepts_patents --target-patent-field innovation_id --target-concept-field concept_id --analyzer org.apache.lucene.analysis.core.WhitespaceAnalyzer --target-Index-fields "{title,abstract}"
  • Dependencies:
    • Apache lucene4.6.0: lucene-core-4.6.0.jar, lucene-analyzers-common-4.6.0.jar, lucene-queries-4.6.0.jar, lucene-queryparser-4.6.0.jar
    • Included: sqlite-jdbc-3.7.2.jar

About

collection of apps to work with patents data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages