Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush zombie graphs #51

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ncbo_cron

## Run the ncbo_cron daemon

To run it use the `bin/ncbo_cron` command

Running this command without option will run the job according to the settings defined in the NcboCron config file. Or by default in [ncbo_cron/lib/ncbo_cron/config.rb](https://github.com/ncbo/ncbo_cron/blob/master/lib/ncbo_cron/config.rb)

But the user can add arguments to change some settings.

Here an example to run the flush old graph job every 3 hours and to disable the automatic pull of new submissions:

WARNING: 4store blocks and becomes unresponsive for the duration of the graph deletion operation which could cause a temporary outage on sites with high traffic.

```
bin/ncbo_cron --flush-old-graphs "0 */3 * * *" --disable-pull
```

It will run by default as a daemon

But it will not run as a daemon if you use one of the following options:

* console (to open a pry console)
* view_queue (view the queue of jobs waiting for processing)
* queue_submission (adding a submission to the processing submission queue)
* kill (stop the ncbo_cron daemon)

## Stop the ncbo_cron daemon

The PID of the ncbo_cron process is in /var/run/ncbo_cron/ncbo_cron.pid

To stop the ncbo_cron daemon:
```
bin/ncbo_cron -k
```
5 changes: 4 additions & 1 deletion bin/ncbo_cron
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ opt_parser = OptionParser.new do |opts|
opts.on("-f", "--flush-old-graphs SCHED", String, "cron schedule to delete class graphs of archive submissions", "(default: #{options[:cron_flush]})") do |c|
options[:cron_flush] = c
end
opts.on("--remove-zombie-graphs", "flush class graphs from deleted ontology submissions", "(default: #[options[:remove_zombie_graphs]})") do |v|
options[:remove_zombie_graphs] = true
end
opts.on("-w", "--warm-long-queries SCHED", String, "cron schedule to warmup long time running queries", "(default: #{options[:cron_warmq]})") do |c|
options[:cron_warmq] = c
end
Expand Down Expand Up @@ -289,7 +292,7 @@ runner.execute do |opts|
logger.info "Logging flush details to #{flush_log_path}"; logger.flush
t0 = Time.now
parser = NcboCron::Models::OntologySubmissionParser.new
flush_onts = parser.process_flush_classes(flush_logger)
flush_onts = parser.process_flush_classes(flush_logger, flush_options[:remove_zombie_graphs])
logger.info "Flushed #{flush_onts.length} submissions in #{Time.now - t0} sec."; logger.flush
logger.info "Finished flush"; logger.flush
end
Expand Down
6 changes: 6 additions & 0 deletions config/config.rb.sample
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,21 @@ Annotator.config do |config|
end

NcboCron.config do |config|
# see https://github.com/ncbo/ncbo_cron/blob/master/lib/ncbo_cron/config.rb for default config
config.redis_host ||= "localhost"
config.redis_port ||= 6379
config.search_index_all_url = "http://localhost:8983/solr/term_search_core2"
config.property_search_index_all_url = "http://localhost:8983/solr/prop_search_core2"

# Ontologies Report config
config.enable_ontologies_report = true
config.ontology_report_path = "./test/reports/ontologies_report.json"

# Remove graphs from deleted ontologies when running process_flush_classes
config.remove_zombie_graphs = false

# Google Analytics config
config.enable_ontology_analytics = false
config.analytics_service_account_email_address = "123456789999-sikipho0wk8q0atflrmw62dj4kpwoj3c@developer.gserviceaccount.com"
config.analytics_path_to_key_file = "config/bioportal-analytics.p12"
config.analytics_profile_id = "ga:1234567"
Expand Down
2 changes: 2 additions & 0 deletions lib/ncbo_cron/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ def config(&block)
@settings.enable_processing ||= true
@settings.enable_pull ||= true
@settings.enable_flush ||= true
# Don't remove graphs from deleted ontologies by default when flushing classes
@settings.remove_zombie_graphs ||= false
@settings.enable_warmq ||= true
@settings.enable_mapping_counts ||= true
# enable ontology analytics
Expand Down
9 changes: 8 additions & 1 deletion lib/ncbo_cron/ontology_submission_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ def get_prefixed_id(id)
"#{IDPREFIX}#{id}"
end

# Zombie graphs are submission graphs from ontologies that have been deleted
def zombie_classes_graphs
query = "SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o }}"
class_graphs = []
Expand All @@ -98,7 +99,7 @@ def zombie_classes_graphs
zombies
end

def process_flush_classes(logger)
def process_flush_classes(logger, remove_zombie_graphs=false)
onts = LinkedData::Models::Ontology.where.include(:acronym,:summaryOnly).all
status_archived = LinkedData::Models::SubmissionStatus.find("ARCHIVED").first
deleted = []
Expand Down Expand Up @@ -141,6 +142,12 @@ def process_flush_classes(logger)

zombie_classes_graphs.each do |zg|
logger.info("Zombie class graph #{zg}"); logger.flush
# Not deleting zombie graph by default
if !remove_zombie_graphs.nil? && remove_zombie_graphs == true
Goo.sparql_data_client.delete_graph(RDF::URI.new(zg))
logger.info "DELETED #{zg} graph"
deleted << zg
end
end

logger.info("finish process_flush_classes"); logger.flush
Expand Down