Skip to content

Releases: DigitalNZ/supplejack_worker

v2.7.3

13 Nov 21:22
d075d97
Compare
Choose a tag to compare

Configure parsable logs on uat/staging/prod

v2.7.2

19 Jul 02:00
508420a
Compare
Choose a tag to compare

Update SJ Common

Add Elastic APM to the Worker

29 May 03:28
7817384
Compare
Choose a tag to compare

Add configuration options for Elastic APM on the worker.

v2.7.0

06 May 22:38
ff6495f
Compare
Choose a tag to compare

Add new Parser DSL pre_process_block

This optional block allows manipulation of the response data from your harvest source, before it is handed on to the rest of the parser as per normal. It could be used for any type of pre-processing data clean up requirements but was initially designed to rationalise verbose feeds that mentioned items multiple times, keeping only the latest mention to be harvested.

JSON example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Hash
  hash = JSON.parse(rest_client_response.body)

  # Sort and uniq the data will result in only the latest of each item
  hash = hash.sort do |item_a, item_b|
    # 'updated_at' specifies the date to sort on
    Date.parse(item_b['updated_at']) <=> Date.parse(item_a['updated_at'])
  end
    .uniq { |item| item['audio_id'] } # 'audio_id' specifies the unique item ID to rationalise with

  # Convert back to JSON
  json = hash.to_json

  # Return a new RestClient::Response with the new mutated JSON
  RestClient::Response.create(json, rest_client_response.net_http_res, rest_client_response.request)
end

XML example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Nokogiri Document
  doc = Nokogiri::XML(rest_client_response.body) { |config|    config.options = Nokogiri::XML::ParseOptions::NOBLANKS }

  # Select node that contains all items
  items_node  = doc.at_xpath('//dnz-export')

  # Sorting by the "date" field
  sorted = items_node.children.sort_by do |item|
    item.children.find { |child| child.name == 'date' }.text
  end.reverse!

  # uniq will keep only the latest mention of each item based on the unique ID of that item (specified in "key")
  uniq = sorted.uniq do |item|
    item.children.find { |child| child.name == 'key' } .text
  end

  # Replace all children with new values
  items_node.children.remove
  uniq.each{ |n| items_node << n }

  # Return a new rest response
  RestClient::Response.create(doc.to_xml, rest_client_response.net_http_res, rest_client_response.request)
end

Separate Redis Queues and Job Details Page Improvements

28 Apr 20:25
3172311
Compare
Choose a tag to compare

The Redis queue configuration is now explicitly numbered to avoid possible conflicts. Some improvements have been made to the Supplejack Job details page namely to include the updated_at time of the last updated record id.

Add Additional Indices to Abstract Job model

03 Apr 01:47
119eb3d
Compare
Choose a tag to compare

This change adds additional indices to the Abstract Job model, on created_at, updated_at and parser_id.

504 Harvest Timeout Improvements

03 Apr 19:40
0bfa66e
Compare
Choose a tag to compare

Additional logging and increase of default timeout periods to prevent likelihood of 504 timeouts when running Supplejack harvests.

Remove Mongo Authentication

25 Mar 02:09
30e1995
Compare
Choose a tag to compare

Unstitch authentication requirement on Mongo. Now you can just connect to Mongo just using a username and password rather than needing to pass credentials and tokens to access the database.2

Fix Enrichment Worker Airbrake error

20 Mar 03:22
4d2b611
Compare
Choose a tag to compare

A Rails logger in the enrichment_worker worker has been referencing a variable outside of its scope, calling the #real method on nil. Each time the logger is called, it automatically crashes and raises an exception. This exception then is sent to Airbrake. This worker gets called many times resulting in a flood of errors being sent to Airbrake. Chaos ensues.

v2.5.6

06 Mar 02:19
ff16a44
Compare
Choose a tag to compare

Update SJ common gem and include AWS S3 gem