Releases · DigitalNZ/supplejack_common

Add new Parser DSL `pre_process_block`

This optional block allows manipulation of the response data from your harvest source, before it is handed on to the rest of the parser as per normal. It could be used for any type of pre-processing data clean up requirements but was initially designed to rationalise verbose feeds that mentioned items multiple times, keeping only the latest mention to be harvested.

JSON example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Hash
  hash = JSON.parse(rest_client_response.body)

  # Sort and uniq the data will result in only the latest of each item
  hash = hash.sort do |item_a, item_b|
    # 'updated_at' specifies the date to sort on
    Date.parse(item_b['updated_at']) <=> Date.parse(item_a['updated_at'])
  end
    .uniq { |item| item['audio_id'] } # 'audio_id' specifies the unique item ID to rationalise with

  # Convert back to JSON
  json = hash.to_json

  # Return a new RestClient::Response with the new mutated JSON
  RestClient::Response.create(json, rest_client_response.net_http_res, rest_client_response.request)
end

XML example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Nokogiri Document
  doc = Nokogiri::XML(rest_client_response.body) { |config|    config.options = Nokogiri::XML::ParseOptions::NOBLANKS }

  # Select node that contains all items
  items_node  = doc.at_xpath('//dnz-export')

  # Sorting by the "date" field
  sorted = items_node.children.sort_by do |item|
    item.children.find { |child| child.name == 'date' }.text
  end.reverse!

  # uniq will keep only the latest mention of each item based on the unique ID of that item (specified in "key")
  uniq = sorted.uniq do |item|
    item.children.find { |child| child.name == 'key' } .text
  end

  # Replace all children with new values
  items_node.children.remove
  uniq.each{ |n| items_node << n }

  # Return a new rest response
  RestClient::Response.create(doc.to_xml, rest_client_response.net_http_res, rest_client_response.request)
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Parser DSL `pre_process_block`

RESTful enrichments

Tokenised pagination for JSON parser scripts

Releases: DigitalNZ/supplejack_common

v2.7.0

Add new Parser DSL pre_process_block

Include AWS S3 SDK Gem

Allow Harvesting via a Proxy

Scroll Harvest

Fix for 404 breaking the harvest

Pagination Bug Fix

v2.3.0

RESTful enrichments

Rubocop for code quality

XML Tokenised pagination

v2.1.0

Tokenised pagination for JSON parser scripts

Add new Parser DSL `pre_process_block`