Skip to content

Releases: DigitalNZ/supplejack_common

v2.7.0

06 May 22:19
3000360
Compare
Choose a tag to compare

Add new Parser DSL pre_process_block

This optional block allows manipulation of the response data from your harvest source, before it is handed on to the rest of the parser as per normal. It could be used for any type of pre-processing data clean up requirements but was initially designed to rationalise verbose feeds that mentioned items multiple times, keeping only the latest mention to be harvested.

JSON example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Hash
  hash = JSON.parse(rest_client_response.body)

  # Sort and uniq the data will result in only the latest of each item
  hash = hash.sort do |item_a, item_b|
    # 'updated_at' specifies the date to sort on
    Date.parse(item_b['updated_at']) <=> Date.parse(item_a['updated_at'])
  end
    .uniq { |item| item['audio_id'] } # 'audio_id' specifies the unique item ID to rationalise with

  # Convert back to JSON
  json = hash.to_json

  # Return a new RestClient::Response with the new mutated JSON
  RestClient::Response.create(json, rest_client_response.net_http_res, rest_client_response.request)
end

XML example

pre_process_block do |rest_client_response|
  # Convert RestClient::Response to Nokogiri Document
  doc = Nokogiri::XML(rest_client_response.body) { |config|    config.options = Nokogiri::XML::ParseOptions::NOBLANKS }

  # Select node that contains all items
  items_node  = doc.at_xpath('//dnz-export')

  # Sorting by the "date" field
  sorted = items_node.children.sort_by do |item|
    item.children.find { |child| child.name == 'date' }.text
  end.reverse!

  # uniq will keep only the latest mention of each item based on the unique ID of that item (specified in "key")
  uniq = sorted.uniq do |item|
    item.children.find { |child| child.name == 'key' } .text
  end

  # Replace all children with new values
  items_node.children.remove
  uniq.each{ |n| items_node << n }

  # Return a new rest response
  RestClient::Response.create(doc.to_xml, rest_client_response.net_http_res, rest_client_response.request)
end

Include AWS S3 SDK Gem

06 Mar 00:47
b598e75
Compare
Choose a tag to compare

Includes the AWS S3 SDK Gem as a dependency.

Allow Harvesting via a Proxy

16 Jan 03:11
9ad9fae
Compare
Choose a tag to compare

This release adds support for passing proxy <url> in your Parser Script.

Scroll Harvest

19 Dec 01:13
1133bd4
Compare
Choose a tag to compare

Added support for harvesting from an Elastic Search Scroll API endpoint.

Check out the docs for how to:

http://digitalnz.github.io/supplejack/manager/parser-dsl-domain-specific-language.html

Fix for 404 breaking the harvest

26 Nov 22:07
9b35a04
Compare
Choose a tag to compare

Fixed an issue where if a harvest hit a 404, it would break. Now it will happily continue.

Pagination Bug Fix

19 Aug 22:37
43140f8
Compare
Choose a tag to compare

Altered the condition expression in the PaginatedCollection service to be an less than < rather than a less than equals =<

v2.3.0

16 Mar 00:11
d7bf6f1
Compare
Choose a tag to compare

RESTful enrichments

  • Updated the interactions with SupplejackApi::Record to be done via REST API rather than the database.

Rubocop for code quality

27 Feb 20:39
Compare
Choose a tag to compare

Better code management

XML Tokenised pagination

14 Feb 19:55
c859abf
Compare
Choose a tag to compare

This change enables parsing of xml and oai api's that use tokenised pagination.

Also, the tokenised pagination for all types is enabled with type: 'token'. The old behaviour of setting tokenised: true has been removed

v2.1.0

25 Jan 00:08
a561b87
Compare
Choose a tag to compare

Tokenised pagination for JSON parser scripts

  • paginate parser DSL now can paginate through tokenised APIs - See more