Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Javascript for Enrichments #92

Merged
merged 2 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ git_source(:github) { |repo| "https://github.com/#{repo}.git" }
ruby '3.2.5'

# Bundle edge Rails instead: gem 'rails', github: 'rails/rails', branch: 'main'
gem 'rails', '~> 7.1.3.3'
gem 'rails', '~> 7.1.4.1'

# The original asset pipeline for Rails [https://github.com/rails/sprockets-rails]
gem 'sprockets-rails'
Expand Down
140 changes: 70 additions & 70 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -13,73 +13,73 @@ GIT
GEM
remote: https://rubygems.org/
specs:
actioncable (7.1.3.4)
actionpack (= 7.1.3.4)
activesupport (= 7.1.3.4)
actioncable (7.1.4.1)
actionpack (= 7.1.4.1)
activesupport (= 7.1.4.1)
nio4r (~> 2.0)
websocket-driver (>= 0.6.1)
zeitwerk (~> 2.6)
actionmailbox (7.1.3.4)
actionpack (= 7.1.3.4)
activejob (= 7.1.3.4)
activerecord (= 7.1.3.4)
activestorage (= 7.1.3.4)
activesupport (= 7.1.3.4)
actionmailbox (7.1.4.1)
actionpack (= 7.1.4.1)
activejob (= 7.1.4.1)
activerecord (= 7.1.4.1)
activestorage (= 7.1.4.1)
activesupport (= 7.1.4.1)
mail (>= 2.7.1)
net-imap
net-pop
net-smtp
actionmailer (7.1.3.4)
actionpack (= 7.1.3.4)
actionview (= 7.1.3.4)
activejob (= 7.1.3.4)
activesupport (= 7.1.3.4)
actionmailer (7.1.4.1)
actionpack (= 7.1.4.1)
actionview (= 7.1.4.1)
activejob (= 7.1.4.1)
activesupport (= 7.1.4.1)
mail (~> 2.5, >= 2.5.4)
net-imap
net-pop
net-smtp
rails-dom-testing (~> 2.2)
actionpack (7.1.3.4)
actionview (= 7.1.3.4)
activesupport (= 7.1.3.4)
actionpack (7.1.4.1)
actionview (= 7.1.4.1)
activesupport (= 7.1.4.1)
nokogiri (>= 1.8.5)
racc
rack (>= 2.2.4)
rack-session (>= 1.0.1)
rack-test (>= 0.6.3)
rails-dom-testing (~> 2.2)
rails-html-sanitizer (~> 1.6)
actiontext (7.1.3.4)
actionpack (= 7.1.3.4)
activerecord (= 7.1.3.4)
activestorage (= 7.1.3.4)
activesupport (= 7.1.3.4)
actiontext (7.1.4.1)
actionpack (= 7.1.4.1)
activerecord (= 7.1.4.1)
activestorage (= 7.1.4.1)
activesupport (= 7.1.4.1)
globalid (>= 0.6.0)
nokogiri (>= 1.8.5)
actionview (7.1.3.4)
activesupport (= 7.1.3.4)
actionview (7.1.4.1)
activesupport (= 7.1.4.1)
builder (~> 3.1)
erubi (~> 1.11)
rails-dom-testing (~> 2.2)
rails-html-sanitizer (~> 1.6)
activejob (7.1.3.4)
activesupport (= 7.1.3.4)
activejob (7.1.4.1)
activesupport (= 7.1.4.1)
globalid (>= 0.3.6)
activemodel (7.1.3.4)
activesupport (= 7.1.3.4)
activerecord (7.1.3.4)
activemodel (= 7.1.3.4)
activesupport (= 7.1.3.4)
activemodel (7.1.4.1)
activesupport (= 7.1.4.1)
activerecord (7.1.4.1)
activemodel (= 7.1.4.1)
activesupport (= 7.1.4.1)
timeout (>= 0.4.0)
activerecord-nulldb-adapter (1.0.1)
activerecord (>= 5.2.0, < 7.2)
activestorage (7.1.3.4)
actionpack (= 7.1.3.4)
activejob (= 7.1.3.4)
activerecord (= 7.1.3.4)
activesupport (= 7.1.3.4)
activestorage (7.1.4.1)
actionpack (= 7.1.4.1)
activejob (= 7.1.4.1)
activerecord (= 7.1.4.1)
activesupport (= 7.1.4.1)
marcel (~> 1.0)
activesupport (7.1.3.4)
activesupport (7.1.4.1)
base64
bigdecimal
concurrent-ruby (~> 1.0, >= 1.0.2)
Expand Down Expand Up @@ -152,7 +152,7 @@ GEM
railties (>= 4.1.0)
responders
warden (~> 1.2.3)
devise-two-factor (5.0.0)
devise-two-factor (6.0.0)
activesupport (~> 7.0)
devise (~> 4.0)
railties (~> 7.0)
Expand Down Expand Up @@ -213,10 +213,10 @@ GEM
http-cookie (1.0.5)
domain_name (~> 0.5)
http-form_data (2.3.0)
i18n (1.14.5)
i18n (1.14.6)
concurrent-ruby (~> 1.0)
io-console (0.7.2)
irb (1.14.0)
irb (1.14.1)
rdoc (>= 4.0.0)
reline (>= 0.4.2)
jaro_winkler (1.5.6)
Expand Down Expand Up @@ -262,13 +262,13 @@ GEM
mime-types-data (3.2024.0507)
mini_mime (1.1.5)
minitar (0.9)
minitest (5.24.1)
minitest (5.25.1)
multi_json (1.15.0)
mutex_m (0.2.0)
mysql2 (0.5.6)
net-http (0.4.1)
uri
net-imap (0.4.11)
net-imap (0.5.0)
date
net-protocol
net-pop (0.1.2)
Expand All @@ -279,13 +279,13 @@ GEM
net-protocol
netrc (0.11.0)
nio4r (2.7.3)
nokogiri (1.16.6-aarch64-linux)
nokogiri (1.16.7-aarch64-linux)
racc (~> 1.4)
nokogiri (1.16.6-arm64-darwin)
nokogiri (1.16.7-arm64-darwin)
racc (~> 1.4)
nokogiri (1.16.6-x86_64-darwin)
nokogiri (1.16.7-x86_64-darwin)
racc (~> 1.4)
nokogiri (1.16.6-x86_64-linux)
nokogiri (1.16.7-x86_64-linux)
racc (~> 1.4)
optparse (0.5.0)
orm_adapter (0.5.0)
Expand All @@ -302,11 +302,11 @@ GEM
psych (5.1.2)
stringio
public_suffix (5.0.5)
puma (6.4.2)
puma (6.4.3)
nio4r (~> 2.0)
raabro (1.4.0)
racc (1.8.0)
rack (3.1.7)
racc (1.8.1)
rack (3.1.8)
rack-mini-profiler (3.3.1)
rack (>= 1.2.0)
rack-proxy (0.7.7)
Expand All @@ -318,20 +318,20 @@ GEM
rackup (2.1.0)
rack (>= 3)
webrick (~> 1.8)
rails (7.1.3.4)
actioncable (= 7.1.3.4)
actionmailbox (= 7.1.3.4)
actionmailer (= 7.1.3.4)
actionpack (= 7.1.3.4)
actiontext (= 7.1.3.4)
actionview (= 7.1.3.4)
activejob (= 7.1.3.4)
activemodel (= 7.1.3.4)
activerecord (= 7.1.3.4)
activestorage (= 7.1.3.4)
activesupport (= 7.1.3.4)
rails (7.1.4.1)
actioncable (= 7.1.4.1)
actionmailbox (= 7.1.4.1)
actionmailer (= 7.1.4.1)
actionpack (= 7.1.4.1)
actiontext (= 7.1.4.1)
actionview (= 7.1.4.1)
activejob (= 7.1.4.1)
activemodel (= 7.1.4.1)
activerecord (= 7.1.4.1)
activestorage (= 7.1.4.1)
activesupport (= 7.1.4.1)
bundler (>= 1.15.0)
railties (= 7.1.3.4)
railties (= 7.1.4.1)
rails-controller-testing (1.0.5)
actionpack (>= 5.0.1.rc1)
actionview (>= 5.0.1.rc1)
Expand All @@ -343,9 +343,9 @@ GEM
rails-html-sanitizer (1.6.0)
loofah (~> 2.21)
nokogiri (~> 1.14)
railties (7.1.3.4)
actionpack (= 7.1.3.4)
activesupport (= 7.1.3.4)
railties (7.1.4.1)
actionpack (= 7.1.4.1)
activesupport (= 7.1.4.1)
irb
rackup (>= 1.0.0)
rake (>= 12.2)
Expand All @@ -358,7 +358,7 @@ GEM
redis-client (0.22.1)
connection_pool
regexp_parser (2.9.1)
reline (0.5.9)
reline (0.5.10)
io-console (~> 0.5)
responders (3.1.1)
actionpack (>= 5.2)
Expand Down Expand Up @@ -461,7 +461,7 @@ GEM
sprockets (>= 3.0.0)
stringio (3.1.1)
strscan (3.1.0)
thor (1.3.1)
thor (1.3.2)
timeout (0.4.1)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
Expand All @@ -480,7 +480,7 @@ GEM
addressable (>= 2.8.0)
crack (>= 0.3.2)
hashdiff (>= 0.4.0, < 2.0.0)
webrick (1.8.1)
webrick (1.8.2)
websocket (1.2.11)
websocket-driver (0.7.6)
websocket-extensions (>= 0.1.0)
Expand All @@ -489,7 +489,7 @@ GEM
nokogiri (~> 1.8)
yard (0.9.36)
yomu (0.1.5)
zeitwerk (2.6.13)
zeitwerk (2.7.1)
zlib (3.1.1)

PLATFORMS
Expand Down Expand Up @@ -527,7 +527,7 @@ DEPENDENCIES
pry-byebug
puma (~> 6.0)
rack-mini-profiler
rails (~> 7.1.3.3)
rails (~> 7.1.4.1)
rails-controller-testing
retriable
rqrcode
Expand Down
12 changes: 12 additions & 0 deletions app/supplejack/extraction/enrichment_extraction.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@ def initialize(request, record, page = 1, extraction_folder = nil)
@extraction_folder = extraction_folder
end

def extract
::Retriable.retriable do
@document = if @extraction_definition.evaluate_javascript?
Extraction::JavascriptRequest.new(url:, params:).get
else
Extraction::Request.new(url:, params:, headers:, method: http_method).send(http_method)
end
end
rescue StandardError => e
::Sidekiq.logger.info "Extraction error: #{e}" if defined?(Sidekiq)
end

def valid?
url.exclude?('evaluation-error')
end
Expand Down
19 changes: 19 additions & 0 deletions spec/supplejack/extraction/enrichment_extraction_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,25 @@
end
end

context 'when the extraction requires JavaScript' do
let(:ed) { create(:extraction_definition, :enrichment, destination:, extraction_jobs: [extraction_job], base_url: "file://#{Rails.root.join('spec/stub_responses')}", evaluate_javascript: true) }
let!(:parameter) { create(:parameter, content: "javascript_example.html", kind: 'slug', request: request_two, content_type: 'static') }

context 'when the extraction is successful' do
it 'evaluates the JavaScript and saves the HTML as a document' do
document = subject.extract

document_html = Nokogiri::HTML(document.body).xpath('//body').to_html
expect(document_html).to include('This heading is rendered with JavaScript')
end

it 'returns a successful status code' do
document = subject.extract
expect(document.status).to eq 200
end
end
end

context 'when record extraction fails' do
before do
subject
Expand Down
Loading