Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add import cli #23

Open
10 tasks
kptdobe opened this issue Jun 21, 2022 · 1 comment
Open
10 tasks

Add import cli #23

kptdobe opened this issue Jun 21, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@kptdobe
Copy link
Contributor

kptdobe commented Jun 21, 2022

When using the browser version of the importer (see https://github.com/kptdobe/helix-importer-tools/), you start by writing an import.js file to handle your import and then can run the import on a large number of urls. But there are cases where the browser version is impossible to use (browser memory issue, large number of pages to import). A cli version would allow to automate, parallelise, support restarts or temporary connection lost...

You should be able to run things like helix-importer --project https://github.com/hlxsites/bamboohr-website --urls ./urls.json --from 250 --to 400 --target ˜/import/bamboo/docx which would run the import with bamboohr-website/tools/importer/import.js transformation file, use urls in the urls.json file, only process entries from index 250 to 400 and store the docx files in the target forlder.

Here is a non exhaustive list of possible options:

  • --project: git url to project containing the transformation file
  • --transformationFile: path to local transformation file
  • --type md|docx: store either md or docx (default to docx)
  • --urls: url or local path to json object containing the list of urls.
  • --from: start index in the list of urls (default = 0)
  • --to: end index in the list of urls (default = urls.length-1)
  • --target: path to local folder where to store output, including report file
  • --javascript enabled|disabled: enable or disable JS - enabling would require to run a headless browser (together with hlx import proxy...), while when disabled, it can run against the page html (default = disabled)
  • --pageLoadTimeout: time in ms to wait after page load before running the import - allows to wait for JS execution which could be slow (default = 100)
  • --disableProxy: disable the use of the proxy host (https://localhost:3001) and go directly to the url host (default = false).

The import process would produce the same report file than the browser tool.

@kptdobe kptdobe added the enhancement New feature or request label Jun 21, 2022
@hsaginor
Copy link

If anyone is interested I started working on something similar - https://github.com/headwirecom/franklin-importer-tools.
It still has limited features but I already used it to do some imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants