diff --git a/CHANGELOG.md b/CHANGELOG.md
index 181db35..81764d5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,10 @@
+## [2.0.4](https://github.com/armand1m/papercut/compare/v2.0.3...v2.0.4) (2021-11-15)
+
+
+### Bug Fixes
+
+* make jsdom and pino peer dependencies ([5aabad2](https://github.com/armand1m/papercut/commit/5aabad246c45127f9a3f5b23f18e1aa407410704))
+
## [2.0.3](https://github.com/armand1m/papercut/compare/v2.0.2...v2.0.3) (2021-11-15)
diff --git a/docs/index.html b/docs/index.html
index d98a6f4..cc5485f 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -2,8 +2,14 @@
Papercut
+
+Papercut is a scraping/crawling library for Node.js, written in Typescript.
-It provides a type-safe and small foundation that makes it fairly easy to scrape webpages with confidence.
+
Papercut provides a small type-safe and tested foundation that makes it fairly easy to scrape webpages with confidence.
Create an empty project with yarn:
mkdir papercut-demo
cd papercut-demo
yarn init -y
-Add papercut:
-yarn add @armand1m/papercut
+Add papercut and the needed peer dependencies:
+yarn add @armand1m/papercut jsdom pino
diff --git a/docs/interfaces/CreateRunnerProps.html b/docs/interfaces/CreateRunnerProps.html
index 8d5aa4d..90ce9bf 100644
--- a/docs/interfaces/CreateRunnerProps.html
+++ b/docs/interfaces/CreateRunnerProps.html
@@ -1,6 +1,6 @@
-CreateRunnerProps | @armand1m/papercut Interface CreateRunnerProps
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface CreateRunnerProps
Hierarchy
Index
Properties
Properties
logger
A pino.Logger instance.
-options
options
The scraper options. Use this to tweak log, cache and concurrency settings.
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface GeosearchResult
Hierarchy
Index
Properties
Properties
latitude
longitude
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface GeosearchResult
Hierarchy
Index
Properties
Properties
latitude
longitude
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface RunProps<T, B>
Type parameters
T: SelectorMap
B: boolean
Hierarchy
Index
Properties
Properties
base Url
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface RunProps<T, B>
Type parameters
T: SelectorMap
B: boolean
Hierarchy
Index
Properties
Properties
base Url
The base url to start scraping off.
This page will be fetched, parsed and mounted in a virtual JSDOM instance.
-Optional pagination
Optional pagination
Optional pagination feature.
If enabled and configured, this will make papercut fetch, parse, mount and scrape multiple pages based @@ -9,14 +9,14 @@
As long as you have a way to fetch the last page number from the page you're scraping, and use it as a query param in the page url, you should be fine.
-selectors
selectors
The selectors to be used during the scraping process.
The result object will match the schema of the selectors.
-strict
strict
If enabled, this will make Papercut scrape the page in strict mode. This means that in case a selector function fails, the entire scraping will be halted with an error.
When enabled, the result types will not expect undefined values.
-target
target
The DOM selector for the target nodes to be scraped.
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScrapeProps<T, B>
Type parameters
T: SelectorMap
B: boolean
Hierarchy
Index
Properties
Properties
document
logger
options
selectors
strict
target
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScrapeProps<T, B>
Type parameters
T: SelectorMap
B: boolean
Hierarchy
Index
Properties
Properties
document
logger
options
selectors
strict
target
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScraperOptions
Hierarchy
Index
Properties
Properties
cache
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScraperOptions
Hierarchy
Index
Properties
Properties
cache
Enables HTML payload caching on the disk. Keep in mind that papercut will not clear the cache for you. When enabling this, it's your responsability to deal with cache invalidation.
false
-concurrency
concurrency
Concurrency settings.
Type declaration
node: number
Amount of concurrent promises for node scraping.
@@ -14,7 +14,7 @@selector: number
Amount of concurrent promises for selector scraping.
2
-log
log
Enables writing pino logs to the stdout.
process.env.DEBUG === "true"
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScraperProps
Hierarchy
Index
Properties
Properties
name
- Preparing search index...
- The search index is not available
@armand1m/papercutInterface ScraperProps
Hierarchy
Index
Properties
Properties
name
The scraper name. This will be used only for logging purposes.
-Optional options
Optional options
The scraper options. Use this to tweak log, cache and concurrency settings.
Generated using TypeDoc
- Preparing search index...
- The search index is not available
@armand1m/papercut@armand1m/papercut
Index
Interfaces
Type aliases
Functions
Type aliases
Scrape Result Type
Type parameters
T: SelectorMap
B: boolean
Scraper
Selector Function
Type declaration
- Preparing search index...
- The search index is not available
@armand1m/papercut@armand1m/papercut
Index
Interfaces
Type aliases
Functions
Type aliases
Scrape Result Type
Type parameters
T: SelectorMap
B: boolean
Scraper
Selector Function
Type declaration
Function to be used when scraping the target node for specific data.
-Parameters
utils: SelectorUtilities
self: SelectorMap
Returns any
Selector Map
Parameters
utils: SelectorUtilities
self: SelectorMap
Returns any
Selector Map
Map of selector functions.
This type is meant to be checked with an extended type, as users are going to implement a derived version of this for custom scrapers.
-Selector Utilities
Functions
Const create Runner
Selector Utilities
Functions
Const create Runner
Creates a runner instance.
This method is called by the createScraper function, but can also be externally used if needed to use an @@ -33,7 +33,7 @@
Parameters
props: RunProps<T, B>
The scraping runner properties and selectors.
Returns Promise<ScrapeResultType<T, B>[]>
result Type-safe scraping results based on the given selectors and strict mode.
-Const create Scraper
Const create Scraper
Creates a new scraper runner.
This method is papercut entrypoint. It will create an Scraper struct containing a runner that you can tweak @@ -63,7 +63,7 @@
Parameters
props: RunProps<T, B>
The scraping runner properties and selectors.
Returns Promise<ScrapeResultType<T, B>[]>
result Type-safe scraping results based on the given selectors and strict mode.
-Const create Selector Utilities
Const create Selector Utilities
This method creates the selector utilities provided to every selector function given to the scrape method.
These utilities are meant to make the experience of @@ -74,7 +74,7 @@ fallback of an empty string, in case it fails to find the element or a specific property.
At the same time, you also have direct access to the elementfrom selector functions if needed for more complex tasks.
-Parameters
element: Element
Returns { all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }; attr: (selector: string, attribute: string) => string; className: (selector: string) => string; createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }; element: Element; fetchPage: (url: string) => Promise<string>; geosearch: (q: string, limit?: number) => Promise<GeosearchResult>; href: (selector: string) => string; mapNodeListToArray: (nodeList: NodeList) => Element[]; src: (selector: string) => string; text: (selector: string) => string }
all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }
Parameters
selector: string
Returns { asArray: Element[]; nodes: NodeListOf<Element> }
as Array: Element[]
nodes: NodeListOf<Element>
attr: (selector: string, attribute: string) => string
Parameters
selector: string
attribute: string
Returns string
class Name: (selector: string) => string
Parameters
selector: string
Returns string
create Window: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }
Parameters
htmlContent: string
Returns { close: () => void; document: Document; window: DOMWindow }
close: () => void
Returns void
document: Document
window: DOMWindow
element: Element
fetch Page: (url: string) => Promise<string>
Parameters
url: string
Returns Promise<string>
geosearch: (q: string, limit?: number) => Promise<GeosearchResult>
Parameters
q: string
limit: number = 1
Returns Promise<GeosearchResult>
href: (selector: string) => string
Parameters
selector: string
Returns string
map Node List To Array: (nodeList: NodeList) => Element[]
Parameters
nodeList: NodeList
Returns Element[]
src: (selector: string) => string
Parameters
selector: string
Returns string
text: (selector: string) => string
Parameters
selector: string
Returns string
Const geosearch
Parameters
q: string
limit: number = 1
Returns Promise<GeosearchResult>
scrape
Parameters
element: Element
Returns { all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }; attr: (selector: string, attribute: string) => string; className: (selector: string) => string; createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }; element: Element; fetchPage: (url: string) => Promise<string>; geosearch: (q: string, limit?: number) => Promise<GeosearchResult>; href: (selector: string) => string; mapNodeListToArray: (nodeList: NodeList) => Element[]; src: (selector: string) => string; text: (selector: string) => string }
all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }
Parameters
selector: string
Returns { asArray: Element[]; nodes: NodeListOf<Element> }
as Array: Element[]
nodes: NodeListOf<Element>
attr: (selector: string, attribute: string) => string
Parameters
selector: string
attribute: string
Returns string
class Name: (selector: string) => string
Parameters
selector: string
Returns string
create Window: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }
Parameters
htmlContent: string
Returns { close: () => void; document: Document; window: DOMWindow }
close: () => void
Returns void
document: Document
window: DOMWindow
element: Element
fetch Page: (url: string) => Promise<string>
Parameters
url: string
Returns Promise<string>
geosearch: (q: string, limit?: number) => Promise<GeosearchResult>
Parameters
q: string
limit: number = 1
Returns Promise<GeosearchResult>
href: (selector: string) => string
Parameters
selector: string
Returns string
map Node List To Array: (nodeList: NodeList) => Element[]
Parameters
nodeList: NodeList
Returns Element[]
src: (selector: string) => string
Parameters
selector: string
Returns string
text: (selector: string) => string
Parameters
selector: string
Returns string
Const geosearch
Parameters
q: string
limit: number = 1
Returns Promise<GeosearchResult>
scrape
the scrape function
this function will select all target nodes from the given document and spawn promise pools for diff --git a/package.json b/package.json index cabe29f..77549f6 100644 --- a/package.json +++ b/package.json @@ -1,5 +1,5 @@ { - "version": "2.0.3", + "version": "2.0.4", "license": "MIT", "main": "dist/index.js", "types": "dist/index.d.ts",