Node.js module

As a Node module dependency, the engine exposes a JavaScript API that can be called in your own code.

Classes 🔗

SourceDocument
- new SourceDocument(params)

Functions 🔗

extract(sourceDocument) ⇒ Promise.<string>
launchHeadlessBrowser() ⇒ Promise.<puppeteer.Browser>
stopHeadlessBrowser() ⇒ Promise.<void>
fetch(params) ⇒ Promise.<{mimeType: string, content: (string|Buffer)}>

SourceDocument 🔗

Kind: global class

`new SourceDocument(params)` 🔗

Represents a source document containing web content and metadata for extraction.
Includes the document location, selectors for content inclusion/exclusion,
content filters, raw content data, and MIME type information.

Param	Type	Description
params	`object`	The source document parameters
params.location	`string`	The URL location of the document
params.executeClientScripts	`boolean`	Whether to execute client-side scripts
params.contentSelectors	`string` \| `object` \| `Array`	CSS selectors for content to include
params.insignificantContentSelectors	`string` \| `object` \| `Array`	CSS selectors for content to exclude
params.filters	`Array`	Array of filters to apply
params.content	`string`	The document content
params.mimeType	`string`	The MIME type of the content

`extract(sourceDocument)` ⇒ `Promise.<string>` 🔗

Extract content from source document and convert it to Markdown

Kind: global function
Returns: Promise.<string> - Promise which is fulfilled once the content is extracted and converted in Markdown. The promise will resolve into a string containing the extracted content in Markdown format

Param	Type	Description
sourceDocument	`string`	Source document from which to extract content, see SourceDocument

`launchHeadlessBrowser()` ⇒ `Promise.<puppeteer.Browser>` 🔗

Launches a headless browser instance using Puppeteer if one is not already running. Returns the existing browser instance if one is already running, otherwise creates and returns a new instance.

Kind: global function
Returns: Promise.<puppeteer.Browser> - The Puppeteer browser instance.

`stopHeadlessBrowser()` ⇒ `Promise.<void>` 🔗

Stops the headless browser instance if one is running. If no instance exists, it does nothing.

Kind: global function

`fetch(params)` ⇒ `Promise.<{mimeType: string, content: (string|Buffer)}>` 🔗

Fetch a resource from the network, returning a promise which is fulfilled once the response is available

Kind: global function
Returns: Promise.<{mimeType: string, content: (string|Buffer)}> - Promise containing the fetched resource’s MIME type and content

Param	Type	Description
params	`object`	Fetcher parameters
params.url	`string`	URL of the resource you want to fetch
[params.executeClientScripts]	`boolean`	Enable execution of client scripts. When set to `true`, this property loads the page in a headless browser to load all assets and execute client scripts before returning its content
[params.cssSelectors]	`string` \| `Array`	List of CSS selectors to await when loading the resource in a headless browser. Can be a CSS selector or an array of CSS selectors. Only relevant when `executeClientScripts` is enabled
[params.config]	`object`	Fetcher configuration
[params.config.navigationTimeout]	`number`	Maximum time (in milliseconds) to wait before considering the fetch failed
[params.config.language]	`string`	Language (in ISO 639-1 format) to be passed in request headers
[params.config.waitForElementsTimeout]	`number`	Maximum time (in milliseconds) to wait for selectors to exist on page before considering the fetch failed. Only relevant when `executeClientScripts` is enabled

Node.js module

Classes 🔗

Functions 🔗

SourceDocument 🔗

new SourceDocument(params) 🔗

extract(sourceDocument) ⇒ Promise.<string> 🔗

launchHeadlessBrowser() ⇒ Promise.<puppeteer.Browser> 🔗

stopHeadlessBrowser() ⇒ Promise.<void> 🔗

fetch(params) ⇒ Promise.<{mimeType: string, content: (string|Buffer)}> 🔗

`new SourceDocument(params)` 🔗

`extract(sourceDocument)` ⇒ `Promise.<string>` 🔗

`launchHeadlessBrowser()` ⇒ `Promise.<puppeteer.Browser>` 🔗

`stopHeadlessBrowser()` ⇒ `Promise.<void>` 🔗

`fetch(params)` ⇒ `Promise.<{mimeType: string, content: (string|Buffer)}>` 🔗