Service declaration

This reference documentation details all available properties that can be specified in a service’s declaration file.

The examples given throughout this reference can be seen in context in the declarations files from the Demo collection

Properties 🔗

name string required

The name of the service.

Example: Open Terms Archive

terms object of objects required

Map of terms associated with a service, where keys are standardized term types (e.g., “Privacy Policy”, “Terms of Service”), and values are term objects containing the configuration for fetching and processing each document, as detailed in the Terms declaration section.

To facilitate cross-service comparisons and ensure consistency, a standardized list of term types is maintained in a dedicated repository.

Please note, the terms type may differ from the exact name provided by the service, but it should align with the underlying commitment. For example, some providers might call “Terms and Conditions” or “Terms of Use” what some others call “Terms of Service”.

Example:

"terms": {
    "Terms of Service": {
      "fetch": "https://opencollective.com/tos",
      "select": ".markdown"
    },
    "Privacy Policy": {
      "fetch": "https://opencollective.com/privacypolicy",
      "select": ".markdown"
    }
}

Reference:Terms Types.

Terms declaration 🔗

fetch uri required

The URL where the terms document can be downloaded.

Example: https://opentermsarchive.org/en/privacy-policy

select string, object or array required for HTML documents

The way to select the parts of the document to extract. Can be:

a CSS selector string. See the CSS Selectors specification
a range selector object. See the range selector section
an array of those

Example:

As a direct CSS selector:

"select": "#article-contents"

As a range selector object:

"select": {
    "startBefore": "h1",
    "endBefore": "#toc-heading"
}

As an array of those:

"select": [
    "#article-contents",
    {
        "startBefore": "h1",
        "endBefore": "#toc-heading"
    }
]

executeClientScripts boolean

Boolean flag to execute client-side JavaScript before accessing content.

When enabled, this loads the page in a headless browser to execute client-side scripts and load dynamic content, which is necessary when JavaScript modifies or loads content after the initial page load. If undefined, the engine will automatically balance performance and tracking success rate, defaulting to not executing scripts and escalating to headless browser if the page fails to load.

Default: undefined

Example: true

remove string, object or array

The way to remove the parts of the document that are not part of the terms and can be considered as noise. Can be:

a CSS selector string. See the CSS Selectors specification
a range selector object. See the range selector section
an array of those

Example:

As a direct CSS selector:

"remove": ".nav, .breadcrumb"

As a range selector object:

"remove": {
    "startBefore": ".nav",
    "endBefore": ".breadcrumb"
}

As an array of those:

"remove": [
    ".nav, .breadcrumb",
    {
        "startBefore": "#contact-us",
        "endBefore": "#footer"
    }
]

filter array of strings

Array of filter function names to apply. Function will be executed in the order of the array. See the Filters section for more information.

Example: [“filterName1”, “filterName2”]

combine array of objects

An array of terms declaration objects that will be combined into a single terms document. Each object in the array can contain all the same properties as a regular terms declaration (except “combine”).

Common properties (can be a combination of “select”, “remove”, “filter” and “executeClientScripts”) that are shared across all source documents can be factorized by declaring them at the root level of the terms declaration.

Example:

"combine": [
    {
        "fetch": "https://example.com/terms/part1",
        "select": "#main-content",
        "remove": ".ads"
    },
    {
        "fetch": "https://example.com/terms/part2",
        "select": "#main-content",
        "remove": ".ads"
    }
]

Range selector 🔗

startBefore CSS selector either startBefore or startAfter is required

The CSS selector for the element before which the range starts.

Example: #privacy-eea

startAfter CSS selector either startBefore or startAfter is required

The CSS selector for the element after which the range starts.

Example: #privacy-eea

endBefore CSS selector either endBefore or endAfter is required

The CSS selector for the element before which the range ends.

Example: footer

endAfter CSS selector either endBefore or endAfter is required

The CSS selector for the element after which the range ends.

Example: footer

Example 🔗

To capture content starting from and including a privacy section up until but excluding the footer:

{
  "startBefore": "#privacy-section",
  "endBefore": "footer"
}