This reference documentation details all available properties that can be specified in a service’s declaration file.
The examples given throughout this reference can be seen in context in the declarations files from the Demo collection
name
string
requiredOpen Terms Archive
terms
object of objects
requiredMap of terms associated with a service, where keys are standardized term types (e.g., “Privacy Policy”, “Terms of Service”), and values are term objects containing the configuration for fetching and processing each document, as detailed in the Terms declaration section.
To facilitate cross-service comparisons and ensure consistency, a standardized list of term types is maintained in a dedicated repository.
Please note, the terms type may differ from the exact name provided by the service, but it should align with the underlying commitment. For example, some providers might call “Terms and Conditions” or “Terms of Use” what some others call “Terms of Service”.
"terms": {
"Terms of Service": {
"fetch": "https://opencollective.com/tos",
"select": ".markdown"
},
"Privacy Policy": {
"fetch": "https://opencollective.com/privacypolicy",
"select": ".markdown"
}
}
fetch
uri
requiredhttps://opentermsarchive.org/en/privacy-policy
select
string, object or array
required for HTML documentsThe way to select the parts of the document to extract. Can be:
As a direct CSS selector:
"select": "#article-contents"
As a range selector object:
"select": {
"startBefore": "h1",
"endBefore": "#toc-heading"
}
As an array of those:
"select": [
"#article-contents",
{
"startBefore": "h1",
"endBefore": "#toc-heading"
}
]
executeClientScripts
booleanBoolean flag to execute client-side JavaScript before accessing content.
When enabled, this loads the page in a headless browser to execute client-side scripts and load dynamic content, which is necessary when JavaScript modifies or loads content after the initial page load.
true
remove
string, object or arrayThe way to remove the parts of the document that are not part of the terms and can be considered as noise. Can be:
As a direct CSS selector:
"remove": ".nav, .breadcrumb"
As a range selector object:
"remove": {
"startBefore": ".nav",
"endBefore": ".breadcrumb"
}
As an array of those:
"remove": [
".nav, .breadcrumb",
{
"startBefore": "#contact-us",
"endBefore": "#footer"
}
]
filter
array of strings[“filterName1”, “filterName2”]
combine
array of objectsAn array of terms declaration objects that will be combined into a single terms document. Each object in the array can contain all the same properties as a regular terms declaration (except “combine”).
Common properties (can be a combination of “select”, “remove”, “filter” and “executeClientScripts”) that are shared across all source documents can be factorized by declaring them at the root level of the terms declaration.
"combine": [
{
"fetch": "https://example.com/terms/part1",
"select": "#main-content",
"remove": ".ads"
},
{
"fetch": "https://example.com/terms/part2",
"select": "#main-content",
"remove": ".ads"
}
]
startBefore
CSS selector
either startBefore
or startAfter
is required#privacy-eea
startAfter
CSS selector
either startBefore
or startAfter
is required#privacy-eea
endBefore
CSS selector
either endBefore
or endAfter
is requiredfooter
endAfter
CSS selector
either endBefore
or endAfter
is requiredfooter
To capture content starting from and including a privacy section up until but excluding the footer:
{
"startBefore": "#privacy-section",
"endBefore": "footer"
}