Data Catalog Vocabulary (DCAT) Connector

The Data Catalog Vocabulary (DCAT) is a standard developed by the W3C (World Wide Web Consortium) designed to facilitate interoperability between data catalogs on the web. It enables the discovery and reuse of datasets by providing a common vocabulary for describing datasets and data catalogs.

See full DCAT specification here: https://www.w3.org/TR/vocab-dcat-3/

How to index DCAT:

The content of DCAT can be indexed by creating a DCAT Repository. The connector will create entries for these DCAT elements:

  • Catalog: Represents a data catalog, which is a collection of datasets. It includes metadata about the catalog itself.

  • Dataset: Represents a collection of data, published or curated by a single agent, and available for access or download.

  • Distribution: Represents an accessible form of a dataset, such as a downloadable file, an API, or a web service.

 

DCAT Repository provides these parameters:

 

Advanced parameters:

  • File Size Limit - Maximum size of file, that can be downloaded and further extracted by HQ (in megabytes). 

  • Index Distributions -  Whether to index each DCAT Distribution object  (e.g. linked files). If unchecked these files will not be further extracted and only the DCAT Datasets objects will be indexed.