Insitu dataset
Describes a dataset to be imported by insitu
Properties
maintainer
(string): The person in charge of updating or fixing the dataset as needed, along with their email adress.headers_http
(object): The headers to use when fetching the source file.downloaded_filename
(string): The filename of the downloaded file.expected
(object): Describes the expected source data file. Cannot contain additional properties.format
(string): The expected data file format. Must be one of:['xls', 'xlsx', 'xlsm', 'csv', 'dbf', 'json', 'ndjson', 'zipshapefile']
.name
(string): The expected pattern the name of the file should match.sheet
(string): If the source file is a spreadsheet, which sheet contains the data.sheets
(array): If the source file is a spreadsheet, which sheets contain the data.delimiter
(string): The delimiter used in the source csv file.encoding
(string): The encoding of the source file (if applicable).headerless
(boolean): Reference columns by number (instead of names in a header row).flatten
(boolean): Flatten all JSON fields (default is true).skip_rows
(integer): Number of rows to skip in the file before reading the data.mode
(string): How the data should be written in the database. Must be one of:['replace', 'append']
.table
(string): The name of the target table in the database.dependencies
(array): Other datasets that must be loaded before this one.- Items (string)
last_modified_date
(object): The date the data was last modified.source
(string): The source of truth for the last modified date. Must be one of:['filename', 'file', 'zip', 'grist', 'http_header', 'download']
.params
(object): Extra parameters, based on the chosen source.group
(string): Name of the regexp group to pick fromexpected.name
, when using thefilename
source (default:modified
).header
(string): Name of the HTTP header to use, when the using thehttp_header
source (default:Last-Modified
).
columns
(array): Describes the columns that will be created in the target table.- Items (object): Cannot contain additional properties.
source
(['integer', 'string']): The column number or name in which to get the data.regex
(string): Regular expression used to extract data from the column values.display
(string): The human-friendly display name of the column.grist_col_ref
(integer): The Grist reference of the column.db
(object): Schema information for the database. Cannot contain additional properties.column
(string): The name of the column in the database.type
(string): The datatype of the column in the database.validator
(['object', 'string']): A validator function for the column. Cannot contain additional properties.path
(string): The python path to the validator function.params
(object): Parameters for the validator function.validators
(array): An ordered list of validator functions for the column.- Items (['object', 'string'])
as
(string): A shortcut for often used data columns (defined in datasets/init.py). Must be one of:['insee_com', 'insee_dep', 'insee_reg', 'insee_epci', 'insee_collectivite', 'annee']
.missing_values
(array): The values that mean the data is missing (NULL).- Items (string)
skip_values
(array): The values to ignore and skip in the column.keep_values
(array): The values to keep in the column ??replacement
(string): New column name based on extracted variables in regex.nullable
(boolean): Specify if the value can be null.source_pattern
(string): A pattern to match in source columns.format_with_row_values
(string): Replace the value using this template, formatted with the values of other cells in the same row.melt
(object): Melt the columns defined by source_pattern into multiple rows.variable
(object): The 'variable' destination column.column
(string): The name of the column in the database.type
(string): The datatype of the column in the database.value
(string): The value of the column.
value
(object): The 'value' destination column.column
(string): The name of the column in the database.type
(string): The datatype of the column in the database.
nosource
(boolean)
indicateurs
geot
(string): Identifier for this dataset in the GeoT export (unused).description
(string): A human-friendly description of the dataset.keywords
(array): Keywords to help users find the dataset.- Items (string)
producer
(object): Information about the producer of the data. Cannot contain additional properties.name
(string): Name of the data producer.url
(string): The url on the producer's website where you can find the data.contributors
(array): A list of contributors to the dataset.- Items (object): Cannot contain additional properties.
title
(string): Name of the contributor.email
(string): Email of the contributor.path
(string): The url on the contributor's website where you can find the data.role
(string): The role of the contributor. Must be one of:['author', 'publisher', 'maintainer', 'wrangler', 'contributor']
. Default:contributor
.organization
(string): The organization of the contributor.
licenses
(array): The license(s) under which the package is provided.- Items (object)
name
(string): Open Definition license ID.title
(string): A human-readable title.path
(string): Either a URL or a relative POSIX path.
expires
(string): Duration after which the dataset is considered expired (follows data.gouv naming convention). Must be one of:['stable', 'unknown', 'punctual', 'continuous', 'hourly', 'fourTimesADay', 'threeTimesADay', 'semidaily', 'daily', 'fourTimesAWeek', 'threeTimesAWeek', 'semiweekly', 'weekly', 'biweekly', 'threeTimesAMonth', 'semimonthly', 'monthly', 'bimonthly', 'quarterly', 'threeTimesAYear', 'semiannual', 'annual', 'biennial', 'triennial', 'quinquennial', 'irregular']
.datagouv
(object): A standard set of information about the data.gouv.fr source. Cannot contain additional properties.id
(string): The dataset id on data.gouv.fr.resources
(object): The files ids used to build the stable URL on data.gouv.fr.json
(string)csv
(string)
overwrite-description
(boolean): Enable overwriting of the description to datagouv, this is true by default.source
(string): The URL of the source data file, for automatic fetching by insitu. Must not require login.visibility
(string): The visibility of the dataset (default: public). Must be one of:['public', 'private']
.