Aller au contenu

Insitu dataset

Describes a dataset to be imported by insitu

Properties

  • maintainer (string): The person in charge of updating or fixing the dataset as needed, along with their email adress.
  • headers_http (object): The headers to use when fetching the source file.
  • downloaded_filename (string): The filename of the downloaded file.
  • expected (object): Describes the expected source data file. Cannot contain additional properties.
  • format (string): The expected data file format. Must be one of: ['xls', 'xlsx', 'xlsm', 'csv', 'dbf', 'json', 'ndjson', 'zipshapefile'].
  • name (string): The expected pattern the name of the file should match.
  • sheet (string): If the source file is a spreadsheet, which sheet contains the data.
  • sheets (array): If the source file is a spreadsheet, which sheets contain the data.
  • delimiter (string): The delimiter used in the source csv file.
  • encoding (string): The encoding of the source file (if applicable).
  • headerless (boolean): Reference columns by number (instead of names in a header row).
  • flatten (boolean): Flatten all JSON fields (default is true).
  • skip_rows (integer): Number of rows to skip in the file before reading the data.
  • mode (string): How the data should be written in the database. Must be one of: ['replace', 'append'].
  • table (string): The name of the target table in the database.
  • dependencies (array): Other datasets that must be loaded before this one.
  • Items (string)
  • last_modified_date (object): The date the data was last modified.
  • source (string): The source of truth for the last modified date. Must be one of: ['filename', 'file', 'zip', 'grist', 'http_header', 'download'].
  • params (object): Extra parameters, based on the chosen source.
    • group (string): Name of the regexp group to pick from expected.name, when using the filename source (default: modified).
    • header (string): Name of the HTTP header to use, when the using the http_header source (default: Last-Modified).
  • columns (array): Describes the columns that will be created in the target table.
  • Items (object): Cannot contain additional properties.
    • source (['integer', 'string']): The column number or name in which to get the data.
    • regex (string): Regular expression used to extract data from the column values.
    • display (string): The human-friendly display name of the column.
    • grist_col_ref (integer): The Grist reference of the column.
    • db (object): Schema information for the database. Cannot contain additional properties.
    • column (string): The name of the column in the database.
    • type (string): The datatype of the column in the database.
    • validator (['object', 'string']): A validator function for the column. Cannot contain additional properties.
    • path (string): The python path to the validator function.
    • params (object): Parameters for the validator function.
    • validators (array): An ordered list of validator functions for the column.
    • Items (['object', 'string'])
    • as (string): A shortcut for often used data columns (defined in datasets/init.py). Must be one of: ['insee_com', 'insee_dep', 'insee_reg', 'insee_epci', 'insee_collectivite', 'annee'].
    • missing_values (array): The values that mean the data is missing (NULL).
    • Items (string)
    • skip_values (array): The values to ignore and skip in the column.
    • keep_values (array): The values to keep in the column ??
    • replacement (string): New column name based on extracted variables in regex.
    • nullable (boolean): Specify if the value can be null.
    • source_pattern (string): A pattern to match in source columns.
    • format_with_row_values (string): Replace the value using this template, formatted with the values of other cells in the same row.
    • melt (object): Melt the columns defined by source_pattern into multiple rows.
    • variable (object): The 'variable' destination column.
      • column (string): The name of the column in the database.
      • type (string): The datatype of the column in the database.
      • value (string): The value of the column.
    • value (object): The 'value' destination column.
      • column (string): The name of the column in the database.
      • type (string): The datatype of the column in the database.
    • nosource (boolean)
  • indicateurs
  • geot (string): Identifier for this dataset in the GeoT export (unused).
  • description (string): A human-friendly description of the dataset.
  • keywords (array): Keywords to help users find the dataset.
  • Items (string)
  • producer (object): Information about the producer of the data. Cannot contain additional properties.
  • name (string): Name of the data producer.
  • url (string): The url on the producer's website where you can find the data.
  • contributors (array): A list of contributors to the dataset.
  • Items (object): Cannot contain additional properties.
    • title (string): Name of the contributor.
    • email (string): Email of the contributor.
    • path (string): The url on the contributor's website where you can find the data.
    • role (string): The role of the contributor. Must be one of: ['author', 'publisher', 'maintainer', 'wrangler', 'contributor']. Default: contributor.
    • organization (string): The organization of the contributor.
  • licenses (array): The license(s) under which the package is provided.
  • Items (object)
    • name (string): Open Definition license ID.
    • title (string): A human-readable title.
    • path (string): Either a URL or a relative POSIX path.
  • expires (string): Duration after which the dataset is considered expired (follows data.gouv naming convention). Must be one of: ['stable', 'unknown', 'punctual', 'continuous', 'hourly', 'fourTimesADay', 'threeTimesADay', 'semidaily', 'daily', 'fourTimesAWeek', 'threeTimesAWeek', 'semiweekly', 'weekly', 'biweekly', 'threeTimesAMonth', 'semimonthly', 'monthly', 'bimonthly', 'quarterly', 'threeTimesAYear', 'semiannual', 'annual', 'biennial', 'triennial', 'quinquennial', 'irregular'].
  • datagouv (object): A standard set of information about the data.gouv.fr source. Cannot contain additional properties.
  • id (string): The dataset id on data.gouv.fr.
  • resources (object): The files ids used to build the stable URL on data.gouv.fr.
    • json (string)
    • csv (string)
  • overwrite-description (boolean): Enable overwriting of the description to datagouv, this is true by default.
  • source (string): The URL of the source data file, for automatic fetching by insitu. Must not require login.
  • visibility (string): The visibility of the dataset (default: public). Must be one of: ['public', 'private'].