Abstract Datasource Info
For each Collection Provider, it must have its own DatasourceInfo class that extends from the AbstractDatasourceInfo. The abstract class holds the standardised data source info. The API doesn’t directly interact with the data source info; it is mainly used by the get_data function of the collection provider.
Each data source defined under the collection_providers is instantiated as the corresponding data source info class when loaded. All data sources are stored in the datasources dictionary of the collection provider.
The attributes of Abstract Datasource Info class are:
data_cols: a list of column names(in string) used toget_data, default to: [‘*’] , which means all columns.exclude_data_cols: a list of column names(in string) that are excluded fromget_data, default to: [].datetime_col: a string of column name to specify the datetime column of the datasource. default toNone.zone_groups: A dictionary to map the refinement level to the column name that stores the zone IDs.nodata_mapping: A dictionary to map nodata value for a data type. Uses in zone data retrieval withzarr+zipreturns values for NA. Default tonp.nan.
Abstract Collection providers
To implement a collection provider, users need to provide initialisation of the datasources variable and the implementation of the interfaces listed below:
Variable:
datasources: a dictionary with key equals to thedatasource_idthat map to the correspodning datasource info class.
Interfaces:
get_data: implementation of the data query from the datasetget_datadictionary: implementation of getting the data dictionary (column names and data types) from the dataset, for the tiles JSON response.
Class initialisation
The collection_providers must initialise the datasources dictionary of the class with the datasources configuration from the collection_providers table. Users can reference the full example here.
For example, the ParquetDatasourceInfo and the datasources configuration:
class ParquetDatasourceInfo(AbstractDatasourceInfo):
filepath: str = ""
id_col: str = ""
conn: duckdb.DuckDBPyConnection = None
"hytruck_local": {
"filepath": "~/file_path/igeo7_4-10.parquet",
"id_col": "cell_ids",
"data_cols": ["stations_band_1", "pipelines_band_1","pipelines_band_2"],
"exclude_data_cols": ["geometry"],
"nodata_mapping": {"uint8": 255}
}
Parameters for get_data
The pydggsapi creates collection provider objects at the beginning, and data sources that share the same provider will use the same object instance. The get_data function accepts five parameters:
zoneIds: A list of zone IDs for retrieving the data at specific zonesres: refinement leveldatasource_id: defined in each collection providers to retrieve the corresponding data source info class, which is used to perform queries.cql_filter: a pygeofilter parser object for CQL2 query handling.include_datetime: a boolean to indicate if thecql_filterconsist of datetime query
Temporal data source:
The implementation should check if the datetime_col is set inside the data source info if include_datetime is True. Then, it should map the zone_datetime_placeholder (defined inside schemas/ogc_dggs/dggrs_zones.py) to the column name specified by datetime_col using the fieldmapping from pygeofilter.
Parameters for get_datadictionary
The pydggsapi passes in the getdata_params to this function, so developers can reuse the parameters to determine which data source should be used to provide the data dictionary.