pipeline.src.read_query ======================= .. py:module:: pipeline.src.read_query Functions --------- .. autoapisummary:: pipeline.src.read_query.read_saved_query pipeline.src.read_query.read_query pipeline.src.read_query.read_table Module Contents --------------- .. py:function:: read_saved_query(sql_filepath: Union[str, pathlib.Path], *, db: str = None, con: Union[sqlalchemy.engine.Connection, sqlalchemy.engine.Engine] = None, chunksize: Union[None, str] = None, params: Union[dict, None] = None, backend: str = 'pandas', geom_col: str = 'geom', crs: Union[int, None] = None, parse_dates: Union[list, dict, None] = None, return_pyarrow_dtypes: bool = False, **kwargs) -> Union[pandas.DataFrame, geopandas.GeoDataFrame] Run saved SQLquery on a database. Supported databases : - 'ocan' : OCAN database - 'fmc': FMC database - 'monitorfish_remote': Monitorfish database - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP - 'monitorenv_remote': Monitorfish database - 'cacem_local' : CACEM PostGIS database hosted in CNSP - 'data_warehouse' : Monitorfish/Monitorenv/RapportNav Data Warehouse Database credentials must be present in the environement. :param sql_filepath: path to .sql file, starting from the saved queries folder. example : 'ocan/nav_fr_peche.sql' :type sql_filepath: str :param db: Database name. Possible values : 'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local'. If `db` is None, `con` must be passed. :type db: str, optional :param con: `sqlalchemy.engine.Connection` or `sqlalchemy.engine.Engine` object. Mandatory if no `db` is given. Ignored if `db` is given. :type con: Union[Connection, Engine], optional :param chunksize: If specified, return an iterator where `chunksize` is the number of rows to include in each chunk. Defaults to None. :type chunksize: Union[None, str], optional :param params: Parameters to pass to execute method. Defaults to None. :type params: Union[dict, None], optional :param backend: 'pandas' to run a SQL query and return a `pandas.DataFrame` or 'geopandas' to run a PostGIS query and return a `geopandas.GeoDataFrame`. Defaults to 'pandas'. :type backend: str, optional :param geom_col: column name to convert to shapely geometries when `backend` is 'geopandas'. Ignored when `backend` is 'pandas'. Defaults to 'geom'. :type geom_col: str, optional :param crs: CRS to use for the returned GeoDataFrame; if not set, tries to determine CRS from the SRID associated with the first geometry in the database, and assigns that to all geometries. Ignored when `backend` is 'pandas'. Defaults to None. :type crs: Union[None, str], optional :param parse_dates: - List of column names to parse as dates. - Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of ``{column_name: arg dict}``, where the arg dict corresponds to the keyword arguments of :func:`pandas.to_datetime` :type parse_dates: Union[list, dict, None], optional :param return_pyarrow_dtypes: If `True`, and `db` is `"data_warehouse"`, results are returned as a pandas DataFrame of `pyarrow` dtypes. Ignored if `db` is not `"data_warehouse"`. Defaults to `False`. :type return_pyarrow_dtypes: bool, optional :param kwargs: passed to pd.read_sql or gpd.read_postgis :returns: Query results :rtype: Union[pd.DataFrame, gpd.DataFrame] .. py:function:: read_query(query: str | sqlalchemy.Selectable | sqlalchemy.TextClause, *, db: str = None, con: Union[sqlalchemy.engine.Connection, sqlalchemy.engine.Engine] = None, chunksize: Union[None, str] = None, params: Union[dict, None] = None, backend: str = 'pandas', geom_col: str = 'geom', crs: Union[int, None] = None, parse_dates: Union[list, dict, None] = None, return_pyarrow_dtypes: bool = False, **kwargs) -> Union[pandas.DataFrame, geopandas.GeoDataFrame] Run SQLquery on a database. Supported databases : - 'ocan' : OCAN database - 'fmc': FMC database - 'monitorfish_remote': Monitorfish database - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP - 'monitorenv_remote': Monitorenv database - 'cacem_local' : CACEM PostGIS database hosted in CNSP - 'data_warehouse' : Monitorfish/Monitorenv/RapportNav Data Warehouse Database credentials must be present in the environement. :param query: Query to execute (must be a string if querying data warehouse). :type query: str | Selectable | TextClause :param db: Database name. Possible values : 'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local', 'monitorenv_remote', 'cacem_local'. If `db` is None, `con` must be passed. :type db: str, optional :param con: `sqlalchemy.engine.Connection` or `sqlalchemy.engine.Engine` object. Mandatory if no `db` is given. Ignored if `db` is given. :type con: Union[Connection, Engine], optional :param chunksize: If specified, return an iterator where `chunksize` is the number of rows to include in each chunk. Defaults to None. :type chunksize: Union[None, str], optional :param params: Parameters to pass to execute method. Defaults to None. :type params: Union[dict, None], optional :param backend: 'pandas' to run a SQL query and return a `pandas.DataFrame` or 'geopandas' to run a PostGIS query and return a `geopandas.GeoDataFrame`. Defaults to 'pandas'. :type backend: str, optional :param geom_col: column name to convert to shapely geometries when `backend` is 'geopandas'. Ignored when `backend` is 'pandas'. Defaults to 'geom'. :type geom_col: str, optional :param crs: CRS to use for the returned GeoDataFrame; if not set, tries to determine CRS from the SRID associated with the first geometry in the database, and assigns that to all geometries. Ignored when `backend` is 'pandas'. Defaults to None. :type crs: Union[None, str], optional :param parse_dates: - List of column names to parse as dates. - Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of ``{column_name: arg dict}``, where the arg dict corresponds to the keyword arguments of :func:`pandas.to_datetime` :type parse_dates: Union[list, dict, None], optional :param return_pyarrow_dtypes: If `True`, and `db` is `"data_warehouse"`, results are returned as a pandas DataFrame of `pyarrow` dtypes. Ignored if `db` is not `"data_warehouse"`. Defaults to `False`. :type return_pyarrow_dtypes: bool, optional :param kwargs: passed to pd.read_sql or gpd.read_postgis :returns: Query results :rtype: Union[pd.DataFrame, gpd.DataFrame] .. py:function:: read_table(db: str, schema: str, table_name: str) Loads database table into pandas Dataframe. Supported databases : - 'ocan' : OCAN database - 'fmc': FMC database - 'monitorfish_remote': Monitorfish database - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP - 'monitorenv_remote': Monitorenv database - 'cacem_local' : CACEM PostGIS database hosted in CNSP :param db: Database name. Possible values : 'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local', 'monitorenv_remote', 'cacem_local'. :type db: str :param schema: Schema name :type schema: str :param table_name: Table name :type table_name: str :returns: Dataframe containing the entire table :rtype: pd.DataFrame