pipeline.src.read_query
=======================

.. py:module:: pipeline.src.read_query


Functions
---------

.. autoapisummary::

   pipeline.src.read_query.read_saved_query
   pipeline.src.read_query.read_query
   pipeline.src.read_query.read_table


Module Contents
---------------

.. py:function:: read_saved_query(sql_filepath: Union[str, pathlib.Path], *, db: str = None, con: Union[sqlalchemy.engine.Connection, sqlalchemy.engine.Engine] = None, chunksize: Union[None, str] = None, params: Union[dict, None] = None, backend: str = 'pandas', geom_col: str = 'geom', crs: Union[int, None] = None, parse_dates: Union[list, dict, None] = None, return_pyarrow_dtypes: bool = False, **kwargs) -> Union[pandas.DataFrame, geopandas.GeoDataFrame]

   Run saved SQLquery on a database. Supported databases :
     - 'ocan' : OCAN database
     - 'fmc': FMC database
     - 'monitorfish_remote': Monitorfish database
     - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP
     - 'monitorenv_remote': Monitorfish database
     - 'cacem_local' : CACEM PostGIS database hosted in CNSP
     - 'data_warehouse' : Monitorfish/Monitorenv/RapportNav Data Warehouse

   Database credentials must be present in the environement.

   :param sql_filepath: path to .sql file, starting from the saved queries folder.
                        example : 'ocan/nav_fr_peche.sql'
   :type sql_filepath: str
   :param db: Database name. Possible values :
              'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local'. If `db` is None,
              `con` must be passed.
   :type db: str, optional
   :param con: `sqlalchemy.engine.Connection` or
               `sqlalchemy.engine.Engine` object. Mandatory if no `db` is given. Ignored if
               `db` is given.
   :type con: Union[Connection, Engine], optional
   :param chunksize: If specified, return an iterator where
                     `chunksize` is the number of rows to include in each chunk. Defaults to
                     None.
   :type chunksize: Union[None, str], optional
   :param params: Parameters to pass to execute method.
                  Defaults to None.
   :type params: Union[dict, None], optional
   :param backend: 'pandas' to run a SQL query and return a
                   `pandas.DataFrame` or 'geopandas' to run a PostGIS query and return a
                   `geopandas.GeoDataFrame`. Defaults to 'pandas'.
   :type backend: str, optional
   :param geom_col: column name to convert to shapely geometries when
                    `backend` is 'geopandas'. Ignored when `backend` is 'pandas'. Defaults to
                    'geom'.
   :type geom_col: str, optional
   :param crs: CRS to use for the returned GeoDataFrame;
               if not set, tries to determine CRS from the SRID associated with the first
               geometry in the database, and assigns that to all geometries. Ignored when
               `backend` is 'pandas'. Defaults to None.
   :type crs: Union[None, str], optional
   :param parse_dates:
                       - List of column names to parse as dates.
                       - Dict of ``{column_name: format string}`` where format string is
                         strftime compatible in case of parsing string times or is one of
                         (D, s, ns, ms, us) in case of parsing integer timestamps.
                       - Dict of ``{column_name: arg dict}``, where the arg dict corresponds
                         to the keyword arguments of :func:`pandas.to_datetime`
   :type parse_dates: Union[list, dict, None], optional
   :param return_pyarrow_dtypes: If `True`, and `db` is
                                 `"data_warehouse"`, results are returned as a pandas DataFrame of `pyarrow`
                                 dtypes. Ignored if `db` is not `"data_warehouse"`. Defaults to `False`.
   :type return_pyarrow_dtypes: bool, optional
   :param kwargs: passed to pd.read_sql or gpd.read_postgis

   :returns: Query results
   :rtype: Union[pd.DataFrame, gpd.DataFrame]


.. py:function:: read_query(query: str | sqlalchemy.Selectable | sqlalchemy.TextClause, *, db: str = None, con: Union[sqlalchemy.engine.Connection, sqlalchemy.engine.Engine] = None, chunksize: Union[None, str] = None, params: Union[dict, None] = None, backend: str = 'pandas', geom_col: str = 'geom', crs: Union[int, None] = None, parse_dates: Union[list, dict, None] = None, return_pyarrow_dtypes: bool = False, **kwargs) -> Union[pandas.DataFrame, geopandas.GeoDataFrame]

   Run SQLquery on a database. Supported databases :

     - 'ocan' : OCAN database
     - 'fmc': FMC database
     - 'monitorfish_remote': Monitorfish database
     - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP
     - 'monitorenv_remote': Monitorenv database
     - 'cacem_local' : CACEM PostGIS database hosted in CNSP
     - 'data_warehouse' : Monitorfish/Monitorenv/RapportNav Data Warehouse

   Database credentials must be present in the environement.

   :param query: Query to execute (must be a string if
                 querying data warehouse).
   :type query: str | Selectable | TextClause
   :param db: Database name. Possible values :
              'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local',
              'monitorenv_remote', 'cacem_local'. If `db` is None, `con` must be passed.
   :type db: str, optional
   :param con: `sqlalchemy.engine.Connection` or
               `sqlalchemy.engine.Engine` object. Mandatory if no `db` is given. Ignored if
               `db` is given.
   :type con: Union[Connection, Engine], optional
   :param chunksize: If specified, return an iterator where
                     `chunksize` is the number of rows to include in each chunk. Defaults to
                     None.
   :type chunksize: Union[None, str], optional
   :param params: Parameters to pass to execute method.
                  Defaults to None.
   :type params: Union[dict, None], optional
   :param backend: 'pandas' to run a SQL query and return a
                   `pandas.DataFrame` or 'geopandas' to run a PostGIS query and return a
                   `geopandas.GeoDataFrame`. Defaults to 'pandas'.
   :type backend: str, optional
   :param geom_col: column name to convert to shapely geometries when
                    `backend` is 'geopandas'. Ignored when `backend` is 'pandas'. Defaults to
                    'geom'.
   :type geom_col: str, optional
   :param crs: CRS to use for the returned GeoDataFrame;
               if not set, tries to determine CRS from the SRID associated with the first
               geometry in the database, and assigns that to all geometries. Ignored when
               `backend` is 'pandas'. Defaults to None.
   :type crs: Union[None, str], optional
   :param parse_dates:
                       - List of column names to parse as dates.
                       - Dict of ``{column_name: format string}`` where format string is
                         strftime compatible in case of parsing string times or is one of
                         (D, s, ns, ms, us) in case of parsing integer timestamps.
                       - Dict of ``{column_name: arg dict}``, where the arg dict corresponds
                         to the keyword arguments of :func:`pandas.to_datetime`
   :type parse_dates: Union[list, dict, None], optional
   :param return_pyarrow_dtypes: If `True`, and `db` is
                                 `"data_warehouse"`, results are returned as a pandas DataFrame of `pyarrow`
                                 dtypes. Ignored if `db` is not `"data_warehouse"`. Defaults to `False`.
   :type return_pyarrow_dtypes: bool, optional
   :param kwargs: passed to pd.read_sql or gpd.read_postgis

   :returns: Query results
   :rtype: Union[pd.DataFrame, gpd.DataFrame]


.. py:function:: read_table(db: str, schema: str, table_name: str)

   Loads database table into pandas Dataframe. Supported databases :

     - 'ocan' : OCAN database
     - 'fmc': FMC database
     - 'monitorfish_remote': Monitorfish database
     - 'monitorfish_local': Monitorfish PostGIS database hosted in CNSP
     - 'monitorenv_remote': Monitorenv database
     - 'cacem_local' : CACEM PostGIS database hosted in CNSP

   :param db: Database name. Possible values :
              'ocan', 'fmc', 'monitorfish_remote', 'monitorfish_local',
              'monitorenv_remote', 'cacem_local'.
   :type db: str
   :param schema: Schema name
   :type schema: str
   :param table_name: Table name
   :type table_name: str

   :returns: Dataframe containing the entire table
   :rtype: pd.DataFrame