pipeline.src.flows.control_anteriority ====================================== .. py:module:: pipeline.src.flows.control_anteriority Attributes ---------- .. autoapisummary:: pipeline.src.flows.control_anteriority.control_rate_bins pipeline.src.flows.control_anteriority.control_rate_bins_risk_factors pipeline.src.flows.control_anteriority.infraction_rate_bins Functions --------- .. autoapisummary:: pipeline.src.flows.control_anteriority.compute_control_dates_coefficients pipeline.src.flows.control_anteriority.compute_control_ranks_coefficients pipeline.src.flows.control_anteriority.extract_last_years_controls pipeline.src.flows.control_anteriority.extract_fishing_infraction_natinfs pipeline.src.flows.control_anteriority.extract_vessels_most_recent_control pipeline.src.flows.control_anteriority.transform_vessels_most_recent_control pipeline.src.flows.control_anteriority.compute_control_rate_risk_factors pipeline.src.flows.control_anteriority.compute_infraction_rate_risk_factors pipeline.src.flows.control_anteriority.compute_control_statistics pipeline.src.flows.control_anteriority.merge pipeline.src.flows.control_anteriority.load_control_anteriority pipeline.src.flows.control_anteriority.control_anteriority_flow Module Contents --------------- .. py:data:: control_rate_bins .. py:data:: control_rate_bins_risk_factors .. py:data:: infraction_rate_bins .. py:function:: compute_control_dates_coefficients(control_dates: pandas.Series, from_date: datetime.datetime, to_date: datetime.datetime) -> pandas.Series For each date in ``control_dates``, computes a coefficient determined by its distance from ``from_date`` relative to the distance between ``from_date`` and ``to_date``. :param control_dates: Series of ``datetime.datetime`` :type control_dates: pd.Series :param from_date: Start of time interval considered :type from_date: datetime :param to_date: Start of time interval considered :type to_date: datetime :returns: [description] :rtype: pd.Series .. rubric:: Examples >>> import pandas as pd >>> from datetime import datetime >>> from_date = datetime(2021, 1, 1) >>> to_date = datetime(2023, 1, 1) >>> dates = pd.Series([ datetime(2019, 6, 5), datetime(2021, 1, 1), datetime(2022, 1, 1), datetime(2025, 5, 2) ]) >>> compute_control_dates_coefficients( dates, from_date=from_date, to_date=to_date ) 0 0.0 1 0.0 2 0.5 3 0.0 dtype: float64 .. py:function:: compute_control_ranks_coefficients(control_ranks: numpy.array) -> numpy.array Given a ``numpy.array`` of integers representing the rank of vessels controls over time, returns the corresponding coefficients with which they must be taken into account in the risk factor. The input array represents the controls of several vessels. For each vessel, controls are sorted from most to least recent and ranked (1, 2, 3...). The output is an array with the same shape which contains coefficients defined as: * 1.0 for controls of rank 1 * 0.9 for controls of rank 2 * ... * 0.1 for controls of rank 10 * 0.0 for controls of rank 11+ :param control_ranks: 1D-array of integers >= 1 :type control_ranks: np.array :returns: array with the same shape and with coefficients between 1 (for controls of rank 1) and 0 (for controls of rank >= 10). :rtype: np.array .. rubric:: Examples >>> ranks = np.array([1, 4, 2, 2, 12, 2, 4]) >>> compute_control_ranks_coefficients(ranks) np.array([1.0, 0.7, 0.9, 0.9, 0.0, 0.9, 0.7]) .. py:function:: extract_last_years_controls(years: int) -> pandas.DataFrame Extracts controls data of the last 5 years for all vessels. :returns: all vessels' controls data for the last 5 years. :rtype: pd.DataFrame .. py:function:: extract_fishing_infraction_natinfs() -> set Extracts all ``natinf_code`` of ``infractions`` related to fishing non-compliance (safety non compliance events are excluded). :returns: Set of infractions natinf_codes related to fishing :rtype: set .. py:function:: extract_vessels_most_recent_control(years: int) -> pandas.DataFrame Extracts data about the most recent control of each vessel. :returns: DataFrame containing the most recent control of each vessel within the last 5 years :rtype: pd.DataFrame .. py:function:: transform_vessels_most_recent_control(controls: pandas.DataFrame) -> pandas.DataFrame .. py:function:: compute_control_rate_risk_factors(controls: pandas.DataFrame) -> pandas.DataFrame Given controls data on 3+ years, computes the control rate risk factor of each vessel. The idea is that vessels that have been controlled less over the past 3 years and that have not been controlled for a certain time have a higher priority of control than vessels that were controlled many times over the past 3 years and that were controlled recently. :param controls: ``pd.DataFrame`` of controls data on the last 3+ years :type controls: pd.DataFrame :returns: for each vessel, the component of the risk factor related to the control rate of each vessel. :rtype: pd.DataFrame .. py:function:: compute_infraction_rate_risk_factors(controls: pandas.DataFrame, fishing_infraction_natinfs: set) -> pandas.DataFrame Given control results data of vessels, computes the infraction rate risk factor of each vessel. The idea is that vessels which committed infractions in the past have a higher priority of control than vessels that were in order. Only violations related to fishing non-compliance are taken into account. Safety non-compliance evens are not taken into account. If a vessel was controlled more than 10 times, only the 10 most recent control results are taken into account. :param controls: control results data :type controls: pd.DataFrame :param fishing_infraction_natinfs: set of infractions natinfs related to fishing non-compliance. :type fishing_infraction_natinfs: set :returns: for each vessel, the component of the risk factor related to the infraction rate of each vessel. :rtype: pd.DataFrame .. py:function:: compute_control_statistics(controls: pandas.DataFrame) -> pandas.DataFrame Computes control statistics per vessel. :param controls: Controls data, output of extract_last_years_controls :type controls: pd.DataFrame Return pd.DataFrame: control statistics per vessel .. py:function:: merge(control_rate_risk_factors: pandas.DataFrame, infraction_rate_risk_factor: pandas.DataFrame, last_controls: pandas.DataFrame, control_statistics: pandas.DataFrame) -> pandas.DataFrame Merge of ``pd.DataFrame`` to produce output of the flow. The join is performed on ``vessel_id``. :param control_rate_risk_factors: output of :type control_rate_risk_factors: pd.DataFrame :param ``compute_control_rate_risk_factors`` task: :param infraction_rate_risk_factor: output of :type infraction_rate_risk_factor: pd.DataFrame :param ``compute_infraction_rate_risk_factors`` task: :param last_controls: output of ``get_last_controls`` task :type last_controls: pd.DataFrame :returns: join of the 3 input ``pd.DataFrame`` :rtype: pd.DataFrame .. py:function:: load_control_anteriority(control_anteriority: pandas.DataFrame) Load the output of ``merge`` task into ``control_anteriority`` table. :param control_anteriority: output of ``merge`` task. :type control_anteriority: pd.DataFrame .. py:function:: control_anteriority_flow(number_years: int = 5)