pipeline.src.flows.control_anteriority

Attributes

control_rate_bins

control_rate_bins_risk_factors

infraction_rate_bins

Functions

compute_control_dates_coefficients(→ pandas.Series)

For each date in control_dates, computes a coefficient determined by its

compute_control_ranks_coefficients(→ numpy.array)

Given a numpy.array of integers representing the rank of vessels controls

extract_last_years_controls(→ pandas.DataFrame)

Extracts controls data of the last 5 years for all vessels.

extract_fishing_infraction_natinfs(→ set)

Extracts all natinf_code of infractions related to fishing non-compliance

extract_vessels_most_recent_control(→ pandas.DataFrame)

Extracts data about the most recent control of each vessel.

transform_vessels_most_recent_control(→ pandas.DataFrame)

compute_control_rate_risk_factors(→ pandas.DataFrame)

Given controls data on 3+ years, computes the control rate risk factor of each

compute_infraction_rate_risk_factors(→ pandas.DataFrame)

Given control results data of vessels, computes the

compute_control_statistics(→ pandas.DataFrame)

Computes control statistics per vessel.

merge(→ pandas.DataFrame)

Merge of pd.DataFrame to produce output of the flow. The join is performed

load_control_anteriority(control_anteriority)

Load the output of merge task into control_anteriority table.

control_anteriority_flow([number_years])

Module Contents

pipeline.src.flows.control_anteriority.control_rate_bins[source]
pipeline.src.flows.control_anteriority.control_rate_bins_risk_factors[source]
pipeline.src.flows.control_anteriority.infraction_rate_bins[source]
pipeline.src.flows.control_anteriority.compute_control_dates_coefficients(control_dates: pandas.Series, from_date: datetime.datetime, to_date: datetime.datetime) pandas.Series[source]

For each date in control_dates, computes a coefficient determined by its distance from from_date relative to the distance between from_date and to_date.

Parameters:
  • control_dates (pd.Series) – Series of datetime.datetime

  • from_date (datetime) – Start of time interval considered

  • to_date (datetime) – Start of time interval considered

Returns:

[description]

Return type:

pd.Series

Examples

>>> import pandas as pd
>>> from datetime import datetime
>>> from_date = datetime(2021, 1, 1)
>>> to_date = datetime(2023, 1, 1)
>>> dates = pd.Series([
            datetime(2019, 6, 5),
            datetime(2021, 1, 1),
            datetime(2022, 1, 1),
            datetime(2025, 5, 2)
    ])
>>> compute_control_dates_coefficients(
            dates,
            from_date=from_date,
            to_date=to_date
    )
0    0.0
1    0.0
2    0.5
3    0.0
dtype: float64
pipeline.src.flows.control_anteriority.compute_control_ranks_coefficients(control_ranks: numpy.array) numpy.array[source]

Given a numpy.array of integers representing the rank of vessels controls over time, returns the corresponding coefficients with which they must be taken into account in the risk factor.

The input array represents the controls of several vessels. For each vessel, controls are sorted from most to least recent and ranked (1, 2, 3…).

The output is an array with the same shape which contains coefficients defined as:

  • 1.0 for controls of rank 1

  • 0.9 for controls of rank 2

  • 0.1 for controls of rank 10

  • 0.0 for controls of rank 11+

Parameters:

control_ranks (np.array) – 1D-array of integers >= 1

Returns:

array with the same shape and with coefficients between 1 (for controls of rank 1) and 0 (for controls of rank >= 10).

Return type:

np.array

Examples

>>> ranks = np.array([1, 4, 2, 2, 12, 2, 4])
>>> compute_control_ranks_coefficients(ranks)
np.array([1.0, 0.7, 0.9, 0.9, 0.0, 0.9, 0.7])
pipeline.src.flows.control_anteriority.extract_last_years_controls(years: int) pandas.DataFrame[source]

Extracts controls data of the last 5 years for all vessels.

Returns:

all vessels’ controls data for the last 5 years.

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.extract_fishing_infraction_natinfs() set[source]

Extracts all natinf_code of infractions related to fishing non-compliance (safety non compliance events are excluded).

Returns:

Set of infractions natinf_codes related to fishing

Return type:

set

pipeline.src.flows.control_anteriority.extract_vessels_most_recent_control(years: int) pandas.DataFrame[source]

Extracts data about the most recent control of each vessel.

Returns:

DataFrame containing the most recent control of each vessel within the last 5 years

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.transform_vessels_most_recent_control(controls: pandas.DataFrame) pandas.DataFrame[source]
pipeline.src.flows.control_anteriority.compute_control_rate_risk_factors(controls: pandas.DataFrame) pandas.DataFrame[source]

Given controls data on 3+ years, computes the control rate risk factor of each vessel.

The idea is that vessels that have been controlled less over the past 3 years and that have not been controlled for a certain time have a higher priority of control than vessels that were controlled many times over the past 3 years and that were controlled recently.

Parameters:

controls (pd.DataFrame) – pd.DataFrame of controls data on the last 3+ years

Returns:

for each vessel, the component of the risk factor related to the control rate of each vessel.

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.compute_infraction_rate_risk_factors(controls: pandas.DataFrame, fishing_infraction_natinfs: set) pandas.DataFrame[source]

Given control results data of vessels, computes the infraction rate risk factor of each vessel.

The idea is that vessels which committed infractions in the past have a higher priority of control than vessels that were in order.

Only violations related to fishing non-compliance are taken into account. Safety non-compliance evens are not taken into account.

If a vessel was controlled more than 10 times, only the 10 most recent control results are taken into account.

Parameters:
  • controls (pd.DataFrame) – control results data

  • fishing_infraction_natinfs (set) – set of infractions natinfs related to fishing non-compliance.

Returns:

for each vessel, the component of the risk factor related to the infraction rate of each vessel.

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.compute_control_statistics(controls: pandas.DataFrame) pandas.DataFrame[source]

Computes control statistics per vessel.

Parameters:

controls (pd.DataFrame) – Controls data, output of extract_last_years_controls

Return

pd.DataFrame: control statistics per vessel

pipeline.src.flows.control_anteriority.merge(control_rate_risk_factors: pandas.DataFrame, infraction_rate_risk_factor: pandas.DataFrame, last_controls: pandas.DataFrame, control_statistics: pandas.DataFrame) pandas.DataFrame[source]

Merge of pd.DataFrame to produce output of the flow. The join is performed on vessel_id.

Parameters:
  • control_rate_risk_factors (pd.DataFrame) – output of

  • task (compute_infraction_rate_risk_factors)

  • infraction_rate_risk_factor (pd.DataFrame) – output of

  • task

  • last_controls (pd.DataFrame) – output of get_last_controls task

Returns:

join of the 3 input pd.DataFrame

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.load_control_anteriority(control_anteriority: pandas.DataFrame)[source]

Load the output of merge task into control_anteriority table.

Parameters:

control_anteriority (pd.DataFrame) – output of merge task.

pipeline.src.flows.control_anteriority.control_anteriority_flow(number_years: int = 5)[source]