pipeline.src.flows.control_anteriority
Attributes
Functions
|
For each date in |
|
Given a |
|
Extracts controls data of the last 5 years for all vessels. |
Extracts all |
|
|
Extracts data about the most recent control of each vessel. |
|
|
|
Given controls data on 3+ years, computes the control rate risk factor of each |
|
Given control results data of vessels, computes the |
|
Computes control statistics per vessel. |
|
Merge of |
|
Load the output of |
|
Module Contents
- pipeline.src.flows.control_anteriority.compute_control_dates_coefficients(control_dates: pandas.Series, from_date: datetime.datetime, to_date: datetime.datetime) pandas.Series[source]
For each date in
control_dates, computes a coefficient determined by its distance fromfrom_daterelative to the distance betweenfrom_dateandto_date.- Parameters:
control_dates (pd.Series) – Series of
datetime.datetimefrom_date (datetime) – Start of time interval considered
to_date (datetime) – Start of time interval considered
- Returns:
[description]
- Return type:
pd.Series
Examples
>>> import pandas as pd >>> from datetime import datetime >>> from_date = datetime(2021, 1, 1) >>> to_date = datetime(2023, 1, 1) >>> dates = pd.Series([ datetime(2019, 6, 5), datetime(2021, 1, 1), datetime(2022, 1, 1), datetime(2025, 5, 2) ]) >>> compute_control_dates_coefficients( dates, from_date=from_date, to_date=to_date ) 0 0.0 1 0.0 2 0.5 3 0.0 dtype: float64
- pipeline.src.flows.control_anteriority.compute_control_ranks_coefficients(control_ranks: numpy.array) numpy.array[source]
Given a
numpy.arrayof integers representing the rank of vessels controls over time, returns the corresponding coefficients with which they must be taken into account in the risk factor.The input array represents the controls of several vessels. For each vessel, controls are sorted from most to least recent and ranked (1, 2, 3…).
The output is an array with the same shape which contains coefficients defined as:
1.0 for controls of rank 1
0.9 for controls of rank 2
…
0.1 for controls of rank 10
0.0 for controls of rank 11+
- Parameters:
control_ranks (np.array) – 1D-array of integers >= 1
- Returns:
array with the same shape and with coefficients between 1 (for controls of rank 1) and 0 (for controls of rank >= 10).
- Return type:
np.array
Examples
>>> ranks = np.array([1, 4, 2, 2, 12, 2, 4]) >>> compute_control_ranks_coefficients(ranks) np.array([1.0, 0.7, 0.9, 0.9, 0.0, 0.9, 0.7])
- pipeline.src.flows.control_anteriority.extract_last_years_controls(years: int) pandas.DataFrame[source]
Extracts controls data of the last 5 years for all vessels.
- Returns:
all vessels’ controls data for the last 5 years.
- Return type:
pd.DataFrame
- pipeline.src.flows.control_anteriority.extract_fishing_infraction_natinfs() set[source]
Extracts all
natinf_codeofinfractionsrelated to fishing non-compliance (safety non compliance events are excluded).- Returns:
Set of infractions natinf_codes related to fishing
- Return type:
set
- pipeline.src.flows.control_anteriority.extract_vessels_most_recent_control(years: int) pandas.DataFrame[source]
Extracts data about the most recent control of each vessel.
- Returns:
DataFrame containing the most recent control of each vessel within the last 5 years
- Return type:
pd.DataFrame
- pipeline.src.flows.control_anteriority.transform_vessels_most_recent_control(controls: pandas.DataFrame) pandas.DataFrame[source]
- pipeline.src.flows.control_anteriority.compute_control_rate_risk_factors(controls: pandas.DataFrame) pandas.DataFrame[source]
Given controls data on 3+ years, computes the control rate risk factor of each vessel.
The idea is that vessels that have been controlled less over the past 3 years and that have not been controlled for a certain time have a higher priority of control than vessels that were controlled many times over the past 3 years and that were controlled recently.
- Parameters:
controls (pd.DataFrame) –
pd.DataFrameof controls data on the last 3+ years- Returns:
for each vessel, the component of the risk factor related to the control rate of each vessel.
- Return type:
pd.DataFrame
- pipeline.src.flows.control_anteriority.compute_infraction_rate_risk_factors(controls: pandas.DataFrame, fishing_infraction_natinfs: set) pandas.DataFrame[source]
Given control results data of vessels, computes the infraction rate risk factor of each vessel.
The idea is that vessels which committed infractions in the past have a higher priority of control than vessels that were in order.
Only violations related to fishing non-compliance are taken into account. Safety non-compliance evens are not taken into account.
If a vessel was controlled more than 10 times, only the 10 most recent control results are taken into account.
- Parameters:
controls (pd.DataFrame) – control results data
fishing_infraction_natinfs (set) – set of infractions natinfs related to fishing non-compliance.
- Returns:
for each vessel, the component of the risk factor related to the infraction rate of each vessel.
- Return type:
pd.DataFrame
- pipeline.src.flows.control_anteriority.compute_control_statistics(controls: pandas.DataFrame) pandas.DataFrame[source]
Computes control statistics per vessel.
- Parameters:
controls (pd.DataFrame) – Controls data, output of extract_last_years_controls
- Return
pd.DataFrame: control statistics per vessel
- pipeline.src.flows.control_anteriority.merge(control_rate_risk_factors: pandas.DataFrame, infraction_rate_risk_factor: pandas.DataFrame, last_controls: pandas.DataFrame, control_statistics: pandas.DataFrame) pandas.DataFrame[source]
Merge of
pd.DataFrameto produce output of the flow. The join is performed onvessel_id.- Parameters:
control_rate_risk_factors (pd.DataFrame) – output of
task (compute_infraction_rate_risk_factors)
infraction_rate_risk_factor (pd.DataFrame) – output of
task
last_controls (pd.DataFrame) – output of
get_last_controlstask
- Returns:
join of the 3 input
pd.DataFrame- Return type:
pd.DataFrame