pipeline.src.flows.control_anteriority

Attributes

`control_rate_bins`
`control_rate_bins_risk_factors`
`infraction_rate_bins`

Functions

`compute_control_dates_coefficients`(→ pandas.Series)	For each date in `control_dates`, computes a coefficient determined by its
`compute_control_ranks_coefficients`(→ numpy.array)	Given a `numpy.array` of integers representing the rank of vessels controls
`extract_last_years_controls`(→ pandas.DataFrame)	Extracts controls data of the last 5 years for all vessels.
`extract_fishing_infraction_natinfs`(→ set)	Extracts all `natinf_code` of `infractions` related to fishing non-compliance
`extract_vessels_most_recent_control`(→ pandas.DataFrame)	Extracts data about the most recent control of each vessel.
`transform_vessels_most_recent_control`(→ pandas.DataFrame)
`compute_control_rate_risk_factors`(→ pandas.DataFrame)	Given controls data on 3+ years, computes the control rate risk factor of each
`compute_infraction_rate_risk_factors`(→ pandas.DataFrame)	Given control results data of vessels, computes the
`compute_control_statistics`(→ pandas.DataFrame)	Computes control statistics per vessel.
`merge`(→ pandas.DataFrame)	Merge of `pd.DataFrame` to produce output of the flow. The join is performed
`load_control_anteriority`(control_anteriority)	Load the output of `merge` task into `control_anteriority` table.
`control_anteriority_flow`([number_years])

Module Contents

pipeline.src.flows.control_anteriority.control_rate_bins[source]

pipeline.src.flows.control_anteriority.control_rate_bins_risk_factors[source]

pipeline.src.flows.control_anteriority.infraction_rate_bins[source]

pipeline.src.flows.control_anteriority.compute_control_dates_coefficients(control_dates: pandas.Series, from_date: datetime.datetime, to_date: datetime.datetime) → pandas.Series[source]

For each date in control_dates, computes a coefficient determined by its distance from from_date relative to the distance between from_date and to_date.

Parameters:

control_dates (pd.Series) – Series of datetime.datetime
from_date (datetime) – Start of time interval considered
to_date (datetime) – Start of time interval considered

Returns:

[description]

Return type:

pd.Series

Examples

>>> import pandas as pd
>>> from datetime import datetime
>>> from_date = datetime(2021, 1, 1)
>>> to_date = datetime(2023, 1, 1)
>>> dates = pd.Series([
            datetime(2019, 6, 5),
            datetime(2021, 1, 1),
            datetime(2022, 1, 1),
            datetime(2025, 5, 2)
    ])
>>> compute_control_dates_coefficients(
            dates,
            from_date=from_date,
            to_date=to_date
    )
0    0.0
1    0.0
2    0.5
3    0.0
dtype: float64

pipeline.src.flows.control_anteriority.compute_control_ranks_coefficients(control_ranks: numpy.array) → numpy.array[source]

Given a numpy.array of integers representing the rank of vessels controls over time, returns the corresponding coefficients with which they must be taken into account in the risk factor.

The input array represents the controls of several vessels. For each vessel, controls are sorted from most to least recent and ranked (1, 2, 3…).

The output is an array with the same shape which contains coefficients defined as:

1.0 for controls of rank 1

0.9 for controls of rank 2

…

0.1 for controls of rank 10

0.0 for controls of rank 11+

Parameters:: control_ranks (np.array) – 1D-array of integers >= 1
Returns:: array with the same shape and with coefficients between 1 (for controls of rank 1) and 0 (for controls of rank >= 10).
Return type:: np.array

Examples

>>> ranks = np.array([1, 4, 2, 2, 12, 2, 4])
>>> compute_control_ranks_coefficients(ranks)
np.array([1.0, 0.7, 0.9, 0.9, 0.0, 0.9, 0.7])

pipeline.src.flows.control_anteriority.extract_last_years_controls(years: int) → pandas.DataFrame[source]

Extracts controls data of the last 5 years for all vessels.

Returns:: all vessels’ controls data for the last 5 years.
Return type:: pd.DataFrame

pipeline.src.flows.control_anteriority.extract_fishing_infraction_natinfs() → set[source]

Extracts all natinf_code of infractions related to fishing non-compliance (safety non compliance events are excluded).

Returns:: Set of infractions natinf_codes related to fishing
Return type:: set

pipeline.src.flows.control_anteriority.extract_vessels_most_recent_control(years: int) → pandas.DataFrame[source]

Extracts data about the most recent control of each vessel.

Returns:: DataFrame containing the most recent control of each vessel within the last 5 years
Return type:: pd.DataFrame

pipeline.src.flows.control_anteriority.transform_vessels_most_recent_control(controls: pandas.DataFrame) → pandas.DataFrame[source]

pipeline.src.flows.control_anteriority.compute_control_rate_risk_factors(controls: pandas.DataFrame) → pandas.DataFrame[source]

Given controls data on 3+ years, computes the control rate risk factor of each vessel.

The idea is that vessels that have been controlled less over the past 3 years and that have not been controlled for a certain time have a higher priority of control than vessels that were controlled many times over the past 3 years and that were controlled recently.

Parameters:: controls (pd.DataFrame) – pd.DataFrame of controls data on the last 3+ years
Returns:: for each vessel, the component of the risk factor related to the control rate of each vessel.
Return type:: pd.DataFrame

pipeline.src.flows.control_anteriority.compute_infraction_rate_risk_factors(controls: pandas.DataFrame, fishing_infraction_natinfs: set) → pandas.DataFrame[source]

Given control results data of vessels, computes the infraction rate risk factor of each vessel.

The idea is that vessels which committed infractions in the past have a higher priority of control than vessels that were in order.

Only violations related to fishing non-compliance are taken into account. Safety non-compliance evens are not taken into account.

If a vessel was controlled more than 10 times, only the 10 most recent control results are taken into account.

Parameters:

controls (pd.DataFrame) – control results data
fishing_infraction_natinfs (set) – set of infractions natinfs related to fishing non-compliance.

Returns:

for each vessel, the component of the risk factor related to the infraction rate of each vessel.

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.compute_control_statistics(controls: pandas.DataFrame) → pandas.DataFrame[source]

Computes control statistics per vessel.

Parameters:: controls (pd.DataFrame) – Controls data, output of extract_last_years_controls

Return: pd.DataFrame: control statistics per vessel

pipeline.src.flows.control_anteriority.merge(control_rate_risk_factors: pandas.DataFrame, infraction_rate_risk_factor: pandas.DataFrame, last_controls: pandas.DataFrame, control_statistics: pandas.DataFrame) → pandas.DataFrame[source]

Merge of pd.DataFrame to produce output of the flow. The join is performed on vessel_id.

Parameters:

control_rate_risk_factors (pd.DataFrame) – output of
task (compute_infraction_rate_risk_factors)
infraction_rate_risk_factor (pd.DataFrame) – output of
task
last_controls (pd.DataFrame) – output of get_last_controls task

Returns:

join of the 3 input pd.DataFrame

Return type:

pd.DataFrame

pipeline.src.flows.control_anteriority.load_control_anteriority(control_anteriority: pandas.DataFrame)[source]

Load the output of merge task into control_anteriority table.

Parameters:: control_anteriority (pd.DataFrame) – output of merge task.

pipeline.src.flows.control_anteriority.control_anteriority_flow(number_years: int = 5)[source]