pipeline.src.shared_tasks.vessels

Functions

`add_vessel_id`(→ pandas.DataFrame)	Adds a vessel_id column to the input DataFrame by:
`add_vessels_columns`(→ pandas.DataFrame)	Adds the indicated columns to the input vessels DataFrame.

Module Contents

pipeline.src.shared_tasks.vessels.add_vessel_id(vessels: pandas.DataFrame, vessels_table: sqlalchemy.Table) → pandas.DataFrame[source]

Adds a vessel_id column to the input DataFrame by:

querying all vessels in the vessels table that have a matching cfr, ircs or external_immatriculation

matching the found vessels to the input vessels using the merge_vessel_id helper.

Parameters:

vessels (pd.DataFrame) – DataFrame of vessels. Must have columns cfr, ircs
external_immatriculation (and)
vessels_table (Table) – sqlalchemy Table of vessels.

Returns:

Same as input with an added vessel_id column.

Return type:

pd.DataFrame

pipeline.src.shared_tasks.vessels.add_vessels_columns(vessels: pandas.DataFrame, vessels_table: sqlalchemy.Table, vessels_columns_to_add: list = None, districts_table: sqlalchemy.Table = None, districts_columns_to_add: list = None) → pandas.DataFrame[source]

Adds the indicated columns to the input vessels DataFrame.

Parameters:

vessels (pd.DataFrame) – DataFrame of vessels. Must have vessel_id column.
vessels_table (Table) – vessels table.
vessels_columns_to_get (list, optional) – List of columns from the vessels table to add. Defaults to None.
districts_table (Table, optional) – district table. Must ne supplied if districts_columns_to_get is given. Defaults to None.
districts_columns_to_get (list, optional) – List of columns from the districts table to add. Defaults to None.

Returns:

Input DataFrame with added columns.

Return type:

pd.DataFrame