pipeline.src.shared_tasks.vessels

Functions

add_vessel_id(→ pandas.DataFrame)

Adds a vessel_id column to the input DataFrame by:

add_vessels_columns(→ pandas.DataFrame)

Adds the indicated columns to the input vessels DataFrame.

Module Contents

pipeline.src.shared_tasks.vessels.add_vessel_id(vessels: pandas.DataFrame, vessels_table: sqlalchemy.Table) pandas.DataFrame[source]

Adds a vessel_id column to the input DataFrame by:

  • querying all vessels in the vessels table that have a matching cfr, ircs or external_immatriculation

  • matching the found vessels to the input vessels using the merge_vessel_id helper.

Parameters:
  • vessels (pd.DataFrame) – DataFrame of vessels. Must have columns cfr, ircs

  • external_immatriculation (and)

  • vessels_table (Table) – sqlalchemy Table of vessels.

Returns:

Same as input with an added vessel_id column.

Return type:

pd.DataFrame

pipeline.src.shared_tasks.vessels.add_vessels_columns(vessels: pandas.DataFrame, vessels_table: sqlalchemy.Table, vessels_columns_to_add: list = None, districts_table: sqlalchemy.Table = None, districts_columns_to_add: list = None) pandas.DataFrame[source]

Adds the indicated columns to the input vessels DataFrame.

Parameters:
  • vessels (pd.DataFrame) – DataFrame of vessels. Must have vessel_id column.

  • vessels_table (Table) – vessels table.

  • vessels_columns_to_get (list, optional) – List of columns from the vessels table to add. Defaults to None.

  • districts_table (Table, optional) – district table. Must ne supplied if districts_columns_to_get is given. Defaults to None.

  • districts_columns_to_get (list, optional) – List of columns from the districts table to add. Defaults to None.

Returns:

Input DataFrame with added columns.

Return type:

pd.DataFrame