pipeline.src.flows.anchorages

Classes

PortLocation

PortsVPTree

Vantage Point Tree to efficiently find the nearest port from a given

Functions

extract_ports(→ pandas.DataFrame)

Extracts ports locode, name, latitude and longitude from processed.ports. This

extract_control_ports_locodes()

Returns the set of distinct port locodes where at least one control

extract_ers_ports_locodes(→ Set[str])

Returns the set of distinct port locodes used at least once in an ERS

extract_ais_anchorage_coordinates(→ pandas.DataFrame)

Returns a DataFrame with latitude, longitude columns corresponding to

extract_vms_static_positions(→ pandas.DataFrame)

Read local file with vms positions that have speed zero.

extract_manual_anchorages_coordinates(→ pandas.DataFrame)

get_anchorage_h3_cells(→ pandas.DataFrame)

Bins input positions into h3 cells of the given resolutions and filters said h3

get_anchorage_h3_cells_rings(→ pandas.DataFrame)

Unites two sets of h3 cells corresponding to anchorage locations of vessels

get_ports_locations(→ List[PortLocation])

Transforms a DataFrame into a list of PortLocation objects.

get_anchorages_closest_port(→ pandas.DataFrame)

unite_ports_locodes(→ Set[str])

Unites sets of port locodes.

get_active_ports(→ pandas.DataFrame)

merge_closest_port_closest_active_port(→ pandas.DataFrame)

Merges anchorages closest port and closest active port.

load_processed_anchorages(anchorages)

Load anchorages to processed.anchorages

anchorages_compute_flow([h3_resolution, ...])

Flow to compute anchorages and attribute cells to ports

extract_datagouv_anchorages(→ pandas.DataFrame)

Downloads anchorages csv file, returns the result as a pandas DataFrame.

load_anchorages_to_monitorfish(anchorages)

Loads anchorages data to monitorfish database.

anchorages_flow()

Main anchorages flow - extract from data.gouv.fr and load to database

Module Contents

class pipeline.src.flows.anchorages.PortLocation[source]
locode: str[source]
port_name: str[source]
latitude: float[source]
longitude: float[source]
class pipeline.src.flows.anchorages.PortsVPTree(ports_locations: List[PortLocation])[source]

Bases: vptree.VPTree

Vantage Point Tree to efficiently find the nearest port from a given Position(lat, lon).

If there are p ports in the tree, searching for the port that is closest to a given Position has complexity log(p).

get_nearest_port(pos: src.helpers.spatial.Position) dict[source]

Returns the distance (in meters) and locode of the PortLocation that is closest to the input Position.

Parameters:

pos (Position) – Position instance

Returns:

dict with nearest_port_distance and

nearest_port_locode keys.

Return type:

dict

pipeline.src.flows.anchorages.extract_ports() pandas.DataFrame[source]

Extracts ports locode, name, latitude and longitude from processed.ports. This table therefore needs to be filled before using this function.

Returns:

DataFrame of ports with locode, port_name, longitude and latitude

columns.

Return type:

pd.DataFrame

pipeline.src.flows.anchorages.extract_control_ports_locodes()[source]

Returns the set of distinct port locodes where at least one control was done.

Returns:

set of port locodes

Return type:

Set[str]

pipeline.src.flows.anchorages.extract_ers_ports_locodes() Set[str][source]

Returns the set of distinct port locodes used at least once in an ERS DEP, PNO ou LAN message.

Returns:

set of port locodes

Return type:

Set[str]

pipeline.src.flows.anchorages.extract_ais_anchorage_coordinates() pandas.DataFrame[source]

Returns a DataFrame with latitude, longitude columns corresponding to S2 cells identified as docks in AIS global positions.

pipeline.src.flows.anchorages.extract_vms_static_positions(parquet_file_relative_path) pandas.DataFrame[source]

Read local file with vms positions that have speed zero.

Returns:

DataFrame with latitude and longitude columns.

Return type:

pd.DataFrame

pipeline.src.flows.anchorages.extract_manual_anchorages_coordinates() pandas.DataFrame[source]
pipeline.src.flows.anchorages.get_anchorage_h3_cells(static_positions: pandas.DataFrame, h3_resolution: int = 9, number_signals_threshold: int = 100) pandas.DataFrame[source]

Bins input positions into h3 cells of the given resolutions and filters said h3 cells to keep only the ones that appear at least number_signals_threshold times in the dataset.

Parameters:
  • static_positions (pd.DataFrame) – DataFrame with latitude and longitude columns

  • h3_resolution (int) – h3 resolution to use

  • number_signals_threshold (int) – number of occurences below which h3 cells are filtered out

pipeline.src.flows.anchorages.get_anchorage_h3_cells_rings(ais_anchorage_h3_cells: Set[str], vms_anchorage_h3_cells: Set[str], manual_anchorage_h3_cells: Set[str]) pandas.DataFrame[source]

Unites two sets of h3 cells corresponding to anchorage locations of vessels in AIS and VMS data, then adds two “rings” of cells around them. Returns the result as a DataFrame containing the indices, latitude and longitude of cells as well as whether each cell was present in the original cells (ring 0) or was added in rings 1 and 2 that surround the initial cells.

Parameters:
  • ais_anchorage_h3_cells (Set[str]) – set of indices of h3 cells where vessels anchor (AIS data)

  • vms_anchorage_h3_cells (Set[str]) – set of indices of h3 cells where vessels anchor (VMS data)

  • manual_anchorage_h3_cells (Set[str]) – set of additional indices of h3 cells

Returns:

DataFrame of h3 cells with 2 levels of rings added

Return type:

pd.DataFrame

pipeline.src.flows.anchorages.get_ports_locations(ports: pandas.DataFrame) List[PortLocation][source]

Transforms a DataFrame into a list of PortLocation objects.

Parameters:

ports (pd.DataFrame) – DataFrame with columns matching the fields of a PortLocation object.

Returns:

List[PortLocation]

pipeline.src.flows.anchorages.get_anchorages_closest_port(anchorage_h3_cells_rings: pandas.DataFrame, ports_locations: List[PortLocation]) pandas.DataFrame[source]
pipeline.src.flows.anchorages.unite_ports_locodes(ers_ports_locode: Set[str], control_ports_locodes: Set[str]) Set[str][source]

Unites sets of port locodes.

Parameters:
  • ers_ports_locode (Set[str]) – set of the locodes of ports used in ERS

  • control_ports_locodes (Set[str]) – set of the locodes of ports used in controls

Returns:

union of the two input sets

Return type:

Set[str]

pipeline.src.flows.anchorages.get_active_ports(ports: pandas.DataFrame, active_ports_locodes: Set[str]) pandas.DataFrame[source]
pipeline.src.flows.anchorages.merge_closest_port_closest_active_port(anchorages_closest_port: pandas.DataFrame, anchorages_closest_active_port: pandas.DataFrame) pandas.DataFrame[source]

Merges anchorages closest port and closest active port.

pipeline.src.flows.anchorages.load_processed_anchorages(anchorages: pandas.DataFrame)[source]

Load anchorages to processed.anchorages

pipeline.src.flows.anchorages.anchorages_compute_flow(h3_resolution: int = ANCHORAGES_H3_CELL_RESOLUTION, number_signals_threshold: int = 100, static_vms_positions_file_path: str = 'data/raw/anchorages/static_vms_positions_2021_03_to_10.parquet')[source]

Flow to compute anchorages and attribute cells to ports

pipeline.src.flows.anchorages.extract_datagouv_anchorages(anchorages_url: str, proxies: dict) pandas.DataFrame[source]

Downloads anchorages csv file, returns the result as a pandas DataFrame.

Parameters:
  • anchorages_url (str) – url to download the data from.

  • proxies (dict) – dict with http_proxy and https_proxy settings to use for the download

Returns:

anchorages data

Return type:

pd.DataFrame

pipeline.src.flows.anchorages.load_anchorages_to_monitorfish(anchorages: pandas.DataFrame)[source]

Loads anchorages data to monitorfish database.

Parameters:

anchorages (pd.DataFrame) – anchorages data

pipeline.src.flows.anchorages.anchorages_flow()[source]

Main anchorages flow - extract from data.gouv.fr and load to database