pipeline.src.flows.anchorages
Classes
Vantage Point Tree to efficiently find the nearest port from a given |
Functions
|
Extracts ports locode, name, latitude and longitude from processed.ports. This |
Returns the set of distinct port locodes where at least one control |
|
|
Returns the set of distinct port locodes used at least once in an ERS |
|
Returns a DataFrame with latitude, longitude columns corresponding to |
|
Read local file with vms positions that have speed zero. |
|
|
|
Bins input positions into h3 cells of the given resolutions and filters said h3 |
|
Unites two sets of h3 cells corresponding to anchorage locations of vessels |
|
Transforms a DataFrame into a list of PortLocation objects. |
|
|
|
Unites sets of port locodes. |
|
|
|
Merges anchorages closest port and closest active port. |
|
Load anchorages to processed.anchorages |
|
Flow to compute anchorages and attribute cells to ports |
|
Downloads anchorages csv file, returns the result as a pandas DataFrame. |
|
Loads anchorages data to monitorfish database. |
Main anchorages flow - extract from data.gouv.fr and load to database |
Module Contents
- class pipeline.src.flows.anchorages.PortsVPTree(ports_locations: List[PortLocation])[source]
Bases:
vptree.VPTreeVantage Point Tree to efficiently find the nearest port from a given Position(lat, lon).
If there are p ports in the tree, searching for the port that is closest to a given Position has complexity log(p).
- get_nearest_port(pos: src.helpers.spatial.Position) dict[source]
Returns the distance (in meters) and locode of the PortLocation that is closest to the input Position.
- Parameters:
pos (Position) – Position instance
- Returns:
- dict with nearest_port_distance and
nearest_port_locode keys.
- Return type:
dict
- pipeline.src.flows.anchorages.extract_ports() pandas.DataFrame[source]
Extracts ports locode, name, latitude and longitude from processed.ports. This table therefore needs to be filled before using this function.
- Returns:
- DataFrame of ports with locode, port_name, longitude and latitude
columns.
- Return type:
pd.DataFrame
- pipeline.src.flows.anchorages.extract_control_ports_locodes()[source]
Returns the set of distinct port locodes where at least one control was done.
- Returns:
set of port locodes
- Return type:
Set[str]
- pipeline.src.flows.anchorages.extract_ers_ports_locodes() Set[str][source]
Returns the set of distinct port locodes used at least once in an ERS DEP, PNO ou LAN message.
- Returns:
set of port locodes
- Return type:
Set[str]
- pipeline.src.flows.anchorages.extract_ais_anchorage_coordinates() pandas.DataFrame[source]
Returns a DataFrame with latitude, longitude columns corresponding to S2 cells identified as docks in AIS global positions.
- pipeline.src.flows.anchorages.extract_vms_static_positions(parquet_file_relative_path) pandas.DataFrame[source]
Read local file with vms positions that have speed zero.
- Returns:
DataFrame with latitude and longitude columns.
- Return type:
pd.DataFrame
- pipeline.src.flows.anchorages.get_anchorage_h3_cells(static_positions: pandas.DataFrame, h3_resolution: int = 9, number_signals_threshold: int = 100) pandas.DataFrame[source]
Bins input positions into h3 cells of the given resolutions and filters said h3 cells to keep only the ones that appear at least number_signals_threshold times in the dataset.
- Parameters:
static_positions (pd.DataFrame) – DataFrame with latitude and longitude columns
h3_resolution (int) – h3 resolution to use
number_signals_threshold (int) – number of occurences below which h3 cells are filtered out
- pipeline.src.flows.anchorages.get_anchorage_h3_cells_rings(ais_anchorage_h3_cells: Set[str], vms_anchorage_h3_cells: Set[str], manual_anchorage_h3_cells: Set[str]) pandas.DataFrame[source]
Unites two sets of h3 cells corresponding to anchorage locations of vessels in AIS and VMS data, then adds two “rings” of cells around them. Returns the result as a DataFrame containing the indices, latitude and longitude of cells as well as whether each cell was present in the original cells (ring 0) or was added in rings 1 and 2 that surround the initial cells.
- Parameters:
ais_anchorage_h3_cells (Set[str]) – set of indices of h3 cells where vessels anchor (AIS data)
vms_anchorage_h3_cells (Set[str]) – set of indices of h3 cells where vessels anchor (VMS data)
manual_anchorage_h3_cells (Set[str]) – set of additional indices of h3 cells
- Returns:
DataFrame of h3 cells with 2 levels of rings added
- Return type:
pd.DataFrame
- pipeline.src.flows.anchorages.get_ports_locations(ports: pandas.DataFrame) List[PortLocation][source]
Transforms a DataFrame into a list of PortLocation objects.
- Parameters:
ports (pd.DataFrame) – DataFrame with columns matching the fields of a PortLocation object.
- Returns:
List[PortLocation]
- pipeline.src.flows.anchorages.get_anchorages_closest_port(anchorage_h3_cells_rings: pandas.DataFrame, ports_locations: List[PortLocation]) pandas.DataFrame[source]
- pipeline.src.flows.anchorages.unite_ports_locodes(ers_ports_locode: Set[str], control_ports_locodes: Set[str]) Set[str][source]
Unites sets of port locodes.
- Parameters:
ers_ports_locode (Set[str]) – set of the locodes of ports used in ERS
control_ports_locodes (Set[str]) – set of the locodes of ports used in controls
- Returns:
union of the two input sets
- Return type:
Set[str]
- pipeline.src.flows.anchorages.get_active_ports(ports: pandas.DataFrame, active_ports_locodes: Set[str]) pandas.DataFrame[source]
- pipeline.src.flows.anchorages.merge_closest_port_closest_active_port(anchorages_closest_port: pandas.DataFrame, anchorages_closest_active_port: pandas.DataFrame) pandas.DataFrame[source]
Merges anchorages closest port and closest active port.
- pipeline.src.flows.anchorages.load_processed_anchorages(anchorages: pandas.DataFrame)[source]
Load anchorages to processed.anchorages
- pipeline.src.flows.anchorages.anchorages_compute_flow(h3_resolution: int = ANCHORAGES_H3_CELL_RESOLUTION, number_signals_threshold: int = 100, static_vms_positions_file_path: str = 'data/raw/anchorages/static_vms_positions_2021_03_to_10.parquet')[source]
Flow to compute anchorages and attribute cells to ports
- pipeline.src.flows.anchorages.extract_datagouv_anchorages(anchorages_url: str, proxies: dict) pandas.DataFrame[source]
Downloads anchorages csv file, returns the result as a pandas DataFrame.
- Parameters:
anchorages_url (str) – url to download the data from.
proxies (dict) – dict with http_proxy and https_proxy settings to use for the download
- Returns:
anchorages data
- Return type:
pd.DataFrame