pipeline.src.helpers.spatial

Classes

Position

PositionRepresentation

Representation of a position with latitude and longitude in human readable format.

Functions

coordinate_to_dms(→ Tuple[int, float, int, int])

Takes a coordinate and return the corresponding degrees, minutes_decimal, minutes

position_to_position_representation(...)

Converts a Position to a PositionRepresentation in the designated

to_multipolygon(→ shapely.geometry.MultiPolygon)

Returns a MultiPolygon of the input Polygon or MultiPolygon geometry.

estimate_current_position(→ Tuple[float, float])

Estimate the current position of a vessel based on its last position, course and

get_h3_indices(→ pandas.Series)

Returns a Series with the same index as the input DataFrame and values equal to the

get_k_ring_of_h3_cells(→ Set[str])

Takes an list-like sequence of h3 cells and an integer k, returns the set of h3

point_dist(→ float)

Computes the spherical distance between two Position objects in meters.

get_step_distances(→ numpy.array)

Compute the distance between successive positions (rows). The DataFrame must

compute_movement_metrics(→ pandas.DataFrame)

Takes a pandas DataFrame with:

detect_fishing_activity(→ pandas.DataFrame)

Detects fishing activity from positions of a vessel.

enrich_positions(→ pandas.DataFrame)

Applies compute_movement_metrics and detect_fishing_activity successively.

geocode([query_string, country_code_iso2, backend])

Return latitude, longitude for input location from a query string or from one or

geocode_google([address])

Return latitude, longitude for input location from a query string, with optionnal

Module Contents

class pipeline.src.helpers.spatial.Position[source]
latitude: float[source]
longitude: float[source]
class pipeline.src.helpers.spatial.PositionRepresentation[source]

Representation of a position with latitude and longitude in human readable format.

latitude: str[source]
longitude: str[source]
pipeline.src.helpers.spatial.coordinate_to_dms(coord: float) Tuple[int, float, int, int][source]

Takes a coordinate and return the corresponding degrees, minutes_decimal, minutes and seconds. The sign is not taken into account - only returns positive values.

Parameters:

coord (float) – latitude or longitude coordinate value

Returns:

degrees, minutes_decimal, minutes, seconds

Return type:

Tuple[int, float, int, int]

Examples

>>> coordinate_to_dms(45.123)
(45, 7.379999999999853, 7, 23)
>>> coordinate_to_dms(-45.123)
(45, 7.379999999999853, 7, 23)
pipeline.src.helpers.spatial.position_to_position_representation(p: Position, representation_type: str = 'DMS') PositionRepresentation[source]

Converts a Position to a PositionRepresentation in the designated representation_type.

Parameters:
  • p (Position) – input Position

  • representation_type (str) – “DMS” or “DMD”. Defaults to “DMS”.

Returns:

PositionRepresentation

Raises:

ValueError

if :

  • lat is greater than 90.0 and less than -90.0

  • lon is greater than 180.0 and less than -180.0

  • representation_type is not ‘DMD’ or ‘DMS’.

pipeline.src.helpers.spatial.to_multipolygon(p: shapely.geometry.Polygon | shapely.geometry.MultiPolygon) shapely.geometry.MultiPolygon[source]

Returns a MultiPolygon of the input Polygon or MultiPolygon geometry.

pipeline.src.helpers.spatial.estimate_current_position(last_latitude: float, last_longitude: float, course: float, speed: float, hours_since_last_position: float, max_hours_since_last_position: float = 2.0, on_error: str = 'ignore') Tuple[float, float][source]

Estimate the current position of a vessel based on its last position, course and speed. If the last position is older than max_hours_since_last_position, or is in the future (i.e. hours_since_last_position is negative), returns None.

Parameters:
  • last_latitude (float) – last known latitude of vessel

  • last_longitude (float) – last known longitude of vessel

  • course (float) – last known route of vessel in degrees

  • speed (float) – last known speed of vessel in nots

  • hours_since_last_position (float) – time since last known position of vessel, in hours

  • max_hours_since_last_position (float) – maximum time in hours since last position, after which the estimation is not performed (returns None instead) Defaults to 2.0

  • on_error (str) – ‘ignore’ or ‘raise’

Returns:

estimated current latitude float: estimated current longitude

Return type:

float

pipeline.src.helpers.spatial.get_h3_indices(df: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', resolution: int = 12) pandas.Series[source]

Returns a Series with the same index as the input DataFrame and values equal to the h3 index corresponding to the latitude and longitude of the indicated columns of the DataFrame

Parameters:
  • df (pd.DataFrame) – DataFrame with latitude and longitude coordinates in 2 of its columns

  • lat (str) – name of the column containing latitudes. Defaults to “latitude”.

  • lon (str) – name of the column containing longitudes. Defaults to “longitude”.

  • resolution (int) – h3 resolution of the h3 cells to output.

Returns:

h3 cells indices

Return type:

pd.Series

pipeline.src.helpers.spatial.get_k_ring_of_h3_cells(h3_sequence: Iterable[str], k: int) Set[str][source]

Takes an list-like sequence of h3 cells and an integer k, returns the set of h3 cells that belong to the k-ring of at least one of the h3 cells in the input sequence.

Parameters:
  • h3_sequence (sequence) – sequence of h3 cells

  • k (int) – number of rings to add around the input cells

Returns:

sequence of h3 cells belonging to the k-ring of at least one of the h3 cells in the input sequence

Return type:

sequence[str]

pipeline.src.helpers.spatial.point_dist(position1: Position, position2: Position) float[source]

Computes the spherical distance between two Position objects in meters.

Parameters:
Returns:

distance in meters between the two input Positions

Return type:

float

pipeline.src.helpers.spatial.get_step_distances(df: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', how: str = 'backward', unit: str = 'm') numpy.array[source]

Compute the distance between successive positions (rows). The DataFrame must have latitude and longitude columns. Returns a numpy array with the same length as the input DataFrame and distances as values.

Parameters:
  • df

  • lat (str) – column name containing latitudes

  • lon (str) – column name containing longitudes

  • how (str) – if, ‘forward’, computes the interval between each position and the next one. if ‘backward’, computes the interval between each position and the previous one.

  • unit (str) – the distance unit (passed to h3.great_circle_distance). Defaults to ‘m’.

Returns:

array of distances between the successive positions.

Return type:

np.array

pipeline.src.helpers.spatial.compute_movement_metrics(positions: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', datetime_column: str = 'datetime_utc', is_at_port_column: str = 'is_at_port', time_emitting_at_sea_column: str = 'time_emitting_at_sea') pandas.DataFrame[source]

Takes a pandas DataFrame with:

  • latitude and longitude columns (float dtypes)

  • a column indicating the date and time of the position (datetime dtype)

  • a column indicating whether the vessel is at port (boolean dtype)

  • a column indicating how long the vessel has been continuously emitting at sea in hours (float dtype)

whose rows represent successive positions of a vessel, assumed to be sorted chronologically by ascending order.

Returns pandas DataFrame with the same index and columns, with :

  • speed, distance and time between successive positions as additionnal computed features in new columns

  • values for time_emitting_at_sea_column computed and updated - so if the input contained any NULL values, they will be computed and filled in.

Parameters:
  • positions (pd.DataFrame) – DataFrame representing a vessel route

  • lat (str) – column name of latitude values. May not contain null values.

  • lon (str) – column name of longitude values. May not contain null values.

  • datetime_column (str) – column name of datetime values. May not contain null values.

  • is_at_port_column (str) – column indicating whether the vessel is at port. May not contain null values.

  • time_emitting_at_sea_column (float) – column indicating how long the vessel has been continuously emitting at sea, in hours. May contain null values.

Returns:

the same DataFrame, plus added columns with the computed features

Return type:

pd.DataFrame

pipeline.src.helpers.spatial.detect_fishing_activity(positions: pandas.DataFrame, minimum_minutes_of_emission_at_sea: int, is_at_port_column: str = 'is_at_port', average_speed_column: str = 'average_speed', time_emitting_at_sea_column: str = 'time_emitting_at_sea', minimum_consecutive_positions: int = 3, min_fishing_speed_threshold: float = 0.1, max_fishing_speed_threshold: float = 4.5, return_floats: bool = False) pandas.DataFrame[source]

Detects fishing activity from positions of a vessel.

Rows of the input DataFrame represent successive positions of the analyzed vessel, assumed to be sorted chronologically by ascending order.

The DataFrame must have a columns indicating :

  1. whether the position is at port

  2. the average speed between each position and the previous one, in knots

A vessel will be considered to be fishing if its average speed remains above the min_fishing_speed_threshold and below the max_fishing_speed_threshold for a minimum of minimum_consecutive_positions positions outside a port and after at least minimum_time_of_emission_at_sea time of uninterrupted VMS emission outside of a port.

Parameters:
  • positions (pd.DataFrame) – DataFrame representing successive positions of a vessel, assumed to be sorted by ascending datetime

  • minimum_minutes_of_emission_at_sea (int) – the minimum time a vessel is required to emit continuously at sea in order to be considred as in fishing activity, in minutes. This avoids detecting fishing activity when vessels leave ports.

  • is_at_port_column (str) – name of the column containing boolean values for whether a position is in at port or not

  • average_speed_column (str) – name of the column containing average speed values (distance from previous position divided by time since the last position), in knots

  • time_emitting_at_sea_column (str) – name of the column containing the duration (in hours) for which the vessel has been continuously emitting at sea outside ports.

  • minimum_consecutive_positions (int) – minimum number of consecutive positions below fishing speed threshold to consider that a vessel is fishing

  • min_fishing_speed_threshold (float) – speed below which a vessel is considered to be stopped

  • max_fishing_speed_threshold (float) – speed above which a vessel is considered to be in transit

  • return_floats (bool) – if True, return float dtypes with 1.0 representing True, 0.0 representing False and np.nan for null values. If False (the default), the return dtype is object and values are True, False and np.nan, which is more explicit and natural but slower.

Returns:

copy of the input DataFrame with the added boolean column “is_fishing”

Return type:

pd.DataFrame

pipeline.src.helpers.spatial.enrich_positions(positions: pandas.DataFrame, minimum_minutes_of_emission_at_sea: int, lat: str = 'latitude', lon: str = 'longitude', datetime_column: str = 'datetime_utc', is_at_port_column: str = 'is_at_port', time_emitting_at_sea_column: str = 'time_emitting_at_sea', minimum_consecutive_positions: int = 3, min_fishing_speed_threshold: float = 0.1, max_fishing_speed_threshold: float = 4.5, return_floats: bool = False) pandas.DataFrame[source]

Applies compute_movement_metrics and detect_fishing_activity successively.

See these two functions for help.

pipeline.src.helpers.spatial.geocode(query_string=None, country_code_iso2=None, backend: str = 'Nominatim', **kwargs)[source]

Return latitude, longitude for input location from a query string or from one or more of the following keyword arguments:

  • street

  • city

  • county

  • state

  • country

  • postalcode

pipeline.src.helpers.spatial.geocode_google(address=None, **kwargs)[source]

Return latitude, longitude for input location from a query string, with optionnal filtering on one or more of the following keyword arguments:

  • postal_code

  • country (country name or country code ISO2)

  • route

  • locality

  • administrative_area

If address is not given, at least one kwarg must be given.