pipeline.src.helpers.spatial
Classes
Representation of a position with latitude and longitude in human readable format. |
Functions
|
Takes a coordinate and return the corresponding degrees, minutes_decimal, minutes |
Converts a Position to a PositionRepresentation in the designated |
|
|
Returns a MultiPolygon of the input Polygon or MultiPolygon geometry. |
|
Estimate the current position of a vessel based on its last position, course and |
|
Returns a Series with the same index as the input DataFrame and values equal to the |
|
Takes an list-like sequence of h3 cells and an integer k, returns the set of h3 |
|
Computes the spherical distance between two Position objects in meters. |
|
Compute the distance between successive positions (rows). The DataFrame must |
|
Takes a pandas DataFrame with: |
|
Detects fishing activity from positions of a vessel. |
|
Applies compute_movement_metrics and detect_fishing_activity successively. |
|
Return latitude, longitude for input location from a query string or from one or |
|
Return latitude, longitude for input location from a query string, with optionnal |
Module Contents
- class pipeline.src.helpers.spatial.PositionRepresentation[source]
Representation of a position with latitude and longitude in human readable format.
- pipeline.src.helpers.spatial.coordinate_to_dms(coord: float) Tuple[int, float, int, int][source]
Takes a coordinate and return the corresponding degrees, minutes_decimal, minutes and seconds. The sign is not taken into account - only returns positive values.
- Parameters:
coord (float) – latitude or longitude coordinate value
- Returns:
degrees, minutes_decimal, minutes, seconds
- Return type:
Tuple[int, float, int, int]
Examples
>>> coordinate_to_dms(45.123) (45, 7.379999999999853, 7, 23) >>> coordinate_to_dms(-45.123) (45, 7.379999999999853, 7, 23)
- pipeline.src.helpers.spatial.position_to_position_representation(p: Position, representation_type: str = 'DMS') PositionRepresentation[source]
Converts a Position to a PositionRepresentation in the designated representation_type.
- Parameters:
p (Position) – input Position
representation_type (str) – “DMS” or “DMD”. Defaults to “DMS”.
- Returns:
PositionRepresentation
- Raises:
ValueError –
if :
lat is greater than 90.0 and less than -90.0
lon is greater than 180.0 and less than -180.0
representation_type is not ‘DMD’ or ‘DMS’.
- pipeline.src.helpers.spatial.to_multipolygon(p: shapely.geometry.Polygon | shapely.geometry.MultiPolygon) shapely.geometry.MultiPolygon[source]
Returns a MultiPolygon of the input Polygon or MultiPolygon geometry.
- pipeline.src.helpers.spatial.estimate_current_position(last_latitude: float, last_longitude: float, course: float, speed: float, hours_since_last_position: float, max_hours_since_last_position: float = 2.0, on_error: str = 'ignore') Tuple[float, float][source]
Estimate the current position of a vessel based on its last position, course and speed. If the last position is older than max_hours_since_last_position, or is in the future (i.e. hours_since_last_position is negative), returns None.
- Parameters:
last_latitude (float) – last known latitude of vessel
last_longitude (float) – last known longitude of vessel
course (float) – last known route of vessel in degrees
speed (float) – last known speed of vessel in nots
hours_since_last_position (float) – time since last known position of vessel, in hours
max_hours_since_last_position (float) – maximum time in hours since last position, after which the estimation is not performed (returns None instead) Defaults to 2.0
on_error (str) – ‘ignore’ or ‘raise’
- Returns:
estimated current latitude float: estimated current longitude
- Return type:
float
- pipeline.src.helpers.spatial.get_h3_indices(df: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', resolution: int = 12) pandas.Series[source]
Returns a Series with the same index as the input DataFrame and values equal to the h3 index corresponding to the latitude and longitude of the indicated columns of the DataFrame
- Parameters:
df (pd.DataFrame) – DataFrame with latitude and longitude coordinates in 2 of its columns
lat (str) – name of the column containing latitudes. Defaults to “latitude”.
lon (str) – name of the column containing longitudes. Defaults to “longitude”.
resolution (int) – h3 resolution of the h3 cells to output.
- Returns:
h3 cells indices
- Return type:
pd.Series
- pipeline.src.helpers.spatial.get_k_ring_of_h3_cells(h3_sequence: Iterable[str], k: int) Set[str][source]
Takes an list-like sequence of h3 cells and an integer k, returns the set of h3 cells that belong to the k-ring of at least one of the h3 cells in the input sequence.
- Parameters:
h3_sequence (sequence) – sequence of h3 cells
k (int) – number of rings to add around the input cells
- Returns:
sequence of h3 cells belonging to the k-ring of at least one of the h3 cells in the input sequence
- Return type:
sequence[str]
- pipeline.src.helpers.spatial.point_dist(position1: Position, position2: Position) float[source]
Computes the spherical distance between two Position objects in meters.
- pipeline.src.helpers.spatial.get_step_distances(df: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', how: str = 'backward', unit: str = 'm') numpy.array[source]
Compute the distance between successive positions (rows). The DataFrame must have latitude and longitude columns. Returns a numpy array with the same length as the input DataFrame and distances as values.
- Parameters:
df
lat (str) – column name containing latitudes
lon (str) – column name containing longitudes
how (str) – if, ‘forward’, computes the interval between each position and the next one. if ‘backward’, computes the interval between each position and the previous one.
unit (str) – the distance unit (passed to h3.great_circle_distance). Defaults to ‘m’.
- Returns:
array of distances between the successive positions.
- Return type:
np.array
- pipeline.src.helpers.spatial.compute_movement_metrics(positions: pandas.DataFrame, lat: str = 'latitude', lon: str = 'longitude', datetime_column: str = 'datetime_utc', is_at_port_column: str = 'is_at_port', time_emitting_at_sea_column: str = 'time_emitting_at_sea') pandas.DataFrame[source]
Takes a pandas DataFrame with:
latitude and longitude columns (float dtypes)
a column indicating the date and time of the position (datetime dtype)
a column indicating whether the vessel is at port (boolean dtype)
a column indicating how long the vessel has been continuously emitting at sea in hours (float dtype)
whose rows represent successive positions of a vessel, assumed to be sorted chronologically by ascending order.
Returns pandas DataFrame with the same index and columns, with :
speed, distance and time between successive positions as additionnal computed features in new columns
values for time_emitting_at_sea_column computed and updated - so if the input contained any NULL values, they will be computed and filled in.
- Parameters:
positions (pd.DataFrame) – DataFrame representing a vessel route
lat (str) – column name of latitude values. May not contain null values.
lon (str) – column name of longitude values. May not contain null values.
datetime_column (str) – column name of datetime values. May not contain null values.
is_at_port_column (str) – column indicating whether the vessel is at port. May not contain null values.
time_emitting_at_sea_column (float) – column indicating how long the vessel has been continuously emitting at sea, in hours. May contain null values.
- Returns:
the same DataFrame, plus added columns with the computed features
- Return type:
pd.DataFrame
- pipeline.src.helpers.spatial.detect_fishing_activity(positions: pandas.DataFrame, minimum_minutes_of_emission_at_sea: int, is_at_port_column: str = 'is_at_port', average_speed_column: str = 'average_speed', time_emitting_at_sea_column: str = 'time_emitting_at_sea', minimum_consecutive_positions: int = 3, min_fishing_speed_threshold: float = 0.1, max_fishing_speed_threshold: float = 4.5, return_floats: bool = False) pandas.DataFrame[source]
Detects fishing activity from positions of a vessel.
Rows of the input DataFrame represent successive positions of the analyzed vessel, assumed to be sorted chronologically by ascending order.
The DataFrame must have a columns indicating :
whether the position is at port
the average speed between each position and the previous one, in knots
A vessel will be considered to be fishing if its average speed remains above the min_fishing_speed_threshold and below the max_fishing_speed_threshold for a minimum of minimum_consecutive_positions positions outside a port and after at least minimum_time_of_emission_at_sea time of uninterrupted VMS emission outside of a port.
- Parameters:
positions (pd.DataFrame) – DataFrame representing successive positions of a vessel, assumed to be sorted by ascending datetime
minimum_minutes_of_emission_at_sea (int) – the minimum time a vessel is required to emit continuously at sea in order to be considred as in fishing activity, in minutes. This avoids detecting fishing activity when vessels leave ports.
is_at_port_column (str) – name of the column containing boolean values for whether a position is in at port or not
average_speed_column (str) – name of the column containing average speed values (distance from previous position divided by time since the last position), in knots
time_emitting_at_sea_column (str) – name of the column containing the duration (in hours) for which the vessel has been continuously emitting at sea outside ports.
minimum_consecutive_positions (int) – minimum number of consecutive positions below fishing speed threshold to consider that a vessel is fishing
min_fishing_speed_threshold (float) – speed below which a vessel is considered to be stopped
max_fishing_speed_threshold (float) – speed above which a vessel is considered to be in transit
return_floats (bool) – if True, return float dtypes with 1.0 representing True, 0.0 representing False and np.nan for null values. If False (the default), the return dtype is object and values are True, False and np.nan, which is more explicit and natural but slower.
- Returns:
copy of the input DataFrame with the added boolean column “is_fishing”
- Return type:
pd.DataFrame
- pipeline.src.helpers.spatial.enrich_positions(positions: pandas.DataFrame, minimum_minutes_of_emission_at_sea: int, lat: str = 'latitude', lon: str = 'longitude', datetime_column: str = 'datetime_utc', is_at_port_column: str = 'is_at_port', time_emitting_at_sea_column: str = 'time_emitting_at_sea', minimum_consecutive_positions: int = 3, min_fishing_speed_threshold: float = 0.1, max_fishing_speed_threshold: float = 4.5, return_floats: bool = False) pandas.DataFrame[source]
Applies compute_movement_metrics and detect_fishing_activity successively.
See these two functions for help.
- pipeline.src.helpers.spatial.geocode(query_string=None, country_code_iso2=None, backend: str = 'Nominatim', **kwargs)[source]
Return latitude, longitude for input location from a query string or from one or more of the following keyword arguments:
street
city
county
state
country
postalcode
- pipeline.src.helpers.spatial.geocode_google(address=None, **kwargs)[source]
Return latitude, longitude for input location from a query string, with optionnal filtering on one or more of the following keyword arguments:
postal_code
country (country name or country code ISO2)
route
locality
administrative_area
If address is not given, at least one kwarg must be given.