Using Pandas and GeoPandas

A GeoPandasAdapter is provided in package here-geopandas-adapter to ease working with Pandas and GeoPandas libraries. Once imported, instantiated and enabled in the Platform, many read and write functions of the HERE Data SDK for Python accept and return pd.DataFrame, pd.GeoSeries, gpd.GeoDataFrame and gpd.GeoSeries in place of Python list and dict objects.

Enabling the Adapter

The HERE GeoPandas Adapter can be applied in any of three ways:

  • to all read/write operations
  • on a per-catalog basis
  • on a per-function-call basis

Below we illustrate these three options.

To have the adapter apply to all catalogs and other entities created through a Platform object you can specify adapter when instantiating that Platform object:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

It's also possible to enable the adapter only for selected catalogs, specifying it in the corresponding get_catalog call:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform()
adapter = GeoPandasAdapter()

# These catalogs use the adapter
weather_eu = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu', adapter=adapter)
weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na', adapter=adapter)

# This catalog does not
sdii = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")

Lastly, it's also possible to specify the use of the adapter in single functions:

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform()
adapter = GeoPandasAdapter()

weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na')
live_layer = weather_na.get_layer('latest-data')

# This function uses the adapter
weather_df = live_layer.read_partitions([75477, 75648, 75391, 75562], adapter=adapter, record_path="weather_condition_tile")

# This function does not
weather_msgs = live_layer.read_partitions([75477, 75648, 75391, 75562])

Read to DataFrame

To read data and metadata from versioned, volatile, index, stream and interactive map layers, please familiarize yourself first with the read functions described in the corresponding section of this user guide.

All the standard parameters of get_partitions_metadata, read_partitions, read_stream_metadata, read_stream, get_features, iter_features are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data decoder.

When reading and decoding data, parameters that are adapter-specific are passed to the pd.read_csv, pd.read_parquet and similar Pandas functions that perform the actual decoding of each single partition. You can use them to fine-tune the details of the decoding of single partitions, including how to handle the (Geo)DataFrame index, if present in the data. The GeoPandasAdapter puts together the output in a single DataFrame. For more information on supported content types and exact parameters, please see the documentation of GeoPandasDecoder. The partition name is saved in a partition_id column, to distinguish data read from one partition from data read from another partition, when reading multiple partitions at once. The actual name of the partition_id column can be configured in the GeoPandasAdapter constructor, together with other parameters to fine-tune decoding of specific formats like content following a Protocol Buffers schema.

In case decode=False is passed to read_partitions or read_stream, no decoding takes places, the adapter is not used and a plain Python collection or iterator containing bytes is returned.

Get Partitions Data and Metadata from Versioned Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting versioned metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
versioned_layer = sdii_catalog.get_layer("sample-versioned-layer")

partitions_df = versioned_layer.get_partitions_metadata([377894434, 377894435, 377894440, 377894441])

Partitions metadata are returned in a DataFrame that is not indexed.

id data_handle checksum data_size crc
0 377894434 e2eefcae-e695-4f98-8a55-6881ca1ef52d 7697
1 377894435 da494218-e5b9-4538-9860-624864a718a7 11963
2 377894440 ef395fe1-51b4-4909-bd3c-3883d88d66b3 569494
3 377894441 a5e1f634-7fbb-43f6-bbdb-7e91edc67879 342066

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading versioned data in a DataFrame

partitions_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435])

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no partition_ids are provided, the whole layer is read.

This specific example reads content encoded in Protobuf format.

partition_id tileId messages refs
0 377894434 377894434 [{'messageId': 'ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64', 'message': {'envelope': {'version': '1.0', 'submitter': 'Probe Ro []
1 377894435 377894435 [{'messageId': '4418dfe4-091e-41fe-bb21-49d6524442af', 'message': {'envelope': {'version': '1.0', 'submitter': 'Probe Ro []

(text truncated for clarity)

Depending on the content type and actual schema, the returned DataFrame may be directly usable or require further manipulation to bring it to a usable form. CSV, GeoJSON, Parquet and schemaless content types are decoded and converted to the best possible format for the user automatically. For example, GeoJSON is decoded into a gpd.GeoDataFrame. Protobuf-encoded data usually have nested, composite and repeated fields, lists, dictionaries, and other complex data structures.

Documentation of GeoPandasDecoder illustrates parameters that can be used to fine-tune the decoding and improve the resulting output for every content type, but in particular for Protobuf-encoded data. Very common is the record_path parameter: when specified, only content in that path is decoded. If the field at the given path happens to be a repeated field, the function returns multiple rows per partition. Dictionaries are also unpacked automatically to multiple columns, when possible.

Continuing the example above, we read again the same partitions, specifying the record_path parameter and selecting only some columns for clarity:

columns = ["messageId", "message.envelope.transientVehicleUUID", "message.path.positionEstimate", "metadata.receivedTime"]

messages_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435], record_path="messages", columns=columns)

results in:

partition_id messageId message.envelope.transientVehicleUUID message.path.positionEstimate metadata.receivedTime
0 377894434 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 [{'timeStampUTC_ms': '1506403044000', 'positionTyp 1507151512491
1 377894434 eaa76f08-ed02-4893-b524-9bde9296b9f9 eaa76f08-ed02-4893-b524-9bde9296b9f9 [{'timeStampUTC_ms': '1506402922000', 'positionTyp 1507151512491
2 377894434 a86fb17f-27a6-4e47-b2fb-77ec61000625 a86fb17f-27a6-4e47-b2fb-77ec61000625 [{'timeStampUTC_ms': '1506403015000', 'positionTyp 1507151512491
3 377894434 79bba846-b804-4026-a980-7d4045e7a493 79bba846-b804-4026-a980-7d4045e7a493 [{'timeStampUTC_ms': '1506403037000', 'positionTyp 1507151512491
4 377894434 cc71d131-e8ed-4269-b1d1-d9c4c3108408 cc71d131-e8ed-4269-b1d1-d9c4c3108408 [{'timeStampUTC_ms': '1506402944000', 'positionTyp 1507151512492

(text and rows truncated for clarity)

The partition_id columns is always added automatically after decoding.

The column message.path.positionEstimate contains a list, that can be further processed, turning the DataFrame from having one row per message to one row per position estimate:

from here.geopandas_adapter.utils.dataframe import unpack_columns

estimates_df = messages_df[["messageId", "message.path.positionEstimate"]].explode("message.path.positionEstimate")
estimates_df = unpack_columns(estimates_df, "message.path.positionEstimate", keep_prefix=False)

results in:

messageId timeStampUTC_ms positionType longitude_deg latitude_deg horizontalAccuracy_m heading_deg speed_mps mapMatchedLinkID mapMatchedLinkIDOffset_m
0 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 1506403044000 RAW_GPS 13.3611 52.5099 0 90.8589 16 175536727 0
0 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 1506403046000 RAW_GPS 13.3616 52.5099 0 91.4001 16 175536727 32
0 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 1506403048000 RAW_GPS 13.3621 52.5098 0 91.5694 16 175536727 64
0 ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 1506403050000 RAW_GPS 13.3625 52.5098 0 91.5694 16 175536727 92.1063
1 eaa76f08-ed02-4893-b524-9bde9296b9f9 1506402922000 RAW_GPS 13.3731 52.5092 0 85.7321 16 180105322 0

(columns and rows truncated for clarity)

Get Partitions Data and Metadata from Volatile Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting volatile metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
weather_catalog = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu')
volatile_layer = weather_catalog.get_layer('latest-data')

partitions_df = volatile_layer.get_partitions_metadata(partition_ids=[81150, 81151])

Partitions metadata are returned in a DataFrame that is not indexed.

id data_handle checksum data_size crc
0 81150 81150
1 81151 81151

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Note

Volatile metadata and underlying data can occasionally be out of sync. When this occurs, metadata may indicate that data exists in a given partition but at the current point in time there is no data residing there. In the event you call read_partitions and one or more of the requested partitions do not exist or contain no data, no rows will be added to the returned DataFrame for that partition. This could result in an empty DataFrame being returned.

Example: reading volatile data in a DataFrame

partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile")

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no partition_ids are provided, the whole layer is read.

This specific example reads content encoded in Protobuf format.

columns = ["tile_id",
           "center_point_geohash",
           "air_temperature.value",
           "dew_point_temperature.value",
           "humidity.value",
           "air_pressure.value",
           "visibility.value",
           "iop.value",
           "wind_velocity.value",
           "wind_velocity.direction",
           "precipitation_type.precipitation_type"]

partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile", columns=columns)

In this example we select only some columns obtained from the Protobuf repeated field weather_condition_tile, resulting in:

partition_id tile_id center_point_geohash air_temperature.value dew_point_temperature.value humidity.value air_pressure.value visibility.value iop.value wind_velocity.value wind_velocity.direction precipitation_type.precipitation_type
0 81150 332391761 g7ybnf00 4.83 2 82.09 1003.09 9.99 0 33.5 22.81 NONE
1 81150 332391760 g7ybn600 4.84 2 82.04 1003.08 9.99 0 33.47 22.73 NONE
2 81150 332391767 g7ybpy00 4.8 2 82.26 1003.12 9.99 0 33.62 23.09 NONE
3 81150 332391765 g7ybpf00 4.81 2 82.18 1003.11 9.99 0 33.57 22.97 NONE
4 81150 332391764 g7ybp600 4.82 2 82.14 1003.1 9.99 0 33.53 22.89 NONE

(rows truncated for clarity)

Get Partitions Data and Metadata from Index Layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting index metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
index_layer = sdii_catalog.get_layer("sample-index-layer")

partitions_df = index_layer.get_partitions_metadata(query="hour_from=ge=10")

Partitions metadata are returned in a DataFrame that is not indexed. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

id data_handle checksum data_size crc
0 1d63cfb6-5b79-455a-8fda-1503b99253e3 1d63cfb6-5b79-455a-8fda-1503b99253e3 0353f45622ac843ccabbc8af4ce6739d5baf171a 290391
1 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 1a1472a4de647291da7498407b59a2011af6c25c 113261
2 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 74b94f931c3bda3a7500eadaf34506445c0a10ba 356674
3 2fed9456-7275-4786-b600-0c4865854b79 2fed9456-7275-4786-b600-0c4865854b79 ad68c63881bfeae3635d64270df4e13202049f54 115175
4 3b0c053b-8988-4621-92d7-9daf65e7d4a7 3b0c053b-8988-4621-92d7-9daf65e7d4a7 e7aca6afb0a37ed46d9e11a8c2ed73afa9eae1d0 114945

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below. If no partition_ids are provided, the whole layer is read.

Example: reading index data in a DataFrame

partitions_df = index_layer.read_partitions(query="hour_from=ge=10")

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

partition_id envelope path pathEvents pathMedia
0 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter': 'Probe Route Simul {'positionEstimate': array([{'timeStampUTC_ms': 15 {'vehicleStatus': None, 'vehicleDynamics': None, '
1 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter': 'Probe Route Simul {'positionEstimate': array([{'timeStampUTC_ms': 15 {'vehicleStatus': None, 'vehicleDynamics': None, '
2 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter': 'Probe Route Simul {'positionEstimate': array([{'timeStampUTC_ms': 15 {'vehicleStatus': None, 'vehicleDynamics': None, '
3 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter': 'Probe Route Simul {'positionEstimate': array([{'timeStampUTC_ms': 15 {'vehicleStatus': None, 'vehicleDynamics': None, '
4 1d63cfb6-5b79-455a-8fda-1503b99253e3 {'version': '1.0', 'submitter': 'Probe Route Simul {'positionEstimate': array([{'timeStampUTC_ms': 15 {'vehicleStatus': None, 'vehicleDynamics': None, '

(text and rows truncated for clarity)

In this specific example, as demonstrate for other layer types and described in details in the section Manipulate DataFrames and GeoDataFrames, it's convenient to use the unpack_columns function to further unpack the dictionaries into proper columns:

from here.geopandas_adapter.utils import dataframe

columns = ["partition_id", "pathEvents"]

events_df = dataframe.unpack_columns(partitions_df[columns], ["pathEvents"], keep_prefix=False)

resulting in:

partition_id vehicleStatus vehicleDynamics signRecognition laneBoundaryRecognition exceptionalVehicleState proprietaryInfo environmentStatus
0 1d63cfb6-5b79-455a-8fda-1503b99253e3 [{'timeStampUTC_ms': 1506402914000, 'positionOffse
1 1d63cfb6-5b79-455a-8fda-1503b99253e3 [{'timeStampUTC_ms': 1506403395000, 'positionOffse
2 1d63cfb6-5b79-455a-8fda-1503b99253e3 [{'timeStampUTC_ms': 1506403082000, 'positionOffse
3 1d63cfb6-5b79-455a-8fda-1503b99253e3 None
4 1d63cfb6-5b79-455a-8fda-1503b99253e3 [{'timeStampUTC_ms': 1506403131000, 'positionOffse

(text, columns and rows truncated for clarity)

Get Partitions Data and Metadata from Stream Layer in a DataFrame

Use get_stream_metadata to consume partitions metadata from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting stream metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
stream_layer = sdii_catalog.get_layer("sample-streaming-layer")

with stream_layer.subscribe() as subscription: 
    partitions_df = stream_layer.get_stream_metadata(subscription=subscription)

Partitions metadata (stream messages) are returned in a DataFrame that is not indexed. Data can be inlined, as in this example, or stored via the Blob API if too large.

id data_handle data_size data checksum crc timestamp kafka_partition kafka_offset
0 c755c5f5-3e01-4398-a3cd-f9a99393b5b4 b'\nB\n\x031.0\x12\x15Probe Route Simula c5b9d6040e7cb1ca805f20e26e3c5e3f818d3cc59b9f637c443b9b7b90018fa0 2021-11-26 14:00:52.695000 3 18856435
1 b69f5967-1408-44d9-9f2a-6e6fd4ec274a b'\nB\n\x031.0\x12\x15Probe Route Simula bff2e955dff1d35c0a52916aafce8200ebf876c8055204b56d688929fae4ff70 2021-11-26 14:00:57.833000 3 18856436
2 14eb5324-1c3b-44dc-8632-47cfa1dc051e b'\nB\n\x031.0\x12\x15Probe Route Simula 2463cf999a2d97d991adef6af957ed34a3902a1619b3b6f447c4f61c2dd162b6 2021-11-26 14:01:01.933000 3 18856437
3 03c70b04-1f15-46a2-8745-15793cac4eb5 b'\nB\n\x031.0\x12\x15Probe Route Simula ee4432e0d4a6d52727ab4c1ea38d61672172b30dd90598f3f9b7d082a601f3ab 2021-11-26 14:01:05.037000 3 18856438
4 2ba84d9e-a4fd-44b5-980b-8db2f04d80b6 b'\nB\n\x031.0\x12\x15Probe Route Simula be4406f678f4ae882fe85e153f62ebab55270772dea094eae49a11358c6dd222 2021-11-26 14:01:11.253000 3 18856439

(text and rows truncated for clarity)

Use read_stream to consume, fetch and decode the data from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading stream data in a DataFrame

In this example we show how adapter-specific parameters, such as record_path, can be used to customize the decoding. We're interested in only a selection of the properties of the data.

This specific example reads content encoded in Protobuf format.

with stream_layer.subscribe() as subscription:
    columns = ["timeStampUTC_ms",
               "latitude_deg",
               "longitude_deg",
               "heading_deg",
               "speed_mps"]

    partitions_df = stream_layer.read_stream(subscription=subscription, record_path="path.positionEstimate", columns=columns)

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer.

partition_id partition_timestamp timeStampUTC_ms latitude_deg longitude_deg heading_deg speed_mps
0 ae93f978-777a-4afe-ab08-993162ef934a 2021-11-26 13:56:18.727000 1637934814720 52.5263 13.3499 276.471 16
1 ae93f978-777a-4afe-ab08-993162ef934a 2021-11-26 13:56:18.727000 1637934816720 52.5263 13.3496 268.154 16
2 ae93f978-777a-4afe-ab08-993162ef934a 2021-11-26 13:56:18.727000 1637934818720 52.5263 13.3491 268.179 16
3 ae93f978-777a-4afe-ab08-993162ef934a 2021-11-26 13:56:18.727000 1637934820720 52.5263 13.3486 268.946 16
4 ae93f978-777a-4afe-ab08-993162ef934a 2021-11-26 13:56:18.727000 1637934822720 52.5263 13.3482 269.345 16

(rows truncated for clarity)

Get Features from Interactive Map Layer in a GeoDataFrame

Use search_features to retrieve features from one interactive map layer. When the GeoPandasAdapter is enabled, a gpd.GeoDataFrame is returned instead of a list or dict as shown in the example below.

The layer supports other functions, among which get_features and spatial_search that query and retrieve features from the layer. A GeoDataFrame is returned from these functions as well.

When running in Jupyter notebooks, a GeoDataFrame enables an effortless, visual inspection of the features over a map, as demonstrated by using the HERE Inspector in the examples below.

Example: reading features in a GeoDataFrame

In this example we retrieve the districts (Bezirk) of Berlin from a sample catalog and a sample interactive map layer.

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples")
iml_layer = sample_catalog.get_layer("berlin-interactivemap")

features_gdf = iml_layer.search_features()

search_features without parameters returns all the content, resulting in:

geometry Bez BezName @ns:com:here:xyz
pjB2hRwTpsW2ZAoP MULTIPOLYGON Z (((13.429401 52.508571 0, 13.429028 01 Mitte {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
bzuUAjSSniAlAza3 MULTIPOLYGON Z (((13.491453 52.488265 0, 13.490708 02 Friedrichshain-Kreuzberg {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
p6PdohLKy98613Yh MULTIPOLYGON Z (((13.523023 52.645034 0, 13.522967 03 Pankow {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
rBPLWN1rBqpn3e48 MULTIPOLYGON Z (((13.34142 52.504867 0, 13.341344 04 Charlottenburg-Wilmersdorf {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
Jawrgifeu6bFL4SE MULTIPOLYGON Z (((13.282182 52.53405 0, 13.282092 05 Spandau {'createdAt': 1629098476655, 'updatedAt': 1629098476655}

(text and rows truncated for clarity)

It's also possible to specify search parameters, as in the following case:

features_gdf = iml_layer.search_features(params={"p.BezName": "Pankow"}, force_2d=True)

resulting in the selection of just one district and removal of z-level from the coordinates:

geometry Bez BezName @ns:com:here:xyz
p6PdohLKy98613Yh MULTIPOLYGON (((13.523023 52.645034, 13.522967 52. 03 Pankow {'createdAt': 1629098476655, 'updatedAt': 1629098476655}

(text truncated for clarity)

Result can be rendered directly on a map when running in a Jupyter notebook, for example using the HERE Inspector:

from here.inspector import inspect
from here.inspector.styles import Color

inspect(features_gdf, "Districts of Berlin", style=Color.BLUE)

Example: geospatial search of features in a GeoDataFrame

In this example we query the districts of Berlin within a 1000m-distance from a city landmark, the Zoologischer Garten railway station, located at the coordinates visible in the query.

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples")
iml_layer = sample_catalog.get_layer("berlin-interactivemap")

features_gdf = iml_layer.spatial_search(lng=13.33474, lat=52.50686, radius=1000)

resulting in:

geometry Bez BezName @ns:com:here:xyz
pjB2hRwTpsW2ZAoP MULTIPOLYGON Z (((13.429401 52.508571 0, 13.429028 01 Mitte {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
rBPLWN1rBqpn3e48 MULTIPOLYGON Z (((13.34142 52.504867 0, 13.341344 04 Charlottenburg-Wilmersdorf {'createdAt': 1629098476655, 'updatedAt': 1629098476655}
jLrIE0BxQ6vj5U2a MULTIPOLYGON Z (((13.427455 52.38578 0, 13.426965 07 Tempelhof-Schöneberg {'createdAt': 1629098476655, 'updatedAt': 1629098476655}

The result can be rendered directly in a Jupyter notebook using:

from here.inspector import inspect
from here.inspector.styles import Color

inspect(features_gdf, "Districts within 1000m from Berlin Zoologischer Garten railway station", style=Color.RED)

Write DataFrame to Layer

To write data and metadata to versioned, volatile, index, stream and interactive map layers, please familiarize yourself first with the write functions described in the corresponding section of this user guide.

For content types supported by the GeoPandas Adapter (see Table), contents of a DataFrame or GeoDataFrame can be encoded and written to layer with a single function. For content types not supported, you will need to pass encode=False and take care of the encoding yourself.

All the standard parameters of set_partitions_metadata, write_partitions, append_stream_metadata, write_stream, write_features, update_features, delete_features are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data encoder.

When writing and encoding data, the GeoPandasAdapter splits the (Geo)DataFrame to write in partitions according to the partition_id column. Each selection of rows is then encoded and stored as standalone partition. Rows with no partition identifier set are discarded. Parameters that are adapter-specific are passed to the DataFrame.to_csv, DataFrame.to_parquet and similar functions that perform the actual encoding of each single partition. You can use them to fine-tune the details of the encoding of single partitions, including how to handle the (Geo)DataFrame index. For more information on supported content types and exact parameters, please see the documentation of GeoPandasEncoder.

In case encode=False is passed to write_partitions or write_stream, a plain Python collection containing bytes and not a (Geo)DataFrame must be passed as well, as the adapter is not used and no encoding takes place.

Write examples are symmetric to the read examples shown above.

Manipulate DataFrames and GeoDataFrames

The commonly used Pandas and GeoPandas libraries are well documented, and many examples showing how to use them to perform data analysis and manipulation are publicly available. Generally, data is in a tabular representation where each cell of the table contains one value with a defined data type (numeric, string, or other basic type).

Map data and, in general, data stored in a catalog can be highly structured sometimes and follow a complex, nested schema. Dealing with this complexity in Pandas can be difficult. Therefore, the HERE Data SDK for Python includes in the here-geopandas-adapter package utility functions to perform repetitive tasks and manipulate complex DataFrames, in particular DataFrames with columns that contain dictionaries instead of single values.

Unpacking Series and DataFrames

Pandas provides the explode function to turn objects of type list contained in a column into multiple rows. Similarly, HERE Data SDK for Python provides the unpack and unpack_columns functions to turn single columns containing dict into multiple columns. This is a convenience function to unpack data structures that sometimes result from reading data from catalogs or working with complex data models.

unpack is applied to a Series containing dict objects, it returns a DataFrame. unpack_columns is applied to a DataFrame to replace one or more column that contain dict objects with multiple columns, one for each field of the dictionaries. Unpacking is also recursive, to deal easily with deeply nested data structures.

Example: unpacking a DataFrame column that contains dictionaries

Given the example DataFrame df, derived from structured objects:

import pandas as pd

berlin = {
    "name": "Berlin",
    "location": {
        "longitude": 13.408333,
        "latitude": 52.518611,
        "country": { "name": "Deutschland", "code": "DE" }
    },
    "zip_codes": { "min": 10115, "max": 14199 },
    "population": 3664088
}

paris = {
    "name": "Paris",
    "location": {
        "longitude": 2.351667,
        "latitude": 48.856667,
        "country": { "name": "France", "code": "FR" }
    },
    "zip_codes": { "min": 75001, "max": 75020 },
    "population": 2175601
}

df = pd.DataFrame([berlin, paris])

resulting in:

name location zip_codes population
0 Berlin {'longitude': 13.408333, 'latitude': 52.518611, 'country': {'name': 'Deutschland', 'code': 'DE'}} {'min': 10115, 'max': 14199} 3664088
1 Paris {'longitude': 2.351667, 'latitude': 48.856667, 'country': {'name': 'France', 'code': 'FR'}} {'min': 75001, 'max': 75020} 2175601

We can unpack the columns location and zip_codes containing dictionaries that otherwise would be difficult to operate with. Unpacking is recursive and unpacks also nested dictionaries, for example country contained in location.

from here.geopandas_adapter.utils.dataframe import unpack_columns

unpacked_df = unpack_columns(df, columns=["location", "zip_codes"])

resulting in:

name location.longitude location.latitude location.country.name location.country.code zip_codes.min zip_codes.max population
0 Berlin 13.4083 52.5186 Deutschland DE 10115 14199 3664088
1 Paris 2.35167 48.8567 France FR 75001 75020 2175601

Replacing a column with one or more columns

The function replace_column can be used to replace one single column of a DataFrame with one or multiple columns of another DataFrame.

Example: replacing one column with a multiple columns

Given the example DataFrames df and df2:

import pandas as pd

df = pd.DataFrame({
    "col_A": [11, 31, 41],
    "col_B": [12, 32, 42],
    "col_C": [14, 34, 42]
}, index = [1, 3, 4])

df2 = pd.DataFrame({
    "col_Bx": [110, 130, 140],
    "col_By": [115, 135, 145]
}, index = [1, 3, 4])

resulting in:

col_A col_B col_C
1 11 12 14
3 31 32 34
4 41 42 42

and:

col_Bx col_By
1 110 115
3 130 135
4 140 145

We can replace col_B with col_Bx and col_By:

from here.geopandas_adapter.utils.dataframe import replace_column

replaced_df = replace_column(df, "col_B", df2)

resulting in:

col_A col_Bx col_By col_C
1 11 110 115 14
3 31 130 135 34
4 41 140 145 42

Adding and removing prefixes to column names

The functions prefix_columns and unprefix_columns are used to add or remove a prefix from the names of selected columns of a DataFrame. A separator . is added between the prefix and column names.

This is useful to group (prefix) related columns of a DataFrame under a common prefix or to remove a lengthy, verbose prefix present in multiple columns (unprefix) to obtain a derived DataFrame that is more comfortable to work with.

Example: prefixing columns with common prefix

Given the example DataFrame df:

import pandas as pd

df = pd.DataFrame({
    "name": ["Sarah", "Vivek", "Marco"],
    "age": [41, 29, 35],
    "house_nr": ["1492", "34-35", "48A"],
    "road": ["SE 36th Ave", "Seshadri Road", "Via Giosuè Carducci"],
    "city": ["Portland", "Bengaluru", "Milan"],
    "zip": [97214, 560009, 20123],
    "state": ["OR", "KA", pd.NA],
    "country": ["US", "IN", "IT"],
})

resulting in:

name age house_nr road city zip state country
0 Sarah 41 1492 SE 36th Ave Portland 97214 OR US
1 Vivek 29 34-35 Seshadri Road Bengaluru 560009 KA IN
2 Marco 35 48A Via Giosuè Carducci Milan 20123 IT

We can group columns that are part of the address, prefixing them with address:

from here.geopandas_adapter.utils.dataframe import prefix_columns

prefixed_df = prefix_columns(df, "address", ["house_nr", "road", "city", "zip", "country", "state"])

resulting in:

name age address.house_nr address.road address.city address.zip address.state address.country
0 Sarah 41 1492 SE 36th Ave Portland 97214 OR US
1 Vivek 29 34-35 Seshadri Road Bengaluru 560009 KA IN
2 Marco 35 48A Via Giosuè Carducci Milan 20123 IT

Example: removing a common prefix

Continuing the example above, we can remove the address prefix and obtain the original DataFrame:

from here.geopandas_adapter.utils.dataframe import unprefix_columns

unprefixed_df = unprefix_columns(prefixed_df, "address")

resulting in:

name age house_nr road city zip state country
0 Sarah 41 1492 SE 36th Ave Portland 97214 OR US
1 Vivek 29 34-35 Seshadri Road Bengaluru 560009 KA IN
2 Marco 35 48A Via Giosuè Carducci Milan 20123 IT

results matching ""

    No results matching ""