The Data SDK for Python - Analytics enables exploration, analysis and visualization of the data in your own environment using Jupyter, Python and Spark. Use this SDK to experiment and design data processes and AI before going to production on the platform.
Due to the lack of Python support in production pipelines, Python logic must be re-implemented in Spark/Flink for production.
We are delivering our Data SDK for Python using Conda, a package management service that makes it easy to find, access, store and share public notebooks, environments, and conda and PyPI packages. Conda is a popular tool among Python data workers because it makes it easy for them to stay current with updates made to their various environments and packages. It also allows users to switch between environments without affecting the delicate compatibility of each environment's dependencies.
After installing conda for the first time, restart your terminal before executing any conda command. This is to avoid a common issue where conda command is not recognized the first time.
To begin, download the installation script.
To ensure your credentials are properly placed, use the following command to ensure the environment is ready for the SDK.
python sdk_setup.py -v
(Only For Windows) Start the terminal as an administrator to avoid errors related to insufficient rights or privileges.
To install the SDK, use the following commands:
This command will install the latest version of the SDK in the default environment.
python sdk_setup.py -i
To install a specific version of the SDK in the default environment, use the following command below.
python sdk_setup.py -i <sdk_version>
To install the latest version of the SDK in specified environment:
python sdk_setup.py -i -n <yourenvname>
To install the specific version of the SDK in specified environment:
python sdk_setup.py -i <sdk_version> -n <yourenvname>
Activate the conda environment:
conda activate olp-sdk-for-python-1.12-env
Go to home directory and proceed to start Jupyter:
jupyter notebook --NotebookApp.iopub_data_rate_limit=1000000000 --ip=0.0.0.0
jupyter notebook --NotebookApp.iopub_data_rate_limit=1000000000
cd ~/. It is for you to be able to access all the files located in your home folder when Jupyter starts.
CTRL + C in the same console you started it.
If you work with the JupyterLab "desktop" instead of the "classic" Jupyter notebooks, use this command to start Jupyter:
jupyter lab --NotebookApp.iopub_data_rate_limit=1000000000 --ip=0.0.0.0
jupyter lab --NotebookApp.iopub_data_rate_limit=1000000000
With JupyterLab you will benefit from installing a few additional JupyterLab extensions. These will either render files in some frequently used formats (e.g. HTML or GeoJSON) or some computed output (like Leaflet map cells) directly inside JupyterLab:
jupyter labextension install @mflevine/jupyterlab_html
jupyter labextension install @jupyterlab/geojson-extension
jupyter labextension install jupyter-leaflet
jupyter labextension install @jupyter-widgets/jupyterlab-manager
You might also be able to install these inside JupyterLab using its interactive Extension Manager.
Explore the Data SDK for Python API reference by opening the html docs located at the links below:
$HOME/olp-sdk-for-python-1.12/documentation/Data SDK for Python API Reference.html.
%USERPROFILE%\olp-sdk-for-python-1.12\documentation\Data SDK for Python API Reference.html.
We recommend opening this documentation directly in Chrome and Firefox browsers instead of Jupyter or Internet Explorer.
The tutorial notebooks included with the SDK are located in the folder:
We recommend reading the Getting Started notebook to get an overview of all of the tutorial notebooks:
Thank you for choosing the HERE Data SDK for Python. After the setup, kindly consider filling out this short 1-minute survey to help us improve the setup experience.