Bring your data into HERE platform using an object store layer

Objectives: Understand how to bring your own data into HERE platform using an object store layer.

Complexity: Beginner

Time to complete: 30 min

Dependencies: Organize your work in projects

Source code: Download

The example in this tutorial demonstrates how you can bring your data into the HERE Platform by using the object store layer.

The object store layer is a distributed and highly durable key/value store, with the additional capability of listing the keys. You can find more information on various layers here: Data Service Documentation.

This tutorial includes the following steps:

  1. Create a catalog with an object store layer.
  2. Upload one file from the local file system to the Object store, using the OLP Command Line Interface (CLI).
  3. Upload multiple files from the local file system to the Object store layer using Apache Hadoop.

In preparation, you must create a catalog containing an Object store layer.

Create catalog

You must create a catalog. Follow the steps outlined in Organize your work in projects, using the OLP Command Line Interface (CLI).

Create a file named bring-your-data.json with the following content, and replace {{YOUR_CATALOG_ID}} with an identifier of your choice.

{
  "id": "{{YOUR_CATALOG_ID}}",
  "name": "Tutorial for copying your data from local file system to the Object Store layer",
  "summary": "Tutorial for copying your data from local file system to the Object Store layer",
  "description": "Tutorial for copying your data from local file system to the Object Store layer",
  "tags": ["Hadoop FS Support", "Object store"],
  "layers": [
    {
      "id": "bring-your-data-layer",
      "name": "bring-your-data-layer",
      "summary": "Simulated data.",
      "description": "Simulated data to demonstrate usability of Object store layer",
      "tags": ["Hadoop FS Support", "Object store"],
      "layerType": "objectstore",
      "volume": {
        "volumeType": "durable"
      }
    }
  ]
}

Replace {{YOUR_CATALOG_ID}} with your own identifier and then run the following command:

olp catalog create {{YOUR_CATALOG_ID}} \
    "Tutorial for copying data from local file system to the object store layer" \
    --config bring-your-data.json

Note

If a billing tag is required in your realm, update the config file by adding the billingTags: ["YOUR_BILLING_TAG"] property to the layer section.

Upload single file to the Object store layer using OLP CLI

You can upload a single file from your local file system to the Object store layer using the OLP Command Line Interface (CLI).

1. Upload file

Replace {{YOUR_CATALOG_HRN}} in the following command with the value of HRN which you received from the create catalog step, and then run the following command:

olp catalog layer object put {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file

The previous command uploads your local file test-data-cli/test-file to the Object store layer.

2. List uploaded file

You can verify your upload by listing the files from the layer. Use the following command from OLP CLI, after replacing with the HRN you received from the create catalog step. Then run the following command:

olp catalog layer object list {{YOUR_CATALOG_HRN}} bring-your-data-layer

3. Get contents of uploaded file

To obtain the data from the previous file in the Object store layer, replace {{YOUR_CATALOG_HRN}} with the value of the HRN which you received from the create catalog step. Then run the following command:

olp catalog layer object get {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file

Upload multiple files to the object store layer using Apache Hadoop

You can upload multiple files from your local file system to the Object store in parallel, in a distributed manner, using Apache Hadoop.

You must export the JAVA_HOME variable to your environment before you can run Apache Hadoop commands.

Follow these steps to upload multiple files using Apache Hadoop:

1. Export HADOOP_VERSION variable

You must export the HADOOP_VERSION variable to your environment. For this tutorial, run the following command to export HADOOP_VERSION:


export HADOOP_VERSION=2.7.7

2. Download Apache Hadoop

You must download Apache Hadoop. Run the following command to download the appropriate version of Apache Hadoop:

wget -c https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz

3. Extract Apache Hadoop

You must extract the tarball which you downloaded in the previous step as follows:

Linux
Windows
tar xzf hadoop-${HADOOP_VERSION}.tar.gz
export MSYS=winsymlinks:lnk
tar xzf hadoop-${HADOOP_VERSION}.tar.gz

4. Download Hadoop FS support Jar

You must download the Hadoop FS Support assembly jar provided by the Data Client Library by running the following command:

mvn dependency:copy -Dartifact=com.here.platform.data.client:hadoop-fs-support_2.12:LATEST:jar:assembly -DoutputDirectory=hadoop-${HADOOP_VERSION}/share/hadoop/common/lib/

5. Upload data using Apache Hadoop

You must replace with the catalog HRN you received from the previous create catalog step in the following script, and run it as shown here:

./hadoop-${HADOOP_VERSION}/bin/hadoop distcp test-data-hadoop blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop

The previous command uploads the test-data-hadoop directory, which contains three files: test-file-1, test-file-2, and test-file-3. These files are uploaded to the Object store layer using Apache Hadoop.

6. List the uploaded files using Apache Hadoop

You can verify the upload from the previous command by listing the uploaded files, using Apache Hadoop. Run the following command, after replacing with the HRN you received from the create catalog step.

./hadoop-${HADOOP_VERSION}/bin/hadoop fs -ls blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop/

The previous command lists all the files in the test-data-hadoop directory from the Object store layer.

Copying data from other storages to the object store layer

You can copy your data from any Apache Hadoop compatible storage, such as AWS S3 and Azure Blob Storage, to the Object store layer using the steps mentioned previously. You must change the source path from the local file system to remote storage.

You can discover more detail on how to use Apache Hadoop to copy data from one storage to another here: DistCp and Object Stores

Further information

For additional details on the topics covered in this tutorial, you can refer to the following sources:

results matching ""

    No results matching ""