The Data Processing Library components are configured using the Typesafe Config library, that supports multiple configuration file formats. The recommended file format for manual editing is the HOCON format which defines a simplified JSON syntax with includes, references to environment variables, and so on.
You can find the configuration parameters in the reference.conf
file in the batch-core
resources.
The snippet below shows the contents of this file, which documents the configuration parameters available and their default values where applicable. Some parameters are commented out because they must always be defined at the application level.
here.platform.data-processing.driver {
appName = "DataProcessing"
parallelRetrievers = 4
parallelUploads = 4
uniquePartitionLimitInBytes = 1024
numCommitPartitions = 85
disableIncremental = false
disableCommitIntegrityCheck = false
sparkStorageLevels {
default = "MEMORY_AND_DISK_SER"
catalogQueries = "MEMORY_AND_DISK_SER"
publishedPayloads = "MEMORY_AND_DISK_SER"
persistedState = "MEMORY_AND_DISK_SER"
}
state {
partitions = 10
layer = "state"
}
}
here.platform.data-processing.executors {
reftree {
parallelResolves = 10
}
compilein {
threads = 10
sorting = false
}
compileout {
threads = 10
sorting = false
}
debug {
collectStageErrors = false
}
partitionKeyFilters = [
]
}
here.platform.data-processing.spark {
serializer = "org.apache.spark.serializer.KryoSerializer"
kryo.registrationRequired = false
kryo.registrator = "com.here.platform.data.processing.spark.KryoRegistrator"
kryo.referenceTracking = true
rdd.compress = true
ui.showConsoleProgress = true
}
here.platform.data-processing.deltasets {
default {
intermediateStorageLevel = "MEMORY_AND_DISK_SER"
validationLevel = "SAFETY"
threads = 1
sorting = false
incremental = true
}
partitionKeyFilters = [
]
}
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
}
here.platform.data-processing.compiler {
}
For compiler-specific defaults, you should define these in the application.conf
file, as part of the compiler's resources. You can override the processing library's driver parameters with those that you specify in the command line.
The Typesafe Config library enables you to override the compiler's configuration in various ways, such as:
- Use
-Dconfig.file
or -Dconfig.url
parameters to replace the whole application.conf
file in the classpath with another file. - Use
-Dhere.platform.data-processing.xxx=value
to modify individual parameters via Java's system properties. - Use
-Dconfig.trace=loads
to debug configuration overrides. This option prints the files parsed and outputs the final configuration to stdout
.
To define your own configuration parameters, add them to the here.platform.data-processing.compiler
section in the application.conf
. The values for these parameters can be loaded into an instance of a compiler specific case class T
via the compilerConfig[T]
method in the CompleteConfig
instance passed to the setupDriver
. Consider caching the results if you are going to call the method repeatedly.
By default the configuration is loaded when a DriverContext
is built, and can be accessed from the context. To load the configuration in your application without a DriverContext
, or to customize the configuration when the DriverContext
is built, use the CompleteConfig
factory methods. Additional configuration parameters can be provided as HOCON key=value strings:
val defaultConfig = CompleteConfig()
val defaultConfigWithOverrides = CompleteConfig(
Seq("here.platform.data-processing.executors.compilein.threads=20",
"here.platform.data-processing.executors.compileout.threads=10"))
CompleteConfig defaultConfiguration = CompleteConfig.load();
CompleteConfig defaultConfigurationWithOverrides =
CompleteConfig.load(
Arrays.asList(
"here.platform.data-processing.executors.compilein.threads=20",
"here.platform.data-processing.executors.compileout.threads=10"));