Compilation Patterns · Data Processing Library

Compilation Patterns

One key feature in the Data Processing Library are its compilation patterns. These patterns guide you to implement incremental distributed compilers. Each task executes one compiler, which can either consist of your code only or a combination of your code with one of the patterns provided.

There are two types of patterns:

Functional patterns: provide specific interfaces that guide you to develop the compiler in a more precise way. In these patterns, Spark is hidden in the pattern implementation and the compiler focuses on the business logic. The processing library takes care of the distributed processing and incremental compilation details.
Spark RDD-based patterns: expose Spark RDDs, allowing the compiler implementation to perform parallel operations on data and metadata using Spark, such as join, cogroup, filter, or map. In these patterns, the interfaces are less rigid and you may need to actively support incremental compilation.

Table 1: Compilation Patterns Overview

Compiler Class	Incremental Processing	References to Other Tiles	Global Algorithms	Functional or RDD-Based	Complexity
DirectCompiler	Yes	No	No	Functional	Simple
MapGroupCompiler	Yes	No	No	Functional	Simple
RefTreeCompiler	Yes	Yes	No	Functional	Medium
NonIncrementalCompiler	No	Yes	Yes	RDD	Simple
DepCompiler	Partially	No	Yes	RDD	Medium
IncrementalDepCompiler	Yes	No	Yes	RDD	Complex

Note: Where possible, it is recommended to use functional patterns instead of Spark RDD-based patterns.

Compilation Patterns

Compilation Patterns

Table 1: Compilation Patterns Overview

results matching ""

No results matching ""

Developer