Compilation Patterns

One key feature in the Data Processing Library are its compilation patterns. These patterns guide you to implement incremental distributed compilers. Each task executes one compiler, which can either consist of your code only or a combination of your code with one of the patterns provided.

There are two types of patterns:

  • Functional patterns: provide specific interfaces that guide you to develop the compiler in a more precise way. In these patterns, Spark is hidden in the pattern implementation and the compiler focuses on the business logic. The processing library takes care of the distributed processing and incremental compilation details.
  • Spark RDD-based patterns: expose Spark RDDs, allowing the compiler implementation to perform parallel operations on data and metadata using Spark, such as join, cogroup, filter, or map. In these patterns, the interfaces are less rigid and you may need to actively support incremental compilation.

Table 1: Compilation Patterns Overview

Compiler Class Incremental Processing References to Other Tiles Global Algorithms Functional or RDD-Based Complexity
DirectCompiler Yes No No Functional Simple
MapGroupCompiler Yes No No Functional Simple
RefTreeCompiler Yes Yes No Functional Medium
NonIncrementalCompiler No Yes Yes RDD Simple
DepCompiler Partially No Yes RDD Medium
IncrementalDepCompiler Yes No Yes RDD Complex

Note: Where possible, it is recommended to use functional patterns instead of Spark RDD-based patterns.

results matching ""

    No results matching ""