For both implementations let's assume that we have already implemented the following method:
which, as you have probably guessed, converts a line from a file to a domain object - either a valid or an invalid reading. Since Monix uses its Tasks instead of the plain Scala Futures, the signature is going to be slightly different in the Monix version:
Anyway, the concept of asynchronously creating a Reading remains unchanged.
Before we dig into the implementation of the actual building blocks, let's have a quick overview of the APIs. In general, a stream processing pipeline consists of a data producer, a consumer, and an arbitrary number of intermediate processing stages.
Let's also assume we have a couple of settings loaded from a configuration file:
importDirectory - the directory where our data files live,
linesToSkip - the number of initial lines to skip in every file,
concurrentFiles - the number of files that we want to process in parallel,
concurrentWrites - the number of parallel writes to the database,
nonIOParallelism - the parallelism level for non-IO operations (like in-memory computations).
In Akka Streams, the processing pipeline (the graph) consists of three types of elements: a Source (the producer), a Sink (the consumer), and Flows - the processing stages.
Using those components, you define your graph, which is nothing more than a recipe for processing your data - it doesn't do any computations so far. To actually execute the pipeline, you need to materialize the graph, i.e. convert it to a runnable form. In order to do it, you need a so-called materializer which optimizes the graph definition and actually runs it. Therefore, the definition of the graph is completely decoupled from the way of running it, which, in theory, lets you use any materializer to run the pipeline. However, the built-in ActorMaterializer is actually the status quo, so chances are you won't be using any other implementation.
When you look carefully at the type parameters of the components, you will notice that each of them, apart from the respective input/output types, has a mysterious Mat type. It refers to the so-called materialized value, which is a value that is accessible from outside the graph (as opposed to the input/output types which are internal to the communication between the graph stages). In this series we're not going to use the materialized values, so we're going to use a special type parameter - NotUsed - which is nothing more than a unified representation of Scala's Unit and Java's Void
In Monix, the API is defined in terms of an Observable (the producer), a Consumer, and Transformers - the processing stages. And here a Transformer is just a type alias for a function that transforms an Observable:
Contrary to Akka Streams, the pipeline definition in Monix is not clearly decoupled from the runtime environment, i.e there is no way to provide something like the Akka Streams materializer to influence the way the processing gets executed. Moreover, Monix doesn't introduce anything like the Akka Streams' materialized value - which, overall, makes the API a bit more straightforward.