Wednesday, May 4, 2016

Akka.NET Streams vs Hopac vs AsyncSeq

Akka.NET Streams is a port of its Scala/Java counterpart and intended to execute complex data processing graphs, optionally in parallel and even distributed. It has quite different semantics compared to Hopac's one and it's wrong to compare them feature-by-feature, but it's still interesting to benchmark them in a scenario which both of them supports well: read lines of a file asynchronously, filter them by a regex in controlled degree of parallelism, then normalize the lines with a simple string manipulation algorithm, also in parallel, then count the number of lines.

Firts, Akka.NET:

Note that I have to use the empty string as indication that the regular expression does not match. I should use `option` of course (just like I do in the Hopac snippet below), but Akka.NET Streams is strict about what is allowed to be returned by its combinators like `Map` or `Filter`, in particular, you cannot return `null`, doing so makes Akka.NET unhappy and it will throw exception at you. In F#, expressions like `fun x -> printfn "%O" x` and `fun x -> None` returns `()` and `None` values respectively, which are represented as `null` at runtime, so you have to be very careful `Map`ping and `Filter`ing (and using all the combinators actually) over side effecting functions or returning `Options` (just do not do either).

Now, Hopac:

And finally AsyncSeq:


Number of allocations is roughly identical for Hopac and Akka, but it's an order of magnitude larger for AsyncSeq.

Conclusions

  • Use Hopac if you need the best performance available on .NET, or if you need to implement arbitrary complex concurrent scenarios.
  • Akka.NET is quite fast and has a full blown graph definition DSL, so it's great for implementing complex stream processing, which can run on a cluster of nodes. However, it has a typical "fluent" C#-targeted API, so it's necessary to write a thin layer over it in order to make it usable from F#.
  • AsyncSeq has the most "F# friendly" API - it's just a combination of two computation expressions which every F# programmer knows: Async and Seq.

Update 6 May, 2016


Marc Piechura suggested a way to exclude materialization phase from the benchmark, here is the modified code:

It turns out it takes Akka.NET about 3 seconds to materialize the graph.

Thanks Vesa Karvoven for help with fixing the Hopac version and Lev Gorodinski for fixing AsyncSeq performance (initially it works awfully slow).

11 comments:

Bartosz Sypytkowski said...

That's great to read! :) Concerning akka streams, there is still a huge room for improvements (those oriented strictly to capabilities of .NET platform), so I hope these metrics will get better over time.

Also I think, it's good good idea to measure benchmark without counting materialization phase - it's the place when stream definition gets materialized into executable flow. I know about its existence in Akka.Streams, and I guess that Hopac has similar step. While it can be quite heavy, it's only done once and doesn't mean much in long running streams, and their processing speed is the clue of the benchmark.

Vasily said...

Thanks.

I strongly believe that Hopac does not have a materialization phase. Hopac streams are just lazy list of promises, see https://github.com/Hopac/Hopac/blob/master/Libs/Hopac/Stream.fs#L27-L29

So it's worth to remove akka.net streams' materialization phase from the benchmark, but I'm afraid I don't know how, it seems to impossible to materialize a graph as a separate step, without executing it immediately.

Marc Piechura said...

The easiest way would probably be if you use Source.FromTask in combination with a TaskCompletionSource.
https://github.com/akkadotnet/akka.net/blob/akka-streams/src/core/Akka.Streams/Dsl/Source.cs#L292

Vesa Karvonen said...
This comment has been removed by the author.
Vesa Karvonen said...

It has a far more comprehensive semantics than Hopac and it's wrong to compare them feature-by-feature, Hopac would obviously loose

It is not quite so clear cut. :)

The semantics of Hopac streams, or choice streams, which are just a kind of lazy lists of promises, are fundamentally different from Akka streams and many other stream implementations such as Rx. Neither Akka nor Rx style streams, for example, are strictly more expressive than Hopac streams or vice verse.

Consider that all combinators on ordinary lazy lists can be implemented for and are meaningful on Hopac streams. So, you can, for example, implement a lazy backwards fold over a Hopac stream and you could use that to implement a large number of other stream combinators. This kind of composability and expressiveness is not available in most other stream formulations.

Vasily said...

@Marc Piechura: I'm afraid I don't understand how to produce a stream of lines using Source.FromTask, it seems to emit single element when the task completes.

Vasily said...

@Vesa Karvoven: Thanks for clarification!

Marc Piechura said...

@Vasily Sorry for the confusing, I was referring to Bartosz comment and the fact that Akka.Streams has a materialization phase.
I have created a little gist which shows a possible solution to remove this phase from the benchmark.

https://gist.github.com/Silv3rcircl3/2138bc4abf1e9ca8a20865b4643bdda7

Vasily said...

@Marc Piechura: Thanks! I updated the blog.

Vasily said...

Vesa Karvoven found a couple of bugs in the Hopac benchmark and provided fixed version. I updated the blog.

Vasily said...

Lev Gorodinski fixed AsyncSeq performance and I retested everything and amend the blog post accordingly.