Saturday, January 10, 2015

Parallel reduce: Hopac, Asyncs, Tasks and Scala's Futures

Tuomas Hietanen posted a parallel reduce function that uses TPL Tasks. I found it interesting to compare performance of this function with analogues implemented using F# Asyncs, Hopac Jobs and Scala Futures.
The author uses noop long-running reduce function to show that it's really run in parallel. In this blog post we are benchmarking another aspect of the implementations: how much extra cost is introduced by a particular parallezation mechanism (library) itself.

We translate the original code almost as is to Tasks and Hopac:


And Scala's Futures:


The results (Core i5, 4 cores):

  • Sequential List.reduce: Real: 00:00:00.014, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0 
  • Tasks: Real: 00:00:01.790, CPU: 00:00:05.678, GC gen0: 36, gen1: 10, gen2: 1 
  • Hopac: Real: 00:00:00.514, CPU: 00:00:01.482, GC gen0: 27, gen1: 2, gen2: 1 
  • Asyncs: Real: 00:00:37.872, CPU: 00:01:48.405, GC gen0: 90, gen1: 29, gen2: 4
  • Scala Futures: 4.8 seconds

(Hopac - 3.4 times faster, Asyncs - 21.1 times slower, Scala - 1.8 times slower)

Hopac is ~3.5 times faster than TPL. What's wrong with Asyncs? I don't know. Maybe they are not intended for highly concurrent scenarios. Or my code may not be the most efficient. Any ideas, guys?

Let's test the leaders on larger arrays:


(Hopac is 3.37 times faster, Scala is 1.5 times slower)


(Hopac is 5.25 times faster, Scala is 1.05 times slower)

5 comments:

Thorium said...
This comment has been removed by the author.
Thorium said...

.NET Tasks and Async are more for IO-bound operations than for CPU-bound; for concurrency rather than parallelism. So maybe it is not the fastest thing, while it is quite easy. If you open async-tasks dll method with ILDASM / ILSpy / Reflector, you notice that it will cause some overhead.

For high-parallelism use e.g. Array.Parallel.

Thorium said...

Hmm... With thread-blocking Sleep(30) in the aggregate function I tried to simulate that what if your combinator is more complex than just (+)
E.g. for array with just 5000 items:

Array.reduce (fun c -> fun x -> System.Threading.Thread.Sleep(30);x+c) a
Real: 00:02:36.232

reduceParallelTasks (fun c -> fun x -> System.Threading.Thread.Sleep(30);x+c) a
Real: 00:00:06.730

reduceParallelAsync (fun c -> fun x -> System.Threading.Thread.Sleep(30);x+c) a
Real: 00:00:05.361

reduceParallelHopac (fun c -> fun x -> System.Threading.Thread.Sleep(30);x+c) a
Real: 00:00:20.032

Vesa Karvonen said...

The main reason for the existence of the Job monad is to make it possible to not block a native thread. So what your test does is that explicitly does the one thing, blocks the thread running the job, that you should not do.

Vasily said...

Try reduceParallelHopac (fun c -> fun x -> Timer.Global.timeOutMillis 30;x+c) a