Fibonacci: Hopac vs Async vs TPL Tasks on .NET and Mono

Hopac claims that its Jobs are much more lightweight that F# Asyncs. There are many benchmarks on Hopac github repository, but I wanted to make a simple and straightforward benchmark and what could be simpler that parallel Fibonacci algorithm? :) (actually there's a more comprehensive  benchmark in the Hopac repository itself, see Fibonacci.fs)

Sequential Fibonacci function is usually defined as

let rec fib n =
if n < 2L then n
else fib (n - 1L) + fib (n - 2L)
view raw sfib.fs hosted with ❤ by GitHub
So write a parallel version in Hopac where each step is performed in a Job and all these Jobs are (potentially) run in Parallel by Hopac's scheduler

let rec hfib n = Job.delay <| fun () ->
if n < 2L then
Job.result n
else
hfib (n-2L) <*> hfib (n-1L) |>> fun (x, y) ->
x + y
view raw simple_hfib.fs hosted with ❤ by GitHub
An equivalent parallel algorithm written using F# Asyncs

let rec afib n = async {
if n < 2L then
return n
else
let! n2a = afib (n-2L) |> Async.StartChild
let! n1 = afib (n-1L)
let! n2 = n2a
return n2 + n1
}
view raw simple_afib.fs hosted with ❤ by GitHub
...and using TPL Tasks

let rec tfib n =
if n < 2L then n
else
let n2t = Task.Factory.StartNew (fun _ -> tfib (n-2L))
let n1 = tfib (n-1L)
n2t.Result + n1
All three functions create *a lot* of parallel jobs/asyncs/tasks. For example, for calculating fib (34) they create ~14 million of jobs (this is why Fibonacci was chose for this test). To make them work efficiently we will use the sequential version of fib for small N, then switch to parallel version

let rec fib n =
if n < 2L then n
else fib (n - 1L) + fib (n - 2L)
let rec hfib (n, level) = Job.delay <| fun () ->
if n < 2L then
Job.result n
elif n < level then Job.result (fib n)
else
hfib (n-2L, level) <*> hfib (n-1L, level) |>> fun (x, y) ->
x + y
let rec afib (n, level) = async {
if n < 2L then
return n
elif n < level then return fib n
else
let! n2a = afib (n-2L, level) |> Async.StartChild
let! n1 = afib (n-1L, level)
let! n2 = n2a
return n2 + n1
}
let rec tfib (n, level) =
if n < 2L then n
elif n < level then fib n
else
let n2t = Task.Factory.StartNew (fun _ -> tfib (n-2L, level))
let n1 = tfib (n-1L, level)
n2t.Result + n1
view raw h_a_fibs.fs hosted with ❤ by GitHub
Now we can run both of the function with different "level"s in order to find on which value the functions starts to perform good (x-axis: level, y-axis: time (ms),  blue line: the sequential function, orange line: hopac/async/tasks function):

Hopac
Async
Tasks


Hopac reaches performance equivalent to the sequential implementation at level = 9, Async - at level = 17 and Tasks at level = 11.

If we modify the code so we can count how many jobs/asyncs are created during the calculation

let hfib_jobs = ref 0L
let rec hfib (n, level) = Job.delay <| fun () ->
hfib_jobs := !hfib_jobs + 1L
if n < 2L then
Job.result n
elif n < level then Job.result (fib n)
else
hfib (n-2L, level) <*> hfib (n-1L, level) |>> fun (x, y) ->
x + y
let afib_asyncs = ref 0L
let rec afib (n, level) = async {
afib_asyncs := !afib_asyncs + 1L
if n < 2L then
return n
elif n < level then return fib n
else
let! n2a = afib (n-2L, level) |> Async.StartChild
let! n1 = afib (n-1L, level)
let! n2 = n2a
return n2 + n1
}
let tfib_tasks = ref 0L
let rec tfib (n, level) =
if n < 2L then n
elif n < level then fib n
else
let n2t = Task.Factory.StartNew (fun _ ->
tfib_tasks := !tfib_tasks + 1L
tfib (n-2L, level))
let n1 = tfib (n-1L, level)
n2t.Result + n1

We get the following results (n = 42): 

* Sequential, Real: 00:00:01.849, CPU: 00:00:01.840, GC gen0: 0, gen1: 0, gen2: 0
* Hopac (level = 9) jobs: 28761996, Real: 00:00:01.700, CPU: 00:00:05.600, GC gen0: 89, gen1: 1, gen2: 0
* Async (level = 17) asyncs: 605898, Real: 00:00:01.515, CPU: 00:00:04.804, GC gen0: 4, gen1: 2, gen2: 0
* Tasks (level = 11) tasks: 5675789, Real: 00:00:01.813, CPU: 00:00:06.302, GC gen0: 18, gen1: 0, gen2: 0

So, Hopac was able to create and processed ~47x more jobs than Async and ~5x more jobs than Tasks. Hopac is impressive and F# Asyncs are frustrating.  

PS: Rewriting the async version without async computation explicit expression, like this

let rec acfib (n, level) =
if n < 2L then async.Return n
elif n < level then async.Return (fib n)
else
let n2aa = acfib (n-2L, level) |> Async.StartChild
async.Bind (n2aa, (fun n2a ->
async.Bind(acfib (n-1L, level), fun n1 ->
async.Bind (n2a, fun n2 -> async.Return (n2 + n1)))))
does not improve performance at all. 

Running on Mono (Ubuntu 14.10 x64, mono 3.10)

* Sequential, Real: 00:00:02.637, CPU: 00:00:02.636, GC gen0: 0, gen1: 0
* Hopac (level = 17) jobs: 629133Real: 00:00:02.447, CPU: 00:00:06.106, GC gen0: 26, gen1: 1
* Async (level = 21) asyncs: 92375Real: 00:00:02.845, CPU: 00:00:05.590, GC gen0: 86, gen1: 3
* Tasks (level = 33) tasks: 143Real: 00:00:14.111, CPU: 00:00:03.782, GC gen0: 0, gen1: 0

Hopac can handle ~6.8x more jobs than F# Async. I'm not sure if F# asyncs performs very well on Mono or it's because everything works extremely slowly there. What about TPL, it's obviously broken on Mono (official Hopac Fibonacci benchmark does not even run TPL version on mono: Fibonacci.fs#L233).

Update 10.09.2017 - use BenchmarkDotNet


n = 30, level = 15

 Method |     Mean |     Error |    StdDev |
------- |---------:|----------:|----------:|
    Fib | 8.208 ms | 0.0432 ms | 0.0383 ms |
   HFib | 1.860 ms | 0.0045 ms | 0.0042 ms |
   AFib | 4.921 ms | 0.0330 ms | 0.0292 ms |
   TFib | 2.229 ms | 0.0184 ms | 0.0172 ms |

n = 20, level = 0

 Method |         Mean |      Error |       StdDev |
------- |-------------:|-----------:|-------------:|
    Fib |     68.21 us |   1.258 us |     1.177 us |
   HFib |    356.21 us |   7.180 us |    11.595 us |
   AFib | 31,815.44 us | 636.249 us | 1,524.413 us |
   TFib |  1,623.25 us |  32.206 us |    33.073 us |

Comments

Thanks a lot, I replicated your tests using a core i9-9900k from 2019, and dotnet 6.0, and it seems like nowadays TPL is quite faster than Hopac. Can you confirm these findings (indeed with server GC enabled)?

I prepared a complete file here:

https://github.com/vincenzoml/VoxLogicA/blob/gpu-new/testHopacVsTPL.fs

Popular posts from this blog

Regular expressions: Rust vs F# vs Scala

Hash maps: Rust, F#, D, Go, Scala

Haskell: performance