vaskir's blog

How Go runtime makes concurrent code simple

2021-12-22T18:08:00.002+03:00

5 years ago I showed a way to fight with "all-threads-busy" problem while writing Hopac code. The same problem exists on JVM as well. The root is in the cooperative concurrency model, which is based on thread pools, async/futures/etc. are scheduled to run on them. So, if an async is blocked by an IO call of if it's just executing some computation, the OS thread is not able to do anything else: the scheduler is unable to "pause" such a blocked async and execute anothe one. So, asynchronous code on .NET, JVM, Rust, etc. is inherently unsafe: it's not guaranteed that all existing asyncs progress.

The things are different in Go runtime: the scheduler is preemptive, so it can "pause" goroutines at any point: not only at function calls, at loops, but on just any execution point (almost). This makes writing concurrent code dead easy: there's no "async" functions, lambdas or blocks of code, every function or call are the same, being them executed in "main" thead or in a goroutine.

Let's look at the mention above code being written in Go:

No let!, await, async, do!, StartAsTask, match! and so on. Need to run some code concurrently? Add go keyword. Done.

Load balancing: Rancher vs Swarm

2016-08-13T18:28:00.002+03:00

Rancher has a load balancer built it (HAProxy). Let's compare its performance vs Docker Swarm one. I will use 3 identical nodes:

192GB RAM
28-cores i5 Xeon
1GBit LAN
CentOS 7
Docker 1.12
Rancher 1.1.2

I will benchmark against a hello world HTTP server written in Scala with Akka-HTTP and Spray JSON serialization (I don't think it matters though), sources are on GitHub. I will use Apache AB benchmark tool.

As a baseline, I exposed the web server port outside the container and run the following command:

ab -n 100000 -c 20 -k http://1.1.1.1:29001/person/kot

It shows 22400 requests per second. I'm not sure whether it's a great result for Akka-HTTP, considering that some services written in C can handle hundreds of thousands requests per second, but it's not the main topic of this blog post (I ran the test with 100 concurrent connections (-c 100), and it shows ~50k req/sec. I don't know if this number is good enough either :) )

Now I created a Rancher so-called "stack" containing our web server and a build in load balancer:

Now run the same benchmark against the load balancer, increasing number of akka-http containers one-by-one:

Containers	Req/sec
1	755
2	1490
3	2200
4	3110
5	3990
6	4560
7	4745
8	4828

OK, it looks like HAProxy introduced a lot of overhead. Let's look how well Swarm internal load balancer handles such load. After initializing Swarm on all nodes, create a service:

docker service create --name akka-http --publish 32020:29001/tcp private-repository:5000/akkahttp1:1.11

Check that the container is running:

$ docker service ps akka-http

0yb99vo9btmx3t1wluvd0fgo6 akka-http.1 1.1.1.1:5000/akkahttp1:1.11 xxxx Running Running 3 minutes ago

OK, great. Now I will scale our service up one container at a time, running the same benchmark as I go:

docker service scale akka-http=2

docker service scale akka-http=3

...

docker service scale akka-http=8

Results:

Containers	Req/sec
1	19200
2	19700
3	18700
4	18700
5	18300
6	18800
7	17900
8	18300

Much better! Swarm LB does introduce some little overhead, but it's totally tolerable. What's more, if I run the benchmark against the node where single container is running, Swarm LB shows exactly same performance as directly exposed web server (22400 req/sec in my case).

To make this blog post less boring, I added a nice picture :)

I run JMeter (8 parallel threads). Direct: 6700, Swarm: 6500 (1 container) - 5600 (8 containers), Rancher: 835 r/s (1 container) - 2400 (8 containers). Which is roughly the same as AB results.

Running computational intensive code outside of Hopac scheduler

2016-06-11T22:48:00.000+03:00

Hopac uses a bounded pool of worker threads, number of which is equal to number of CPU cores (by default). A dangerous thing about this design is that a situation is possible where all the threads are busy doing some CPU intensive work and no other Hopac jobs can proceed. A good solution for this is running such a CPU bound computations on the standard .NET thread pool, freeing Hopac pool for more intelligent work. I found a nice code in one of the older Hopac GitHub discussions which schedules a ordinary function on ThreadPool and represents the result as a Hopac job.

Here is a test with explanations:

Upcoming F# struct tuples: are they always faster?

2016-05-27T20:21:00.001+03:00

Don Syme has been working on struct tuples for F# language. Let's see if they are more performant than "old" (heap allocated) tuples in simple scenario: returning tuple from function. The code is very simple:

Decompiled code in Release configuration:

Everything we need to change to switch to struct tuples, is adding "struct" keyword in front of constructor and pattern matching:

Decompiled code in Release configuration:

I don't know about you, but I was surprised with those results. The performance roughly the same. GC is not a bottleneck as no objects were promoted to generation 1.

Conclusions:

Using struct tuples as a faster or "GC-friendly" alternative to return multiple values from functions does not make sense.
Building in release mode erases away heap allocated tuples, but not struct tuples.
Building in release mode inlines the "foo" function, which makes the code 10x faster.
You can fearlessly allocate tens of millions of short-living object per second, performance will be great.

Hash maps: Rust, F#, D, Go, Scala

2016-05-22T19:00:00.000+03:00

Let's compare performance of hashmap implementation in Rust, .NET, D (LDC) and Go.
Rust:
F#:
As you can see, Rust is slower at about 17% on insersions and at about 21% on lookups.

Update

As @GolDDranks suggested on Twitter, since Rust 1.7 it's possible to use custom hashers in HashMap. Let's try it:
Yes, it's significantly faster: additions is only 5% slower than .NET implementation, and lookups are 32% *faster*! Great.

Update: D added

LDC x64 on windows
It's very slow at insertions and quite fast on lookups.

Update: Go added

Update: Scala added

Compared to Scala all the other languages looks equally fast :) What's worse, Scala loaded all four CPU cores at almost 100% during the test, while others used roughly single core. My guess is that JVM allocated so many objects (each Int is an object, BTW), that 3/4 of CPU time was spend for garbage collecting. However, I'm a Scala/JVM noob, so I just could write the whole benchmark in a wrong way. Scala developers, please review the code and explain why it's so slow (full IDEA/SBT project is here). Thanks!

Akka.NET Streams vs Hopac vs AsyncSeq

2016-05-04T14:30:00.001+03:00

Akka.NET Streams is a port of its Scala/Java counterpart and intended to execute complex data processing graphs, optionally in parallel and even distributed. It has quite different semantics compared to Hopac's one and it's wrong to compare them feature-by-feature, but it's still interesting to benchmark them in a scenario which both of them supports well: read lines of a file asynchronously, filter them by a regex in controlled degree of parallelism, then normalize the lines with a simple string manipulation algorithm, also in parallel, then count the number of lines.

Firts, Akka.NET:

Note that I have to use the empty string as indication that the regular expression does not match. I should use `option` of course (just like I do in the Hopac snippet below), but Akka.NET Streams is strict about what is allowed to be returned by its combinators like `Map` or `Filter`, in particular, you cannot return `null`, doing so makes Akka.NET unhappy and it will throw exception at you. In F#, expressions like `fun x -> printfn "%O" x` and `fun x -> None` returns `()` and `None` values respectively, which are represented as `null` at runtime, so you have to be very careful `Map`ping and `Filter`ing (and using all the combinators actually) over side effecting functions or returning `Options` (just do not do either).

Now, Hopac:

And finally AsyncSeq:

Number of allocations is roughly identical for Hopac and Akka, but it's an order of magnitude larger for AsyncSeq.

Conclusions

Use Hopac if you need the best performance available on .NET, or if you need to implement arbitrary complex concurrent scenarios.
Akka.NET is quite fast and has a full blown graph definition DSL, so it's great for implementing complex stream processing, which can run on a cluster of nodes. However, it has a typical "fluent" C#-targeted API, so it's necessary to write a thin layer over it in order to make it usable from F#.
AsyncSeq has the most "F# friendly" API - it's just a combination of two computation expressions which every F# programmer knows: Async and Seq.

Update 6 May, 2016

Marc Piechura suggested a way to exclude materialization phase from the benchmark, here is the modified code:

It turns out it takes Akka.NET about 3 seconds to materialize the graph.

Thanks Vesa Karvoven for help with fixing the Hopac version and Lev Gorodinski for fixing AsyncSeq performance (initially it works awfully slow).

Regular expressions: Rust vs F# vs Scala

2015-09-24T18:45:00.002+03:00

Let's implement the following task: read first 10M lines from a text file of the following format:

then find all lines containing Microsoft namespace in them, and format the type names the usual way, like "Microsoft.Win32.IAssemblyEnum".

First, F#:

Now Rust:

After several launches the file was cached by the OS and both implementations became non IO-bound. F# one took 29 seconds and 31MB of RAM at peak; Rust - 11 seconds and 18MB.

The Rust code is as twice as long as F# one, but it's handling all possible errors explicitly - no surprises at runtime at all. The F# code may throw some exceptions (who knows what kind of them? Nobody). It's possible to wrap all calls to .NET framework with `Choice.attempt (fun _ -> ...)`, then define custom Error types for regex related code, for IO one and a top-level one, and the code'd be even longer then Rust's, hard to read and it would still give no guarantee that we catch all possible exceptions.

Update 4 Jan 2016: Scala added:

Ok, it turns out that regex performance may depend on whether it's case sensitive or not. What's worse, I tested F# with case insensitive pattern, but Rust - for case sensitive. Anyway, as I've upgraded my machine recently (i5-750 => i7-4790K), I've rerun F# and Rust versions in both the regex modes and added Scala to the mix. First, case sensitive mode:

F# (F# 4.0, .NET 4.6.1) - 4.8 secs
Scala (2.11.7, Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65) - 3.5 secs
Rust (1.7.0-nightly (bfb4212ee 2016-01-01) - 5.9 secs

Now, case insensitive:

F# (F# 4.0, .NET 4.6.1) - 15.5 secs
Scala (2.11.7, Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65) - 3.2 secs
Rust (1.7.0-nightly (bfb4212ee 2016-01-01) - 6.1 secs

Although case sensitive patterns performs roughly the same on all the platforms, it's quite surprising that Rust is not the winner.

Scala is faster in case insensitive mode (?), Rust is slightly slower and now the question: what's wrong with .NET implementation?.. It performs more than 3 times slower that case sensitive and the others.

Update 4 Jan 2016: D added.

regex - 10.6 s (DMD), 7.8 s (LDC)
ctRegex! - 6.9 s (DMD), 6.6 s (LDC)

Update 6 Jan 2016: Elixir added:

It takes 56 seconds to finish.

Update 6 Jan 2016: Haskell added:

I takes 20 seconds.

Update 7 Jan 2016: Nemerle added:

It takes 3.8 seconds (case sensitive) and 7.1 seconds (case insensitive).

Update 8 Jan 2016: Nemerle PEG added:

It takes 4.1 seconds.

All results so far:

	Case sensitive	Case insensitive
F#	4,80	15,50
Scala	3,50	3,20
Rust	5,90	6,10
DMD	6,90
LDC	6,60
Elixir	56,00
Hakell	20,00
Nemerle	3,80	7,10
Nemerle PEG	4,10	4,20

Update 3 Dec 2017: Rust's regex crate updated and code cleanup, F# run on .NET Core 2.0:

F#

Rust

Rust version now performs 2x faster, F# is slower on case sensitive and faster on case insensitive on Core:

All results so far:

	Case sensitive	Case insensitive
F#	6,57	9,56
Scala	3,50	3,20
Rust	2,97	3,02
DMD	6,90
LDC	6,60
Elixir	56,00
Hakell	20,00
Nemerle	3,80	7,10
Nemerle PEG	4,10	4,20

I removed Elixir from the chart as it's a clear outsider and its results make the chart read harder:

Elixir: first look

2015-07-18T13:34:00.003+03:00

I don't have a clear impression about Elixir language yet. I don't like it has Ruby like syntax, but do like it has pipe operator and macros. So, Fibonacci:

It executes in about 13 seconds which is on pair (even faster for unknown reason) with Erlang, no surprises here.

D (GDC) - 0.990
C# - 1.26
D (DMD) - 1.3
C++ - 1.33
F# - 1.38
Nemerle - 1.45
Rust - 1.66
Go - 2.38
Haskell - 2.8
Clojure - 9
Elixir - 13
Erlang - 17
Ruby - 60
Python - 120

SHA1 compile time checked literals: F# vs Nemerle vs D

2015-06-22T16:36:00.000+03:00

I've always been interested in metaprogramming. Sooner or later, I'm starting to feel constrained within a language without it. F# is a really nice language, but I'm afraid I'd have got bored with it if it'd not have Type Providers, for example. Why metaprogramming is so important? Because it allows changing a language without cracking the compiler. It allows making things which seemed to be impossible to implement.

I'm dealing with cryptography hashes a lot at work, nothing rocket since, just MD5, SHA-1 and so on. And I write tons of tests where such hashes are used in form of string literals, like this:

The problem with this code is that the compiler cannot guarantee that the hex string in the last line represents a valid SHA-1. If it does not, the test will fail at runtime for a reason it's not intended to.

OK, now we can formulate our task: provide a language construct to enforce a string literal being a valid SHA-1 hexadecimal, at compile time. We will explore how much work it's required to implement such a simple feature in F#, Nemerle and D. It's also interesting how well the development workflow is for each of this languages - IDE integration, error reporting and testing cycle.

F#

Using Type Providers is the only way to check (at compile time) that a string is a valid hex one and that it's length is exactly 40 characters (SHA-1 is a 20-bytes hash). Actually, I've written this type provider before. The interesting part looks like this:

It includes caching, and `HexParser` module is not shown, but those details are not important here. It's simple and it generates Value property which directly returns byte array, created in compile-time.

Error reporting:

Nemerle

Nemerle has full fledged macros, which strictly more powerful than F#'s Type Providers. Let's see if they allow solving the task in an elegant way:

Error reporting:

D

The code does not use any unusual stuff and does not manipulate AST. Just plane D code. Very elegant. Note that the template is defined in the same file as its usage. Contrast this with F# and Nemerle where you have to place your Type Provider / macros into a dedicated assembly.

Error reporting:

The error is located in the template itself, not at the instantiation point though.

Performance

I added 1000 usages of the TP, macro and template and measured compilation time.

F# - 5 seconds
Nemerle - 2 seconds
D - the compiler crashes with "Error: out of memory" after 1 minute work.

Fib: C++, C# and GDC

2015-06-20T17:20:00.000+03:00

As a reference implementation, I added C++ one:

It's execution time is 1.33 seconds, which surprisingly is not the best result so far.

A C# version:

Also, I compiled this D code with GDC compiler and it executed in 990 ms, which is the best result:

D (GDC) - 0.990
C# - 1.26
D (DMD) - 1.3
C++ - 1.33
F# - 1.38
Nemerle - 1.45
Rust - 1.66
Go - 2.38
Haskell - 2.8
Clojure - 9
Erlang - 17
Ruby - 60
Python - 120

Unfortunately, I have not managed to compile the D code with LDC compiler, it returns the following error:

Building: DFib (Release)

Performing main compilation...

Current dictionary: d:\git\DFib\DFib

D:\ldc2-0.15.2-beta1-win64-msvc\bin\ldc2.exe -O3 -release "main.d" "-od=obj\Release" "-of=d:\git\DFib\DFib\bin\Release\DFib.exe"

LINK : fatal error LNK1181: cannot open input file 'kernel32.lib'

Error: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\link.exe failed with status: 1181

Exit code 1181

Composing custom error types in F#

2015-05-16T21:20:00.000+03:00

I strongly believe that we should keep code as referential transparent as possible. Unfortunately, F# language does not encourage programmers to use Either monad to deal with errors. The common practice in the community is using common in the rest .NET (imperative) world exception based approach. From my experienced, almost all bugs found in production are caused by unhandled exceptions.

The problem

In our project we've used the Either monad for error handling with great success for about two years. ExtCore is a great library making dealing with Either, Reader, State and other monads and their combinations really easy. Consider a typical error handling code, which make use Choice computation expression from ExtCore:

The code is a bit hairy because of explicit error mapping. We could introduce an operator as a synonym for Choice.mapError, like <!>, after which the code could become a bit cleaner:

(actually it's the approach we use at in our team).

Rust composable errors

I was completely happy until today, when I read Error Handling in Rust article and found out how elegantly errors are composed using From trait. By implementing it for an error type, you enable auto converting lower level errors to be convertable to it by try! macro, which eliminates error mapping completely. I encourage the reader to read that article because it explains good error handling in general, it's totally applicable to F#.

Porting to F#

Unfortunately, there's no static interface implementation neither in F# nor in .NET, so we cannot just introduce IError with a static member From: 'a -> 'this, like we can in Rust. But in F# we can use statically resolved type parameters to get the result we need. The idea is that each "higher level" error type defines a bunch of static methods, each of which converts some lower level error type to one of the error type cases:

Now we can write a generic function which can create any higher level error type, which defines From methods:

Now we can rewrite our processFile function without explicit mapping to concrete error cases:

Great. But it's still not as clean. The remaining bit is to modify Choice computation expression builder so that it can do the same implicit conversion in its Bind method (its ChoiceBuilder from ExtCore as is, but without For and While methods):

The CE now requires all errors to be convertable to its main error type, including the error type itself, so we have to add one more From static method to Error type, and we finally can remove any noise from our processFile function:

Go: fib

2015-05-04T22:38:00.000+03:00

Go code is relatively low-level since it does not have "foreach over range" syntax construct:

Results are not as impressive for a systems language: 2.38 seconds. And it lays below Rust but under Haskell:

C# - 1.26
D (DMD) - 1.3
F# - 1.38
Nemerle - 1.45
Rust - 1.66
Go - 2.38
Haskell - 2.8
Clojure - 9
Erlang - 17
Ruby - 60
Python - 120

Computing cryptography hashes: Rust, F#, D and Scala

2015-04-11T17:40:00.001+03:00

Let's compare how fast Rust, D and F# (.NET actually) at computing cryptography hashes, namely MD5, SHA1, SHA256 and SHA512. We're going to use rust-crypto cargo:

Results:

MD5 - 3.39s
SHA1 - 2.89s
SHA256 - 6.97s
SHA512 - 4.47s

Now the F# code:

Results (.NET 4.5, VS 2013, F# 3.1):

MD5CryptoServiceProvider - 2.32s (32% faster)
SHA1CryptoServiceProvider - 2.92s (1% slower)
SHA256Managed - 16.50s (236% slower)
SHA256CryptoServiceProvider - 11.50s (164% slower)
SHA256Cng - 11.71s (168% slower)
SHA512Managed - 61.04s (1365% slower)
SHA512CryptoServiceProvider - 21.88s (489% slower)
SHA512Cng - 22.19s (496% slower)

(.NET 4.6, VS 2015, F# 4.0):

MD5CryptoServiceProvider elapled 2.55
SHA1CryptoServiceProvider elapled 2.89
SHA256Managed elapled 17.01
SHA256CryptoServiceProvider elapled 8.74
SHA256Cng elapled 8.75
SHA512Managed elapled 23.42
SHA512CryptoServiceProvider 5.81
SHA512Cng elapled 5.79

DMD

MD5 - 16.05s (470% slower)
SHA1 - 2.35s (19% faster)
SHA256 - 47.96s (690% slower (!))
SHA512 - 61.47s (1375% slower (!))

LDC2

MD5 - 2,18s (55% faster)
SHA1 - 2.88s (same)
SHA256 - 6,79s (3% faster)
SHA512 - 4,6s (3% slower)

GDC

MD5 - 2,43 (29% faster)
SHA1 - 2,84 (2% faster)
SHA256 - 12,62 (45% slower)
SHA512 - 8,56 (48% slower)

Scala:

MD5 - 4.2s (23% slower)
SHA1 - 6.09s (110% slower)
SHA256 - 9.96s (42% slower)
SHA512 - 7.32s (63% slower)

Interesting things:

Rust and D (LDC2) show very close results. D is significantly faster on MD5, so it's the winner!
D (DMD) has very bad performance on all algorithms, except SHA1, where it's won.
SHA512Managed .NET class is very slow. Do not use it.

Rust: fib

2015-03-29T21:05:00.000+03:00

Rust is an interesting language. It is not a primitive one, like Go where we don't have ADTs, pattern matching and generics (but we do have Nils). And it's advertising as a safe and performant system language. Today is the very first day I'm looking at it. Let's "smoke" test it with Fibonacci :)

Debug: 3.44 seconds, release: 1.66 seconds. This is not very impressive, but pretty fast indeed.

C# - 1.26
D (DMD) - 1.3
F# - 1.38
Nemerle - 1.45
Rust - 1.66
Haskell - 2.8
Clojure - 9
Erlang - 17
Ruby - 60
Python - 120

It's very interesting how it'll behave in concurrent Fibonacci test.

The compiler is quite slow: it takes 2-3 seconds to build this tiny program.

Parallel reduce: Hopac, Asyncs, Tasks and Scala's Futures

2015-01-10T17:11:00.000+03:00

Tuomas Hietanen posted a parallel reduce function that uses TPL Tasks. I found it interesting to compare performance of this function with analogues implemented using F# Asyncs, Hopac Jobs and Scala Futures.

The author uses noop long-running reduce function to show that it's really run in parallel. In this blog post we are benchmarking another aspect of the implementations: how much extra cost is introduced by a particular parallezation mechanism (library) itself.

We translate the original code almost as is to Tasks and Hopac:

And Scala's Futures:

The results (Core i5, 4 cores):

Sequential List.reduce: Real: 00:00:00.014, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
Tasks: Real: 00:00:01.790, CPU: 00:00:05.678, GC gen0: 36, gen1: 10, gen2: 1
Hopac: Real: 00:00:00.514, CPU: 00:00:01.482, GC gen0: 27, gen1: 2, gen2: 1
Asyncs: Real: 00:00:37.872, CPU: 00:01:48.405, GC gen0: 90, gen1: 29, gen2: 4
Scala Futures: 4.8 seconds

(Hopac - 3.4 times faster, Asyncs - 21.1 times slower, Scala - 1.8 times slower)

Hopac is ~3.5 times faster than TPL. What's wrong with Asyncs? I don't know. Maybe they are not intended for highly concurrent scenarios. Or my code may not be the most efficient. Any ideas, guys?

Let's test the leaders on larger arrays:

(Hopac is 3.37 times faster, Scala is 1.5 times slower)

(Hopac is 5.25 times faster, Scala is 1.05 times slower)

Fibonacci: Hopac vs Async vs TPL Tasks on .NET and Mono

2015-01-07T20:51:00.002+03:00

Hopac claims that its Jobs are much more lightweight that F# Asyncs. There are many benchmarks on Hopac github repository, but I wanted to make a simple and straightforward benchmark and what could be simpler that parallel Fibonacci algorithm? :) (actually there's a more comprehensive benchmark in the Hopac repository itself, see Fibonacci.fs)

Sequential Fibonacci function is usually defined as

So write a parallel version in Hopac where each step is performed in a Job and all these Jobs are (potentially) run in Parallel by Hopac's scheduler

An equivalent parallel algorithm written using F# Asyncs

...and using TPL Tasks

All three functions create *a lot* of parallel jobs/asyncs/tasks. For example, for calculating fib (34) they create ~14 million of jobs (this is why Fibonacci was chose for this test). To make them work efficiently we will use the sequential version of fib for small N, then switch to parallel version

Now we can run both of the function with different "level"s in order to find on which value the functions starts to perform good (x-axis: level, y-axis: time (ms), blue line: the sequential function, orange line: hopac/async/tasks function):

Hopac

Async

Tasks

Hopac reaches performance equivalent to the sequential implementation at level = 9, Async - at level = 17 and Tasks at level = 11.

If we modify the code so we can count how many jobs/asyncs are created during the calculation

We get the following results (n = 42):

* Sequential, Real: 00:00:01.849, CPU: 00:00:01.840, GC gen0: 0, gen1: 0, gen2: 0

* Hopac (level = 9) jobs: 28761996, Real: 00:00:01.700, CPU: 00:00:05.600, GC gen0: 89, gen1: 1, gen2: 0

* Async (level = 17) asyncs: 605898, Real: 00:00:01.515, CPU: 00:00:04.804, GC gen0: 4, gen1: 2, gen2: 0

* Tasks (level = 11) tasks: 5675789, Real: 00:00:01.813, CPU: 00:00:06.302, GC gen0: 18, gen1: 0, gen2: 0

So, Hopac was able to create and processed ~47x more jobs than Async and ~5x more jobs than Tasks. Hopac is impressive and F# Asyncs are frustrating.

PS: Rewriting the async version without async computation explicit expression, like this

does not improve performance at all.

Running on Mono (Ubuntu 14.10 x64, mono 3.10)

* Sequential, Real: 00:00:02.637, CPU: 00:00:02.636, GC gen0: 0, gen1: 0

* Hopac (level = 17) jobs: 629133, Real: 00:00:02.447, CPU: 00:00:06.106, GC gen0: 26, gen1: 1

* Async (level = 21) asyncs: 92375, Real: 00:00:02.845, CPU: 00:00:05.590, GC gen0: 86, gen1: 3

* Tasks (level = 33) tasks: 143, Real: 00:00:14.111, CPU: 00:00:03.782, GC gen0: 0, gen1: 0

Hopac can handle ~6.8x more jobs than F# Async. I'm not sure if F# asyncs performs very well on Mono or it's because everything works extremely slowly there. What about TPL, it's obviously broken on Mono (official Hopac Fibonacci benchmark does not even run TPL version on mono: Fibonacci.fs#L233).

Update 10.09.2017 - use BenchmarkDotNet

n = 30, level = 15

 Method |     Mean |     Error |    StdDev |
------- |---------:|----------:|----------:|
    Fib | 8.208 ms | 0.0432 ms | 0.0383 ms |
   HFib | 1.860 ms | 0.0045 ms | 0.0042 ms |
   AFib | 4.921 ms | 0.0330 ms | 0.0292 ms |
   TFib | 2.229 ms | 0.0184 ms | 0.0172 ms |

n = 20, level = 0

 Method |         Mean |      Error |       StdDev |
------- |-------------:|-----------:|-------------:|
    Fib |     68.21 us |   1.258 us |     1.177 us |
   HFib |    356.21 us |   7.180 us |    11.595 us |
   AFib | 31,815.44 us | 636.249 us | 1,524.413 us |
   TFib |  1,623.25 us |  32.206 us |    33.073 us |

D

2014-10-25T17:44:00.002+04:00

I've to say, D is very interesting language with several unique features. What about performance? How it compares to VM-based languages?

It took 3.6 seconds in debug mode and 1.3 seconds in release mode, which is on pair with F#.

C# - 1.26
D (DMD) - 1.3
F# - 1.38
Nemerle - 1.45
Haskell - 2.8
Clojure - 9
Erlang - 17
Ruby - 60
Python - 120

Scala pros and cons from F# dev view

2014-08-15T16:15:00.002+04:00

Recently I started to learn Scala (for about 2 weeks now). Here is prons and cons so far (note: I've not written any serious code yet, just have read "Scala for the impatient" and now reading "Programming in Scala" by Odersky):

Prons

Passing not evaluated block as argument ("by-name arguments". Allows to develop better DSLs)
Macros!
There are few libraries written with macros, impossible to implement in a language like c# or f# (MacWire etc.)

Cons

No not-nullable types (this is a huge one)
Not whitespace sensitive (curly braces everywhere)
No type inference for arguments and, sometimes, for result type (signatures involving generic types may be really hairy)
No compiler warning on implicitly discarded expression value (possibly wrong code like arr.map(x => x * 2); arr.map(x => x * 3). The result of the first map is discarded silently. In contrast, F# forces us to write arr |> Array.map (fun x -> x * 2) |> ignore; arr |> Array.map (fun x -> x * 3).

STM revisited: add Scala

2014-08-13T22:19:00.002+04:00

Some time ago I compared F# and Haskell STM implementation performance. Today I'm adding Scala into it:

So, it's more than two times slower than Haskell, but more than 3 times faster than F#. Interesting.

Duck typing in Scala

2014-08-10T11:57:00.001+04:00

Scala keeps surprising me. It has much good stuff that F# has. For example, I believed that statically resolved type parameters are unique to F#. Turns out this is not the case. Scala has equivalent feature called "structural types". It has nice syntax compared to F#'s statically resolved type parameters. However, it uses reflection under the hood while F# uses inlining to instantiate functions in-place at compile time. This approach should definitely have awful performance characteristics. Surprisingly, it's not the case:

Static call took about 0.5 seconds, "duck typing" call took about 4 seconds. It only 8-10 times slower than static call. Frankly, I expected something like 100x - 1000x degradation. So, it's not as fast as the equivalent feature in F#, but it certainly practically useful in many circumstances.

STM: F# vs Haskell

2013-09-22T20:57:00.000+04:00

STM is a very nice parallel programming model used intensively in Haskell and Clojure. There's a F# implementation which can be found in FSharpx library.

Today I'm going to test performance of both the Haskell and the F# STMs. The test is very simple - read a couple TVars, check their equality, then write them back incremented by 1, repeat a million times.

First, the Haskell code:

So, it took about 170 ms. OK, now F#: It took about 1,6 seconds which is an order of magnitude slower than the Haskell result. It's rather frustrating.

Grouping consecutive integers in F# and Haskell

2013-09-01T18:38:00.001+04:00

Let's look at what F# and Haskell can offer us while reimplementing grouping consecutive integers in C# algorithm.

F#:

Haskell:

For my current taste, the Haskell version is more readable due to the awesome separate type annotation and to the lovely Erlang-style pattern matching on function arguments.

Haskell: performance

2013-08-24T18:44:00.001+04:00

It turned out that it's not Haskell who was slow in my previous Fibonacci test. It was my lack of Haskell knowledge. Haskell has two integral types - Int, which has bounds -2147483648..2147483647 and Integer, which is unbounded. Both types implement Integral type class, so out fib function can be defined in terms of this type class:
OK, now we can test performance of the function parametrizing it with the two concrete types. Integer first:
We get our previous result ~15 seconds which is rather slow. Now test it with Int type:
Whoa! The Int type is ~6 times faster than Integer! And with result of 2.8 seconds Haskell's took the third position in our small rating :) Current list (in seconds):

C# - 1.26
F# - 1.38
Nemerle - 1.45
Haskell - 2.8
Clojure - 9
Erlang - 17
Ruby - 60
Python - 120

Beginning Haskell

2013-08-17T22:05:00.002+04:00

Okey, it passed about a year I started learn and use F# and it's now time to learn Haskell. As usual, I started from the naive Fibonacci function: The performance in this particular algorithm is not fantastic, it's actually ~4 times slower than F#. It's OK for now.

The Either monad: we can finally get rid of exceptions

2013-06-16T12:52:00.000+04:00

Jessica Kerr recently wrote on the very interesting topic: what the best way to get rid of exceptions and of the mess they introduce into the program flow.
Simply put, she's gently introduced the Either monad in Scala language.
Although I find Scala to be a very advanced OO language with reasonably good FP support, I don't think the code Jessica used in her post can force anybody to change the habitual exception handling flow in favor of the Data Flow. This's why: Scala is too verbose.
But this is not the case in a more advanced functional language like OCaml, Haskell or F#. Let's take a look what the latter can offer:

The code is much more clear because of using the monadic function composition (>=>) and the amazing type inference.
Conclusion: if you do use monands, use their full power.