Posts

How Go runtime makes concurrent code simple

5 years ago I showed a way to fight with "all-threads-busy" problem while writing Hopac code . The same problem exists on JVM as well. The root is in the cooperative concurrency model, which is based on thread pools, async/futures/etc. are scheduled to run on them. So, if an async is blocked by an IO call of if it's just executing some computation, the OS thread is not able to do anything else: the scheduler is unable to "pause" such a blocked async and execute anothe one. So, asynchronous code on .NET, JVM, Rust, etc. is inherently unsafe: it's not guaranteed that all existing asyncs progress. The things are different in Go runtime: the scheduler is preemptive, so it can "pause" goroutines at any point: not only at function calls, at loops, but on just any execution point (almost). This makes writing concurrent code dead easy: there's no "async" functions, lambdas or blocks of code, every function or call are the same, being them ex

Load balancing: Rancher vs Swarm

Image
Rancher has a load balancer built it (HAProxy). Let's compare its performance vs Docker Swarm one. I will use 3 identical nodes: 192GB RAM 28-cores i5 Xeon 1GBit LAN CentOS 7 Docker 1.12 Rancher 1.1.2 I will benchmark against a hello world HTTP server written in Scala with Akka-HTTP and Spray JSON serialization (I don't think it matters though), sources are on GitHub . I will use Apache AB benchmark tool. As a baseline, I exposed the web server port outside the container and run the following command: ab -n 100000 -c 20 -k http://1.1.1.1:29001/person/kot It shows 22400 requests per second. I'm not sure whether it's a great result for Akka-HTTP, considering that some services written in C can handle hundreds of thousands requests per second, but it's not the main topic of this blog post (I ran the test with 100 concurrent connections (-c 100), and it shows ~50k req/sec. I don't know if this number is good enough either :) ) Now I creat

Running computational intensive code outside of Hopac scheduler

Hopac uses a bounded pool of worker threads, number of which is equal to number of CPU cores (by default). A dangerous thing about this design is that a situation is possible where all the threads are busy doing some CPU intensive work and no other Hopac jobs can proceed. A good solution for this is running such a CPU bound computations on the standard .NET thread pool, freeing Hopac pool for more intelligent work. I found a nice code in one of the older Hopac GitHub discussions which schedules a ordinary function on ThreadPool and represents the result as a Hopac job. Here is a test with explanations:

Upcoming F# struct tuples: are they always faster?

Image
Don Syme  has been working  on struct tuples for F# language. Let's see if they are more performant than "old" (heap allocated) tuples in simple scenario: returning tuple from function. The code is very simple: Decompiled code in Release configuration: Everything we need to change to switch to struct tuples, is adding "struct" keyword in front of constructor and pattern matching: Decompiled code in Release configuration: I don't know about you, but I was surprised with those results. The performance roughly the same. GC is not a bottleneck as no objects were promoted to generation 1. Conclusions: Using struct tuples as a faster or "GC-friendly" alternative to return multiple values from functions does not make sense. Building in release mode erases away heap allocated tuples, but not struct tuples. Building in release mode inlines the "foo" function, which makes the code 10x faster. You can fearlessly allo

Hash maps: Rust, F#, D, Go, Scala

Image
Let's compare performance of hashmap implementation in Rust, .NET, D (LDC) and Go. Rust: F#: As you can see, Rust is slower at about 17% on insersions and at about 21% on lookups. Update As @GolDDranks suggested on Twitter , since Rust 1.7 it's possible to use custom hashers in HashMap. Let's try it: Yes, it's significantly faster: additions is only 5% slower than .NET implementation, and lookups are 32% *faster*! Great. Update: D added LDC x64 on windows It's very slow at insertions and quite fast on lookups. Update: Go added Update: Scala added Compared to Scala all the other languages looks equally fast :) What's worse, Scala loaded all four CPU cores at almost 100% during the test, while others used roughly single core. My guess is that JVM allocated so many objects (each Int is an object, BTW), that 3/4 of CPU time was spend for garbage collecting. However, I'm a Scala/JVM noob, so I just could write the whole b

Akka.NET Streams vs Hopac vs AsyncSeq

Image
Akka.NET Streams is a port of its Scala/Java counterpart and intended to execute complex data processing graphs, optionally in parallel and even distributed. It has quite different semantics compared to Hopac's one and it's wrong to compare them feature-by-feature, but it's still interesting to benchmark them in a scenario which both of them supports well: read lines of a file asynchronously, filter them by a regex in controlled degree of parallelism, then normalize the lines with a simple string manipulation algorithm, also in parallel, then count the number of lines. Firts, Akka.NET: Note that I have to use the empty string as indication that the regular expression does not match. I should use `option ` of course (just like I do in the Hopac snippet below), but Akka.NET Streams is strict about what is allowed to be returned by its combinators like `Map` or `Filter`, in particular, you cannot return `null`, doing so makes Akka.NET unhappy and it will throw exception

Regular expressions: Rust vs F# vs Scala

Image
Let's implement the following task: read first 10M lines from a text file of the following format: then find all lines containing Microsoft namespace in them, and format the type names the usual way, like "Microsoft.Win32.IAssemblyEnum". First, F#: Now Rust: After several launches the file was cached by the OS and both implementations became non IO-bound. F# one took 29 seconds and 31MB of RAM at peak; Rust - 11 seconds and 18MB. The Rust code is as twice as long as F# one, but it's handling all possible errors explicitly - no surprises at runtime at all. The F# code may throw some exceptions (who knows what kind of them? Nobody). It's possible to wrap all calls to .NET framework with `Choice.attempt (fun _ -> ...)`, then define custom Error types for regex related code, for IO one and a top-level one, and the code'd be even longer then Rust's, hard to read and it would still give no guarantee that we catch all possible exceptions. Up