Locks provide synchronization but may introduce significant performance problems if used poorly. One common place to find locks and performance problems are in HTTP handlers. In particular, it is easy to inadvertently lock around network I/O. To understand what this means, it helps to look at an example. For this post, we will be using Go.
To do that, we are going to build a small HTTP server that reports the number of requests it has received. All the code for this post may be found here.
A server that reports a number of requests might look like this:
package main
// import statements
// ...
const (
payloadBytes = 1024 * 1024
)
var (
mu sync.Mutex
count int
)
// register handler and start server in main
// ...
// BAD: Don't do this.
func root(w http.ResponseWriter, r *http.Request) {
mu.Lock()
defer mu.Unlock()
count++
msg := []byte(strings.Repeat(fmt.Sprintf("%d", count), payloadBytes))
w.Write(msg)
}
The root
handler uses the common pattern of locking and unlocking with a
defer
statement at the top of the function. Next, while still holding the
lock, the handler increments count
, creates a payload by repeating the count
variable payloadBytes
times, and finally writes the payload to the
http.ResponseWriter
.
To the untrained eye, this handler may look perfectly correct. In fact, there is a significant performance problem. The handler holds the lock around network I/O, which will cause the handler to execute only as fast as the slowest client.
To see this problem first hand, we need to simulate a slow client reader. In
fact, it is partly because some clients are so slow that configuring
timeouts is an absolute necessity for any Go HTTP server exposed
directly on the open internet. The simulation can be tricky, though, on account
of the ways the kernel will buffer writes to and reads from TCP sockets. Let's
say we create a client which initiates a GET
request, but never reads any data
from the socket (see here). Will this be enough to cause the
server to block on w.Write
?
Because the kernel buffers reads and writes, we won't see any slowdown, at least until the buffer is full. So to observe the slowdown, we need to make sure every write fills the buffer. There are two ways to do this: 1) tune the kernel, or 2) write a large number of bytes each time.
Tuning the kernel is itself a fascinating subject. There is the proc directory, there is documentation on all the network-related parameters, there are multiple tutorials on host tuning. But for our purposes, we will take the easy route and simply write a megabyte of data into the socket, which overwhelms the TCP buffers on a vanilla Darwin (v17.4) kernel. Note, to run this demo yourself, you may have to adjust the number of bytes to ensure the buffers are filled.
Now if we start the server, we can use the slow client to observe how fast clients are forced to wait for the slow client. Again, the slow client is here.
First, confirm a request is handled quickly with:
curl localhost:8080/
# Output:
# numerous 1's without any meaningful delay
Now, this time run the slow client first:
# Assuming $GOPATH/src/github.com/gobuildit/gobuildit/lock directory
go run client/main.go
# Output:
dialing
sending GET request
blocking and never reading
With the slow client connected to the server, now try to run a "fast" client:
curl localhost:8080/
# Hangs
We see firsthand how our locking strategy inadvertently blocks faster clients. If we return to our handler and think about our use of the lock, this behavior will make sense.
func root(w http.ResponseWriter, r *http.Request) {
mu.Lock()
defer mu.Unlock()
// ...
}
By locking at the top of the function and adding a deferred call to unlock, we are holding the lock for the duration of the handler. This includes manipulation of shared state, a read of that shared state, and a write over the network. And herein lies the problem. Network I/O is inherently unpredictable. Granted, we may configure timeouts to protect our server from excessively long calls, but we cannot say that all network I/O will complete within a fixed time period.
The key takeaway is to not lock around I/O. In the case here, locking around I/O provides no value whatsoever. By locking around I/O we are allowing our program to be susceptible to an unreliable network and to any slow clients. In effect, we are ceding partial control of our program's synchronization.
Let's rewrite the handler to lock around just the critical section.
// GOOD: Keep the critical section as small as possible and don't lock around
// I/O.
func root(w http.ResponseWriter, r *http.Request) {
mu.Lock()
count++
current := count
mu.Unlock()
msg := []byte(strings.Repeat(fmt.Sprintf("%d", current), payloadBytes))
w.Write(msg)
}
To see the difference, try testing with a slow client and a regular client.
Again, start the slow client:
# Assuming $GOPATH/src/github.com/gobuildit/gobuildit/lock directory
go run client/main.go
Now, use curl
to send a request:
curl localhost:8080/
Observe how the curl
client immediately returns with the expected request
count.
Granted, this example is contrived and much simpler than typical production code. And for synchronized counters, one would be wise to consider the various functions in the atomics package. Nonetheless, I hope it illustrates the importance of thinking carefully about the scope of one's locks. Although there are always exceptions to the rule, in most cases a lock's scope should not include I/O.
Further Reading
- Dancing with Go’s Mutexes
- The complete guide to Go net/http timeouts
- Fallacies of distributed computing
- Documentation for /proc/sys/net/
- Host Tuning
- Tune Network Stack (Buffers Size) To Increase Networking Performance
- Resource Acquisition Is Initialization
Thanks to Jean de Klerk for reading an early draft of this post.