Many Go developers are familiar with the dictum, Never start a goroutine without knowing how it will stop. And yet, it remains incredibly easy to leak goroutines. Let’s look at one common way to leak a goroutine and how to fix it.
To do that, we are going to build a library with a custom map type whose keys expire after a configured duration. We will call the library ttl, and it will have an API that looks like this:
// Create a map with a TTL of 5 minutes
m := ttl.NewMap(5*time.Minute)
// Set a key
m.Set("my-key", []byte("my-value"))
// Read a key
v, ok := m.Get("my-key")
// "my-value"
fmt.Println(string(v))
// true, key is present
fmt.Println(ok)
// ... more than 5 minutes later
v, ok := m.Get("my-key")
// no value here
fmt.Println(string(v) == "")
// false, key has expired
fmt.Println(ok)
To ensure keys expire, we start a worker goroutine in the NewMap
function:
func NewMap(expiration time.Duration) *Map {
m := &Map{
data: make(map[string]expiringValue),
expiration: expiration,
}
// start a worker goroutine
go func() {
for range time.Tick(expiration) {
m.removeExpired()
}
}()
return m
}
The worker goroutine will run every configured duration invoking a method on the
map to remove any expired keys. This means SetKey
will need to record the key’s
entry time, which is why the data
field includes an expiringValue
type which
associates the actual value with an expiration time:
type expiringValue struct {
expiration time.Time
data []byte // the actual value
}
To the untrained eye, the invocation of the worker goroutine may seem fine. And if this wasn’t a post about leaking goroutines, it would be incredibly easy to scan over the lines without raising an eyebrow. Nonetheless, we leak a goroutine inside the constructor. The question is, how?
Let’s walk through a typical lifecycle of a Map
. First, a caller creates an
instance of the Map
. After creating the instance, a worker goroutine is now
running. Next, the caller might make any number of calls to Set
and Get
.
Eventually, though, the caller will finish using the Map
instance and release
all references to it. At that point, the garbage collector would normally be
able to collect the instance’s memory. However, the worker goroutine is still
running and is also holding onto a reference of the Map
instance. Since there
are no explicit calls to stop the worker, we have leaked a goroutine and have
leaked the instance’s memory as well.
Let’s make the problem especially obvious. To do that, we will use the runtime package to view statistics about the memory allocator and the number of goroutines running at a particular moment in time.
func main() {
go func() {
var stats runtime.MemStats
for {
runtime.ReadMemStats(&stats)
fmt.Printf("HeapAlloc = %d\n", stats.HeapAlloc)
fmt.Printf("NumGoroutine = %d\n", runtime.NumGoroutine())
time.Sleep(5*time.Second)
}
}()
for {
work()
}
}
func work() {
m := ttl.NewMap(5*time.Minute)
m.Set("my-key", []byte("my-value"))
if _, ok := m.Get("my-key"); !ok {
panic("no value present")
}
// m goes out of scope
}
It doesn’t take long to see the heap allocations and the number of goroutines are growing much, much too fast.
HeapAlloc = 76960
NumGoroutine = 18
HeapAlloc = 2014278208
NumGoroutine = 1447847
HeapAlloc = 3932578560
NumGoroutine = 2832416
HeapAlloc = 5926163224
NumGoroutine = 4322524
So now it’s clear we need to stop that goroutine. Currently, the Map
API
provides no way to shutdown the worker goroutine. It would be nice to avoid any
API changes and still stop the worker goroutine when the caller is done with the
Map
instance. But only the caller will know when they are done.
A common pattern to solve this problem is to implement the
io.Closer interface. When a caller is done with the Map
, they can
call Close
to tell the Map
to stop its worker goroutine.
func (m *Map) Close() error {
close(m.done)
return nil
}
The invocation of the worker goroutine in our constructor now looks like this:
func NewMap(expiration time.Duration) *Map {
m := &Map{
data: make(map[string]expiringValue),
expiration: expiration,
done: make(chan struct{}),
}
// start a worker goroutine
go func() {
ticker := time.NewTicker(expiration)
defer ticker.Stop()
for {
select {
case <-ticker.C:
m.removeExpired()
case <-m.done:
return
}
}
}()
return m
}
Now the worker goroutine includes a select
statement which checks the done
channel in addition to the ticker’s channel. Note, we have swapped out
time.Tick as well, as it provides no means for a clean shutdown and will
also leak.
After making the changes, here is what our simplistic profiling looks like:
HeapAlloc = 72464
NumGoroutine = 6
HeapAlloc = 5175200
NumGoroutine = 59
HeapAlloc = 5495008
NumGoroutine = 35
HeapAlloc = 9171136
NumGoroutine = 240
HeapAlloc = 8347120
NumGoroutine = 53
The numbers are hardly small, which is a result of work
being invoked in a tight
loop. More importantly, though, we no longer have the massive growth in the
number of goroutines or heap allocations. And that’s what we’re after. Note, the
final code for may be found here.
If anything, this post provides an obvious example of why knowing when a goroutine will stop is so important. As a secondary conclusion, we might say that monitoring the number of goroutines in an application is just as important. Such a monitor provides a warning system if a goroutine leak sneaks into the codebase. It’s also worth keeping in mind that sometimes goroutine leaks take days if not weeks to manifest in an application. And so it’s worth having monitors for both shorter and longer timespans.
Thanks to Jean de Klerk and Jason Keene who read drafts of this post.