Configuring sql.DB for Better Performance

There are a lot of good tutorials which talk about Go's sql.DB type and how to use it to execute SQL database queries and statements. But most of them gloss over the SetMaxOpenConns(), SetMaxIdleConns() and SetConnMaxLifetime() methods — which you can use to configure the behavior of sql.DB and alter its performance.

In this post I'd like to explain exactly what these settings do and demonstrate the (positive and negative) impact that they can have.

Open and idle connections

I'll begin with a little background.

A sql.DB object is a pool of many database connections which contains both 'open' and 'idle' connections. A connection is marked open when you are using it to perform a database task, such as executing a SQL statement or querying rows. When the task is complete the connection becomes idle.

When you instruct sql.DB to perform a database task, it will first check if any idle connections are already available in the pool. If one is available then Go will reuse the existing connection and mark it as open for the duration of the task. If there are no idle connections in the pool when you need one then Go will create a new additional connection and 'open' it.

The SetMaxOpenConns method

By default there's no limit on the number of connections that can be open at the same time. But you can implement your own limit via the SetMaxOpenConns() method like so:

// Initialise a new connection pool
db, err := sql.Open("postgres", "postgres://user:pass@localhost/db")
if err != nil {
    log.Fatal(err)
}

// Set the maximum number of concurrently open connections to 5. Setting this
// to less than or equal to 0 will mean there is no maximum limit (which
// is also the default setting).
db.SetMaxOpenConns(5)

In this example code the pool now has a maximum limit of 5 concurrently open connections. If 5 connections are already open and another new connection is needed, then the application will be forced to wait until one of the 5 open connections is freed up and becomes idle.

To illustrate the impact of changing MaxOpenConns I ran a benchmark test with the maximum open connections set to 1, 2, 5, 10 and unlimited. The benchmark executes parallel INSERT statements on a PostgreSQL database and you can find the code in this gist. Here's the results:

BenchmarkMaxOpenConns1-8                 500       3129633 ns/op         478 B/op         10 allocs/op
BenchmarkMaxOpenConns2-8                1000       2181641 ns/op         470 B/op         10 allocs/op
BenchmarkMaxOpenConns5-8                2000        859654 ns/op         493 B/op         10 allocs/op
BenchmarkMaxOpenConns10-8               2000        545394 ns/op         510 B/op         10 allocs/op
BenchmarkMaxOpenConnsUnlimited-8        2000        531030 ns/op         479 B/op          9 allocs/op
PASS

For this benchmark we can see that the more open connections that are allowed, the less time is taken to perform the INSERT on the database (3129633 ns/op with 1 open connection compared to 531030 ns/op for unlimited connections — about 6 times quicker). This is because the more open connections that there are, the less time (on average) the benchmark code needs to wait for an open connection to be freed and made idle (ready for use) again.

The SetMaxIdleConns method

By default sql.DB allows a maximum of 2 idle connections to be retained in the connection pool. You can change this via the SetMaxIdleConns() method like so:

// Initialise a new connection pool
db, err := sql.Open("postgres", "postgres://user:pass@localhost/db")
if err != nil {
    log.Fatal(err)
}

// Set the maximum number of concurrently idle connections to 5. Setting this
// to less than or equal to 0 will mean that no idle connections are retained.
db.SetMaxIdleConns(5)

In theory, allowing a higher number of idle connections in the pool will improve performance because it makes it less likely that a new connection will need to be established from scratch — therefore helping to save resources.

Lets take a look at the same benchmark with the maximum idle connections is set to none, 1, 2, 5 and 10 (and the number of open connections is unlimited):

BenchmarkMaxIdleConnsNone-8          300       4567245 ns/op       58174 B/op        625 allocs/op
BenchmarkMaxIdleConns1-8            2000        568765 ns/op        2596 B/op         32 allocs/op
BenchmarkMaxIdleConns2-8            2000        529359 ns/op         596 B/op         11 allocs/op
BenchmarkMaxIdleConns5-8            2000        506207 ns/op         451 B/op          9 allocs/op
BenchmarkMaxIdleConns10-8           2000        501639 ns/op         450 B/op          9 allocs/op
PASS

When MaxIdleConns is set to none, a new connection has to be created from scratch for each INSERT and we can see from the benchmarks that the average runtime and memory usage is comparatively high.

Allowing just 1 idle connection to be retained and reused makes a massive difference to this particular benchmark — it cuts the average runtime by about 8 times and reduces memory usage by about 20 times. Going on to increase the size of the idle connection pool makes the performance even better, although the improvements are less pronounced.

So we should you maintain a large idle connection pool? The answer is it depends on the application.

It's important to realise that keeping an idle connection alive comes at a cost — it takes up memory which can otherwise be used for both your application and the database.

It's also possible that if a connection is idle for too long then it may become unusable. For example, MySQL's wait_timeout setting will automatically close any connections that haven't been used for 8 hours (by default).

When this happens sql.DB handles it gracefully. Bad connections will automatically be retried twice before giving up, at which point Go will remove the connection from the pool and create a new one. So setting MaxIdleConns too high may actually result in connections becoming unusable and more resources being used than if you had a smaller idle connection pool (with fewer connections that are used more frequently). So really you only want to keep a connection idle if you're likely to be using it again soon.

One last thing to point out is that MaxIdleConns should always be less than or equal to MaxOpenConns. Go enforces this and will automatically reduce MaxIdleConns if necessary. This Stack Overflow post nicely describes why:

There is no point in ever having any more idle connections than the maximum allowed open connections, because if you could instantaneously grab all the allowed open connections, the remain idle connections would always remain idle. It's like having a bridge with four lanes, but only ever allowing three vehicles to drive across it at once.

The SetConnMaxLifetime method

Let's now take a look at the SetConnMaxLifetime() method which sets the maximum length of time that a connection can be reused for. This can be useful if your SQL database also implements a maximum connection lifetime or if — for example — you want to facilitate gracefully swapping databases behind a load balancer.

You use it like this:

// Initialise a new connection pool
db, err := sql.Open("postgres", "postgres://user:pass@localhost/db")
if err != nil {
    log.Fatal(err)
}

// Set the maximum lifetime of a connection to 1 hour. Setting it to 0
// means that there is no maximum lifetime and the connection is reused
// forever (which is the default behavior).
db.SetConnMaxLifetime(time.Hour)

In this example all our connections will 'expire' 1 hour after they were first created, and cannot be reused after they've expired. But note:

In theory, the shorter ConnMaxLifetime is the more often connections will expire — and consequently — the more often they will need to be created from scratch.

To illustrate this I ran the benchmarks with ConnMaxLifetime set to 100ms, 200ms, 500ms, 1000ms and unlimited (reused forever), with the default settings of unlimited open connections and 2 idle connections. These time periods are obviously much, much shorter than you'd use in most applications but they help illustrate the behaviour well.

BenchmarkConnMaxLifetime100-8               2000        637902 ns/op        2770 B/op         34 allocs/op
BenchmarkConnMaxLifetime200-8               2000        576053 ns/op        1612 B/op         21 allocs/op
BenchmarkConnMaxLifetime500-8               2000        558297 ns/op         913 B/op         14 allocs/op
BenchmarkConnMaxLifetime1000-8              2000        543601 ns/op         740 B/op         12 allocs/op
BenchmarkConnMaxLifetimeUnlimited-8         3000        532789 ns/op         412 B/op          9 allocs/op
PASS

In these particular benchmarks we can see that memory usage was more than 3 times greater with a 100ms lifetime compared to an unlimited lifetime, and the average runtime for each INSERT was also slightly longer.

Exceeding connection limits

Lastly, this article wouldn't be complete without mentioning what happens if you exceed a hard limit on the number of database connections.

As an illustration, I'll change my postgresql.conf file so only a total of 5 connections are permitted (the default is 100)...

max_connections = 5

And then rerun the benchmark test with unlimited open connections...

BenchmarkMaxOpenConnsUnlimited-8    --- FAIL: BenchmarkMaxOpenConnsUnlimited-8
    main_test.go:14: pq: sorry, too many clients already
    main_test.go:14: pq: sorry, too many clients already
    main_test.go:14: pq: sorry, too many clients already
FAIL

As soon as the hard limit of 5 connections is hit my database driver (pq) immediately returns a sorry, too many clients already error message instead of completing the INSERT.

To prevent this error we need to set the total maximum of open and idle connections in sql.DB to comfortably below 5. Like so:

// Initialise a new connection pool
db, err := sql.Open("postgres", "postgres://user:pass@localhost/db")
if err != nil {
    log.Fatal(err)
}

// Set the number of open and idle connection to a maximum total of 3.
db.SetMaxOpenConns(2)
db.SetMaxIdleConns(1)

Now there will only ever be a maximum of 3 connections created by sql.DB and the benchmark should run without any errors.

But doing this comes with a big caveat: when the open connection limit is reached, any new database tasks that your application needs to execute will be forced to wait until a connection becomes free.

For some applications that behavior might be fine, but for others it might not. For example, in a web application it could arguably be better to immediately log the error message and send a 500 Internal Server Error to the user, rather than having their HTTP request hang and potentially timeout while waiting for a free connection.

In summary