GCP Cloud NAT + Golang’s HTTP client = 🤯
Created: Oct 14, 2020
It simply does not become boring in the tech area. Yesterday we had a funny problem with an actually easy task:
Execute 10.006.525 PUT requests with different parameters against an API
Easy right? I thought so too. But the sequential execution of 10 million
requests would certainly take some time, and I like to build things and not
watch them run. Let’s do it in Golang and use some goroutines for that. Said and
done, and a few moments later I had my program running on my local machine.
go build
and scp
the whole thing to a worker virtual machine running in the
Google Cloud and execute:
INFO[2020-10-13T15:32:51Z] Request #1 finished
INFO[2020-10-13T15:32:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T15:32:51Z] Request #62 finished
INFO[2020-10-13T15:32:51Z] Request #63 finished
INFO[2020-10-13T15:32:52Z] Request #64 finished
…
hmm why did it stop after 64 request? I checked my code, checked it again but i could not find anything unusual. If it is not the developer’s fault, it must be the bloody infrastructure. As always …
So i reached out to David Gleich, one of my company’s Google Cloud experts, and asked him for advise. The first thing we did was to build a simplified dummy without any bells and whistles to rule out an issue with the goroutines:
package main
import (
"log"
"net/http"
)
func main() {
cnt := 0
client := &http.Client{}
for {
cnt++
resp, err := client.Get("https://example.com/")
if err != nil {
log.Printf("Error on request #%d: %s", cnt, err)
continue
}
resp.Body.Close()
log.Printf("Request #%d finished", cnt)
}
}
But we ran into the same problem:
INFO[2020-10-13T15:48:51Z] Request #1 finished
INFO[2020-10-13T15:48:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T15:48:51Z] Request #62 finished
INFO[2020-10-13T15:48:51Z] Request #63 finished
INFO[2020-10-13T15:48:52Z] Request #64 finished
Do you use Cloud NAT with this machine? David asked. Yes, because this virtual machine does not need to be reachable from the outside and therefore does not have an external IP address, but needs Cloud NAT to reach the Internet.
OK lets check the Cloud NAT setup:
64, that sounds familiar. The tooltip describes
Minimum ports per VM instances
as:
By default, a VM instance has 64 ports, which means it can open 64 connections to the same destination IP and port. A VM instance can open up to 64,000 connections across all protocols.
Aha big brain time. Lets change this Cloud NAT setting to 128.
INFO[2020-10-13T16:05:51Z] Request #1 finished
INFO[2020-10-13T16:05:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T16:05:58Z] Request #126 finished
INFO[2020-10-13T16:05:58Z] Request #127 finished
INFO[2020-10-13T16:05:58Z] Request #128 finished
So it seems that there is some kind of correlation 😲
Bloody infrastructure, i told you … But why does the program not close or
reuse the TCP connection? We already had the usual resp.Body.Close()
. After
some try and error we found this Stack Overflow post: Why is Go HTTPS Client not
reusing connections? and there Matt Self answered:
The default client does reuse connections.
Be sure to close the response.
Callers should close resp.Body when done reading from it. If resp.Body is not closed, the Client’s underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent “keep-alive” request.
Additionally, I’ve found that I also needed to read until the response was complete before calling Close().
e.g.
res, _ := client.Do(req) io.Copy(ioutil.Discard, res.Body) res.Body.Close()
Let’s give it a try:
package main
import (
"io"
"io/ioutil"
"log"
"net/http"
)
func main() {
cnt := 0
client := &http.Client{}
for {
cnt++
resp, err := client.Get("https://example.com/")
if err != nil {
log.Printf("Error on request #%d: %s", cnt, err)
continue
}
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
log.Printf("Request #%d finished", cnt)
}
}
go build
, scp
and fingers crossed:
INFO[2020-10-13T16:31:51Z] Request #1 finished
INFO[2020-10-13T16:31:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T16:31:51Z] Request #126 finished
INFO[2020-10-13T16:31:51Z] Request #127 finished
INFO[2020-10-13T16:31:51Z] Request #128 finished
INFO[2020-10-13T16:31:51Z] Request #129 finished
INFO[2020-10-13T16:31:51Z] Request #130 finished
INFO[2020-10-13T16:31:51Z] Request #131 finished
-- and they lived happily ever after --
Very nice.
We could have stopped right there, but that is a little unsatisfactory. I
assumed that the almighty resp.Body.Close()
takes care of everything and
looking at Golang’s code it looks like it already does. So the new question
is:
Why did it not work in our setup?
Right now David Gleich and I are digging a little deeper and hopefully find something on our way through the debugger. I will update this article as soon as there is something new. If you have any idea please drop a comment below.
TL;DR
You need to fully read the body of the response before calling Close()
.
resp, _ := client.Do(req)
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
Source: Why is Go HTTPS Client not reusing connections? by Matt Self.
Disclaimer
- Part of this material might be labeled beta or even alpha by Google. These products or features are in a pre-release state and might change or have limited support. For more information, see the product launch stages. I will do my best to keep this guide up to date when something changes.
- It’s the year 2020 and actually I should not have to mention it. Please do not blindly copy & paste stuff from the Internet into your shell.