Part of our application retrieves files from an internal storage system that behaves similarly to AWS S3. The system has four heads and is fronted by a network load balancer. Clients of the storage system, named Repono, retrieve authentication tickets from our vending system. Repono then does an authorization look up and if everything is OK, the resource (aka “the file”) is handed out over HTTPS. Normally, our performance of this system is really fantastic and looks like this chart:

Response time as recorded by our internal monitoring system
About a month ago, the performance of this system really changed. We went from routine performance under 1 second to sporadic responses clocking in over 10 seconds.

A different monitoring system using ELK measuring the response time of our middleware to Repono
This problem did not reveal itself easily. The major client of Repono was recently upgraded to a newer version of Microsoft .NET. One step we took was to revert the change and downgrade the version of .NET. For around 20 to 30 minutes, the performance was looking good, and then it reverted to the lumpy 10 second performance.
What else could have changed? This was not easy to see. The HTTPS traffic did not allow easy inspection and alternative testing with our go to baseline client, cURL could not reproduce the results. The problem was clearly affecting production and we couldn’t easily change out our middleware tools in .NET for a set of shell based testing scripts using cURL.
We needed to see inside the HTTP and TCP conversation of the .NET HTTP client, but the normal amount of debugging and logging did not record the details.
Thanks to the big’ol Internet, we stumbled on this helpful article from Mike Hadlow to get to the level of information we wanted out of .NET and the HTTP client.The only problem is that this logging doesn’t include time stamps and there was too much traffic to keep up and detangle out of the audit logs. Adjusting the load balancer, we could force most of the traffic through patched software with new logging. This greatly reduced the volume of data and we tried to watch in real time. This is what we saw
1 System.Net Verbose: 0 : [5156] ConnectStream#43140910::Close()
2 System.Net Verbose: 0 : [5156] Exiting ConnectStream#43140910::Close()
3 System.Net Verbose: 0 : [5156] Exiting MyWebClient#1109663::DownloadData() -> Byte[]#50053605
4 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
5 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
6 System.Net Verbose: 0 : [5156] WebRequest::Create(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED)
7 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::HttpWebRequest(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
8 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::HttpWebRequest()
9 System.Net Verbose: 0 : [5156] Exiting WebRequest::Create() -> HttpWebRequest#45035759
10 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::GetResponse()
11 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466
12 System.Net Information: 0 : [5156] Associating Connection#31630514 with HttpWebRequest#45035759
13 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#66385004
14 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1
15 System.Net Information: 0 : [5156] ConnectStream#66385004 - Sending headers
16 {
17 x-auth-user: REDACTED
18 x-auth-expiry: 2015-08-16T01:22:43Z
19 x-auth-ticket: REDACTED
20 Accept: text/xml
21 Content-Type: text/xml
22 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED
23 Host: REDACTED
24 }.
25 System.Net Error: 0 : [5156] Exception in the HttpWebRequest#45035759:: - The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
26 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466
27 System.Net Information: 0 : [5156] Associating Connection#14267127 with HttpWebRequest#45035759
28 System.Net Information: 0 : [5156] Connection#14267127 - Created connection from REDACTED:56478 to REDACTED:443.
29 System.Net Information: 0 : [5156] TlsStream#34867337::.ctor(host=REDACTED, #certs=0)
30 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#62287651
31 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1
32 System.Net Information: 0 : [5156] SecureChannel#50704654::.ctor(hostname=REDACTED, #clientCertificates=0, encryptionPolicy=RequireEncryption)
33 System.Net Information: 0 : [5156] SecureChannel#50704654 - Left with 0 client certificates to choose from.
34 System.Net Information: 0 : [5156] Using the cached credential handle.
35 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = (null), targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
36 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffer length=0, Out-Buffer length=159, returned code=ContinueNeeded).
37 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
38 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
39 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
40 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
41 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
42 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
43 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
44 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=134, returned code=ContinueNeeded).
45 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
46 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
47 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
48 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=OK).
49 System.Net Information: 0 : [5156] Remote certificate: [Version]
50 V3
51 REDACTED
52 [Signature Algorithm]
53 sha256RSA(1.2.840.113549.1.1.11)
54 [Public Key]
55 Algorithm: RSA
56 Length: 2048
57 Key Blob: REDACTED
58 System.Net Information: 0 : [5156] ProcessAuthentication(Protocol=Tls, Cipher=Aes128 128 bit strength, Hash=Sha1 160 bit strength, Key Exchange=44550 256 bit strength).
59 System.Net Information: 0 : [5156] ConnectStream#62287651 - Sending headers
60 {
61 x-auth-user: REDACTED
62 x-auth-expiry: 2015-08-16T01:22:43Z
63 x-auth-salt: REDACTED
64 x-auth-ticket: REDACTED
65 x-auth-dev-ticket: REDACTED
66 Accept: text/xml
67 Content-Type: text/xml
68 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED
69 Host: REDACTED
70 }.
71 System.Net Information: 0 : [5156] Connection#14267127 - Received status line: Version=1.1, StatusCode=200, StatusDescription=OK.
72 System.Net Information: 0 : [5156] Connection#14267127 - Received headers
73 {
74 X-Timestamp: 1413849343.99091
75 X-Preauth-Uri: /repono/3dfc19b30213a7fd7297e1fb32815b95d15e7187a91e13c84b1c423c/REDACTED?token=PREA_49cd6e580ce3460948610e05e1dba031adf5bc19b098a2a98226cf4a
76 X-Trans-Id: tx72179a9abf4a4d7e8cc65-0055cfe147
77 Accept-Ranges: bytes
78 Content-Length: 5378129
79 Content-Type: application/pdf
80 Date: Sun, 16 Aug 2015 01:03:03 GMT
81 ETag: 40fb56bde55a37911d253debfa002005
82 Last-Modified: Mon, 20 Oct 2014 23:55:44 GMT
83 }.
84 System.Net Information: 0 : [5156] ConnectStream#10452251::ConnectStream(Buffered 0 bytes.)
85 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#10452251
86 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with HttpWebResponse#24816868
87 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::GetResponse() -> HttpWebResponse#24816868
88 System.Net Verbose: 0 : [5156] HttpWebResponse#24816868::GetResponseStream()
89 System.Net Information: 0 : [5156] ContentLength=5378129
90 System.Net Verbose: 0 : [5156] Exiting HttpWebResponse#24816868::GetResponseStream() -> ConnectStream#10452251
91 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Read()
92 System.Net Verbose: 0 : [5156] Exiting ConnectStream#10452251::Read() -> 0#0
93 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Close()
If you see it right away, contact me, I would like to hire you on to the team. For those of you that don’t immediately see it, tune your eyes to line 25 in the output.
Here’s what is going on. The .NET client is expecting to reuse the TCP socket and when it doesn’t get an answer, it’s waiting for 10 seconds and then establishes a new HTTPS session. Once we had this isolated, we patched the .NET code to close the connection after retrieving the file and tear down the TCP session. There could be a lot of discussion about what sessions and HTTP mean at the TCP level, but we could not adjust the Repono side and we could adjust our .NET side, so that is the approach we used.
This correction is very important, but it doesn’t easily answer what changed? Through a lot more work in the troubleshooting domain, we noticed that the load balancer had changed behavior specifically around long running sessions. Patching the load balancer and our .NET code has put us in a better place.