The Not So Small Stuff

Part of our application retrieves files from an internal storage system that behaves similarly to AWS S3. The system has four heads and is fronted by a network load balancer. Clients of the storage system, named Repono, retrieve authentication tickets from our vending system.  Repono then does an authorization look up and if everything is OK, the resource (aka “the file”) is handed out over HTTPS. Normally, our performance of this system is really fantastic and looks like this chart:

Response Time

Response time as recorded by our internal monitoring system

About a month ago, the performance of this system really changed. We went from routine performance under 1 second to sporadic responses clocking in over 10 seconds.

Uh Oh!

A different monitoring system using ELK measuring the response time of our middleware to Repono

This problem did not reveal itself easily. The major client of Repono was recently upgraded to a newer version of Microsoft .NET. One step we took was to revert the change and downgrade the version of .NET. For around 20 to 30 minutes, the performance was looking good, and then it reverted to the lumpy 10 second performance.

What else could have changed? This was not easy to see. The HTTPS traffic did not allow easy inspection and alternative testing with our go to baseline client, cURL could not reproduce the results. The problem was clearly affecting production and we couldn’t easily change out our middleware tools in .NET for a set of shell based testing scripts using cURL.

We needed to see inside the HTTP and TCP conversation of the .NET HTTP client, but the normal amount of debugging and logging did not record the details.

Thanks to the big’ol Internet, we stumbled on this helpful article from Mike Hadlow to get to the level of information we wanted out of .NET and the HTTP client.The only problem is that this logging doesn’t include time stamps and there was too much traffic to keep up and detangle out of the audit logs. Adjusting the load balancer, we could force most of the traffic through patched software with new logging. This greatly reduced the volume of data and we tried to watch in real time. This is what we saw

 1 System.Net Verbose: 0 : [5156] ConnectStream#43140910::Close()
 2 System.Net Verbose: 0 : [5156] Exiting ConnectStream#43140910::Close()
 3 System.Net Verbose: 0 : [5156] Exiting MyWebClient#1109663::DownloadData() -> Byte[]#50053605
 4 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
 5 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
 6 System.Net Verbose: 0 : [5156] WebRequest::Create(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED)
 7 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::HttpWebRequest(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164)
 8 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::HttpWebRequest()
 9 System.Net Verbose: 0 : [5156] Exiting WebRequest::Create() -> HttpWebRequest#45035759
 10 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::GetResponse()
 11 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466
 12 System.Net Information: 0 : [5156] Associating Connection#31630514 with HttpWebRequest#45035759
 13 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#66385004
 14 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1
 
 15 System.Net Information: 0 : [5156] ConnectStream#66385004 - Sending headers
 16 {
 17 x-auth-user: REDACTED
 18 x-auth-expiry: 2015-08-16T01:22:43Z
 19 x-auth-ticket: REDACTED
 20 Accept: text/xml
 21 Content-Type: text/xml
 22 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED
 23 Host: REDACTED
 24 }.
 25 System.Net Error: 0 : [5156] Exception in the HttpWebRequest#45035759:: - The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
 26 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466
 27 System.Net Information: 0 : [5156] Associating Connection#14267127 with HttpWebRequest#45035759
 28 System.Net Information: 0 : [5156] Connection#14267127 - Created connection from REDACTED:56478 to REDACTED:443.
 29 System.Net Information: 0 : [5156] TlsStream#34867337::.ctor(host=REDACTED, #certs=0)
 30 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#62287651
 31 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1
 32 System.Net Information: 0 : [5156] SecureChannel#50704654::.ctor(hostname=REDACTED, #clientCertificates=0, encryptionPolicy=RequireEncryption)
 33 System.Net Information: 0 : [5156] SecureChannel#50704654 - Left with 0 client certificates to choose from.
 34 System.Net Information: 0 : [5156] Using the cached credential handle.
 35 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = (null), targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 36 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffer length=0, Out-Buffer length=159, returned code=ContinueNeeded).
 37 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 38 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
 39 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 40 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
 41 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 42 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
 43 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 44 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=134, returned code=ContinueNeeded).
 45 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 46 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded).
 47 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation)
 48 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=OK).
 49 System.Net Information: 0 : [5156] Remote certificate: [Version]
 50 V3
 51 REDACTED
 
 52 [Signature Algorithm]
 53 sha256RSA(1.2.840.113549.1.1.11)
 
 54 [Public Key]
 55 Algorithm: RSA
 56 Length: 2048
 57 Key Blob: REDACTED
 58 System.Net Information: 0 : [5156] ProcessAuthentication(Protocol=Tls, Cipher=Aes128 128 bit strength, Hash=Sha1 160 bit strength, Key Exchange=44550 256 bit strength).
 
 59 System.Net Information: 0 : [5156] ConnectStream#62287651 - Sending headers
 60 {
 61 x-auth-user: REDACTED
 62 x-auth-expiry: 2015-08-16T01:22:43Z
 63 x-auth-salt: REDACTED
 64 x-auth-ticket: REDACTED
 65 x-auth-dev-ticket: REDACTED
 66 Accept: text/xml
 67 Content-Type: text/xml
 68 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED
 69 Host: REDACTED
 70 }.
 71 System.Net Information: 0 : [5156] Connection#14267127 - Received status line: Version=1.1, StatusCode=200, StatusDescription=OK.
 72 System.Net Information: 0 : [5156] Connection#14267127 - Received headers
 73 {
 74 X-Timestamp: 1413849343.99091
 75 X-Preauth-Uri: /repono/3dfc19b30213a7fd7297e1fb32815b95d15e7187a91e13c84b1c423c/REDACTED?token=PREA_49cd6e580ce3460948610e05e1dba031adf5bc19b098a2a98226cf4a
 76 X-Trans-Id: tx72179a9abf4a4d7e8cc65-0055cfe147
 77 Accept-Ranges: bytes
 78 Content-Length: 5378129
 79 Content-Type: application/pdf
 80 Date: Sun, 16 Aug 2015 01:03:03 GMT
 81 ETag: 40fb56bde55a37911d253debfa002005
 82 Last-Modified: Mon, 20 Oct 2014 23:55:44 GMT
 83 }.
 84 System.Net Information: 0 : [5156] ConnectStream#10452251::ConnectStream(Buffered 0 bytes.)
 85 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#10452251
 86 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with HttpWebResponse#24816868
 87 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::GetResponse() -> HttpWebResponse#24816868
 88 System.Net Verbose: 0 : [5156] HttpWebResponse#24816868::GetResponseStream()
 89 System.Net Information: 0 : [5156] ContentLength=5378129
 90 System.Net Verbose: 0 : [5156] Exiting HttpWebResponse#24816868::GetResponseStream() -> ConnectStream#10452251
 91 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Read()
 92 System.Net Verbose: 0 : [5156] Exiting ConnectStream#10452251::Read() -> 0#0
 93 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Close()

If you see it right away, contact me, I would like to hire you on to the team. For those of you that don’t immediately see it, tune your eyes to line 25 in the output.

Here’s what is going on. The .NET client is expecting to reuse the TCP socket and when it doesn’t get an answer, it’s waiting for 10 seconds and then establishes a new HTTPS session. Once we had this isolated, we patched the .NET code to close the connection after retrieving the file and tear down the TCP session. There could be a lot of discussion about what sessions and HTTP mean at the TCP level, but we could not adjust the Repono side and we could adjust our .NET side, so that is the approach we used.

This correction is very important, but it doesn’t easily answer what changed? Through a lot more work in the troubleshooting domain, we noticed that the load balancer had changed behavior specifically around long running sessions.  Patching the load balancer and our .NET code has put us in a better place.