Part of our application retrieves files from an internal storage system that behaves similarly to AWS S3. The system has four heads and is fronted by a network load balancer. Clients of the storage system, named Repono, retrieve authentication tickets from our vending system. Repono then does an authorization look up and if everything is OK, the resource (aka “the file”) is handed out over HTTPS. Normally, our performance of this system is really fantastic and looks like this chart:
About a month ago, the performance of this system really changed. We went from routine performance under 1 second to sporadic responses clocking in over 10 seconds.
This problem did not reveal itself easily. The major client of Repono was recently upgraded to a newer version of Microsoft .NET. One step we took was to revert the change and downgrade the version of .NET. For around 20 to 30 minutes, the performance was looking good, and then it reverted to the lumpy 10 second performance.
What else could have changed? This was not easy to see. The HTTPS traffic did not allow easy inspection and alternative testing with our go to baseline client, cURL could not reproduce the results. The problem was clearly affecting production and we couldn’t easily change out our middleware tools in .NET for a set of shell based testing scripts using cURL.
We needed to see inside the HTTP and TCP conversation of the .NET HTTP client, but the normal amount of debugging and logging did not record the details.
Thanks to the big’ol Internet, we stumbled on this helpful article from Mike Hadlow to get to the level of information we wanted out of .NET and the HTTP client.The only problem is that this logging doesn’t include time stamps and there was too much traffic to keep up and detangle out of the audit logs. Adjusting the load balancer, we could force most of the traffic through patched software with new logging. This greatly reduced the volume of data and we tried to watch in real time. This is what we saw
1 System.Net Verbose: 0 : [5156] ConnectStream#43140910::Close() 2 System.Net Verbose: 0 : [5156] Exiting ConnectStream#43140910::Close() 3 System.Net Verbose: 0 : [5156] Exiting MyWebClient#1109663::DownloadData() -> Byte[]#50053605 4 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164) 5 System.Net Verbose: 0 : [5156] MyWebClientWithHead#30034512::DownloadData(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164) 6 System.Net Verbose: 0 : [5156] WebRequest::Create(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED) 7 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::HttpWebRequest(https://REDACTED/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED#1553812164) 8 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::HttpWebRequest() 9 System.Net Verbose: 0 : [5156] Exiting WebRequest::Create() -> HttpWebRequest#45035759 10 System.Net Verbose: 0 : [5156] HttpWebRequest#45035759::GetResponse() 11 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466 12 System.Net Information: 0 : [5156] Associating Connection#31630514 with HttpWebRequest#45035759 13 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#66385004 14 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1 15 System.Net Information: 0 : [5156] ConnectStream#66385004 - Sending headers 16 { 17 x-auth-user: REDACTED 18 x-auth-expiry: 2015-08-16T01:22:43Z 19 x-auth-ticket: REDACTED 20 Accept: text/xml 21 Content-Type: text/xml 22 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED 23 Host: REDACTED 24 }. 25 System.Net Error: 0 : [5156] Exception in the HttpWebRequest#45035759:: - The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. 26 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ServicePoint#17057466 27 System.Net Information: 0 : [5156] Associating Connection#14267127 with HttpWebRequest#45035759 28 System.Net Information: 0 : [5156] Connection#14267127 - Created connection from REDACTED:56478 to REDACTED:443. 29 System.Net Information: 0 : [5156] TlsStream#34867337::.ctor(host=REDACTED, #certs=0) 30 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#62287651 31 System.Net Information: 0 : [5156] HttpWebRequest#45035759 - Request: HEAD /v1/repono/REDACTED/REDACTED/n/1/6/REDACTED/REDACTED HTTP/1.1 32 System.Net Information: 0 : [5156] SecureChannel#50704654::.ctor(hostname=REDACTED, #clientCertificates=0, encryptionPolicy=RequireEncryption) 33 System.Net Information: 0 : [5156] SecureChannel#50704654 - Left with 0 client certificates to choose from. 34 System.Net Information: 0 : [5156] Using the cached credential handle. 35 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = (null), targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 36 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffer length=0, Out-Buffer length=159, returned code=ContinueNeeded). 37 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 38 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded). 39 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 40 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded). 41 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 42 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded). 43 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 44 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=134, returned code=ContinueNeeded). 45 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 46 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=ContinueNeeded). 47 System.Net Information: 0 : [5156] InitializeSecurityContext(credential = System.Net.SafeFreeCredential_SECURITY, context = 2496950:2dabe60, targetName = REDACTED, inFlags = ReplayDetect, SequenceDetect, Confidentiality, AllocateMemory, InitManualCredValidation) 48 System.Net Information: 0 : [5156] InitializeSecurityContext(In-Buffers count=2, Out-Buffer length=0, returned code=OK). 49 System.Net Information: 0 : [5156] Remote certificate: [Version] 50 V3 51 REDACTED 52 [Signature Algorithm] 53 sha256RSA(1.2.840.113549.1.1.11) 54 [Public Key] 55 Algorithm: RSA 56 Length: 2048 57 Key Blob: REDACTED 58 System.Net Information: 0 : [5156] ProcessAuthentication(Protocol=Tls, Cipher=Aes128 128 bit strength, Hash=Sha1 160 bit strength, Key Exchange=44550 256 bit strength). 59 System.Net Information: 0 : [5156] ConnectStream#62287651 - Sending headers 60 { 61 x-auth-user: REDACTED 62 x-auth-expiry: 2015-08-16T01:22:43Z 63 x-auth-salt: REDACTED 64 x-auth-ticket: REDACTED 65 x-auth-dev-ticket: REDACTED 66 Accept: text/xml 67 Content-Type: text/xml 68 User-Agent: REDACTED/version-2.4501/GetReponoAuthURL/gaggleid:REDACTED 69 Host: REDACTED 70 }. 71 System.Net Information: 0 : [5156] Connection#14267127 - Received status line: Version=1.1, StatusCode=200, StatusDescription=OK. 72 System.Net Information: 0 : [5156] Connection#14267127 - Received headers 73 { 74 X-Timestamp: 1413849343.99091 75 X-Preauth-Uri: /repono/3dfc19b30213a7fd7297e1fb32815b95d15e7187a91e13c84b1c423c/REDACTED?token=PREA_49cd6e580ce3460948610e05e1dba031adf5bc19b098a2a98226cf4a 76 X-Trans-Id: tx72179a9abf4a4d7e8cc65-0055cfe147 77 Accept-Ranges: bytes 78 Content-Length: 5378129 79 Content-Type: application/pdf 80 Date: Sun, 16 Aug 2015 01:03:03 GMT 81 ETag: 40fb56bde55a37911d253debfa002005 82 Last-Modified: Mon, 20 Oct 2014 23:55:44 GMT 83 }. 84 System.Net Information: 0 : [5156] ConnectStream#10452251::ConnectStream(Buffered 0 bytes.) 85 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with ConnectStream#10452251 86 System.Net Information: 0 : [5156] Associating HttpWebRequest#45035759 with HttpWebResponse#24816868 87 System.Net Verbose: 0 : [5156] Exiting HttpWebRequest#45035759::GetResponse() -> HttpWebResponse#24816868 88 System.Net Verbose: 0 : [5156] HttpWebResponse#24816868::GetResponseStream() 89 System.Net Information: 0 : [5156] ContentLength=5378129 90 System.Net Verbose: 0 : [5156] Exiting HttpWebResponse#24816868::GetResponseStream() -> ConnectStream#10452251 91 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Read() 92 System.Net Verbose: 0 : [5156] Exiting ConnectStream#10452251::Read() -> 0#0 93 System.Net Verbose: 0 : [5156] ConnectStream#10452251::Close()
If you see it right away, contact me, I would like to hire you on to the team. For those of you that don’t immediately see it, tune your eyes to line 25 in the output.
Here’s what is going on. The .NET client is expecting to reuse the TCP socket and when it doesn’t get an answer, it’s waiting for 10 seconds and then establishes a new HTTPS session. Once we had this isolated, we patched the .NET code to close the connection after retrieving the file and tear down the TCP session. There could be a lot of discussion about what sessions and HTTP mean at the TCP level, but we could not adjust the Repono side and we could adjust our .NET side, so that is the approach we used.
This correction is very important, but it doesn’t easily answer what changed? Through a lot more work in the troubleshooting domain, we noticed that the load balancer had changed behavior specifically around long running sessions. Patching the load balancer and our .NET code has put us in a better place.