All IT pros need to understand TCP windowing

High-bandwidth replication over long distances, whether to a hot site or the cloud, requires a solid grasp of TCP to steer clear of bottlenecks

Following Hurricane Sandy, let's say you've been asked to set up replication to a disaster recovery site. Your company has chosen to back up its core operations located in Boston with space in a collocation center in Chicago -- about a thousand miles away. You've done the math and determined that you'll need a 500Mbps circuit to handle the amount of data necessary to replicate and maintain recovery-point SLAs.

As you get your Chicago site and connectivity lit up, you decide to test out your connection. First, a ping shows that you're getting a roundtrip time of 25ms -- not horrible for such a long link (at least 11ms of which is simple light-lag). Next, you decide to make sure you're getting the bandwidth you're paying for. You fire up your laptop and FTP a large file to a Windows 2003 management server on the other side of the link. As soon as the transfer finishes, you know something's wrong -- your massive 500Mbps link is pushing about 21Mbps.

Do you know what's wrong with this picture? If not, keep reading because this problem has probably affected you before without your realizing it. If you decide to move to the cloud or implement this kind of replication, it's likely to strike again.

First, understand that the answer is related to Transmission Control Protocol (TCP), one of the two main IPs that most applications use to communicate over the Internet. (The other is User Datagram Protocol, or UDP.) What matters here is that TCP has built-in congestion and packet-loss detection capabilities whereas UDP does not.

That detection makes TCP a great choice when you need to transfer data in a reliable, ordered fashion; UDP is a good choice when you have very small amounts of data to send and you don't care precisely what order it's received in or whether some is lost in transit (or have other application-layer means of dealing with these events). TCP is used for protocols like HTTP, FTP, most kinds of IP-based SAN replication, and Windows file sharing (SMB), while UDP is commonly used for DNS, VoIP, and some remote-display protocols like PCoIP.

TCP's reliability introduces throughput limitations

TCP ensures that no data is lost by building a stateful connection from the client to the server. Whenever data is sent from one to the other, the receiving station acknowledges that the data has been received. This allows TCP to detect that a packet has been lost -- ensuring that the sending side knows to resend it.

This is great from a reliability standpoint, but it presents a potential performance problem: If the sending station has to wait for the receiving station to acknowledge every packet it sends, performance could be dramatically reduced. In my Boston-Chicago example, the laptop would have to wait 25ms every time it sent a packet with a 1,460-byte payload -- resulting in a throughput of only about 4.6Mbps.

TCP windowing is the answer

Fortunately, TCP has a way to work around this problem. Instead of sending an acknowledgement every time a packet is received, the receiving station sends an acknowledgement for each collection of packets when their sizes add up to a limit called the TCP window. If the TCP window were set at 64KB, the sending station could send up to 64KB worth of packets without receiving an acknowledgement from the receiving station -- reducing the slowdown caused by the packet acknowledgments.

1 2 Page 1
Page 1 of 2