Thursday, September 24, 2009

16. Understanding TCP Incast Throughput Collapse in Datacenter Networks

The unique workloads and fast connection low latency characteristics of data centers somewhat differs from what TCP was designed for in the first place. The infamous TCP incast collapse is a major problem in modern data centers. This paper analyzes the dynamics of incast and provides experiment results among different solutions.

The authors approach the incast from three perspectives,
  1. it is not limited to particular network environments
  2. causes of the problem should be identified and symptoms should be predictable
  3. modifications to TCP stack to solve it
Method: synchronized read requests 100 blocks of data from a set of 1-48 storage servers

Things that do NOT help: TCP Reno, New Reno, SACK

Things do:
  • reducing min value of RTO (but randomizing the minimum and initial RTO doesn't work, which is against intuition),
  • fine-grained OS timers for sub-milisecond RTO,
  • randomizing RTO,
  • disabling TCP delayed ACKs when possible.

Comment: I like the way they search out all current solutions, but where are the analysis results from all these different working methods to TCP incast problem?

Wednesday, September 23, 2009

15. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

This paper deals with the TCP incast problem when inbound data overflows the switch buffers by using high resolution timers for TCP time-outs. When there are multiple end-node sending out queries, the client side cannot make forward progress until they receive responses from every server, this is referred to barrier-synchronized requests in the article.

TCP incast collapse:
  • high-bandwidth, low-latency networks with small buffers in the switches
  • clients issue barrier-sync requests in parallel
  • servers respond with small amount of data per request

By enabling microsecond-granularity retransmission time-outs (RTO), the authors intended to solve the incast problem commonly seen in data centers,
  1. modify TCP implementation to use high-resolution kernel timers, timeout = 2^backoff (RTO+ rand(0.5) * RTO )
  2. prevent TCP incast collapse for up to 47 concurrent senders
  3. recover in data centers do not affect performance
Comment: This is a different approach to solve the problems in data centers. The other papers we read strived for backward-compatibility, while this one modifies the TCP stack. Since data centers are usually confined to a fixed location and managed by normally one company, I guess it is a feasible way to solve TCP incast collapse.