Thursday, September 24, 2009

16. Understanding TCP Incast Throughput Collapse in Datacenter Networks

The unique workloads and fast connection low latency characteristics of data centers somewhat differs from what TCP was designed for in the first place. The infamous TCP incast collapse is a major problem in modern data centers. This paper analyzes the dynamics of incast and provides experiment results among different solutions.

The authors approach the incast from three perspectives,
  1. it is not limited to particular network environments
  2. causes of the problem should be identified and symptoms should be predictable
  3. modifications to TCP stack to solve it
Method: synchronized read requests 100 blocks of data from a set of 1-48 storage servers

Things that do NOT help: TCP Reno, New Reno, SACK

Things do:
  • reducing min value of RTO (but randomizing the minimum and initial RTO doesn't work, which is against intuition),
  • fine-grained OS timers for sub-milisecond RTO,
  • randomizing RTO,
  • disabling TCP delayed ACKs when possible.

Comment: I like the way they search out all current solutions, but where are the analysis results from all these different working methods to TCP incast problem?

No comments:

Post a Comment