Saturday, September 19, 2009

14. VL2: A Scalable and Flexible Data Center Network

Today's data centers prevent high utilization, according to the authors, by several ways:
  • Not providing enough capacity between the servers.
  • Congestion and high resource utilization occurs in some of the servers while others are rather lightly-loaded.
  • When one service has a traffic load, all other hosts in the same tree/branch suffer collateral damage.
  • Lack of easy migration of VLANs.
By measuring network traffic, flow distribution, analyzing traffic matrix and failure characteristics, Virtual Layer 2 (VL2) is introduced to address the above issues. Design:
  • Valiant Load Balancing to do random traffic spreading over multiple paths
  • Based on IP
  • Ability to host any service on any server, separating server names from locators.
  • Replaces ARP queries to VL2 directory
Comments:
  1. The last 2 papers are both IP-based to achieve backward compatibility, and replaces ARP broadcasts with queries to a specific server. Seems very promising in today's economy: optimal utilization with existing hardware.
  2. It's good to see a paper addressing the 144-ports switches and 24-ports from the cost-effective perspective. It's very practical and yet hardly mentioned in "research", which kind of reminds me when I had my master thesis proposal, my advisors laughed at the idea of my mentioning how much can be saved if the technique could be implemented. I guess after working for Microsoft Research, the authors live in a more practical world than most of us bookworms, haha.

13. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

With the rapid growth of the Internet, modern computer trends are data centers with multi-core processors and end-host virtualization according to the paper. In view of the shortcomings of SEATTLE and TRILL, the authors proposed PortLand in order to make a plug-and-play-friendly environment for data center system administrators.

PortLand deals with the following issues successfully:
  • Virtual Machines (VM) may migrate from one physical machine to another, and they don't need to change their IP addresses
  • No switch configuration needed before deployment.
  • Any end host can communicate with other hosts in the data center along any physical path.
  • No forwarding loops (as SEATTLE would have when the number of hosts grow).
  • Rapid failure detection.
Design of PortLand: layer 2 routing, forwarding and addressing for data center
  • Fabric manager: centralized network topology, assisting ARP
  • Positional Pseudo MAC Address (PMAC): provides location of the host in the topology
  • Proxy-based ARP: intercepts IP2MAC and forwards to fabric manager, avoiding broadcast storm
  • Distributed location discovery: applying location discovery protocol (LDP), so positioned can be changed without manual overriding.
  • Provably loop free forwarding: after LDP, switches update their forwarding tables.
  • Fault tolerant routing: switches informs fabric manager; fabric manager updates with the new info; fabric manager informs all affected switches
Comment: The idea of a plug-and-play data center network sounds great, but their simulation seemed to be performed locally. Is there any way it can also be used on 'cloud computing'?

Wednesday, September 16, 2009

12. Detailed Diagnosis in Enterprise Networks

From experience in small enterprise networks, the authors developed a diagnosic system scalable to large networks by analysing joint behavior of two components in the past and estimate the impact of current events.
  • existing diagnostic systems: lack detail, require extensive knowledge, sacrifice details
  • NetMedic:
    • framing detailed dignosis as an inference problem
    • estimate when two nodes in the network are impacting each other without knowing how they interact
    • captures state of network using many variables
    • application unrelated
By anaylizing logs of small enterprises for a month with 450,000 entries, a classification of the problems is established. Component states were captured and dependancy was generated to determine which impacts which. Later it is implemented to large enterprise networks and it works successfully and identifies 80% on their top 10 list.

Comment: I remember a couple years ago, NetMedic was THE hip network diagnosis software like Norton SystemWorks. Are they still popular these days? It's interesting to see they "trained" the software with small ethernet network and implements it on large networks.

Tuesday, September 15, 2009

11. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

The authors try to build a protocol that integrates the scalability of IP and simplicity of Ethernet.

  • Advantages of ethernet: persistent MAC addresses, bridged automatically to build routing table (ARP, DHCP); plug-and-play
  • Disadvantages of ethernet: initial setup includes broadcasting, consumes resources, security and privacy issues
  • Pros of IP: shortest-path routing
  • Cons of IP: subnetting wastes address space; Virtual LANs (VLANs) inefficient for large spanning trees
SEATTLE (Scalable Ethernet Architecture for Large Enterprises):
  • one-hope, network layer DHT: stores the location of each end-host, distributed directory
  • traffic-drive location resolution and caching: routers cache responses to queries; includes location info on ARP replies
  • scalable, prompt cache-update protocol: based on unicast, instead of broadcast/timeout to update cache
Simulation: measures on cache eviction timeout, forwarding table size, path stability, switch failure rate, and host mobility rate.

Comment: In view of the high human-error rate of the previous reading "BPG Misconfiguration," SEATTLE's idea about making the network "plug-and-play-able" seems to be very convenient and fool-proof. While at the same time, it provides network administrators to customize network transport. Since this is a 2008 paper, I wonder how it works in the real world?