Wednesday, October 7, 2009

22. A Performance Comparison of Multi-Hop Wireless Ad hoc Network Routing Protocols

This paper describes a wireless networks without fixed access points. This point-to-point connection is commonly refered to as ad hoc network. Four design choices were compared, namelyu, DSDV, TORA, DSR, and AODV. An extended version of ns-2 network simulator was used to test these different designs.

Each node in this infrastructureless network is linked by a single physical channel, and the power level where the packet was received is compared to two values,

sent to MAC layer
------------------ value 1) carrier sense threshould
packet error
------------------ value 2) receive threshould
noise

Distributed Coordination Function (DCF) is implemented to model the medium access control. it is similar to MACA and MACAW which use physical carrier sense and virtual carrier sense to reduce collisions.

Four protocols compared through topology change, see how packets react to it:
  • Destination-Sequenced Distance Vector (DSDV): hop-by-hop distance vector. All nodes broadcast routing updates.
  • Temporally-Ordered Routing Algorithm (TORA): "link reversal" discover routes on demind, sometimes longer routes are used to avoid discovering new routes.
  • Dynamic Source Routing (DSR): complete list of nodes it passes through
  • Ad Hoc On-Demand Distance Vector (AODV): combination of DSR and DSDV. combines 1) DSR: route discovery and route maintenance; 2)DSDV: hop-by-hop routing, sequence #s, perodic beacons

21. A High-Throughput Path Metric for Multi-Hop Wireless Routing

This paper is trying to solve the connection problems on multi-hop wireless networks like the previous paper on Rooftop network. But instead of having the antennae on the rooftops, this time it is placed in a university building with 29 nodes totally. The authors didn't really mention how they chose the locations of their hotspots, however. And they put it clear that mobility of these wireless access points are not put into consideration.

The factor they are trying to change in the 802.11b network this time, is the hop counter for routing protocols. It uses an Expected Transmission Count (ETX) which incorporates the ideas of:
  • masking transmission errors, so that all but the worst connections appear loss-free
  • finding a path with the fewest expected number of (re)transmissions
  • predicting packet loss ratios in both directions of each link, so asymmetrical link speeds are taken into consideration
  • penalizing interference between hops on multi-hop network
ETX = 1 / ( forward delivery ratio * reverse radio ), assumptions:
  1. for protocols with link-layer retransmissions, such as 802.11b
  2. fixed transmit power for the wireless
The fact that ETX does not avoid congested links makes it free from load-adaptive link delays. It also prefers paths less than 3 hops even when 4+ hops have better throughput, and avoiding wireless interference.

Wednesday, September 30, 2009

20. Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This paper describes an experimental 37-nodes wireless network in Cambridge, Massachusetts. Users volunteer to install an antenna on their roofs and either connect it to a home broadband connection or not. Average throughput of the Rooftop network is 627 Kbps with averaging 3 hops.

Roofnet uses a unique addressing system on top of IP addresses. A special routing protocol, Srcr, was also introduced to automatically update the routing table among different wireless access points. Since most of the links in between are rather weak, the routing must find an optimum throughput itself. This is called Estimated Transmission Time (ETT).

Roofnet also uses its own algorithm, SampleRate, to determine the bit rates of 1, 2, 5.5, 11Mbps in 802.11b network. The decision is based on actual data speed rather than periodic probes.

Comments:
  • The beauty of this system is that, you don't need elaborate planning ahead where to put the wireless hops. The antenna is omni-directional, and software is self-configured.
  • I'm not sure if this is ever popular in the States, but there's this thing called FON, which basically gives you an almost-free wireless AP if you agree to share your existing Internet access, whether charging your users or not. And when you are out of your own FON AP, you can access others' depending on your sharing mode. This is/was very, very, very popular in Spain. Would it be possible to apply similar techniques to the FON firmware and give it a try?

19. Modeling Wireless Links for Transport Protocols

This paper provides a lot of background knowledge of wireless links and reviews several protocol solutions. As stated in previous papers, wireless networks have different bandwidth, lossy link, channel allocation delays, and asymmetrical data flows. Ad-hoc and sensor wireless networks are not discussed here, and the focus is mainly in wireless unicast transport protocols.

The main issues why TCP doesn't work well in wireless links are listed as follows,
  • packet loss is considered as network congestion, lossy link isn't implemented in a separate packet header
  • high round-trip time (RTT)
But in order to achieve interoperability between wireless and LANs, it is still beneficial to use the same TCP stack. Some error assumptions can be made on wireless links. For example, in poor GSM connections, timeouts should be rather common if running on TCP, but real world simulations show otherwise because the delay pattern is more of jitter in packet arrivals, resulting in higher values of TCP retransmit timer.

Several models are suggested so that researchers can simulate more accurately,
  • Error losses and corruption: dropping packets probabilistically on a time-based loss, or by per-packet, or per-bit
  • Delay variation: by suspending data transmission.
  • Packet reordering: swapping two packets in the queue at a given time
  • On-demand resource allocation: giving and additional delay when packets arrive at queue
  • Bandwidth variation: changing bandwidth. e.g. CDMA2000 0.02-20s / 9.6~163Kbps.
  • Asymmetry in bandwidth and latency: configuring uplink/downlink with different parameters for bandwidth and latency.
  • Queue: WLAN, drop-tail; satellite, RED. (large buffer causes high queuing delay and poor loss recovery; small low link utilization and frequent packet drops)
Comment: This paper provides abundant background knowledge of wireless network and ways of simulating the iffy link environment. It is also very interesting to know that handovers happen once per minute while driving in a urban area.

Monday, September 28, 2009

18. A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

In TCP protocol, packet drops are mostly considered to be network congestion. But in wireless links, they could result from iffy link conditions. TCP doesn't tell the difference between these, and drops transmission window size before retransmitting packets, and the link speed will be going from bad to worse.

Two approaches:
  • hide non-congestion losses from TCP sender, making lossy link appears as high quality link with a reduced bandwidth. AIRMAIL, indirect-TCP, snoop protocols
  • make sender aware of wireless hops and packet losses are not because of congestion, good bandwidth
Several schemes were proposed and are compared in this paper, including
  • end-to-end retransmissions: selective ACKs (SACKs), Explicit Loss Notifications (ELN)
  • split-TCP connections: hide wireless link from sender
  • link-layer proposals: hide link losses from TCP sender by local retransmissions and forward error correction on wireless links
Benchmark was measured by throughput and goodput for wired and wireless link.

Comment: Would it be possible to apply XCP on wireless link because it has an extra field of congestion level, so that packet drops of lossy link can be distinguished from congestion?

17. MACAW: A Media Access Protocol for Wireless LAN's

Wireless network is a shared environment and resource is limited. WLAN are either token-based or multiple access. The protocols discussed in this paper focus on multiple access, including Carrier Sense Media Access (CSMA), Multiple Access Collision Avoidance (MACA), and their own modification of MACA, called MACAW.

CSMA:
  • avoid collisions by testing the signal strength nearby
  • collisions occur at receiver side, not sender
MACA:
  • A Request-to-Send RTS) packet is sent before the transmission starts, and receiver respondes with Clear-to-Send (CTS) packet.
  • Upon receiving a(n) RTS/CTS, all stations defer according to the proposed data transmission length included in the packet.
  • avoids collisions at receiver, not sender
  • Binary exponential backoff (BEB): backoff time is doubled after every collision, and reduced to min after every successful RTS-CTS exchange (unfair)
MACAW:
  • add a backoff counter in the packet header to achieve fairness
  • gentle adjustment when there's a collision (1.5x and -1, Multiplicative Increase and Linear Decrease, MILD)
  • multiple stream model to be fair among uplinks and downlinks
  • basic message exchange (802.11)
Comment: The idea of adding an ACK in RTS-CTS is great, and their high throughput / fair allocation lives up to the trend of today's network. I just wished they had more real world experiments to proove that MACAW is, indeed, better than MACA.

Thursday, September 24, 2009

16. Understanding TCP Incast Throughput Collapse in Datacenter Networks

The unique workloads and fast connection low latency characteristics of data centers somewhat differs from what TCP was designed for in the first place. The infamous TCP incast collapse is a major problem in modern data centers. This paper analyzes the dynamics of incast and provides experiment results among different solutions.

The authors approach the incast from three perspectives,
  1. it is not limited to particular network environments
  2. causes of the problem should be identified and symptoms should be predictable
  3. modifications to TCP stack to solve it
Method: synchronized read requests 100 blocks of data from a set of 1-48 storage servers

Things that do NOT help: TCP Reno, New Reno, SACK

Things do:
  • reducing min value of RTO (but randomizing the minimum and initial RTO doesn't work, which is against intuition),
  • fine-grained OS timers for sub-milisecond RTO,
  • randomizing RTO,
  • disabling TCP delayed ACKs when possible.

Comment: I like the way they search out all current solutions, but where are the analysis results from all these different working methods to TCP incast problem?

Wednesday, September 23, 2009

15. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

This paper deals with the TCP incast problem when inbound data overflows the switch buffers by using high resolution timers for TCP time-outs. When there are multiple end-node sending out queries, the client side cannot make forward progress until they receive responses from every server, this is referred to barrier-synchronized requests in the article.

TCP incast collapse:
  • high-bandwidth, low-latency networks with small buffers in the switches
  • clients issue barrier-sync requests in parallel
  • servers respond with small amount of data per request

By enabling microsecond-granularity retransmission time-outs (RTO), the authors intended to solve the incast problem commonly seen in data centers,
  1. modify TCP implementation to use high-resolution kernel timers, timeout = 2^backoff (RTO+ rand(0.5) * RTO )
  2. prevent TCP incast collapse for up to 47 concurrent senders
  3. recover in data centers do not affect performance
Comment: This is a different approach to solve the problems in data centers. The other papers we read strived for backward-compatibility, while this one modifies the TCP stack. Since data centers are usually confined to a fixed location and managed by normally one company, I guess it is a feasible way to solve TCP incast collapse.

Saturday, September 19, 2009

14. VL2: A Scalable and Flexible Data Center Network

Today's data centers prevent high utilization, according to the authors, by several ways:
  • Not providing enough capacity between the servers.
  • Congestion and high resource utilization occurs in some of the servers while others are rather lightly-loaded.
  • When one service has a traffic load, all other hosts in the same tree/branch suffer collateral damage.
  • Lack of easy migration of VLANs.
By measuring network traffic, flow distribution, analyzing traffic matrix and failure characteristics, Virtual Layer 2 (VL2) is introduced to address the above issues. Design:
  • Valiant Load Balancing to do random traffic spreading over multiple paths
  • Based on IP
  • Ability to host any service on any server, separating server names from locators.
  • Replaces ARP queries to VL2 directory
Comments:
  1. The last 2 papers are both IP-based to achieve backward compatibility, and replaces ARP broadcasts with queries to a specific server. Seems very promising in today's economy: optimal utilization with existing hardware.
  2. It's good to see a paper addressing the 144-ports switches and 24-ports from the cost-effective perspective. It's very practical and yet hardly mentioned in "research", which kind of reminds me when I had my master thesis proposal, my advisors laughed at the idea of my mentioning how much can be saved if the technique could be implemented. I guess after working for Microsoft Research, the authors live in a more practical world than most of us bookworms, haha.

13. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

With the rapid growth of the Internet, modern computer trends are data centers with multi-core processors and end-host virtualization according to the paper. In view of the shortcomings of SEATTLE and TRILL, the authors proposed PortLand in order to make a plug-and-play-friendly environment for data center system administrators.

PortLand deals with the following issues successfully:
  • Virtual Machines (VM) may migrate from one physical machine to another, and they don't need to change their IP addresses
  • No switch configuration needed before deployment.
  • Any end host can communicate with other hosts in the data center along any physical path.
  • No forwarding loops (as SEATTLE would have when the number of hosts grow).
  • Rapid failure detection.
Design of PortLand: layer 2 routing, forwarding and addressing for data center
  • Fabric manager: centralized network topology, assisting ARP
  • Positional Pseudo MAC Address (PMAC): provides location of the host in the topology
  • Proxy-based ARP: intercepts IP2MAC and forwards to fabric manager, avoiding broadcast storm
  • Distributed location discovery: applying location discovery protocol (LDP), so positioned can be changed without manual overriding.
  • Provably loop free forwarding: after LDP, switches update their forwarding tables.
  • Fault tolerant routing: switches informs fabric manager; fabric manager updates with the new info; fabric manager informs all affected switches
Comment: The idea of a plug-and-play data center network sounds great, but their simulation seemed to be performed locally. Is there any way it can also be used on 'cloud computing'?

Wednesday, September 16, 2009

12. Detailed Diagnosis in Enterprise Networks

From experience in small enterprise networks, the authors developed a diagnosic system scalable to large networks by analysing joint behavior of two components in the past and estimate the impact of current events.
  • existing diagnostic systems: lack detail, require extensive knowledge, sacrifice details
  • NetMedic:
    • framing detailed dignosis as an inference problem
    • estimate when two nodes in the network are impacting each other without knowing how they interact
    • captures state of network using many variables
    • application unrelated
By anaylizing logs of small enterprises for a month with 450,000 entries, a classification of the problems is established. Component states were captured and dependancy was generated to determine which impacts which. Later it is implemented to large enterprise networks and it works successfully and identifies 80% on their top 10 list.

Comment: I remember a couple years ago, NetMedic was THE hip network diagnosis software like Norton SystemWorks. Are they still popular these days? It's interesting to see they "trained" the software with small ethernet network and implements it on large networks.

Tuesday, September 15, 2009

11. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

The authors try to build a protocol that integrates the scalability of IP and simplicity of Ethernet.

  • Advantages of ethernet: persistent MAC addresses, bridged automatically to build routing table (ARP, DHCP); plug-and-play
  • Disadvantages of ethernet: initial setup includes broadcasting, consumes resources, security and privacy issues
  • Pros of IP: shortest-path routing
  • Cons of IP: subnetting wastes address space; Virtual LANs (VLANs) inefficient for large spanning trees
SEATTLE (Scalable Ethernet Architecture for Large Enterprises):
  • one-hope, network layer DHT: stores the location of each end-host, distributed directory
  • traffic-drive location resolution and caching: routers cache responses to queries; includes location info on ARP replies
  • scalable, prompt cache-update protocol: based on unicast, instead of broadcast/timeout to update cache
Simulation: measures on cache eviction timeout, forwarding table size, path stability, switch failure rate, and host mobility rate.

Comment: In view of the high human-error rate of the previous reading "BPG Misconfiguration," SEATTLE's idea about making the network "plug-and-play-able" seems to be very convenient and fool-proof. While at the same time, it provides network administrators to customize network transport. Since this is a 2008 paper, I wonder how it works in the real world?

Saturday, September 12, 2009

10. Congestion Control for High Bandwidth-Delay Product Networks

This paper proposed a new protocol in replace of TCP. The new protocol, eXplicit Control Protocol (XCP), addressed the issues of TCP by putting a congestion level field called Explicit Congestion Notification Proposal (ECN) in the packet headers. This XCP protocol does not maintain a per-flow state in routers and thus, requires less CPU cycles in busy routers.

TCP doesn't react well with high bandwidth, high latency links such as satellite and wireless. Under such circumstances, TCP tends to be unstable and oscillate. For example, TCP sends one packet per RTT to get more bandwidth, and it can takes thousands of RTTs to make good use of the high speed network.

Robustness to congestion should be independent of number of flows, according to the authors. And packet loss shouldn't be a signal of network congestion. Thus, XCP introduces decoupling utilization control from fairness control, which adjusts its aggressiveness according to the spare bandwidth in the network and the feedback delay. It also has a per-flow control in the protocol, rather than in routers, to achieve fairness. Furthermore, XCP distinguishes error loss pack-dropping from congestion, which is quite useful on wireless connections.

Comments:
  1. I like the fact that in XCP, packet drops caused by congestion are highly uncommon because it separates drops caused by link errors from network congestion. These two factors should be viewed independently in the protocol and acted upon accordingly.
  2. Like every new stuff in our lives, backward compatibility is a big issue. XCP is just like IPv6 where everyone knows about it, but no one seems to be using it. Would it be possible to implement XCP concepts one level up in OSI model in order to be compatible with TCP?

9. Random Early Detection Gateways for Congestion Avoidance

This is yet another algorithm for congestion avoidance. The Random Early Detection (RED) is implemented by computing an average size of the queue, and drops a packet or sets a "congestion bit" in the header by the proportion that connection's share of bandwidth in order to reach a certain fairness for all the end-notes on the network. It is designed to with with TCP or any other protocols with build-in congestion control.

In a traditional manner, a network wouldn't know it is congested until a packet had been dropped, and this could be a problem because the response time would have been greatly increased. The RED algorithm tries to slow down the traffic before it reaches the maximum capacity (cliff) while maintaining a high throughput.

The most effective detecting of congestion lies in the gateway itself, according to this paper, because it can distinguish between transmission delay and queueing delay. RED gateways monitor the average queue size, and either mark the congestion bit or drop a packet to signal sender's connection, depending on whether the packets are marked "essential" or "optional".

RED calculates the average queue size in the gateway by a low-pass filter and an exponential weighted moving average, and compared it to a min and max threshold. If the size is greater than max, all packets going through are marked/dropped; less than min, none. If the value falls between min and max, the probability of marking/dropping a packet is a function of average queue size.

Comment: It makes perfect sense that they drop/tag a packet based on the connection bandwidth share of a specific host. But how do we define a "host"? Is it one TCP connection, one software, one port, or one network interface card?

Wednesday, September 9, 2009

8. Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks

There are several Fair Queueing (FQ) mechanisms in routers, such as the last paper, and Flow Random Early Drop (FRED). These congestion controls require the routers to maintain state and operate on a per-flow basis, and the authors suggest that it could be the complexity of the per packet processing that prevents from wide-range adoption.

Fair Queueing can be very effective for congestion control. But if it is implemented in every router, it could be very complicated and costly. In responde to this concern, they try to distinguish the core routers and edge routers, giving the former no per-flow state as well as FIFO packet scheduling system, while leaving the latter to do complex Fair Queueing. This is called Core-Stateless Fair Queueing (CSFQ) in this paper.

CSFQ uses a distributed algorithm that core routers do not maintain per-flow state but in stead carry the information throught a label in each packet's header generated from the edge routers.
Secondly, FIFO queueing with probabilistic dropping on input was peformed on the core routers. Fluid Model Algorithm was first demostrated for an idealized bit-by-bit version and later extend to a practical packet-by-packet version.

Simulations between CSFQ and other four additional ones, namely, FIFO, RED (Random Early Detection, drops packets long before buffer full state), FRED (Flow Random Early Drop, drops based on flow state), DRR (Deficit Round Robin, weighted FQ) were performed on single congested links, multiple congested links, and coexistence of different adaptation schemes. The result shows that CSFQ achieves relatively fair bandwidth. It's far better than FIFO or RED and comparable to FRED. But FRED quires packet flow classifcation while CSFQ does not.

The last part of this article gives the reasons why fair allocations are important and introduces punishment for malicious end-notes that uses its own TCP stack to gain more priority of the network data flow.

The traditional approach to achieve fair allocations are very complicated to implement at high speeds among all routers, since each router has to analyse current flow and packet header to reach a decision. CSFQ could be more scalable when the number of routers increases.

Comments: It is understandable to utilize CSFQ in Tier-1 peering since it could simply the processing on the routers since they are obviously the "cores". But do we consider Tier-2 and Tier-3 ISPs to be edges or cores?

7. Analysis and Simulation of a Fair Queueing Algorithm

Ccongestion control can be divided in two aspects, 1) source varies the rate sending data, 2) gateway implements routing and queueing algorithms. This paper focuses on the latter.

Queueing algorithms can be evaluated in three quantities: bandwidth, promptness, and buffer space. The most common queueing algorithm used today is first-come-first-servced (FCFS). It simply discards any incoming packets when its buffer is full, and end-nodes could manipulate its implementations to take advantage of the network bandwidth. Furthermore, it lacks promptness for Telnet-like applications.

In order to achieve fairness throughout the network, Nagle proposed a fair queueing (FQ) algorithm that gateways maintain separate queues for each end-node. It lets data flows to take turns transferring packets (round-robin). But the main problem of FQ algorithm is that it implements fairness in the packet number level but doesn't take the packet length in to consideration. Thus, the authors incorporate a buffer allocation policy into the FQ algorithm.

Simulations of FCFS and FQ at the packet level were performed with different flow-control and queueing algorithms. Modified FQ rewards good-willed end-nodes and provides good bandwidth and promptness.

Comments: This paper sort of answers my question in mind when I read the two papers of last session, which factors in malicious end-nodes and smart queueing, instead of simply divides the bottleneck bandwidth into equal amounts for each thru-traffic connections.

Monday, September 7, 2009

6. Congestion Avoidance and Control

The Internet had the first congestion crash back in 1986 between Lawrence Berkeley National Laboratory and U.C. Berkeley. This paper pointed out that the causes is in the transport protocol implementations instead of the protocols themselves, and offered detailed solutions and rationales behind with simple codes to solve the problem.

The idea of conservation of packets was brought for a connection in equilibrium, in which a new packet isn't put into the network until an old one leaves. Therefore, for the packet conservation to fail, there are three possible ways and solutions:
  1. Connection doesn't get to equilibrium: by transitive of equality, if the first packets are sent only in responsive to an ack, the sender's packet spacing time will match on the slowest link in the path, this is also called self clocking. A slow-start algorithm is implemented by sending one single packet to start with and adjust the window according to the time between two ack packets arrive.
  2. Sender injects a new packet before an old one has existed: a good round-trip time estimator is the single most factor of any protocol implementation for a heavily loaded network. The authors developed a method to estimate variations and the resulting re-transimit timer derived from RFC793.
  3. The equilibrium can't be reached because of resource limits along the path: if timers are in good shape, timeout indicates a lost packet and not a broken timer. Packet lost is more likely to result because network is congested and there's insufficient buffer along the path, so a congestion avoidance strategy should be adopted. Multiplicative decrease (cut window in half), addtive increase (increase cwnd) on each ack of new data, and sending min advertised windows cwnd are the three lines of code to avoid congestion.
While at the end nodes could control their own data flows, there should be some congestion detection on the gateway side to insure fair sharing of the network capacity before reaching its maximum capacity.


Comment: It is good to read a self-explanatory paper that gives you all the detailed explanations behind each steps, while still keeping the code simple. Cal Rocks! haha. Seriously though, it wasn't easy to identy the problems that caused network congestion, particularly from the protocol implementation perspective. And I really liked the figures which include packet size, timing and bandwidth, all in a 2D diagram.

Saturday, September 5, 2009

5. Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

This paper reviews the algorithms used in avoiding network congestions and offers several suggestions throughout the article.

As fiber optics and gigabit local area networks are getting more popular these days, the link speed could vary from several giga bits-per-second (bps) to 56K bps dial-ups. These various types of networks co-exist in the network, and there has to be some sort of flow/congestion control to achieve maximum possible speed while keeping collision / dropped packets to the minimum.

Traditionally, congestion control schemes kick in when congestion occurs near it's max possible throughput (cliff), but the response time which starts to increase significantly while throughput remains about the same (knee), occurs before cliff. In this paper, this knee-working-range is called congestion avoidance, as opposed to congestion control that keeps the network working before cliff.

Congestion avoidance utilizes the same increase/decrease algorithm that congestion control uses. There's a binary feedback signal that simply tells if the network transport is busy (1), or not busy (0) in the QoS header. The abstract model assumes that all users on the network receive this information and act on it accordingly.

User data flow demands can either flow to centralized decision-making managers or to decentralized locations, which is this research is based on. The successfulness of congestion avoidance is measured in four aspects,
  • efficiency: the closer the total load to the knee, the more efficient
  • fairness: maxmin fairness criterion, bottleneck sharing within the same group
  • distributedness: awareness of the congestion on the user end
  • convergence: time taken to reach a relatively stable state (knee)
Congestion avoidance is achieved through network monitoring and asking users to increase or decrease their load appropriately by a binary system. The decrease of data flow must be multiplicative from the experiment results, and additive increase with multiplicative decrease is the optimal policy.

Thoughts:
  1. Instead of sending a binary bit about the congestion status of the network, would it be possible to have, say, a multi-bits status for the level of the congestion? e.g. 00 for extremely congested, every user stops sending new packets for a random amount of time, 01 for severely congested, 10 for mildly congested, and 11 for light-workload, please send packets faster? This way the user sides could decide if they need to slow down a little bit or a lot.
  2. As we discussed in the first week, some applications require reliable data transmission, others more latency-critical. Maybe this extra magic bit(s) could include how fast/reliable it wants to be handled?

Wednesday, September 2, 2009

4. Understanding BGP Misconfigurations

This paper is a quatitative study of Border Gateway Protocol (BGP) misconfigurations by analyzing configuration errors over a 3 week period. The authors concluded that most errors, while 1 on every 25 incidents actually affects connectivity, could be prevented through better router design or protocol revision.

BGP is a policy based protocal in which autonomous systems (ASes) receive and send routing information to hosts and neighbors. They put the misconfiguration into two categories: origin misconfiguration, and export misconfiguration. An example of the former is a wrong prefix injection into global BGP tables; for the latter, the router exports a route that shouldn't go through.

The effects of the misconfiguration were discussed in three different aspects, including routing load, connectivity disruption, and policy violation. The new routes lasting less than a day are carefully considered as misconfiguration while using external vantange points and administrator email surveys to check if it actually caused the above three kinds of effects. Selection bias is addressed by the authors, and by comparing the results of the email surveys and vantage points, it is kept to a minimum in a safe way.

The physical links and relationships between ASes are usually commercial secrets, and Gao's algorithm was used to estimate the actual linkage, which is based on three rules: 1) all valid AS-path routing information goes downward (e.g. from provider to customer), 2) AS-path can have at most one peer-to-peer edge, 3) ASes with more neighbors are likely to be providers.

Results: A equivalent to 0.2-1.0% of the glocal BGP table size suffers from misconfiguration each day, which means up to 3 in 4 new route annoucements per day are misconfigs. While connectivity was surprising solid, affected in onlly 4% of the misconfiged announcements or 13% of the misconfig incidents, the extra loading in the routers takes up to 2% of the time, or more than 10% of the total update load. In an extreme case, it went higher than 60% of the update load with 15minutes averaging.

The paper investigates the causes of misconfiguration and classify the human errors into slips (errors in the excution of a correct plan, e.g. typoes) and mistakes (erros in the plan itself, e.g., logic bugs). In owing to the human operatior error rate affects the system, as seen in phone network, aircraft, and bank databases, the authors suggests that BGP router configuration is far from error-free because of the command-based editing and lack of high-level languages and checking. Protocol revisions/extensions were suggested, and active alerting wrong routing infomation was also proposed.

Afterthoughts:
  1. This is a carefully examined paper with almost all stastistical and bias errors were carefully considered, which makes the the results very confincing that the BDP is far from perfection.
  2. It's interesting to see that they conducted this experiment after Christmas during the New Year. There's no explanation of why they specifically chose this period depite their careful, scientific setting. Is it probable that there is less/more routing updates during this period?

3. Interdomain Internet Routing

This paper is about how routing information is exchanged between service providers such as Border Gateway Protocol (BGP) as well as failures and shortcomings of the routing system.

In a more realistic view of the Internet structure, Internet Service Providers (ISPs) can be put into three categories, Tier-3 (small, local), Tier-2 (large, region-wide), and Tier-1 (really huge, "default free," only 10 as of 2009). Routing infomation is exchanged between these autonomous systems (ASes) with each assigned a unique 16-bit number address through BGP, while within the ASes, Interior Gateway Protocols (IGPs) were used. The difference between BGP and IGP is that the former implement autonomous routing policies, whereas the latter are usually non-concerned with optimizing a path metric.

There are two kinds of Inter-AS relationships, "transit" and "peering". "Transit" is the customer-ISP data traffic, in which customer is usually charged for the serive. "Peering" is the ISP-ISP traffic, in which may not involve financial settlement if the traffic radio is not highly asymmetric (4:1).

Reasons for peering relationships: 1) Tier-1 peering acts as an alliance of a certain size Tier-1 providers. Peering between Tier-1 ISPs ensures that they have default-free routing information to all Internet nodes. 2) Tier-2 and Tier-3 ISPs usually have to pay their upper tiers for asymmetric traffic, but if they could settle a new link between themselves, while generating roughly symmetric data flow, to save money, and to increase end-to-end performance.

ISPs charger customers for transit links and import/export routing tables to make or save money by peering in order to make profit. Since routing traffic to customers always generate income, ISPs tend to put customer traffic on top priority (local preference) and try to avoid a distant peering that cost them.

BGP sessions can be divided into two types, eBGP (between ASes), and iBGP (within ASes). They use the same protocol for different purposes. Two important goals have to be met when disseminating the routes, namely, loop-free forwarding, and complete visibility.

The lack of origin authentications weakens the BGP by letting network administrators hijack the routes by mistake (the YouTube example) or for profit (the spam example). A solution called secure-BGP (s-BGP) is proposed, but not yet implemented for a variety of reasons.

Comment:
  1. It is interesting to learn how the money actually flows for the Internet Service from customer to ISPs and upper-tier ISPs, since everything is inter-connected in some ways.
  2. A bit of googling shows that s-BGP uses Public Key Infrastructure (PKI) to authenticate the routing updates. It should make sense that s-BGP would replace BGP a.s.ap., but in this paper it says one of the reasons it isn't deployed is because of existing routing table errors! :-O

Monday, August 31, 2009

2. The Design Philosophy of the DARPA Internet Protocols

  • 1. This paper attempts to capture some of the early reasoning which shaped the Internet protocols developed by DARPA.
  • datagram / connectionless service, no emphasis in the first paper
  • 2. Fundamental Goal: top level goal for DARPA: multilexed ultilization of existing interconnected networks (ARPANET <-> ARPA packet radio network) (LAN not emerged)
  • technique selected: packet switching, instead of circuit switching.
  • Top level assumption: connected by Internet packet switches, also called gateways.
  • 3. Second level goals: clearer definitions of "effective": (priority)
    • 1) communication must continue despite loss of networks or gateways, (military)
    • 2) support multiple services,
    • 3) accommodate a variety of networks,
    • 4) permit distributed management,
    • 5) cost effective,
    • 6) permit host attachment easily,
    • 7) resources used must be accountable
  • 4. Survivability in the face of failure: transport layer provides no facility to communicate to the client. sync never lost unless there was no physical path
  • state info of the on-going conversatoin must be protected from loss. if lower layers lost this info, they won't be able to tell if data were lost.
  • 5. Types of service: suould support a variety of types of service at the transport service level, distinguished by speed, latency, reliability, etc.
  • traditional type of service is bi-directional reliable delivery of data, "virtual circuit" service. remote login or file transfer was the first service provided in Internet architecture, using TCP. (delay / throughput requirements)
  • impossible to support all services into one protocol. e.g. 1) XNET (cross-Internet debugger) -- shouldn't be reliable, 2) live voice chat, smooths the delay, missing packet replaced by silence
  • TCP provides reliable sequenced data streams; IP provides basic building block, the datagram (UDP: User Datagram Protocol)
  • 6. Varieties of networks: long haul (ARPANET, X.25), short haul (Ethernet, ringnet), boradcast satellite nets (DARPA Altantic Satelite Network), packet radio networks, etc
  • flexibility achieved by making a minimum set of assumptions about the functions the net will provide. packet or datagram.
  • 7. Other goals: distributed management, gateways.
  • Problems of Internet today: lack of sufficient tools for distributed management, esp routing.
  • headers of Internet packets are long (40 bytes). e.g. remote login packets, only one byte of data with 40 bytes of header.
  • problem of host-resident machanisms -- poor implementation may hurt the network and the host
  • accountability: the Internet contains few tools for accounting for packet flows
  • 8. Architecture and implementation: the architecture tried very hard not to limit the services the Internet could provide.
  • 9. Datagrams: 1) eliminate the need for connection state, i.e. Internet can be reconstituted after a failure, 2) provides a basic building block
  • 10. TCP: window management, port address structure. original ARPANET provided flow control both on bytes and packets. TCP - only bytes. reasons: 1) to permit the insertion of control info, 2) permit TCP packet to be broken up in to smaller packets if necessary, 3) permit a number of small packets to be gathered into one larger packet
  • EOL / Push changes
  • 11. Conclusion: datagram solves the most important goals, but not so well when addressing some goals durther down the priority list. there may be a better building block than the datagram for newer generations of architecture.

1.End-to-End Arguments in System Design

  • end-to-end argument def: functions at low levels may be redundant when compared to the cost of providing them at that low level
  • authors suggest moving functions upward in a layered system
  • possible problems for end-to-end transmission: hardware faults, software mistakes in buffering and copying, processor or memory transient errors, packet drops or duplicates
  • e.g. file transfer program: A->B, B calculates checksum of the original and sends it back to A
  • communications level proposal of file transfer to achieve a reliable data transmissions: selective redundancy in packet checksums, sequence number checking, internal retry mechanisms
  • authors suggest that the comm level proposals has "no effect on inevitability or correctness of the outcome", and that the file transfer program must supply the reliability guarantee itself
  • e.g. app programmers know there's a packet checksum on the gateways so they didn't include any checksums in the program level. A bite pair interchange occurred in one gateway, which happens every million bytes. Files were corrupted and manual comparison has to be made.
  • performance aspects: authors suggest that the lower levels need not provide prefect reliability since app level has to do it anyways, and that it's probably not important to strive for a negligible error rate at any point below the app level. If error-checksum related functions were performed in network comm level, 1) those apps that don't need them will pay for it, 2) low-level may not have enough info and cannot do the job efficiently
  • **thoughts**: detect bit-error rate in the communications level and decides if there's any need for checksum?
  • delivery guarantees: low level may be wasting its effort providing a function that the app level will implement anyways (ARPANET example: knowing for sure that the msg was delivered to the target isn't very important)
  • secure transmission: 1) low level encryption/decryption must be trusted to manage the keys, 2) data will be in the clear and vulnerable, 3) authenticity must still be checked by the app. Encryption in low levels may be called for if necessary.
  • **thoughts**: so why not call for the checksums in communications layer when it is needed? leave the decision to the application designer
  • duplicate message suppression: app level duplicates look different to comm system, so it cannot suppress them
  • guaranteeing FIFO message delivery: an independent mechanism at a higher level must control the ordering of actions
  • transaction management: applying end-to-end argument to SWALLOW distributed data storage system. Low-level does not suppress duplicate messages since 1) object identifier and version info suffices to detect duplicates, 2) effect of a duplicate read request is only to generate a duplicate response, which is easily discarded. Thus, data comm protocol is significantly simplified.
  • Identifying the ends: live voice chat, live video streams: mild packet loss is acceptible, low-level checksum isn't important; voicemail: short delays are ok, so low-level reliability measures is okay.
  • **thoughts**: so why not call for the checksums in communications layer when it is needed? leave the decision to the application designer
  • Conclusions from the authors: awareness of end-to-end argements can help program designers not to be tempted to write more functions than necessary. end-to-end argument will help to add substance about proper layering of communication protocols.