Macro 32 Ramblings

Mind Archive

Overhead Calcs

===========================================================================================================
Modified from: http://sd.wareonearth.com/~phil/net/overhead/
http://www.packetmischief.ca/network/protocol_overhead.html
===========================================================================================================
Ethernet

Ethernet frame format:
6 byte dest addr
6 byte src addr
[4 byte optional 1qtunnel Tag] Juniper ERX
[2 byte optional 1qtunnel Tag] Cisco
[4 byte optional 802.1q VLAN Tag] PER TAG so SVLAN=2
2 byte length/type
46-1500 byte data (payload)
4 byte CRC

Juniper ERX (default type: 0x9100)
# 8100—Specifies Ethertype value 0x8100, as defined in IEEE Standard 802.1q
# 88a8—Specifies Ethertype value 0x88a8, as defined in draft IEEE Standard 802.1ad
# 9100—Specifies Ethertype value 0x9100, which is the default

Ethernet overhead bytes:
12 gap + 8 preamble + 14 header + 4 trailer = 38 bytes/packet w/o 802.1q
12 gap + 8 preamble + 18 header + 4 trailer = 42 bytes/packet with 802.1q
12 gap + 8 preamble + 20 header + 4 trailer = 44 bytes/packet with 1qtunnel (Cisco)
12 gap + 8 preamble + 22 header + 4 trailer = 48 bytes/packet with 1qtunnel (Juniper)
12 gap + 8 preamble + 26 header + 4 trailer = 52 bytes/packet with 1qtunnel (Juniper)
12 gap + 8 preamble + 22 header + 4 trailer + 8 pppoe = 56 bytes/packet with 1qtunnel (juniper) & pppoe

Ethernet Payload data rates are thus:
1500/(38+1500) = 97.5293 % w/o 802.1q tags (1.03)
1500/(42+1500) = 97.2763 % with 802.1q tags (1.03)
1500/(44+1500) = 97.1503 % with 1qtunnel tags (Cisco) (1.03)
1500/(48+1500) = 96.8992 % with 1qtunnel tags (Juniper) (1.03)
1500/(52+1500) = 96.6494 % with 1qtunnel SVLANtags (Juniper) (1.03)
1500/(56+1500) = 96.4010 % with 1qtunnel tags (juniper) & pppoe (1.04)

JunOSe MTU CALCS:
=======================
(Everything)
—————————–
IP MTU 1500 — 4470
—————————–

Interface 1500 OH 4470
Type IntMTU Bytes IntMTU
————————————————————
Ethernet 1518 18 4488
VLAN 1522 22 4492
SVLAN 1526 26 4496

JunOS MTU CALCS:
=======================
(Everything but CRC)
—————————–
IP MTU 1500 — 4470
—————————–

Interface 1500 OH 4470
Type IntMTU Bytes IntMTU
————————————————————
Ethernet 1514 14 4484
VLAN 1518 18 4488
SVLAN 1522 22 4492

Cisco MTU CALCS:
=======================
(Nothing, just IP MTU don’t need to acct. for vlan)
—————————–
IP MTU 1500 — 4470
—————————–

Interface 1500 OH 4470
Type IntMTU Bytes IntMTU
————————————————————
Ethernet 1500 0 4470
VLAN 1504 4 4474
SVLAN 1508 8 4478

TCP over Ethernet:
Assuming no header compression (e.g. not PPP)
Add 20 IPv4 header or 40 IPv6 header (no options)
Add 20 TCP header
Add 12 bytes optional TCP timestamps
Max TCP Payload data rates over ethernet are thus:
(1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers (1.02)
(1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps (1.07)
(1500-52)/(42+1500) = 93.9040 % 802.1q, IPv4, TCP timestamps (1.07)
(1500-52)/(44+1500) = 93.7824 % 1qtunnel, IPv4, TCP timestamps (Csco)(1.07)
(1500-52)/(48+1500) = 93.5401 % 1qtunnel, IPv4, TCP timestamps (Jnpr)(1.07)
(1500-60)/(38+1500) = 93.6281 % IPv6, minimal headers (1.07)
(1500-72)/(38+1500) = 92.8479 % IPv6, TCP timestamps (1.08)
(1500-72)/(42+1500) = 92.6070 % 802.1q, IPv6, ICP timestamps (1.08)
(1500-72)/(44+1500) = 92.4871 % 1qtunnel, IPv6, ICP timestamps (Csco)(1.09)
(1500-72)/(48+1500) = 92.2481 % 1qtunnel, IPv6, ICP timestamps (Jnpr)(1.09)

UDP over Ethernet:
Add 20 IPv4 header or 40 IPv6 header (no options)
Add 8 UDP header
Max UDP Payload data rates over ethernet are thus:
(1500-28)/(38+1500) = 95.7087 % IPv4 (1.05)
(1500-28)/(42+1500) = 95.4604 % 802.1q, IPv4 (1.05)
(1500-28)/(44+1500) = 95.3368 % 1qtunnel, IPv4 (Csco) (1.05)
(1500-28)/(48+1500) = 95.0905 % 1qtunnel, IPv4 (Jnpr) (1.06)
(1500-48)/(38+1500) = 94.4083 % IPv6 (1.06)
(1500-48)/(42+1500) = 94.1634 % 802.1q, IPv6 (1.07)
(1500-48)/(44+1500) = 94.0415 % 1qtunnel, IPv6 (1.07)
(1500-48)/(48+1500) = 93.7985 % 1qtunnel, IPv6 (1.07)

Effective Bit Rate of native Ethernet link Table
=================================================
Assuming:
* Preamble +SFD = 8B
* Interframe Gap (IFG) = 12B
* Total overhead = 20B

Frame Size (B) FE (Mbps) GE (Mbps)
———————————————–
64 76.19 761.90
256 92.75 927.54
512 96.24 926.41
1024 98.08 980.84
1300 98.48 984.85
1518 98.70 987.00
1522 98.70 987.03
9600 99.79 997.92

Notes:
48-bit (6 byte) ethernet address have a 24-bit “Organizationally Unique Identifier” (OUI) assigned by IEEE + a 24-bit number assigned by the vendor.
The minimum ethernet payload (data field) is 46 bytes which makes a 64 byte ethernet packet including header and CRC.
The maximum ethernet payload (data field) is 1500 bytes which makes a 1518 byte ethernet packet including header and CRC. When 802.1q added an optional 4-byte VLAN Tag Header, they extended the allowed maximum frame size to 1522 bytes (22 byte header+CRC).
The bit speed of 100 Mbps ethernet on the wire/fiber is actually 125 Mbps due to 4B/5B encoding. Every four data bits gets mapped to one of 16 5-bit symbols. This leaves 16 non-data symbols. This encoding came from FDDI.
The original Ethernet II spec had a two byte type field which 802.3 changed to a length field, and later a length/type field depending on use: values 1536 and over are types, under 1536 lengths.

===========================================================================================================
===========================================================================================================
Gigabit Ethernet with Jumbo Frames
Gigabit ethernet is exactly 10 times faster than 100 Mbps ethernet, so for standard 1500 byte frames, the numbers above all apply, multiplied by 10. Many GigE devices however allow “jumbo frames” larger than 1500 bytes. The most common figure being 9000 bytes. For 9000 byte jumbo frames, potential GigE throughput becomes (from Bill Fink, the author of nuttcp):

Theoretical maximum TCP throughput on GigE using jumbo frames:

(9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps
| | | | | | | | | | | |
MTU | | | MTU | | | | | GigE Mbps
| | | | | | | |
IP | | Ethernet | | | | InterFrame Gap (IFG), aka
Header | | Header | | | | InterPacket Gap (IPG), is
| | | | | | a minimum of 96 bit times
TCP | FCS | | | from the last bit of the
Header | | | | FCS to the first bit of
| Preamble | | the preamble
TCP | |
Options Start |
(Timestamp) Frame |
Delimiter |
(SFD) |
|
Inter
Frame
Gap
(IFG)

Theoretical maximum UDP throughput on GigE using jumbo frames:

(9000-20-8)/(9000+14+4+7+1+12)*1000000000/1000000 = 992.697 Mbps

Theoretical maximum TCP throughput on GigE without using jumbo frames:

(1500-20-20-12)/(1500+14+4+7+1+12)*1000000000/1000000 = 941.482 Mbps

Theoretical maximum UDP throughput on GigE without using jumbo frames:

(1500-20-8)/(1500+14+4+7+1+12)*1000000000/1000000 = 957.087 Mbps

===========================================================================================================
===========================================================================================================
ATM
————————– DS3 ——————————
Line Rate 44.736 Mbps
PLCP Payload 40.704 (avail to ATM)
ATM Payload 36.864 (avail to AAL)
MTU=576 MTU=9180 MTU=65527
AAL5 Payload 34.501 36.752 36.845 (avail to LLC/SNAP)
LLC/SNAP Payload 34.028 36.720 36.841 (avail to IP)
IP Payload 32.847 36.640 36.830 (avail to transport)
UDP Payload 32.374 36.608 36.825 (avail to application)
TCP Payload 31.665 36.560 36.818 (avail to application)

————————– OC-3c ——————————
Line Rate 155.520 Mbps
SONET Payload 149.760 (avail to ATM)
ATM Payload 135.632 (avail to AAL)
MTU=576 MTU=9180 MTU=65527
AAL5 Payload 126.937 135.220 135.563 (avail to LLC/SNAP)
LLC/SNAP Payload 125.198 135.102 135.547 (avail to IP)
IP Payload 120.851 134.808 135.506 (avail to transport)
UDP Payload 119.112 134.690 135.489 (avail to application)
TCP Payload 116.504 134.513 135.464 (avail to application)

————————– OC-12c —————————–
Line Rate 622.080 Mbps
SONET Payload 600.768 (avail to ATM)
ATM Payload 544.092 (avail to AAL)
MTU=576 MTU=9180 MTU=65527
AAL5 Payload 509.214 542.439 543.818 (avail to LLC/SNAP)
LLC/SNAP Payload 502.239 541.966 543.752 (avail to IP)
IP Payload 484.800 540.786 543.586 (avail to transport)
UDP Payload 477.824 540.313 543.519 (avail to application)
TCP Payload 467.361 539.605 543.420 (avail to application)

Notes:
DS3 and SONET frames are 125 usec long (8000/sec).
PLCP packs 12 ATM cells per DS3 frame, for 96 kc/s (8000×12).
An STS-3c frame (OC3c) is 2430 bytes long (270 bytes x 9 rows), 90 of which are consumed by SONET overhead (9 bytes x 9 rows section and line overhead and 1 byte x 9 rows path overhead), 2340 bytes are payload (260 bytes x 9 rows). The payload is called the Synchronous Payload Envelope (SPE).
An STS-12c frame (OC12c) is 9720 bytes long, 333 of which are SONET overhead, 9387 bytes are payload (SPE). Note that this is slightly larger than four STS-3c SPE’s (4×2340=9360), the advantage of “concatenated” OC12c vs. OC12.
ATM cells are 53 bytes long: 5 header and 48 payload.
AAL5 adds an 8 byte trailer in the last 8 bytes of the last cell, padding in front of the trailer if necessary. This results in 0-47 bytes of padding in an AAL5 frame. In the worse case, you have seven bytes of padding in one cell, and 40 bytes of padding plus the 8 byte AAL5 trailer in the following cell.
RFC1483 defines two types of protocol encapsulation in AAL5
LLC/SNAP – adds an 8 byte header containing LLC (3 bytes), OUI (3 bytes), and PID/EtherType (2 bytes)
VC-mux – adds no additional bytes by sending only a single protocol type per VC
IPv4 usually adds 20 bytes. IPv6 would add 40 bytes. Plus any options but assumed zero here.
UDP adds an 8 byte header. (ICMP is also an 8 byte header)
TCP adds a 20 byte header plus any options. A common option on high performance flows is timestamps which consume an additional 12 bytes per packet.
On the physical layer (single pt-to-pt hop), one out of every 27 cells is an OAM cell. The above calculations don’t take that into account, but that’s another 3.7% reduction!

We should add calculations for ping packets and 1500 byte packets.

So what is the largest packet that we can fit in a single ATM cell? If you are using AAL5, you have a 40 byte payload to work with. For IPv4, you could have a 20 byte header + a 20 byte IP payload. A UDP or ICMP payload could be up to 12 bytes (both use 8 bytes after the IP header). So a “ping -s8” through “ping -s12” should fit in one ATM cell and still give you a round trip time.

===========================================================================================================
===========================================================================================================
Packet Over SONET (POS)
Packet over SONET (POS) uses PPP with HDLC to frame IP packets. These add a five byte header and a four byte trailer under normal circumstances. No padding is required, except for any possible idle time between packets. Byte stuffing is used (see notes below) which can expand the length of the POS frame.
Flag Byte (0x7e)
Address Byte (0xff = all stations)
Control Byte (0x03 = Unnumbered Information)
Protocol – 2 bytes, 1 byte if compressed +
Payload – 0-MRU bytes | PPP part
Padding – 0+ bytes +
Frame Check Sequence (FCS) – 4 bytes (2 in limited cases)
Flag Byte (0x7e)
[Interframe fill or next Address]

HDLC has no set frame size limit, nor does PPP specify the payload size, you just keep reading until you see a Flag byte. PPP however specifies that the Maximum Receive Unit (MRU) default is 1500 bytes and that other sizes can be negotiated using LCP. These LCP messages have a 16-bit length field, so a properly negotiated maximum payload would be 65535 bytes. [It would be possible to configure a sender/receiver pair to go beyond 65535 and simply not negotiate a size with LCP. No one does this however.] Most POS hardware seems to have a 4470 or 9180 byte MRU.
So we get:

————————– OC-3c ——————————
Line Rate 155.520 Mbps
SONET Payload 149.760 (avail to POS)
POS Payload *** to do *** (avail to IP)
etc.

————————– OC-12c —————————–
Line Rate 622.080 Mbps
SONET Payload 600.768 (avail to POS)
MTU=1500 MTU=9000
POS Payload (no stuff) 597.185 600.168 (avail to IP) 9 overhead
POS Payload (rnd stuff) 592.583 595.520 20.71875 overhead
POS Payload (max stuff) 299.486 300.234 1509 overhead

~TCP Payload w/ts rnd 572.040 592.079

Notes:

Only one flag byte is required between frames, i.e. the flag byte that ends one frame can also begin the next.
It is possible for the HDLC Address and Control fields to be “compressed”, i.e. non-existent. This is negotiated by PPP’s Link Control Protocol (LCP). The RFC’s however recommend that they be present on high speed links and POS.
The protocol field can be compressed to one byte (negotiated by LCP), but this is also discouraged on high speed links and POS.
IP -> PPP -> FCS generation -> Byte stuffing -> Scrambling -> SONET/SDH framing
The Frame Check Sequence (FCS) for POS should be 32-bits. RFC2615 allows for 16-bits (the PPP default) only when required for backward compatibility, and only on OC3c. Even on OC3c 32-bit is recommended. The FCS length is configured, not negotiated. The FCS-32 uses the exponents x**0, 1, 2, 4, 5, 7, 8, 10, 11, 12, 16, 22, 23, 26, 32.
Byte stuffing escapes any Flag (0x7e) and Escape (0x7d) bytes by inserting an Escape byte and xoring the original byte with 0x20. [PPP can also escape negotiated control characters but this is not used in POS.] Byte stuffing can at worse double the payload size (e.g. data of all 0x7e). For uniform random data one in every 128 bytes would be stuffed, for an overhead of 0.775%.
The stuffed data is then scrambled with 1+x**43 (the same used for ATM) to prevent certain data patterns from interfering with SONET.
References:
RFC1661 The Point-to-Point Protocol (PPP), July 1994
RFC1662 PPP in HDLC-like Framing, July 1994
RFC2615 PPP over SONET/SDH, June 1999

POS with Frame Relay encapsulation
Frame Relay (FR) encapsulation can be used on POS instead of HDLC/PPP. There are not any RFC’s about Frame Relay over SONET, nor does the Multiprotocol over Frame Relay RFC1490 discuss SONET or POS, but Cisco starting doing this and others have followed.
References:

RFC2427 Multiprotocol Interconnect over Frame Relay, September 1998

===========================================================================================================
===========================================================================================================
Multi Protocol Label Switching (MPLS)
Multi-Protocol Label Switching (MPLS) adds four bytes to every frame. As described in RFC3032 the 32-bit label includes:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label
| Label | Exp |S| TTL | Stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Entry

Label: Label Value, 20 bits
Exp: Experimental Use, 3 bits
S: Bottom of Stack, 1 bit
TTL: Time to Live, 8 bits

——————————————————————————–

===========================================================================================================
===========================================================================================================
Serial Lines (T1,T3)
To do
DS-3 is specified as 44.736 Mbps +/- 20 parts per million (ppm). So one DS-3 can vary from another by up to 1789 bps.
Bit-stuffing is used to accommodate rate mismatches as you mux up the DS-n hierarchy.

===========================================================================================================
http://www.rawether.net/support/KB06300101.htm
===========================================================================================================

Network Performance Notes
Part 1 – Theoretical Estimates

Knowledge Base ID
KB06300101
Category INFORMATION

Effected Product
Rawether for Windows, the Win32 NDIS Framework

Effected Versions
All
Effected Platforms All

Network Performance

These notes discuss two aspects of network performance:

bullet TCP Performance – The ability of the network to transport TCP user data.

bullet NDIS Performance – The ability of a NDIS network monitor to collect packets.

This is a theoretical discussion that can only provide estimates of performance. These estimates are but one part of the design and development of NDIS packet monitor software. It should be understood that:

Experimentation, instrumentation and measurement is the only way to actually understand the performance of computer systems.

TCP Performance

TCP performance is a measure of the ability of a network to transport TCP user data. The performance calculations account for data link and IP/TCP protocol overhead associated with transporting TCP user data for various packet sizes.

These estimates are adapted from Section 24 of TCP/IP Illustrated Volume 1 – The Protocols, by W. Richard Stevens. Published 1994 by Addison-Wesley Publishing, Inc. They are extended to include information of interest to developers of network software for the Windows platform.

The bulk of the performance calculations are based on operation on a 10Mbps Ethernet LAN. Simple scaling is used in the summary to extrapolate to 100Mbps FastEthernet and 1000Mbps Gigabit Ethernet LANs.

In calculating network performance it is convenient to represent inter-packet gap in terms of the time it takes to transmit one “real” network byte. For example, the Ethernet inter-packet gap is 9.6 microseconds, which is the time it takes to transmit 12 “real” network bytes. Where appropriate these “virtual” bytes are enclosed on parentheses “()”.

NDIS Performance

NDIS Performance is a measure of the required capability for a NDIS-based packet monitor to collect packets on the network – theoretically without loss. There are two (2) parameters of interest:

bullet Packet Data Rate – The number of bytes-per-second that the NDIS monitor must collect.
bullet Packet Frequency – The number of packets-per-second that the NDIS monitor must support.

In calculating NDIS performance understand that NDIS does not provide a way to observe the 8-byte Ethernet preamble nor the 4-byte Ethernet CRC. Of course, NDIS does not directly observe the inter-packet gap as received bytes.

These notes provide an estimate of the performance of a hypothetical loss-less packet monitor. No consideration is given to the demands of application or driver packet data processing, bandwidth of logging facilities, network adapter capabilities, etc.

Performance Summary

The Tables below provide a summary of the performance estimates:

bullet 10Mbps Ethernet Performance Summary
bullet 100Mbps FastEthernet Performance Summary
bullet 1000Mbps Gigabit Ethernet Performance Summary

10Mbps Ethernet Performance Summary
10Mbps Ethernet Performance Summary
TCP
Throughput
bytes/sec NDIS
Throughput
bytes/sec Packet
Frequency
Pkt/Sec
MTU 1500, MSS 1460

1,183,667

1,229,658

848
MTU 576, MSS 536 1,088,763 1,200,450 2,065
Small Packet – SS 256 956,205 1,159,655 3,764
Runt Packet 89,286 892,857 14,880

100Mbps FastEthernet Performance Summary
100Mbps Ethernet Performance Summary
TCP
Throughput
bytes/sec NDIS
Throughput
bytes/sec Packet
Frequency
Pkt/Sec
MTU 1500, MSS 1460

11,836,670

12,296,580

8,480
MTU 576, MSS 536 10,887,630 12,004,500 20,650
Small Packet – SS 256 9,562,050 11,596,550 37,640
Runt Packet 892,860 8,928,570 148,800

1000Mbps Gigabit Ethernet Performance Summary
1000Mbps Ethernet Performance Summary
TCP
Throughput
bytes/sec NDIS
Throughput
bytes/sec Packet
Frequency
Pkt/Sec
MTU 1500, MSS 1460

118,366,700

122,965,800

84,800
MTU 576, MSS 536 108,876,300 120,045,000 206,500
Small Packet – SS 256 95,620,500 115,965,500 376,400
Runt Packet 8,928,600 89,285,700 1,488,000

The detailed calculations used to provided these estimates are provided below. This is just simple arithmetic…

bullet MTU 1500, MSS 1460 Performance Calcularions
bullet MTU 576, MSS 536 Performance Calculations
bullet Small Packet – SS 256 Performance Calculations
bullet Runt Packet Performance Calculations

Conclusions

There are two fairly obvious conclusions that can be distilled from the performance estimates:

1. NDIS Throughput Requirement – This varies so little that the NDIS throughput design goal is basically the theoretical performance of the network media.

2. Runt Packet Frequency Support Requirement – The packet frequency of minimum size (“runt”) packets places significant demands on the design and implementation of a packet monitor.

The runt packet frequency is particularly intimidating. On the slowest 10Mbps LAN the run packet frequency is 14,880 packets/second. If each runt packet is lifted from kernel-space to user-space, then there would be 14,880 kernel-user transitions per second. Even if this could be achieved on high-performance systems it is clear that a goal of an NDIS packet monitor would be to reduce the number of these transitions.

NDIS
Throughput
Requirement
bytes/sec
Runt Packet
Frequency
packets/sec
10Mbps Ethernet 1,250,000 14,880
100Mbps FastEthernet 12,500,000 148,800
100Mbps Gigabit Ethernet 125,000,000 1,488,000

Performance For MTU 1500, MSS 1460

This is the performance on a 10Mbps local area network.
Field Data
#bytes ACK
#bytes
Ethernet preamble

8

8
Ethernet destination address

6

6
Ethernet source address

6

6
Ethernet type field

2

2
IP Header

20

20
TCP Header

20

20
User data

1460

0
Pad (to Ethernet minimum)

0

6
Ethernet CRC

4

4
Interpacket gap (9.6 microsecond)

(12)

(12)
Total 1538 84
TCP Throughput

If the TCP window is opened to its maximum size (65535, not using the window scale option), this allows a window of 44 1460-byte segments. It the receiver sends an ACK every 22nd segment the throughput calculation becomes:

throughput =

22 x 1460 bytes
x

10,000,000 bits/sec
= 1,183,667 bytes/sec

22 x 1538 bytes + 84 bytes

8 bits/byte

NDIS Throughput

Under the same conditions a NDIS-based network monitor would observe 1514 bytes of each TCP data packet and 60 bytes of each ACK packet. The corresponding NDIS throughput would be:

NDIS throughput =

22 x 1514 bytes + 60 bytes
x

10,000,000 bits/sec
= 1,229,658 bytes/sec

22 x 1538 bytes + 84 bytes

8 bits/byte

Here is an estimate of the packet frequency that NDIS would observe during an interval where the 22 TCP data packets and the ACK were exchanged:

NDIS packet throughput =

22 + 1
x

10,000,000 bits/sec
= 848 packets/sec

22 x 1538 bytes + 84 bytes

8 bits/byte

Performance for MTU 576, MSS 536

This is the performance observed when accessing the Internet using a 10Mbps local area network where the remote MTU is throttled to the traditional 576-byte limit.
Field Data
#bytes ACK
#bytes
Ethernet preamble

8

8
Ethernet destination address

6

6
Ethernet source address

6

6
Ethernet type field

2

2
IP Header

20

20
TCP Header

20

20
User data

536

0
Pad (to Ethernet minimum)

0

6
Ethernet CRC

4

4
Interpacket gap (9.6 microsecond)

(12)

(12)
Total 614 84
TCP Throughput

If the TCP window is opened to its maximum size (65535, not using the window scale option), this allows a window of 122 536-byte segments. It the receiver sends an ACK every 61th segment the throughput calculation becomes:

throughput =

61 x 536 bytes
x

10,000,000 bits/sec
= 1,088,763 bytes/sec

61 x 614 bytes + 84 bytes

8 bits/byte

NDIS Throughput

Under the same conditions a NDIS-based network monitor would observe 590 bytes of each TCP data packet and 60 bytes of each ACK packet. The corresponding NDIS throughput would be:

NDIS throughput =

61 x 590 bytes + 60 bytes
x

10,000,000 bits/sec
= 1,200,450 bytes/sec

61 x 614 bytes + 84 bytes

8 bits/byte

Here is an estimate of the packet frequency that NDIS would observe during an interval where the 61 TCP data packets and the ACK were exchanged:

NDIS packet throughput =

61 + 1
x

10,000,000 bits/sec
= 2,065 packets/sec

61 x 614 bytes + 84 bytes

8 bits/byte

Performance for “Small” Packet

This is the performance for a continuous stream of “small” (256-byte SS) packets on a 10Mbps local area network.
Field Data
#bytes ACK
#bytes
Ethernet preamble

8

8
Ethernet destination address

6

6
Ethernet source address

6

6
Ethernet type field

2

2
IP Header

20

20
TCP Header

20

20
User data

256

0
Pad (to Ethernet minimum)

0

6
Ethernet CRC

4

4
Interpacket gap (9.6 microsecond)

(12)

(12)
Total 334 84
TCP Throughput

If the TCP window is opened to its maximum size (65535, not using the window scale option), this allows a window of 256 256-byte segments. It the receiver sends an ACK every 128th segment the throughput calculation becomes:

throughput =

128 x 256 bytes
x

10,000,000 bits/sec
= 956,205 bytes/sec

128 x 334 bytes + 84 bytes

8 bits/byte

NDIS Throughput

Under the same conditions a NDIS-based network monitor would observe 310 bytes of each TCP data packet and 60 bytes of each ACK packet. The corresponding NDIS throughput would be:

NDIS throughput =

128 x 310 bytes + 60 bytes
x

10,000,000 bits/sec
= 1,159,655 bytes/sec

128 x 334 bytes + 84 bytes

8 bits/byte

Here is an estimate of the packet frequency that NDIS would observe during an interval where the 128 TCP data packets and the ACK were exchanged:

NDIS packet throughput =

128 + 1
x

10,000,000 bits/sec
= 3,764 packets/sec

128 x 334 bytes + 84 bytes

8 bits/byte

Performance For Minimum Size (“Runt”) Packet

This is the performance for a continuous stream of minimum size (“runt”) packets on a 10Mbps local area network. This condition produces the maximum packet frequency.
Field #bytes
Ethernet preamble

8
Ethernet destination address

6
Ethernet source address

6
Ethernet type field

2
IP Header

20
TCP Header

20
User data

6
Pad (to Ethernet minimum)

0
Ethernet CRC

4
Interpacket gap (9.6 microsecond)

(12)
Total 84

TCP Throughput

…:

throughput =

6 bytes
x

10,000,000 bits/sec
= 1,088,783 bytes/sec

84 bytes

8 bits/byte

NDIS Throughput

A NDIS-based network monitor would observe 60 bytes of each runt TCP data packet:

NDIS throughput =

60 bytes
x

10,000,000 bits/sec
= 892,857 bytes/sec

84 bytes

8 bits/byte

Here is an estimate of the packet frequency that NDIS would observe when runt packets are sent continuously:

NDIS packet throughput =

1
x

10,000,000 bits/sec
= 14,880 packets/sec

84 bytes

8 bits/byte

Note the “magic number” 14,880 packets-per-second! You may see this number in the specification for routers and other network products as the “forward rate”. Achieving this rate means that the product can forward a continuous flood of runt packets without loss.

Status
June 30, 2001 Initial release.

Keywords RAWETHER,NDIS,PERFORMANCE
Created June 30, 2001
Last Reviewed June 30, 2001

Mailing Lists · PCAUSA Newsletter · PCAUSA Discussion List
· Privacy Statement ·
WinDis 32 is a trademark of Printing Communications Assoc., Inc. (PCAUSA).
Rawether for Windows and Rawether .NET are trademarks of Printing Communications Assoc., Inc. (PCAUSA).
Microsoft, MS, Windows, Windows 95, Windows 98, Windows Millennium, Windows 2000, Windows XP, and Win32 are registered trademarks and Visual C++ and Windows NT are trademarks of the Microsoft Corporation.
Send mail to rawether-webmaster@pcausa.com with questions or comments about this web site.
Copyright © 1996-2008 Printing Communications Assoc., Inc. (PCAUSA).
All rights reserved.
Last modified: December 31, 2007

===========================================================================================================
Latency and Throughput
From: http://www.stuartcheshire.org/rants/Latency.html
===========================================================================================================

It’s the Latency, Stupid

Stuart Cheshire, May 1996.

(Revised periodically)

Copyright © Stuart Cheshire 1996-2001

Years ago David Cheriton at Stanford taught me something that seemed very obvious at the time — that if you have a network link with low bandwidth then it’s an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.

It’s now many years later, and this obvious fact seems lost on the most companies making networking hardware and software for the home. I think it’s time it was explained again in writing.
Fact One: Making more bandwidth is easy.

Imagine you live in a world where the only network connection you can get to your house is a 33kbit/sec modem running over a telephone line. Imagine that this is not enough for your needs. You have a problem.

The solution is easy. You can get two telephone lines, and use them together in parallel, giving you a total of 66kbit/sec. If you need even more you can get ten telephone lines, giving you 330kbit/sec. Sure, it’s expensive, and having ten modems in a pile is inconvenient, and you may have to write your own networking software to share the data evenly between the ten lines, but if it was important enough to you, you could get it done.

It may not be cheap, but at least it’s possible.

People with ISDN lines can already do this. It’s called “bonding” and it uses two 56 (or 64) kbit/sec ISDN channels in parallel to give you a combined throughput of 112 (or 128) kbit/sec.
Fact Two: Once you have bad latency you’re stuck with it.

If you want to transfer a large file over your modem it might take several seconds, or even minutes. The less data you send, the less time it takes, but there’s a limit. No matter how small the amount of data, for any particular network device there’s always a minimum time that you can never beat. That’s called the latency of the device. For a typical Ethernet connection the latency is usually about 0.3ms (milliseconds — thousandths of a second). For a typical modem link the latency is usually about 100ms, about 300 times worse than Ethernet.

If you wanted to send ten characters over your 33kbit/sec modem link you might think the total transmission time would be:

80 bits / 33000 bits per second = 2.4ms.

but it doesn’t. It takes 102.4ms because of the 100ms latency introduced by the modems at each end of the link.

If you want to send a large amount of data, say 100K, then that takes 25 seconds, and the 100ms latency isn’t very noticable, but if you want send a smaller amount of data, say 100bytes, then the latency is more than the transmission time.

Why would you care about this? Why do small pieces of data matter? For most end-users it’s the time it takes to transfer big files that annoys them, not small files, so they don’t even think about latency when buying products. In fact if you look at the boxes modems come in, they proudly proclaim “14.4 kbps”, “28.8 kbps” and “33.6 kbps”, but they don’t mention the latency anywhere. What most end-users don’t know is that in the process of transferring those big files their computers have to send back and forth hundreds of little control messages, so the performance of small data packets directly affects the performance of everything else they do on the network.

Now, imagine the same scenario as before. You live in a world where the only network connection you can get to your house is a modem running over a telephone line. Your modem has a latency of 100ms, but you’re doing something that needs lower latency. Maybe you’re trying to do computer audio over the net. 100ms may not sound like very much, but it’s enough to cause a noticable delay and echo in voice communications, which makes conversation difficult. Maybe you’re trying to play an interactive game over the net. The game only sends tiny amounts of data, but that 100ms delay is making the interactivity of the game decidedly sluggish.

What can you do about this?

Nothing.

You can compress the data, but it doesn’t help. It was already small to start with, and that 100ms latency is still there.

You can get 80 phone lines in parallel, and send one single bit over each phone line, but that 100ms latency is still there.

Once you’ve got yourself a device with bad latency there’s absolutely nothing you can do about it (except throw out the device and get something else).
Fact Three: Current consumer devices have appallingly bad latency.

A typical Ethernet card has a latency less than 1ms. The Internet backbone as a whole also has very good latency. Here’s a real-world example:

* The distance from Stanford to Boston is 4320km.
* The speed of light in vacuum is 300 x 10^6 m/s.
* The speed of light in fibre is roughly 66% of the speed of light in vacuum.
* The speed of light in fibre is 300 x 10^6 m/s * 0.66 = 200 x 10^6 m/s.
* The one-way delay to Boston is 4320 km / 200 x 10^6 m/s = 21.6ms.
* The round-trip time to Boston and back is 43.2ms.
* The current ping time from Stanford to Boston over today’s Internet is about 85ms:

[cheshire@nitro]$ ping -c 1 lcs.mit.edu
PING lcs.mit.edu (18.26.0.36): 56 data bytes
64 bytes from 18.26.0.36: icmp_seq=0 ttl=238 time=84.5 ms

* So: the hardware of the Internet can currently achieve within a factor of two of the speed of light.

So the Internet is doing pretty well. It may get better with time, but we know it can never beat the speed of light. In other words, that 85ms round-trip time to Boston might reduce a bit, but it’s never going to beat 43ms. The speed’s going to get a bit better, but it’s not going to double. We’re already within a factor of two of the theoretical optimum. I think that’s pretty good. Not many technologies can make that claim.

Compare this with a modem. Suppose you’re 18km from your ISP (Internet Service Provider). At the speed of light in fibre (or the speed of electricity in copper, which is about the same) the latency should be:

18000 / (180 x 10^6 m/s) = 0.1ms

The latency over your modem is actually over 100ms. Modems are currently operating at level that’s 1000 times worse than the speed of light. They have a long way to go before they get close to what the rest of the Internet is achieving.

Of course no modem link is ever going to have a latency of 0.1ms. I’m not expecting that. The important issue is the total end-to-end transmission delay for a packet — the time from the moment the software makes the call to send the packet to the moment the last bit of the packet arrives the destination and the packet delivered to the software at the receiving end. The total end-to-end transmission delay is made up of fixed latency (including the speed-of-light propagation delay), plus the transmission time. For a 36 byte packet the transmission time is 10ms (the time it takes to send 288 bits at a rate of 28800 bits per second). When the actual transmission time is about 10ms, working to make the latency 0.1ms would be silly. All that’s needed is that the latency should not be so huge that it completely overshadows the transmission time. For a modem that has a transmission rate of 28.8kb/s, a sensible latency target to aim for is about 5ms.
Fact Four: Making limited bandwidth go further is easy.

If you know you have limited bandwidth, there are many techniques you can use to reduce the problem.
Compression

If you know you have limited bandwidth, compression is one easy solution.

You can apply general purpose compression, such as gzip, to the data.

Even better, you can apply data-specific compression, because that gets much higher compression ratios. For example, still pictures can be compressed with JPEG, Wavelet compression, or GIF. Moving pictures can be compressed with MPEG, Motion JPEG, Cinepak, or one of the other QuickTime codecs. Audio can be compressed with uLaw, and English text files can be compressed with dictionary-based compression algorithms.

All of these compression techniques trade off use of CPU power in exchange for lower bandwidth requirements. There’s no equivalent way to trade off use of extra CPU power to make up for poor latency.

All modern modems have compression algorithms built-in. Unfortunately, having your modem do compression is nowhere near as good as having your computer do it. Your computer has a powerful, expensive, fast processor in it. Your modem has a feeble, cheap, slow processor in it. There’s no way your modem can compress data as well or as quickly as your computer can. In addition, in order to compress data, your modem has to hold on to the data until it has a block that’s big enough to compress effectively. That adds latency, and once added, there’s no way for you to get rid of latency. Also, the modem doesn’t know what kind of data you are sending, so it can’t use the superior data-specific compression algorithms. In fact, since most images and sounds on Web pages are compressed already, the modem’s attempts to compress the data a second time is futile, and just adds more latency without giving any benefit.

This is not to say that having a modem do compression never helps. In the case where the host software at the endpoints is not very smart, and doesn’t compress its data appropriately, then the modem’s own compression can compensate somewhat for that deficiency and improve throughput. The point is that modem compression only helps dumb software, and it actually hurts smart software by adding extra delay. For someone planning to write dumb software this is no problem. For anyone planning to write smart software this should be a big cause for concern.
Bandwidth-conscious code

Another way to cope with limited bandwidth is to write programs that take care not to waste bandwidth.

For example, to reduce packet size, wherever possible Bolo uses bytes instead of 16-bit or 32-bit words.

For many kinds of interactive software like games, it’s not important to carry a lot of data. What’s important is that when the little bits of data are delivered, they are delivered quickly. Bolo was originally developed running over serial ports at 4800 bps and could support 8 players that way. Over 28.8 modems it can barely support 2 players with acceptable response time. Why? A direct-connect serial port at 4800 bps has a transmission delay of 2ms per byte, and a latency that is also 2ms. To deliver a typical ten byte Bolo packet takes 22ms. A 28800 bps modem has transmission delay of 0.28ms per byte, but a latency of 100ms, 50 times worse than the 4800 bps serial connection. Over the 28.8 modem, it takes 103ms to deliver a ten byte packet.
Send less data

A third way to cope with limited bandwidth is simply to send less data.

If you don’t have enough bandwidth to send high resolution pictures, you can use lower resolution.

If you don’t have enough bandwidth to send colour images, you can send black and white images, or send images with dramatically reduced colour detail (which is actually what NTSC television does).

If you don’t have enough bandwidth to send 30 frames per second, you can send 15fps, or 5fps, or fewer.

Of course these tradeoffs are not pleasant, but they are possible. You can either choose to pay more money to run multiple circuits in parallel for more bandwidth, or you can choose to send less data to stay within the limited bandwidth you have available.

If the latency is not good enough to meet your needs you don’t have the same option. Running multiple circuits in parallel won’t make your latency any better, and sending less data won’t improve it either.
Caching

One of the most effective techniques throughout all areas of computer science is caching, and that is just as true in networking.

If visit a web site, your Web browser can keep a copy of the text and images on your computer’s hard disk. If you visit the same site again, all your Web browser has to do check that the copies it has stored are up to date — i.e. check that the copies on the Web server haven’t been changed since the date and time the previous copies were downloaded and cached on the local disk.

Checking the date and time a file was last modified is a tiny request to send across the network. This kind of request is so small that the throughput of your modem makes no difference — latency is all that matters.

Recently companies have started providing CDROMs of entire Web sites to speed Web browsing. When browsing these Web sites, all the Web browser has to do is check the modification date of each file it accesses to make sure that the copy on the CDROM is up to date. It only has to download files that have changed since the CDROM was made. Since most of the large files on a Web site are images, and since images on a Web site change far less frequently than the HTML text files, in most cases very little data has to be transferred.

Since for the most part the Web browser is only doing small modification date queries to the Web server, the performance the user experiences is entirely dominated by the latency of the connection, and the throughput is virtually irrelevant.
Another analogy

Even smart people have trouble fully grasping the implications of these latency issues. It’s subtle stuff.

The Cable TV industry is hyping “cable modems” right now, claiming that they’re “1000 times ‘faster’ than a telephone modem.” Given the lack of public awareness of the importance of latency, I wouldn’t be in the least surprised if many of them have latency that is just as bad, or maybe even worse, than telephone modems. (The results from some early prototype cable modems, however, look quite promising. Lets hope the production ones are as good.)

Another device in a similar position is the DirecPC satellite dish, which is supposed to be “14 times faster than a 28.8 modem”. Is it really? Here are some excerpts of what Lawrence Magid had to say about it in his article in the San Jose Mercury News (2nd February 1997):

The system is expensive, requires a relatively elaborate installation and configuration and, in the end, doesn’t necessarily speed up your access to the World Wide Web.

I set up two nearly identical PCs side by side. One was connected to the Net at 28.8kbps and the other with DirecPC. In most cases the satellite system displayed Web pages a bit faster than the one with a modem, but not by much.

In some cases, the modem-equipped PC was faster, especially with sites that don’t have a great deal of graphics.

Alluring as its promise may be, DirecPC for now doesn’t offer spectacular advantages for normal Web surfing, even though it does carry a spectacular price.

Do we see a pattern starting to emerge yet?

Part of the problem here is misleading use of the word “faster”.

Would you say that a Boeing 747 is three times “faster” than a Boeing 737? Of course not. They both cruise at around 500 miles per hour. The difference is that the 747 carries 500 passengers where as the 737 only carries 150. The Boeing 747 is three times bigger than the Boeing 737, not faster.

Now, if you wanted to go from New York to London, the Boeing 747 is not going to get you there three times faster. It will take just as long as the 737.

In fact, if you were really in a hurry to get to London quickly, you’d take Concorde, which cruises around 1350 miles per hour. It only seats 100 passengers though, so it’s actually the smallest of the three. Size and speed are not the same thing.

On the other hand, If you had to transport 1500 people and you only had one aeroplane to do it, the 747 could do it in three trips where the 737 would take ten, so you might say the Boeing 747 can transport large numbers of people three times faster than a Boeing 737, but you would never say that a Boeing 747 is three times faster than a Boeing 737.

That’s the problem with communications devices today. Manufacturers say “speed” when they mean “capacity”. The other problem is that as far as the end-user is concerned, the thing they want to do is transfer large files quicker. It may seem to make sense that a high-capacity slow link might be the best thing for the job. What the end-user doesn’t see is that in order to manage that file transfer, their computer is sending dozens of little control messages back and forth. The thing that makes computer communication different from television is interactivity, and interactivity depends on all those little back-and-forth messages.

The phrase “high-capacity slow link” that I used above probably looked very odd to you. Even to me it looks odd. We’ve been used to wrong thinking for so long that correct thinking looks odd now. How can a high-capacity link be a slow link? High-capacity means fast, right? It’s odd how that’s not true in other areas. If someone talks about a “high-capacity” oil tanker, do you immediately assume it’s a very fast ship? I doubt it. If someone talks about a “large-capacity” truck, do you immediately assume it’s faster than a small sports car?

We have to start making that distinction again in communications. When someone tells us that a modem has a speed of 28.8 kbit/sec we have to remember that 28.8 kbit/sec is its capacity, not its speed. Speed is a measure of distance divided by time, and ‘bits’ is not a measure of distance.

I don’t know how communications came to be this way. Everyone knows that when you buy a hard disk you should check what its seek time is. The maximum transfer rate is something you might also be concerned with, but the seek time is definitely more important. Why does no one think to ask what a modem’s ‘seek time’ is? The latency is exactly the same thing. It’s the minimum time between asking for a piece of data and getting it, just like the seek time of a disk, and it’s just as important.
Lessons to learn

ISDN has a latency of about 10ms. Its throughput may be twice that of a modem, but its latency is ten times better, and that’s the key reason why browsing the web over an ISDN link feels so much better than over a modem. If you have the option of ISDN, and a good ISP that supports it, and it is not too expensive in your area, then get it.

One of the reasons that telephone modems have such poor latency is that they don’t know what you’re doing with your computer. An external modem is usually connected through a serial port. It has no idea what you are doing, or why. All it sees is an unstructured stream of bytes coming down the serial port.

Ironically, the Apple Geoport telecom adapter, which has suffered so much criticism, may offer an answer to this problem. The Apple Geoport telecom adapter connects your computer to a telephone line, but it’s not a modem. All of the functions of a modem are performed by software running on the Mac. The main reason for all the criticism is that running this extra software takes up memory slows down the Mac, but it could also offer an advantage that no external modem could ever match. Because when you use the Geoport adapter the modem software is running on the same CPU as your TCP/IP software and your Web browser, it could know exactly what you are doing. When your Web browser sends a TCP packet, there’s no need for the Geoport modem software to mimic the behaviour of current modems. It could take that packet, encode it, and start sending it over the telephone line immediately, with almost zero latency.

Sending 36 bytes of data, a typical game-sized packet, over an Apple Geoport telecom adapter running at 28.8kb/s could take as little as 10ms, making it as fast as ISDN, and ten times faster than the current best modem you can buy. For less than the price of a typical modem the Geoport telecom adapter would give you Web browsing performance close to that of ISDN. Even better, all the people who already own Apple Geoport telecom adapters wouldn’t need to buy anything at all — they’d just need a software upgrade. Even better, Microsoft wouldn’t be able to just copy it for Windows like they do with everything else they see on the Mac, because Wintel clones don’t have anything like a Geoport for Microsoft to use. What a PR triumph for Apple that would be! It really would show that Apple is the company that understands the Internet. I’m know that in practice there would be other factors that prevent us from getting the delay all the way down to 10ms, but I’m confident that we could get a long way towards that goal.

So far Apple has shown no interest in making use of this opportunity.
Bandwidth Still Matters

Having said all this, you should not conclude that I believe that bandwidth is unimportant. It is very important, but in a way that most people do not think of. Bandwidth is important not only for it’s own sake, but also for it’s effect on overall latency. As I said above, the important issue is the total end-to-end transmission delay for a packet.

Many people believe that having a private 64kb/sec ISDN connection is just as good, or even better than having a 1/150 share of a 10MB/sec Ethernet. Telephone companies argue that ISDN is just as good as new technologies like cable modems, because while cable modems have much higher bandwidth, that bandwidth is shared between lots of users, so the average works out the same. This idea, that you can average packets as if they were a fluid in a pipe, is flawed, as the following example will show:

Say we have a game where the state of the virtual world amounts to 40K of data. We have a game server, and in this simple example, the game server transmits the entire current game state to the player once every 10 seconds. That’s 40K every 10 seconds, or an average of 4K/sec, or 32kb/sec. That’s only half the capacity of a 64kb/sec ISDN line, and 150 users doing this on an Ethernet is only half the capacity of the Ethernet. So far so good. Both links are running at only 50% capacity, so the performance should be the same, right? Wrong. On the Ethernet, when the server sends the 40K to a player, the player can receive that data as little as 32ms later (320kb / 10Mb/sec). If the server is not the only machine sending packets on the Ethernet, then there could be contention for the shared medium, but even in that case the average delay before the player receives the data is only 64ms. On the ISDN line, when the server sends the 40K to a player, the player receives that data 5 seconds later (320kb / 64kb/sec). In both cases the users have the same average bandwidth, but the actual performance is very different. In the Ethernet case the player receives the data almost instantly, but in the ISDN case, by the time the player gets the game information it is already 5 seconds out of date.

The standard mistake is to assume that a 40K chunk every ten seconds and a uniform rate of 4K/second are the same thing. They’re not. If they were then ISDN, ATM, and all the other telephone company schemes would be good ideas. The telephone companies assume that all communications are like the flow of fluid in a pipe. You just tell them the rate of flow you need, and they tell you how big the pipe has to be. Audio streams, like voice, are like the flow of fluid in a pipe, but computer data is not. Computer data comes in lumps. The standard mistake is to say that if I want to send 60K of data once per minute, that’s exactly the same as sending 1K per second. It’s not. A 1K per second connection may be sufficient *capacity* to carry the amound of data you’re sending, but that doesn’t mean it will deliver the 60K lump of data in a timely fashion. It won’t. By the time the lump finishes arriving, it will be one minute old. Just because you don’t send data very often doesn’t mean you want it delivered late. You may only write to your aunt once a year, but that doesn’t mean that on the occasions when you do write her a letter you’d like it to take a year to be delivered.

The conclusion here is obvious. If you’re given the choice between a low bandwidth private connection, or a small share of a larger bandwidth connection, take the small share.

Again, this is painfully obvious outside the computer world. If a politician said they would build either a large shared freeway, or a million tiny separate private footpaths, one reserved for each citizen, which would you vote for?
Survey

A lot of people have sent me e-mail disputing what I’ve said here. A lot of people have sent me e-mail simply asserting that their modem isn’t slow at all, and the slow performance they see is due to the rest of the Internet being slow, not their modem link.

To try to get to the truth of the matter, I’m going to do a small-scale survey. If you think your modem has low latency, please try an experiment for me. Run a “traceroute” to some destination a little way away. On the West coast of the US lcs.mit.edu might be a good host to trace to. From the East coast of the US you can trace to core-gateway.stanford.edu. In other places, pick a host of your choice (or use one of those two if you like).

On Unix, you can run a trace by typing “traceroute ” (if you have traceroute installed). On the Mac, get Peter Lewis’s Mac TCP Watcher and click the “Trace” button. On Windows ’95, you have to open a DOS window and type a command like in Unix, except on Windows ’95 the “traceroute” command is called “TRACERT”. Jack Richard wrote a good article about traceroute for Boardwatch Magazine.

When you get your trace, send it to me, along with any other relevant information, like what brand of modem you’re using, what capacity of modem (14.4/28.8/33k/64k ISDN, etc.), whether it is internal or external, what speed serial port (if applicable), who your Internet Service Provider is, etc.

I’ll collect results and see if any interesting patterns emerge. If any particular brands of modems and/or ISPs turn out to have good latency, I’ll report that.

To start things off, here’s my trace:

Name: Stuart Cheshire
Modem: No modem (Quadra 700 built-in Ethernet)
ISP: BBN (Bolt, Beranek and Newman)

Hop Min Avg Max IP Name
1 3/3 0.003 0.003 0.004 36.186.0.1 jenkins-gateway.stanford.edu
2 3/3 0.003 0.006 0.013 171.64.1.161 core-gateway.stanford.edu
3 3/3 0.004 0.004 0.004 171.64.1.34 sunet-gateway.stanford.edu
4 3/3 0.003 0.003 0.004 198.31.10.3 su-pr1.bbnplanet.net
5 3/3 0.004 0.004 0.005 4.0.1.89 paloalto-br1.bbnplanet.net
6 2/3 0.006 0.006 0.007 4.0.1.62 oakland-br1.bbnplanet.net
7 3/3 0.036 0.036 0.037 4.0.1.134 denver-br1.bbnplanet.net
8 3/3 0.036 0.160 0.406 4.0.1.190 denver-br2.bbnplanet.net
9 3/3 0.056 0.058 0.059 4.0.1.130 chicago1-br1.bbnplanet.net
10 3/3 0.056 0.058 0.059 4.0.1.194 chicago1-br2.bbnplanet.net
11 3/3 0.076 0.077 0.078 4.0.1.126 boston1-br1.bbnplanet.net
12 3/3 0.076 0.076 0.076 4.0.1.182 boston1-br2.bbnplanet.net
13 3/3 0.077 0.077 0.078 4.0.1.158 cambridge1-br2.bbnplanet.net
14 3/3 0.080 0.081 0.083 199.94.205.1 cambridge1-cr1.bbnplanet.net
15 3/3 0.080 0.145 0.212 192.233.149.202 cambridge2-cr2.bbnplanet.net
16 3/3 0.079 0.081 0.084 192.233.33.3 ihtfp.mit.edu
17 3/3 0.083 0.096 0.104 18.168.0.6 b24-rtr-fddi.mit.edu
18 3/3 0.082 0.082 0.084 18.10.0.1 radole.lcs.mit.edu
19 3/3 0.082 0.085 0.089 18.26.0.36 mintaka.lcs.mit.edu

You can see it took my Mac (Quadra 700 running Open Transport) 3ms to get to jenkins-gateway. This is not particularly fast. With a good Ethernet interface it would be less than 1ms. From there, it took 1ms to get to paloalto-br1 (near to Stanford) and another 2ms to get to oakland-br1 (across the bay from San Francisco).

From oakland-br1 to denver-br1 took 30ms, from denver-br1 to chicago1-br1 took 20ms, and from chicago1-br1 to boston1-br1 took another 20ms.

The last stretch from boston1-br1 to mintaka.lcs.mit.edu took another 6ms.

So to summarise where the time’s going, there’s 6ms spent at each end, and 70ms spent on the long-haul getting across the country. Remember those are round-trip times — the one-way times are half as much.

Now, let’s find out what the breakdown looks like when we try the same experiment with a modem. Send in your results! Hopefully we’ll find at least one brand of modem that has good latency.

Note: October 1997. Now that I’ve got a decent collection of results, please only send me your results if they’re a lot faster (or slower) than what’s already on the list. Also, please send me results only for consumer technologies. If you’re company has a T-1 Internet connection, or if you are a student in University houseing with a connection even faster than that, then it’s not a great suprise to find that your connection has good latency. My goal here is to find what consumer technologies are available that offer good latency.
Are we there yet?

The good news is that since I first wrote this rant I’ve started to see a shift in awareness in the industry. Here are a couple of examples:

From Red Herring, June 1997, page 83, Luc Hatlestad wrote:

Matthew George is the vice president of techhnology at Engage… To Mr George, latency issues are more about modems than about network bandwidth. “Network latency in and of itself is not material to game playing; at least 70 to 90 percent of latency problems we see are due to the end points: the modems,” he says.

From MacWeek, 12th May 1997, page 31, Larry Stevens wrote about the new 56k modems:

Greg Palen, director of multimedia at Galzay Marketing Group, a digital communications, prepress and marketing company in Kansas City, Kan., is one of those taking a wait-and-see attitude. “We can always use more bandwidth, but modem speed is not the primary issue at this point. The main issue is latency.

Some modem makers are finally starting to care about latency. One major modem manufacturer has contacted me, and we’ve been investigating where the time goes. It seems that there is room for improvement, but unfortunately modems will never be able to match ISDN. The problem is that over a telephone line, electrical signals get “blurred” out. In order to decode just one single bit, a 33.6kb/s modem needs to take not just a single reading of the voltage on the phone line at that instant, but that single reading plus another 79 like it, spaced 1/6000 of a second apart. A mathematical function of those 80 readings gives the actual result. This process is called “line equalization”. Better line equalization allows higher data rates, but the more “taps” the equalizer has the more delay it adds. The V.34 standard also specifies particular scrambling and descrambling of the data, which also take time. According to this company, the theoretical best round-trip delay for a 14.4kb/s modem (no compression or error recovery) should be 40ms, and for a 33.6kb/s modem 64ms. The irony here is that as the capacity goes up, the best-case latency gets worse instead of better. For a small packet, it would be faster for your modem to send it at 9.6kb/s than at 33.6kb/s!

I don’t know what the theoretical optimum for a 56kb/s modem is. The sample rate with these is 16000 times per second (62.5us between samples) but I don’t know how many taps the equalizer has.
Further Reading

* Latency and the Quest for Interactivity
A white paper by me, giving a slightly different slant on the same issues.

* Stuart’s Law of Networkdynamics
For every Network Service there’s an equal and opposite Network Disservice.

* The importance of latency in telecommunications network
A white paper by Daniel Kohn comparing LEO (Low Earth Orbit) and GEO (Geostationary Earth Orbit) satellites.

* Dynamic Server Selection in the Internet
A paper by Mark E. Crovella and Robert L. Carter which studies delays measured across the Internet. They find that the median round-trip delay to 5262 “randomly chosen” Internet servers is 125ms. This means that if you’re using a 250ms modem to connect to the Internet, then your median delay would be 375ms. The modem would causing 2/3 of the total delay, and the entire rest of the path across the Internet all added together (16 hops, on average) would be responsible for only 1/3.

* Cox Communications’s Cable Modem FAQ
Cox is part of the @Home Network.
(Yes, the punctuation is right. “Cox Communications” is not a plural; it’s a proper noun — it’s the name of the company, so the posessive form uses an apostrophe followed by an “s”.)

* Modem Latencies
Another survey of modem latencies, aimed at game players.

* Lag City
A whole web site devoted to the problems of playing Quake over modems. Here’s one snippet from the site:
“As it is now, how well you do in a Quake deathmatch depends way too much on your connection quality. This can make a match completely unfair, as those who have ISDN, T1’s, and better quality ppp’s will dominate a game.”

* Brookline Software
has a web page reporting on Apple Geoport Telecom adapters, and an excellent software utility called SerialSpeed 230 which reduces latency by enabling the computer to communicate with the modem at 230kb/sec, even when using older software that otherwise would not be aware of rates higher than 57kb/sec.

* Stanislav Shalunov
has an article discussing network delay.

Page maintained by Stuart Cheshire
(Check out my latest construction project: Swimming pool by Swan Pools)

===========================================================================================================
Modified from: http://www.soldierx.com/books/tcp-ip-illustrated/tcp_fut.htm
===========================================================================================================

TCP Performance

Published numbers in the mid-1980s showed TCP throughput on an Ethernet to be around 100,000 to 200,000 bytes per second. (Section 17.5 of [Stevens 1990] gives these references.) A lot has changed since then. It is now common for off-the-shelf hardware (workstations and faster personal computers) to deliver 800,000 bytes or more per second.

It is a worthwhile exercise to calculate the theoretical maximum throughput we could see with TCP on a 10 Mbits/sec Ethernet [Warnock 1991]. We show the basics for this calculation in Figure 24.9. This figure shows the total number of bytes exchanged for a full-sized data segment and an ACK.

————————————————
Field Data ACK
#bytes #bytes
———————————————–
Ethernet preamble 8 8
Ethernet destination address 6 6
Ethernet source address 6 6
Ethernet type field 2 2
IP header 20 20
TCP header 20 20
user data 1460 0
pad (to Ethernet minimum) 0 6
Ethernet CRC 4 4
interpacket gap (9.6 microsec) 12 12
————————————————
total 1538 84
————————————————

Figure 24.9 Field sizes for Ethernet theoretical maximum throughput calculation.

We must account for all the overhead: the preamble, the PAD bytes that are added to the acknowledgment, the CRC, and the minimum interpacket gap (9.6 microseconds, which equals 12 bytes at 10 Mbits/sec).

We first assume the sender transmits two back-to-back full-sized data segments, and then the receiver sends an ACK for these two segments. The maximum throughput (user data) is then

throughput = 2 x 1460 bytes / (2 x 1538 + 84 bytes) x 10,000,000 bits/sec / 8 buts/byte =
= 1,155,063 bytes/sec

If the TCP window is opened to its maximum size (65535, not using the window scale option), this allows a window of 44 1460-byte segments. If the receiver sends an ACK every 22nd segment the calculation becomes

throughput = 22 x 1460 bytes / (22 x 1538 + 84 bytes) x 10,000,000 bits/sec / 8 buts/byte =
= 1,183,667 bytes/sec

This is the theoretical limit, and makes certain assumptions: an ACK sent by the receiver doesn’t collide on the Ethernet with one of the sender’s segments; the sender can transmit two segments with the minimum Ethernet spacing; and the receiver can generate the ACK within the minimum Ethernet spacing. Despite the optimism in these numbers, [Wamock 1991] measured a sustained rate of 1,075,000 bytes/sec on an Ethernet, with a standard multiuser workstation (albeit a fast workstation), which is within 90% of the theoretical value.

Moving to faster networks, such as FDDI (100 Mbits/sec), [Schryver 1993] indicates that three commercial vendors have demonstrated TCP over FDDI between 80 and 98 Mbits/sec. When even greater bandwidth is available, [Borman 1992] reports up to 781 Mbits/sec between two Cray


Leave a Reply