Wednesday, July 9, 2008

Bell Canada Justification of DSL Throttling (2 of 3)

As promised in part 1 here is my take on the likelihood that there is congestion within Bell Canada's network. I will start with an elementary discussion about networks, so if that is too basic you'll want to skip lots of this. Once again, here's the link to CRTC's web site where you can find references to the Bell Canada documents which I refer to in this post, and here's the link to the actual documents.

Cans and strings. Just like a child's play phone from days past, any modern communications network works the same way. The cans may be iPhones or a top of the line Optical Ethernet router, and the strings may be copper wire, coaxial cable or free-space photons, it is still much the same. Like any carrier, Bell Canada chooses from an assortment of cans and strings to create networks, and we can analyze on that basis.

On the economics of networks, strings typically cost more than cans. While copper and glass are cheap enough, getting the rights-of-way, burying or stringing, and dealing with environmental degradation and accidents due to their being out in the wild, makes them costly. While some of those cans can have a high price tag the long-run costs can be reasonable, and the carriers can depreciate cans faster than strings, which is good for the balance sheet.

In addition to basic costs, there is the impact of scaling. The greater the revenue base that the can or string supports, it gets effectively cheaper. As an example, consider a hypothetical string that costs $1,000 to buy and bury. If it supports one subscriber (me or you) that's $1,000/subscriber. If it supports 100 subscribers, that's $10/subscriber. And if it supports 10,000 subscribers, that's $0.10/subscriber. I have shown this per subscriber, although the more important calculation is recurring revenue stream per asset dollar. This will come in important later (part 3).

Next, please have a look at this table from the June 19th Bell Canada disclosure (note: I tried to upload it but couldn't figure out how to do it properly with Blogger).

Ignore the percentages for now. Notice however that the reported congestion declines over time, expect at the DSLAM. This would imply they have been investing in their core network at a rate above the growth in subscriber traffic. Consider what I say just above about costs of cans and strings and their subscriber base coverage. This is the easy part of network investment since there are no trenches to dig, usually, so it's not surprising to see them active there.

The DSLAM congestion is fairly steady over time, implying their investment is running at about the same pace as subscriber traffic growth. They don't say but we can safely assume the DSLAM congestion measurement is on the network side of the 'can', not on the subscriber side. Here's the network model diagram from the May 15th Bell Canada disclosure for your reference:

Since Bell Canada says the problem is at the DSLAM and their reported congestion figures support this, and to me it seems superficially credible, I will strictly focus on the DSLAM for the following discussion.

A modern DSLAM does many jobs, much of that having nothing whatever to do with subscriber data traffic. Let's briefly look at what a DSLAM really does and why this new type of network can came to be. This is worthwhile since there appears to be some misunderstanding of what it is (see here for one example of a truly poor description). Let's start at your residence.

A pair of twisted copper wires (string) runs from your DSL modem and analogue phones (cans) out the house, down the street, all the way to a DSLAM (can) typically located in a Bell Canada building. That building is known as a central office. There are lots of all types of cans in that climate-controlled environment, along with one end of many, many strings, and usually some Bell Canada employees keeping it all running smoothly. The DSLAM in particular might also be located elsewhere in the neighbourhood, such as in a secure vault or leased space in a commercial building.

Some historical perspective may be helpful at this point. Remotely located cans, often called remotes, predate DSL by many years. They were used to improve network economics. Imagine running all that copper wire from every home and business back to one of many central offices. That's a lot of wire and trenching or stringing along poles, and it is expensive. By terminating the wire at a can closer to the subscriber, money can be saved. Now only a few strings are needed from the can to the central office, removing all that copper for part of the run. In the early days those cans were simple multiplexors, with analogue baseband voice on one side and digital multiplexed carrier systems (T1 or T3) on the other side. There would be one channel (DS-0 at 64,000 bps) per subscriber, or more with ADPCM. Later, the remotes featured switching capability so that fewer DS-0 channels were needed, by only requiring a DS-0 while the subscriber used the phone.

Then came DSL. Now those remotes became obsolete, yet the economics of networking remained. A new type of remote was needed, one with subscriber-side ports that handled both frequency-multiplexed baseband analogue voice and DSL data. I will spare you the history of standards battles and the countless variants of DSL and DSLAM that came and went. Eventually things settled down and all the vendors' equipment now (mostly) gets along. You may remember a similar battle that raged more recently around Wi-Fi 802.11 and its many variants.

Okay, we're getting closer now; just a little more on the basics that some of you can continue to skip over.

The network side of the DSLAM supports copper and optical carrier systems, though these are primarily optical nowadays. The capacities look like this, based on a 64,000 bps DS-0 channel:
  • T1: 1.5 Mbps - 24 channels (or DS-1)
  • T3: 45 Mbps - 672 channels (28xT1 or DS-3)
  • OC-3: 155 Mbps - 2016 channels (3xT3)
  • OC-12: 622 Mbps - 8092 channels (12xT3 or 4xOC-3)
  • and so on
This bandwidth is split between PCM telephony and DSL data (there can be more than just these, but we'll keep it simple). The bandwidth dedicated to telephony cannot be used for data, and vice versa. They only mix when analogue telephony is replaced by VoIP over a common DSL or similar shared digital string to the home (that isn't how it is done today, and likely not for some years to come). The split between telephony and DSL could be right down the DS-0 level, but is more likely at the DS-1 or DS-3 channelized rates.

The carrier systems in today's Bell Canada network use ATM. Technically this isn't too significant despite being highlighted by Bell as an issue. When an OC-3, for example, utilizes ATM or Optical Ethernet (OE), it's still a 155 Mbps string, though the data and voice streams mix differently. This would alter the utilization of the available capacity, but frankly not by much; ATM and Ethernet have link-layer overheads that limit the payload. Ethernet when used in a point-to-point application like this can achieve 90% utilization of the available capacity (minus that overhead).

When Bell points out that adding an OE string now is a problem for them (see page 10 of their May 15th disclosure), what they mean is that years ago they made a large financial commitment to ATM and they don't want to continue investing in ATM equipment by adding more ATM ports on the DSLAMs and ATM switches due to increasing DSL traffic demand. However they can't easily add in OE ports if their present DSLAMs don't support OE. This makes the expense of adding capacity to alleviate DSLAM-centred congestion a short-term financial burden. This is a real problem, not an invention. Whether you feel sympathy for their plight is purely subjective.

But let's get back to congestion. On the ATM connections from the DSLAMs to the edge ATM switches, they say they take 15 minute snapshots of utilization and congestion. They never define congestion sufficiently to independently assess their subsequent claims, though they imply it's due to buffer overflows during traffic bursts which are well-known to statistically occur with multi-sourced data traffic. For example, if the average data flow is 1 Mbps during a measurement interval, shorter-term fluctuations well above (and below!) this rate occur. Statistically, the probability of these events and their duration both decrease with higher burst rates.

To accommodate these bursts, average link utilization is kept below the maximum link capacity. This can be compensated somewhat with larger buffers, which are themselves an expense. For the subscriber, larger buffers means increased latency during bursts, and higher link utilization means higher rate of lost packets due to buffer overflows. There is no right answer - it's a matter of engineering the network to selected objectives. That is, forecast your load rate and decide how much traffic you are willing to see delayed or dropped. These numbers will always be non-zero.

For a given (average) subscriber traffic load at the time of day total traffic peaks, the lower the bandwidth of the DSLAM ATM or OE link, the lower the average utilization must be kept to achieve a fixed congestion objective. Traffic statistics determine this (if you don't believe me, you'll have to go learn some traffic engineering on your own). Bell Canada says as much (page 3 of their May 15th disclosure). Since I don't know their congestion objectives, I can't say if these number are correct, however they 'smell' right to me.

Okay, enough typing. If you got this far, thanks for sticking with me. I really can't say more about Bell's claims of congestion without them disclosing more detailed data. They won't do this willingly and the CRTC is very unlikely to order it. In my opinion there is a legitimate competitive risk, and I mostly agree with their witholding the data. So all I've accomplished after all this typing is to help readers understand the nature of the issue perhaps a little better than before.

Before I end let me mention a couple more things about DSLAMs that may be of interest. First, DSLAMs are not only deployed outside the central office to save the cost of running lots of wires the full distance from homes to the central office. DSL speeds are a struggle between James Clerk Maxwell's electromagnetic theory and advanced technology. You can't fight the laws of physics. Keeping the copper run as short as possible increases the achievable data rate. DSL utilizes frequencies up into the MHz range whereas analogue voice is only up to a few kHz. Second, Bell Canada uses DSLAMs to, in part, break its dependency on Nortel. When DSL was new Nortel produced DSL-capable equipment (Universal Edge) that plugged into the backplane of their telephony switches. However that would extend vendor lock-in which any customer, including Bell, prefers to avoid. Using DSLAMs with standard interfaces keeps the competitive pressures high, and equipment costs lower. This is smart business on the part of Bell Canada.

And just for the record, I am not now nor have I ever been employed by Bell Canada, nor have I accessed or used non-public information to do my analysis. Use this material at your own risk.

Back later with part 3 of this series where I'll talk a bit about business incentives for Bell to invest (or not) in increasing capacity to support DSL traffic.

No comments: