ABSTRACT
After writing my first paper on AT&T's approach to Unified Communications, I stumbled upon the Global Network Operation Center (GNOC). This is the continuation of the last paragraph of that paper, where I describe the GNOC and it's association with AT&T's Network Disaster Recovery (NDR) team, and their track record of responding to natural disasters and other unpredictable interruptions in service around the country.
This paper will address the BCP and DRP of AT&T, and their planned responses in the result of a total network outage, their plans for recovery, and the resources they have allocated in preparation of any such disaster. This paper will also attempt to address the importance of active or passive fault management, and see which is fiscally more important, devoting resources to preventing small outages, or reserving money for High Impact Low Probability Disasters, which could quite unexpectedly put any business out of business.
Everyone is familiar with the Big Data Super Giant AT&T. Beyond providing over 300 million people in the Americas with cellular service, AT&T is the foremost provider of Unified Communications for Fortune500 companies. Their Global Network Operations Center (GNOC) in Bedminster, New Jersey is among the most cutting edge in the world, with a 200+ wall of displays that would make NASA jealous. The heart of AT&T’s data, voice, and mobility networks, the GNOC in Bedminster serves as the lynchpin of their Unified Communications. From here they witness or control the ebb and flows of cyberspace, rerouting data to avoid major collisions, and querying servers and backbones to maintain their aggressive Service Level Agreements (SLA).
But what happens if the GNOC were to fall? Or any of their other Network Operations Centers that revolve around the GNOC in New Jersey? Do they have a plan? In 2004, AT&T asserted, (AT&T, 2004) “Industry studies over the past decade estimate that 80 to 90 percent of businesses without well-conceived disaster recovery plans go out of business in two to five years after a major disaster. Yet according to an AT&T commissioned study completed by Digital Research, Inc., in 2002, about 25 percent of companies do not have a disaster recovery plan in place, and almost 20 percent of those companies with plans have not tested them for five years.”
Every company is susceptible to disaster and failure, but many companies refuse to acknowledge that, and have failed to create a Business Continuity Plan (BCP) to implement in the instance of such an incident. Naturally, that is not the case with AT&T. Operating out of their GNOC, the Network Disaster Recovery (NDR) team, is AT&T’s Disaster Recovery Plan (DRP) deploying themselves on a national scale, to ensure that the extensive coverage of AT&T is not compromised in any way whatsoever, to maintain the aforementioned SLAs, and keeping America covered and in touch. In fact, Fmlink reports that (Fmlink, 2004) “Since 1990, the NDR team has been activated 12 times in response to disasters, including restoring service after south Florida’s devastating Hurricane Andrew in 1992, the Northridge, California earthquake in 1994, and tornadoes in Oklahoma in 1999. In 2001, the team mobilized to provide recovery services following the tragic attacks on the World Trade Center towers in New York.” Definitely an example of passive fault management, the DRP of the NDR reacts to High Impact, Low Probability disasters on a natural level, and they do that by sticking to the script. According to AT&T, these are the basics, (AT&T 2013) “AT&T’s Network Disaster Recovery plan has three primary goals: to route non-involved telecommunications traffic around an affected area, to give the affected area communications access to the rest of the world, [and] to recover communications service to a normal condition as quickly as possible through restoration and repair.” Obviously AT&T is committed to the preservation of network operations, as the first company certified by Homeland Security in Public Sector Preparedness Program, PS-Prep, AT&T has committed a large percentage of their resources to ensure their GNOC, and NOCs around the country, stay connected. As Georgia CEO reported in 2016, (Georgia CEO, 2016) “No one knows when the next tropical storm or hurricane will hit the coastline. But, AT&T is prepared with one of the nation’s largest and most advanced disaster programs. It invested more than $600 million in the Network Disaster Recovery program. And an arsenal of equipment is ready for deployment, including more than 300 technology and equipment trailers that can be quickly deployed, making it one of the nation’s largest and most advanced disaster programs.
AT&T conducts readiness drills and simulations year-round to keep networks and our people ready to respond at a moment’s notice. NDR will complete its 75th full-field recovery exercise this year. The AT&T Global Network Operations Center monitors our networks 24/7. Since forming in 1991, the NDR has responded to more than 70 events in the U.S.”
It seems as though their secret is sticking to the script. The arduous drilling and consistent planning, preparing, and upgrading has made AT&T the premier authority in Unified Communication, Central Data Management, with the most up to date NOC in the world, and the most thorough and exercised DRP recorded. All for good reason, the amount of information AT&T manages makes them the middlemen of the internet. According to govtech.com, (Towns, 2012) “They’re tracking the daily movement of nearly 30 petabytes of data — including 1.4 billion voice calls and 5 billion text messages — as it travels through nearly 1 million miles of fiber-optic cable, tens of thousands of cell sites, and countless switches and routers.
Although the GNOC is crucial for keeping vital communications running during high-profile disasters like Hurricane Irene or the Japan earthquake and tsunami, network managers spend 80 percent of their time fixing small problems before they snowball into something bigger.”
That, is the real kicker. At the heart of the DRP capitol of cyberspace, the epicenter of every network natural disaster, the company that has over 300 trailers on stand by, having invested over 600 million dollars in disaster response, and running almost 100 fire drills a year, spends over EIGHTY percent of their time on active fault management. This is the most telling statistic that active fault management is king, and any Network Operations Center should make preventative maintenance their primary function.
REFERENCES
[1] AT&T (2004, February 26). “AT&T Stages Network Disaster Recovery Drill in Seattle” Retrieved June 11, 2016, fromhttp://www.prnewswire.com/news-releases/att-stages-network-disaster-recovery-drill-in-seattle-71842842.html
[2] Fmlink (2004, March 5). “AT&T Stages Network Disaster Recovery Drill in Seattle” Retrieved June 11, 2016, from http://fmlink.com/articles/att-stages-network-disaster-recovery-drill-in-seattle/
[3] AT&T (2013, February 14). “Network Disaster Recovery” Retrieved June 11, 2016, fromhttp://www.corp.att.com/ndr/
[4] Georgia CEO (2016, June 3). “AT&T Prepared to Keep Customers Connected During Hurricane Season” Retrieved June 11, 2016, fromhttp://metroatlantaceo.com/news/2016/06/t-prepared-keep-customers-connected-during-hurricane-season/
[5] Towns, Steve (2012, January 31). “See How the World's Largest Telcom Company Manages its Network” Retrieved June 11, 2016, fromhttp://www.govtech.com/featured/Worlds-Largest-Telcom-Manages-Network-PHOTOSVIDEO.html