Transforming or Disrupting Data Centers?

Artificial intelligence (AI) is creeping into nearly every aspect of our daily lives. For network architects and operators, AI can significantly reduce the time and expertise required to design, build, and manage the day-to-day operations of a network, including those specifically built within data centers and clouds. Everything from automation, optimization, security, and predictive analytics can now be accomplished with AI as long as the proper preparations are made in advance.In this report, you’ll learn about the evolution of AI-powered data center networks and the top AI features that can be used today. We’ll also describe how efficiency and performance benefits can be gained, along with some tips on avoiding potential AI pitfalls. Finally, we’ll look at a few use cases where AI-powered data center networks are used today.The Evolution from Traditional to Intelligent NetworkingTraditional data center networks have long been built and managed using manual processes. Each network component used to be configured individually using CLI and managed on a hop-by-hop basis. Over time and with the use of software-defined network orchestration platforms, however, data center network equipment is now often centrally managed, ensuring that network and security policies are uniform from one end of the fabric to the other. However, in most cases, the configurations themselves must still be manually created and pushed to network switches and routers by network operators.More recently, general IT automation tools like Red Hat Ansible and network orchestration solutions like Cisco ACI, HPE/Juniper Apstra, and VMware NSX Data Center have grown popular for managing data center fabrics. These tools allow for centralized automation through scripts and playbooks. However, a great deal of manual setup and configuration is required, which still requires a highly skilled network practitioner to manage.That leads us to our current evolution, where artificial intelligence is taking over much of the manual processes that have traditionally been performed by humans. Once AI learns about applications, data flows, and their priorities within the data center, AI can be used to automatically configure, adjust, and tune networks based on best-practice standards and using intent-based network (IBN) methodologies. This allows businesses to design, build, and operate complex data center network fabrics with more speed and precision and with less high-skilled technical talent.Core Benefits of Integrating AI into Data Center Build-outs and OperationsWhile there are numerous benefits gained when integrating AI into data center build-outs and operations depending on the use case, seven core benefits that almost anyone can realize are:Network design: With only a bit of manually entered information, including hardware asset tags, IP address schemes, and desired data center architecture (e.g., 3-Clos/5-Clos), artificial intelligence can create an auto-provisioned digital twin network design with the full configuration of the data center fabric underlay that is built to best-practice standards.Intent-based network configuration: Once a network underlay is built, AI can help NetOps teams simplify how their network policies and services are built. Using intent-based methodologies, operators specify the policies they need and where they are required on the network. With this information, AI automatically generates the required configuration commands and applies them to the network devices that require them. Not only does this save time, but it also eliminates human error and guarantees that policies are uniform from end to end.Traffic optimization: AI can analyze network device health and performance data from switch/router logs and telemetry data. After forming a baseline, AI can automatically adjust the quality of service (QoS), load balancing, data path, and other settings to improve critical application traffic flow.Predictive and Proactive Maintenance: The recent term of a “self-healing network” uses AI to use network device health information to predict when hardware and software components are causing sub-optimal performance. The orchestration platform can predict and notify NetOps teams of potential failures so they can be proactively attended to. Doing so can significantly reduce the likelihood of long-term outages.Granular security monitoring: AIOps is a term used to describe how AI can monitor and learn what is considered normal network traffic behavior. Once that behavior is established, the system can set thresholds that, when traffic behavior veers above or below normal traffic behavior, indicate and detect/auto-remediate identified security threats. For example, if a botnet were to infect a number of IoT systems within a data center and begin contacting a command-and-control server on the Internet, AI can detect this change in network communication behavior and quarantine the systems until the threat is resolved.Dynamic scalability: When data center workloads begin to overrun a fabric, causing congestion, AI can adjust routing decisions in real time to bypass the problem area and distribute traffic more evenly.Granular analytics and reporting: AI can be configured to monitor various usage trends, flow capacity, and network resource allocation. This information can then be displayed in easily understood reports, graphs, and charts. The system can also provide insightful suggestions on how to adjust or add additional resource capacity to the data center fabric for improved optimization.Key Components Powering AI-Driven Data Center NetworksIBN orchestration with integrated AI is a hardware and software-based orchestration system deployed within the data center or in a remote cloud. The platform can manage multiple data center network fabrics from a single centralized dashboard. The key components of an AI data center network are:Hardware – AI for data centers must process vast amounts of logging and telemetry data coming from network hardware such as routers, switches, load balancers, and firewalls. In order to analyze the data powerful CPUs, GPUs, IPUs must be used along with other AI accelerators that are used for parallel processing.Software – Required software includes various operating systems, libraries, and frameworks for deploying and managing AI applications.Data collection – Software that defines how to collect relevant data center logs/telemetry data and how/where that data should be stored.Training data – This data is collected from network devices and is organized in such a way that the AI software can analyze the data to make predictable and accurate decisions.Algorithms Training model(s) – The type of algorithms and training models used to analyze a data center fabric training data set. AI for data centers use customized training models that analyze network hardware health, performance, security, connect device visibility and other key health/performance indicators.Feedback loop – A feedback loop allows for external feedback from new data, updated training models, and user interactions to assist the artificial intelligence system improve its accuracy and readying it for any major network changes or expansion.Network Infrastructure Tailored for AI WorkloadsUnlike traditional applications or applications built for high-performance computing, AI workloads are few in number but massively large in size. Because of this, parallel processing must be performed. This is typically accomplished by using graphical processor units (GPUs). This means multiple traffic load distributions and East-West data flows will occur within the data center network as data moves in and out of large-scale GPU clusters. Additionally, specialized network optimization policies must be configured on network ports that connect to high-speed storage arrays.In many cases, specialized network protocols will be used to optimize AI workload transport across the network. These protocols support purpose-built performance features such as packet spraying, congestion mitigation, and the delivery of low-latency packets.How AI Helps Enterprises Build a Responsive EnvironmentIt takes time for artificial intelligence to learn a data center fabric and for network operators to fully trust the analysis, insights, and automated configuration responses. Eventually, however, the AI and human components will get past any major deficiencies and contentions and work cohesively. At this point, AI can best assist in building a fully responsive data center network environment with capabilities far beyond what has been possible before.According to the Uptime Institute, two-thirds of all infrastructure outages are due to human error. Breaking down the root cause of human fault, the most common reasons are:Failing to follow documented proceduresInaccurate documentation/proceduresData centers that were originally designed outside of best-practice standardsLack of proactively identifying hardware/software faultsInsufficient or overworked operations teams.Having artificial intelligence baked into data center network operations can significantly eliminate most, if not all, of these issues. AI can do so by providing the following:Making design and configuration changes that strictly follow best-practice standardsIdentifying and auto-remediating configuration errors or conflicts that may cause unforeseen problems network-wideAssurance that configuration changes align with business intentContinuous and highly granular auditing of network traffic performance and security.Enhancing Efficiency and PerformanceThe one area of data center networks that AI is perhaps most useful deals with network traffic efficiency and performance optimization. This is where predictive analytics and automated and dynamic network resource allocation comes into play. What used to take countless hours of manual analysis and trial and error can now be accomplished in real-time and without human intervention.Streamlining Management with AIAI combined with automation can greatly improve the efficiency of managing and optimizing data center networks. Ways that AI is often used to streamline network operations include:Elimination of repetitive tasks through intelligent analysis and automationProviding predictive analytics with actionable or automated insightsAutomated network provisioning that strictly aligns with best-practice standardsIntelligent and automated troubleshooting with root cause analysisSecurity threat detection with automated responses.Overcoming the Challenges of TransitionTransitioning from a manually controlled data center network with a host of disparate monitoring and automation tools to one that is controlled by AI can be a challenge in several ways. Here are two common examples of where AI transitions can stall if not properly planned for:Addressing Increased Power and Bandwidth Demands of Using AIArtificial intelligence that requires real-time analysis and automation requires significant parallel processing power to handle large and ever-growing sets of training data. As such, the amount of power required to handle CPUs, GPUs, TPUs, ASICs, and other accelerators is significantly higher than what was previously required for data center infrastructures. If AI is hosted within a private data center, careful consideration must be made to ensure that sufficient power is available.Additionally, AI must analyze vast amounts of training data in order to provide the level of intelligence required to operate a network. Training data sets are compiled and analyzed all at once, potentially creating a bottleneck that can disrupt business operations. AI architects must understand the level of bandwidth required and schedule the processing of AI workloads around business data flows.Navigating the Complexities of AI Network IntegrationTo achieve a successful integration of AI into your data center network, a number of steps should be taken to prepare for this transformation. They include:Defining clear objectives and outcomes – Pinpoint your initial AI operations goals.Understand what is required – Determine what is needed both from a hardware/power and technical expertise perspective.Know the limits of your existing network – Perform a network assessment to understand any limitations, bottlenecks, or areas of the data center fabric that need to be addressed prior to deploying AI.Address security compliance – Formulate what security measures must be in place and how AI can use training data to meet those goals.Define NetOps new roles – Formulate a plan around hot NetOps team roles that will change due to AI/automation and prepare the necessary training for them.Real-World Transformations and Case StudiesWhile AI-powered data center networks are still a relatively cutting-edge concept, a few leading-edge organizations have taken their fabrics to the next level. Let’s look at two examples of companies that have delivered successful integrations up to this point.Success Stories of AI Adoption in Network OperationsGoogle: As a leader in this space, it’s no surprise that Google is in charge of automating data center fabrics using AI. The company has deployed AI to analyze network system logs and telemetry to identify performance issues, perform root cause analysis, and conduct predictive analysis.Other aspects of the data center that Google is targeting with advanced AI include the management of power/energy consumption, evaluating security risk, and providing precise cooling to network and compute hardware.Netflix: Because Netflix relies on delivering real-time streaming video content to users around the world, the company deployed AI/ML to analyze network traffic patterns in real time and adjust network resources throughout its global data content delivery network (CDN) to better predict user behavior and deliver the appropriate level of streaming throughput. Doing so allows the company to “right-size” its CDN infrastructure needs without having to overbuild to meet fluctuations in global streaming activities.AI’s Role in Future-Proofing Data Centers Against Evolving DemandsThe best part about AI is that it is incredibly scalable and can easily adapt to technological advancements. As AI analysis improves, AI architects can improve on the type of data sent to AI in order to improve the accuracy of outcomes. Thus, the ability to future-proof data centers against any evolutions in demands can easily be handled if the ability to scale has been predetermined.

Transforming or Disrupting Data Centers?

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate

Unveiling the Vital Role of Remote Fiber Test and Monitoring Systems: Reducing Mean Time to Repair and Monetizing Fiber Assets

How We Share Knowledge as a Web Collective

Introducing Trio | Part I. A three part series on how we built a… | by Eli Hart | The Airbnb Tech Blog

Related articles

Mortgage Rates Could Fall Another Half Point Just from Market Normalization

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate

Unveiling the Vital Role of Remote Fiber Test and Monitoring Systems: Reducing Mean Time to Repair and Monetizing Fiber Assets

About

Latest news

Popular news