Artificial intelligence and its impact on our data center

It may help to discover new levels of efficiency, but the trade-off is a significant increase in bandwidth demand.

It will never allow filmmakers to introduce concepts that seem to be far from reality at the time, but these concepts are incorporated into our daily lives in time. In 1990, Arnold Schwarzenegger’s film Total Recall showed us “Johnny Cab”, an unmanned vehicle that can take them wherever they want. Now, most major car companies are investing millions of dollars to bring this technology to the public. And because of the return to the Future II, Marty McFly evaded the mob on the hoverboard, and our children are now hitting furniture (and each other), similar to what we saw in 1989.

Back in 1968 (some of us can still remember) When we introduced artificial intelligence (AI) with HAL 9000, the HAL 9000 was a perceptual computer on the Discovery One Spaceship in 2001: Space Roaming. HAL is capable of speech and facial recognition, natural language processing, lip reading, art appreciation, interpretation of emotional behavior, automatic reasoning, and of course Hollywood’s favorite computer skills, playing chess.

Fast forward to the past few years, you can very quickly determine that AI has become an integral part of our daily lives. You can ask your smartphone about the weather conditions of your next travel destination, the virtual assistant can play your favorite music, and your social media account will provide news updates and advertisements based on your personal preferences. In the absence of insulting technology companies, this is AI 101.

But more things happen in the background, and we don’t think it will help improve or even save lives. Language translation, news feeds, facial recognition, more accurate diagnosis of more complex diseases, and accelerated drug discovery are just some of the applications that companies develop and deploy AI. According to Gartner’s forecast, the commercial value of artificial intelligence derived by 2022 is expected to reach $3.9 trillion.

Thoughtful server

So how does AI affect the data center? Well, as early as 2014, Google deployed Deepmind AI (using machine learning, AI applications) in one of its facilities. result? They are able to consistently reduce energy used for cooling by 40%, which is equivalent to a 15% reduction in overall PUE overhead after considering electrical losses and other non-cooling inefficiencies. It also produced the lowest PUE ever in the site. Based on these significant savings, Google hopes to deploy the technology on other sites and advise other companies to do the same.

Facebook’s mission is to “make people empowered to build communities and connect the world more closely,” outlined in Facebook’s Applied Machine Learning White Paper: Data Center Infrastructure Perspective. It describes the hardware and software infrastructure that supports machine learning worldwide.

To give you an idea of how much computing power is required for AI and ML, Andrew Ng, chief scientist at Baidu Silicon Valley Labs, said that training Baidu’s Chinese speech recognition model requires not only 4TB of training data, but also 20 computer exaflops, or the entire training cycle. 20 billion times of learning operations.

But what about our data center infrastructure? How does AI affect the design and deployment of all the different sized and shaped facilities we want to build, lease or refresh to accommodate this innovation, cost-saving and life-saving technology?

ML can run on a single machine, but because of the sheer volume of data, it usually runs on multiple machines, all of which are interrelated to ensure continuous communication during training and data processing, low latency, and never interrupt service fingertips, screen or audio device. As a human being, our desire for more and more data drives the exponential growth in the amount of bandwidth needed to satisfy our simplest ideas.

These bandwidths need to be distributed across multiple facilities and across multiple facilities using more complex architectural designs, where the spine and leaf networks no longer cut it – we are talking about super-spine and super-leaf networks, providing highways for all complex algorithms Calculate the flow between different devices and eventually return to our receptors.

Technical deployment options for the data center

This is where fiber optics plays a key role in ensuring that your special (or stupid) moments of pictures or videos are broadcast to the world for viewing, sharing and commenting. Fiber has become the de facto transmission medium for our data center infrastructure, thanks to its high-speed and ultra-high-density capabilities compared to its copper cousin. As we move to higher network speeds, we’re introducing new complexity in hybrids – which technology to use?

Traditional Layer 3 networks use core, aggregation, and edge switching to connect to different servers in the data center, where inter-server traffic communicates with each other in the north and south through active devices. However, now, I am very grateful to AI and ML for the high computational requirements and interdependencies that games bring to the game. More of these networks are implemented using a 2-layer spine and leaf network, where servers are extremely low due to production and training network requirements. Delay, east-west direction.

Since the IEEE’s approval of 40G and 100G in 2010, there have been many competing proprietary solutions that have made the judgment of users who are unsure of which path to follow somewhat sullen. To explain, before 40G and others we have SR or short distance, multimode and LR, or long distance, single mode. Both use a pair of fibers to transmit signals between two devices. No matter which device you use or which transceiver is installed in the device, this is a simple data transaction through two fibers.

But the IEEE approved the solution in 40G and beyond, and its competitor’s brother changed the rules of the game. We are now studying two types of fibers using standard-approved or proprietary, non-interoperable WDM technology, as well as standard-acceptance or multi-source protocols for parallel optics using eight fibers (four transmissions and four fibers) ( MSA) and engineering technology reception) or 20 fibers (10 transmissions, 10 fiber receptions)

  • If you want to continue using standard-certified solutions and reduce optics costs, because you don’t need the distance capability of single-mode fiber, you can choose multi-mode parallel optics so you can break higher-speed 40 or 100G switches The port enters a smaller 10 or 25G server port. I will cover this in more detail in this article.
  • If you want to extend the life of your installed duplex fiber and don’t mind keeping in touch with your preferred hardware vendor without interoperability and without the need for longer distances, you can choose one of the multimode WDMs. solution.

Now I will tell you that most technology companies that deploy AI on a large scale are designing today’s and tomorrow’s networks… single-mode parallel optics. There are three simple reasons for this.

Cost and distance

The current market trend is that parallel optical solution is first developed and released, and WDM solutions are close to the next few years, so the number of parallels is much higher, which reduces manufacturing costs. They also support smaller distances than the 2 km and 10 km WDM solutions, so you don’t need too many complex components to cool the laser and multiplex and demultiplex the signals at both ends. Although we have seen the size and scale of these “super-large” facilities exploding into the size of 3-4 football fields on large campuses, our own data shows that the average deployment length of single-mode fiber has not exceeded 165 in these facilities. There is, therefore, no need to pay for more expensive WDM transceivers to drive the distances they do not need to support.

The parallel single mode also uses less power than WDM variants. As we have seen from Google’s previous examples of their power usage, any work that can reduce the single largest operating cost of the data center must be a good thing.


One of the main advantages of deploying parallel optics is the ability to use high-speed switch ports, such as 40G, and break it down into 4x10G server ports. Port breakthroughs provide tremendous economies of scale because breaking through low-speed ports can significantly reduce the number of chassis or rack-mount units for electronic devices from 3:1 (and data center real estate is not cheap) and use less power, which requires Less cooling can further reduce energy costs, and our data shows that this is equivalent to a 30% savings in a single-mode solution. The transceiver vendor also confirmed that a significant portion of all shipped parallel fiber transceivers was deployed to take advantage of this port branching feature.

Simple and clear migration

The technology roadmap for the major switch and transceiver vendors shows a very clear and simple migration path for customers deploying parallel optics. I mentioned that most technology companies follow this route, so when optics are available and migrated from 100G to 200 or 400G, their fiber infrastructure still exists and no upgrade is required. Companies that decide to use a duplex 2 fiber infrastructure may find themselves hoping to upgrade to more than 100G, but WDM optical systems may not be available within the timeframe of their migration plan.

Impact on data center design

From a connectivity perspective, these networks are highly meshed fiber infrastructures to ensure that no server has more than two network hops between each other. However, such bandwidth requirements are not sufficient even from the traditional 3:1 over-provisioning ratio of the spine switch to the vane switch and are more typically used for distributed computing from super-spinals between different data halls.

Due to the significant increase in switch IO speed, network operators are working hard to increase utilization, increase efficiency and ultra-low latency. We design their systems by using a 1:1 subscription rate from spine to leaf, which is an expensive but necessary Requirements. Today’s AI environment.

In addition, after Google recently announced the release of the latest artificial intelligence hardware, we have moved from the traditional data center design to another transformation, which is a custom ASIC called Tensor Processing Unit (TPU 3.0), in its huge crane In the cabin design, the function will be improved by eight times. More than 100 petaflops of TPU last year. However, adding more computing power to the silicon also increases the amount of energy that drives it, which increases heat, which is why the same announcement states that they are turning to liquid cooling to the chip because the heat generated by the TPU 3.0 has exceeded the previous data. Limitations of central cooling solutions.

The conclusion

Artificial intelligence is the next wave of business innovation. It brings operational cost savings, additional revenue streams, simplified customer interactions, and a more efficient, data-driven way of working that brings the benefits of being too appealing – not only to your CFOs and shareholders but also to your customers. A recent panel discussion confirmed this when the moderator talked about using ChatBots’ website and claimed that if the efficiency is not high and the customer’s attention is not enough, he will give up the conversation and the company will never accept his business again.

Therefore, we must accept this technology and apply it to our strengths, which also means thinking about the design and implementation of the data center in different ways. As the performance of the ASIC is significantly improved, we will eventually see an increase in IO speed and even push the connection deeper. Your data center needs ultra-efficient, high-fiber networks, ultra-low latency, east-west spine and leaf networks to accommodate your daily production traffic, while supporting ML training in parallel, and easy to bring me to wrap it up.

We’ve seen how major technology companies accept AI and how to deploy parallel single mode to help them achieve higher capital and operating costs than traditional duplex methods, which have promised to reduce costs from the start. However, the second day began to operate the data center and continue to grow, because the habits and communication methods of our individuals and professionals are constantly changing, increasing speed and increasing complexity. Now installing the right cabling infrastructure solution will give your business greater economic benefits from the start, retaining and attracting more customers, and enabling your facility to thrive, no matter what requirements.

Original artical source:

Leave a Reply

Your email address will not be published.

If you have any questions, please email us at or skype optcore-sfp