Emulex Blogs

Why do I need hardware offloads, I have CPUs to burn!

Posted March 7th, 2012 by Mark Jones

It wasn’t that long ago that enterprise x86 computing was performed on single processor cores of just a few megahertz (Mhz). Getting data in and out of the computer was an expensive consumer of the processing resources. If you were serious about I/O, it made perfect sense to consider buying one of those fancy Host Bus Adapters (HBAs) that offloaded the I/O protocol processing to specialized processors made just for that, saving the computer processor to perform other general compute functions. But since then, processor technology has marched forward at a tremendous pace, processing speed has increased from a few MHz up to ~3Ghz, which is now the practical limit due to power/thermal efficiency issues. Multithreading, multi-cores and increased processor cache have also been big news in computing to the point where we now can have a tremendous amount of compute power in a very small space in the data center.
Why do I need hardware offloads? I have CPUs to burn!
This week, Intel announced availability of its new Xeon E5-2600 processor family, the platform codenamed “Romley” has a top model of which will be offered by server manufacturers with 16 physical cores and whole menu of other great technologies to improve performance and efficiency. So with all this new compute power, you may be thinking: “Why do I need hardware offloads? I have CPUs to burn!”

Wikipedia is the first place to look to put water on that fire. Moore’s Law (1) is famous for predicting the long-term relationship of the growth of compute power, basically the doubling of processor performance every 18 months. Related to this is Wirth’s law, (2) which states that “software is getting slower more rapidly than hardware becomes faster” or Gate’s law “the speed of commercial software generally slows by 50% every 18 months.” So no matter how fast hardware gets, the data center will evolve to find a way to consume all its resources through software.

If you have worked in a data center during this technology march in recent years, you have noticed that this compute power is getting packed more densely, and it’s possible to get hundreds, if not a few thousand cores into a single rack. This has shifted the data center problem from performance capacity to power and cooling capacity. It’s not about how many servers can fit in a room, it’s about what the maximum power and cooling capacity of the room is. You do not have to look too closely at Intel’s Xeon Processor E5-2600 product announcement before you notice that much of what they promote are features that deliver performance at efficient power levels and features to lower power consumption when not needed. Turbo Boost Technology 2.0 raises CPU performance (increases power draw) only when needed and reduces it when not needed. We have noticed in the lab that these power efficiency features have significant effect on the servers’ power consumption as measured at the AC power cord. For instance, in our Implementer’s Lab, we have measured over a ~110 Watt swing at the power cord between a server at idle to highly loaded (~80% CPU usage).

Emulex HBAs and converged network adapters (CNAs) offload I/O with low power processors that are specifically designed to efficiently process I/O protocols in a far more efficient way than a generalized system processor, and are complementary to the new power/performance efficiency features of the Xeon E5-2600 product family. By offloading the protocol processing from the server operating software stack, we lower the CPU load significantly, which causes the CPU to use power-saving strategies that results in far lower system power usage.

An example of this is with a server running VMware ESX5i and comparing realistic virtual machine (VM) I/O workloads to storage devices over an Fibre Channel over Ethernet (FCoE) network. You have a choice of using software FCoE over a 10Gb Ethernet (10GbE) Network Interface card (NIC) or using an Emulex CNA which will offload the FCoE protocol processing. Our test used four VMs with an equal load to storage of 35k I/O transactions per VM. We measured both the CPU used on the hypervisor and the AC input power usage of the server and found that the server used 53% of the server overall CPU resource while running the I/O using the software FCoE and just 23% when using the offload CNA. Saving 30% of the servers’ CPU resources is significant enough to trigger the servers’ power-saving strategies to use less power and this showed up on the computers’ input power measurements. At idle with no I/O workload running, the server was drawing 110W. While running the I/O over software FCoE, the server was drawing 167 watts. When running over our CNAs with hardware FCoE, it measured 129 watts. The server used 37 less watts to perform at the same performance level, which is significant power savings that can add up over time or when applied throughout the data center.
Remember…it takes energy to cook!
So the next time you get a new super-fast server and you are tempted to burn some of its CPU cycles on running software FCoE or software iSCSI, remember…it takes energy to cook!

(1) George E. Moore 1965, periodically updated by Intel: http://en.wikipedia.org/wiki/Moore’s_law
(2) Nicholas Wirth, 1995: http://en.wikipedia.org/wiki/Wirth%27s_law