Emulex Blogs

High Performance I/O:Conquering the Data Center’s Highest Peaks

Posted February 14th, 2014 by Mike Jochimsen

In my previous blog, “Great Migrations in IT – Cloud, Big Data and the Race for Web-Scale IT.  It’s All About Business Agility!”- I talked about the efforts of enterprises to capture some of the cost and agility benefits, which are inherent in cloud service providers.  Today, I would like to focus on another topic – the need for high performance I/O in the data center and what Emulex is doing to conquer those high peaks (performance requirements) that confront us in some of the world’s largest enterprise data centers (and increasingly, in much smaller data centers too).

In the world of mountain climbing, there is a list of mountains known as the seven summits representing the peaks on each continent with the highest prominence.  There are very few people who have conquered all seven of these peaks; in fact, www.7summits.com lists 348 people as of the end of 2011 who were known to have accomplished this feat.

To scale these peaks, one must possess a mix of traits, including strength, agility, adaptability, stamina and determination.  These peaks, or the challenges represented by them, remind me of the challenges many data centers encounter to meet the ever-growing demands of their user population.  Explosive growth in data that is available to the organization, which must be analyzed to provide real time business intelligence, sounds like a Mount Everest-sized feat.  Increasing virtual density of servers needed to provide on-demand services to customers, while controlling both capital expenditure (CAPEX) and operational expenditures (OPEX), sounds like the effort it takes to scale Mount Aconcagua.  And don’t even get me started on the high performance I/O requirements it takes to scale the application performance needs of my customers on Mount McKinley. Okay, too far, but you get my meaning.

Recently, Emulex announced its new line of Ethernet Network Adapters and Converged Network Adapters (CNAs).  This new line of products, the OCe14000 family, is based on the Emulex Engine (XE) 100 series of I/O controllers.  A key capability of this controller is the ability to accelerate applications using Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE).

RoCE is a networking protocol and standard that directly addresses two key limitations of current compute and networking architectures, namely overhead created by data copies between user (application) and kernel memories and latencies introduced by the TCP/IP protocol.

RDMA is fundamentally an accelerated I/O delivery mechanism. It introduces the concept of “zero-copy” data placement, which allows specially designed RDMA (Network Interface Cards (NICs) on both ends of a transaction (also called an R-NIC) to transfer data directly from the user memory of the source server to the user memory of the destination server bypassing the operating system (OS) kernel.

Bypassing the kernel lets applications issue commands to the NIC without having to execute a kernel call. The RDMA request is issued from user space to the source (local) R-NIC and over the Ethernet network to the destination (remote) R-NIC without requiring any kernel involvement. This reduces the number of context switches between kernel space and user space, while handling network traffic.

Because the RDMA data transfer is performed by the DMA engine on the R-NIC, the CPU is not used for the memory movement, freeing it to perform other tasks, such as hosting more virtual workloads.

RDMA reduces latency and improves throughput by bypassing the host TCP/IP stack, relying instead on high performance InfiniBand (IB) protocols at Layer 3 (L3) and higher layers in combination with industry-standard Ethernet at the Link Layer (L2) and Physical Layer (L1). RoCE leverages Converged Ethernet, also known as Data Center Bridging or Converged Enhanced Ethernet as a lossless physical layer networking medium. RoCE can especially benefit from DCB Priority Flow Control (PFC) for lossless transmission, Enhanced Transmission Selection for class of services and 802.1Q Congestion Notification (QCN) for congestion avoidance. This bypass of the TCP/IP stack in favor of IB protocols implemented in adapter hardware is the fundamental differentiator, and improvement with RoCE, when compared with the older Internet Wide Area RDMA Protocol (iWARP).

A simplified architecture of RoCE architecture with zero-copy data placement is shown below.  The removal of the TCP/IP stack and a data copy step reduce overall latency to deliver accelerated applications performance.

As this feature is deployed in coming months, we plan to provide explicit support for protocols such as Microsoft Windows Server’s Server Message Block (SMB) 3.0 Direct and Linux Network File System (NFS).  These protocols represent use cases and applications in our customer base, which are ripe for high performance I/O capabilities.

With Microsoft Windows Server 2012 R2, Microsoft enabled Hyper-V live migration to leverage SMB Direct.  This allows the Hyper-V files to travel more efficiently to the destination server during the migration, effectively accelerating the act of migrating a virtual machine across the network. Due to the CPU efficiency of RDMA, this is accomplished without imposing a huge CPU tax on the server during the migration.  According to Microsoft, additional SMB Direct acceleration capabilities include:

File storage for virtualization (Hyper-V™ over SMB). Hyper-V can store virtual machine (VM) files, such as configuration, virtual hard disk (VHD) files, and snapshots, in file shares over the SMB 3.0 protocol. This can be used for both standalone file servers and clustered file servers that use Hyper-V together with shared file storage for the cluster.

Microsoft SQL Server over SMB. SQL Server can store user database files on SMB file shares. Currently, this is supported with SQL Server 2008 R2 for stand-alone SQL servers. Upcoming versions of SQL Server will add support for clustered SQL servers and system databases.

Traditional storage for end user data. The SMB 3.0 protocol provides enhancements to the Information Worker (or client) workloads. These enhancements include reducing the application latencies experienced by branch office users when accessing data over wide area networks (WAN) and protecting data from eavesdropping attacks.

As you can see, traditional Microsoft shops have plenty of opportunities to leverage the high performance I/O capabilities of an R-NIC using SMB Direct.  Truly a McKinley-esque need.

Similar to Microsoft’s SMB, NFS is used as a standard access protocol by many Network Attached Storage (NAS) appliances to provide remote file sharing to multiple diverse clients.  Performance is a primary consideration when large files are moved around networks, hence the need for high performance I/O via RDMA.

Increasingly, NAS and NFS are also being considered for relational databases, such as Oracle, which have traditionally been housed on Storage Area Network (SAN) connected arrays.  With native RDMA support, NFS becomes a stronger option for Oracle databases. Due to their mission-critical nature, databases demand high performance storage capabilities with low latency.  This need increases the pressure on the storage and I/O community to provide this robust and high performance capability.

With NFS, the client (e.g., the Oracle database server) controls the file system and the NAS device appears as local storage. Due to this, vendors have been able to optimize I/O for their specific needs, greatly enhancing performance.  However, the one area which cannot be optimized by the application is the network.

Enter Emulex and our high performance I/O.  By providing an enterprise-class solution which optimizes the balance between I/O operations per second (IOPS), throughput and latency, Emulex is building on our foundation of providing the highest performing, most reliable connectivity solutions for our customers.  The Emulex EngineTM (XE)100 controller, and the OCe14000 family of adapters which will be built using it, establish a base camp for conquering the highest peaks.

When combined with features such as storage protocol offload for FCoE and iSCSI and overlay networks offload for optimizing the efficiency of software-defined networking (SDN), the OCe14000 family of Ethernet Network Adapters will provide data centers with the tools necessary to scale their challenging peaks, much like mountain climbers use belay devices, crampons, and pitons to traverse dangerous terrain.