Beginning with the Emulex 10.2 software release, all current Emulex OneConnect® OCe14000 adapters gained a new FREE feature called SMB Direct, which is a form of RDMA over Converged Ethernet (RoCE).
Remote Direct Memory Access, or RDMA, in computing is direct memory access from the memory of one computer into to another without involving any operating system. The figure below explains the direct memory access from another computer in a very simplified way.
RDMA introduces the concept of “zero copy” by allowing the network adapters to transfer the data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. This is extremely beneficial in large data centers processing massive data in clusters, by reducing CPU utilization, and improving latency and throughput.
With Microsoft Windows 2012 and Windows Server 2012 R2, Microsoft introduced an extension of Server Message Block (SMB) technology called “SMB Direct.” SMB Direct supports the use of network adapters that are RDMA capable.
The Emulex OneConnect OCe14000 adapter enables SMB Direct with RoCE, a protocol, which allows RDMA to run over an Ethernet network. SMB Direct improves performance of a file server with lower CPU utilization, increased throughput and lower latency. Below is a sample configuration, which was implemented and tested in our lab environment.
The implementation and configuration of SMB Direct with the Emulex OCe14000 adapter can be briefly split up into the following steps for Microsoft Windows 2012 and Windows 2012 R2:
- Set up the OCe14000 adapter with the proper firmware, driver, profile, VLAN and IP addresses on the host and client.
- Configure Cisco 5548p switches with Virtual LAN , priority groups, Quality of Service and Priority Flow Control (PFC) for RoCE traffic.
- Enable and verify Network Direct and set NetDirect MTU on the host and client.
- Create shares on the host and mount the share on the client.
- Validate and verify SMB connection by copying a file to the share.
A switch which supports PFC can be used for a larger configuration. However, if there are no switches available, you can use two hosts connected back-to-back for evaluating the Emulex OneConnect OCe14000 adapter.
For a complete step-by-step overview of the hardware, software, configuration and verification components needed to successfully deploy SMB Direct, please refer to the Implementer’s Lab under Application Note and select “How to Configure SMB Direct with OCe14000.”
VMware recently introduced a new driver model called native mode in vSphere 5.5. VMware ESXi 5.5 has two driver models, one called “vmklinux,” which is the legacy driver model and the second is the new “native” mode driver model. Moving forward, Emulex supports the native mode driver model for ESXi 5.5. Emulex Fibre Channel (FC) adapters for FC/ Fibre Channel over Ethernet (FCoE) storage protocol support the inbox native mode “lpfc” driver. The Emulex Ethernet (or Network Interface Card (NIC)) functionality has an inbox native mode driver called “elxnet.” The only driver as of this writing that supports legacy mode or vmklinux-based is the “be2iscsi” driver for iSCSI support.
Emulex OCe14000 family of Ethernet and Converged Network Adapters bring new levels of performance and efficiencyPosted July 8th, 2014 by Mark Jones
When we launched the OneConnect® OCe14000, our latest Ethernet and Converged Network Adapters (CNAs), we touched on a number of performance and data center efficiency claims that are significant enough to expand on. The design goals of the OCe14000 family of adapters was to take the next step beyond what we have already delivered with our three previous generations of adapters, by providing the performance, efficiency and scalability needs of the data center, Web-scale computing and evolving cloud networks. We believe we delivered on those goals and can claim some very innovative performance benchmarks.
UPDATE as of 10/21/14: We have made some VMQ updates and as a result posted the new 10.2.413.1 certified NIC driver on Emulex.com. The links below are for Emulex branded customers only. Please read the release notes carefully for important implementation details. Should you have any questions or need assistance contact Emulex tech support here.
Windows 2012 R2 page: http://www.emulex.com/downloads/emulex/drivers/windows/windows-server-2012-r2/drivers/
Below are the driver and firmware combinations that should be used for our OEM products supplied by HP and IBM. Please read and follow the specific instructions supplied by the OEM. Should you have any questions or need assistance contact the OEM technical support.
HP Customers: NIC driver 10.2.413.1, FW 10.2.431.2
IBM Customers: NIC driver 10.2.413.1, FW 10.2.377.24
UPDATE as of 9/9/14: For HP customers using Emulex 10GbE adapters, HP has made publicly available the latest code that addresses the VM disconnect issue when VMQ is enabled among other enhancements. The download portal is currently located here: http://ow.ly/Bi7Yt. Please read, understand and follow the update documentation provided by HP and contact HP tech support for further information. Thank you for your continued patience.
UPDATE as of 8/4/14: We are pleased to inform you that the July 2014 Special Release for Windows Server 2012 and Windows Server 2012 R2 CNA Ethernet Driver is now available for Emulex branded (non OEM) OCe111xx model adapters. Please refer to this link to download the driver kit and firmware. Please read and follow the special instructions within the Release Notes. For non-Emulex branded adapters, please contact Emulex Tech Support here.
UPDATE AS OF 7/23/14: Emulex is in the process of rolling out updated Microsoft Windows 2012 and 2012 R2 VMQ solutions for our customers. Testing of a Windows WHCK certified NIC driver update will be completed in 1-2 weeks. This initial “hotfix” will be for Emulex branded OCe11102 and OCe11101 products and will include a required firmware update. As testing completes on hotfix solutions for additional product configurations, notices and links will be posted on this blog. Thanks for your continued patience.
The Emulex OneConnect OCe14000 family of 10Gb and 40Gb Ethernet (10GbE and 40GbE) Network Adapters and Converged Network Adapters (CNAs) are the first of their kind to be designed and optimized for Virtual Network Fabrics (VNFs). Key to this claim is Emulex Virtual Network ExcelerationTM (VNeX) technology which, among other things, restores the hardware offloads that are normally lost because of the encapsulation that takes place with VNFs. For a VMware environment that is utilizing a Virtual Extensible LAN (VXLAN) VirtualWire interface, most Network Interface Cards (NICs) will see a significant reduction in throughput performance due to losing the NIC hardware offloads, and a loss of hypervisor CPU efficiency, due to it having to now perform much of the computation that the NIC otherwise would have done. The OneConnect OCe14000 adapters by default use VNeX to restore the offload processing in the hardware, thus providing non-virtual network levels of throughput and hypervisor CPU efficiency in VNF environments.
To prove this point, we setup a VXLAN working model using two VMware ESXi5.5 host hypervisors and configured a VXLAN network connection between them. Each server hosted eight RHEL6.3 guest virtual machines (VMs) with network access between the hypervisors using the VMware VirtualWire interface. As a network load generator, we used IXIA IxChariot to perform a network performance tests between the VMs. We compared two test cases, one with the hardware offloads enabled on the OCe14000 (this is the default behavior) and another to a NIC that does not utilize hardware offloads for VXLAN.
You can see in chart 1 that the bi-directional throughput with hardware offloads is as much as 70 percent greater when compare to a NIC without the hardware offloads.
In chart 2, you can see the impact that hardware offloads have on hypervisor CPU utilization, the OCe14000 adapter with VNeX that can process the offloads in hardware reduces CPU utilization as much as half when compared to standard NICs when used for VMware VirtualWire connections, increasing the number of VMs supported per server.
Why am I unable to add a VMware vSphere ESXi 4.x and 5.x host to Emulex OneCommand® Manager for Windows?Posted October 8th, 2013 by Alex Amaya
I recently talked to a few customers and OEM partners who are somewhat fairly new to Emulex OneCommand Manager for Windows. As of this writing, we are using Emulex OneCommand Manager for Windows version 6.3. An issue that keeps popping up is about adding a vSphere ESXi 5.1 hosts in order to manage our Emulex adapters. The process seems fairly straightforward, but at times with all of the management tools out there it can be confusing.
The issue I hear is, “how do I add a vSphere ESXi host to OneCommand Manager because it will not appear in the table of contents for hosts discovered or I get a host unreachable error?” These problems can be overcome if you use the correct login information on the ESXi host and/or use the correct namespace. We hope this blogs helps answers these questions.
When you start Emulex’s OneCommand Manager for Windows, it will show the hosts that were previously added as it is shown in Figure 1. To add the new host you select Discovery -> TCP/IP-> Add Host …
Figure 1 shows four hosts in the managed host section in the Emulex OneCommand Manager application.
If you leave the default values for both options “Add using default credentials” or “Add using specific CIM credentials” you will see a similar error message as the figure below stating the host is unreachable.
Figure 2. Host unreachable due to default and incorrect login information
In order to successfully add a VMware vSphere host to Emulex OneCommand Manager for Windows you need to know a few things:
- Protocol (http or https)
- Port (Default: 5988 or 5989)
- Host name or IP address of the host
- The root login name and password
The protocol to use will either be http or https for ESX hosts. For ESXi hosts the protocol will be https, since by default sfcb is disabled. The default port numbers used for http and https are 5988 and 5989. We will use https with port 5989 as the default. The root and password are the ones you entered during the initial install of VMware vSphere. Do not use the standard default root and password that shows up automatically when you try to discover the hosts.
As for namespaces, there are two namespaces you need to be aware of and it depends on the provider you use. You are either going to use the inbox or out-of-box provider. For example, If the latest Emulex CIM provider and VMware adapter driver are installed on the host you need to make sure to add the correct namespace for out-of-box provider. See Figure 3 which demonstrates the namespace for the out-of-box provider to use.
Figure 3. The red outline shows the fields which are important for adding the host to the table of contents besides the IP address or host name.
When you use the in-box driver with ESX/ESXi 4.x you would want to make sure the namespace field has the correct information. The table below from Emulex OneCommand Manager manual shows the namespace and recommended provider to use.
Table 1. Namespaces Used for Providers
If you need to know what provider is being used? Login to the vSphere host and check the name of the provider being used. If the provider begins with deb_ it’s the inbox provider. If its cross_ , then it will be the out-of-box provider.
The example below shows the how find the provider being used.
~ # esxupdate --vib-view query | grep emulex-cim-provider deb_emulex-cim-provider_410.2.0.32.1-207424 installed 2010-04-01T07:00:00+00:00 cross_emulex-cim provider_410.3.1.16.1235786 installed 2010-10-11T09:39:04.047082+00:00
Last, when the correct user name, password and namespace are used you will get a successful message window from OneCommand Manager stating the host has been added successfully as shown in Figure 5.
Figure 4 shows the new vSphere ESXi 5.1 host on the discovered host table of contents
Hopefully this clarifies the issue or challenge of adding a VMware vSphere ESXi host with Emulex OneCommand Manager for Windows. If you any questions please feel free to reach out to us at firstname.lastname@example.org
By Steve Perkins
A few months back, the question of “tuning” the Emulex FreeBSD driver came up and it took me back to the days when I would spend time “tuning” largely unroadworthy 1960’s and 1970’s British cars on weekends and evenings (I live in the UK so it seemed like a good idea at the time!). It was a belief that this was “performance tuning” but in reality, if the thing started without a push, it was a bonus. But it always felt like hours spent tweaking timings, gapping spark plugs and balancing Skinner Unions(SU) carburettors with a variety of tubes and tuning gadgets was worth all the time and blood lost. Network card driver tuning – what could be more fun?
If we look at the traditional customer base for Emulex products, it has been the sort of enterprise level data centres who usetraditional operating systems (OS) from the likes of Red Hat, SuSE, Microsoft, VMware and OEM UNIX derivatives. These “paid for” OSes (money up front and continuing support) have formed the backbone of our IT world and have been the focus of our driver development for Fibre Channel and Ethernet products.
But the IT world is changing and we are seeing new dynamic types of customer who are willing and able to take open source software to build new data centres for the world of big data and cloud solutions. One way Emulex has responded to this is to increase OS support outside of the “usual suspects” to embrace not only the community versions of Red Hat (Centos) but also Debian, Ubuntu and FreeBSD.
FreeBSD is an interesting OS that is often seen as a less showy alternative to the myriad of Linux distributions. Just getting its’ head down and getting on with the job, FreeBSD is quietly running everything from small home routers through TVs, switches, storage systems to data centres as well as being the basis for Apple’s OSX.
Emulex has Ethernet driver support for FreeBSD for our range of 10GbE OneConnect (OCe1110x) Network Interface Cards (NICs) available for download here. But do the drivers “just” work or do they “really” work? Have we kept the performance capabilities we have built into the OneConnect NICs for Linux, Windows, ESXi etc. for FreeBSD users, or are we just another NIC port? This was a question we’ve had a few times recently in EMEA from customers looking for FreeBSD support. Well, actually the conversation usually goes along the lines of:
Customer: We use FreeBSD,- do you have any drivers for your NICs?
Emulex: Yes we do, you can get them at ….
Customer: Are they any good?
Emulex: Yes, they’re good quality and dependable drivers.
Customer: No, really?
Emulex: Yes, we have embedded RISC CPUs which offload TCP/IP …
So fuelled with the customers’ healthy scepticism that we are simply paying lip service to FreeBSD and all they get is basic NIC support, we thought it would be useful to check out and document not only the installation and configuration of the Emulex OneConnect NICs, but demonstrate the sort of performance that could be achieved.
Setting up a test environment always risks an argument, as we are faced with an almost infinite world of network configurations. Are we looking for database performance in a data centre, setting up video streaming servers, running a cloud data centre – the list goes on. Accepting that we’re not going to build a model of the Internet in the humble lab of the Emulex UK office, we settled on the very simplest configuration of 2x servers connected back-to-back, so we could simply look at the raw capabilities of the OneConnect NICs with FreeBSD. Using the industry standard Netperf tool is a bit like using IOmeter for testing storage I/O. There is a view that it is irrelevant to a real-world application – just a trade show trick – but I believe that it is of real value to strip back the complexities of the whole ecosystem to the raw components of performance. If the basic connectivity is broken, Netperf (or IOmeter) very quickly shows that something is not set up correctly. For example, if you have plugged a NIC in to a PCIe x1 slot by mistake, it is just never going to give you full 10GbE line rate. Drive your car in first gear, your journey is going to be long and loud.
This type of testing is important as we needed to understand the baseline capabilities before analysing the wider network environment in the final system. We aimed to have a very simple and repeatable setup that could be quickly replicated as a starting point for broader system tuning.
We produced an application note on the whole exercise at www.ImplementersLab.com which goes into detail on the installation of the driver, how the tests were done and the results. We’ve even included the script we used to run the tests so we could hit “go” and come back from lunch to a nice set of results.
I have already alluded to the “humble” nature of the Emulex UK labs so the tests were run on fairly average level systems – nothing exotic levered from Intel’s back door here! Even so, we could easily get very close to line rate transfers without any great effort (see chart)
Interestingly enough, the default Maximum Transmission Unit (MTU) size of 1500 bytes, which showed very respectable performance compared to enabling jumbo (9K) frames under TCP tests, although streaming with UDP really got value from larger frames.
Performance tuning of a system is a subject that can generate long email trails, but we need to consider the difference between tuning driver parameters and the broader OS parameters that are typically more relevant to the final application. Fortunately, the Emulex driver defaults are optimised as a default to maximise the use of hardware offload within the NIC CPU. Although the OneConnect NICs are theoretically capable of full TCP Offload Engine (TOE), this has fallen out of favour, and so we use stateless TCP offloads to effectively grab subroutines required in TCP/IP and process them in hardware. This approach still gives offload performance but compared to a full TOE implementation, allows the OS and applications access to all layers of the stack without a rewrite of the OS kernel. These stateless offload functions, such as hardware checksum calculations, VLAN tagging, TCP Segmentation Offload are managed using the good old ifconfig command on the NIC. Running “ifconfig –m” on a NIC port will give you the capabilities and which ones are enabled. For example:
root@ELXUKBSD91:/root # ifconfig -m oce0
oce0: flags=8002<BROADCAST,MULTICAST> metric 0 mtu 1500
media: Ethernet autoselect
status: no carrier
This is the Emulex oce driver default configuration, and we can see that all hardware offload options are enabled. Apart from playing with the MTU, there is no point in changing things as all you are going to do is load up your CPU by moving hardware accelerated functions back on to the host system.
So that was easy. To the car analogy, these days my car just works. It is all computer controlled to run at optimum performance, the right idle speed, perfect timing, and it starts the first time! Somewhere in my garage, I have a collection of timing lights, Colortune plugs, tubes for balancing carbs etc. but evenings of frozen and skinned knuckles spent tinkering just to stand a chance of completing a journey are thankfully over. My car is German.
For more information on going beyond the basic performance validation of a system, there is some good background information around here which is based on work done by the developer of the Nginx web server and the FreeBSD wiki . This should all be considered “work in progress” as we all know that the world of IT never stops evolving, so in the field of performance, the raw capabilities will be the building blocks of understanding the whole system.
To complete the analogy (in the style of a blog!), getting from A to B in the fastest time is not about “does the car start” (without a push) and “can I keep most of the cylinders firing to complete a journey”. The car works as it should do just as the NIC runs predictably and reliably, but journey times are about how I choose the best route. I have GPS to find routes and live traffic data to guide me through the crowded road network and it is these tools that make the difference. Happy tinkering!
Some of these products may not be available in the U.S. Please contact your supplier for more information.
I recently attended the VMware Professional Community vBrownBag for Latin America when Larry Gonzalez (@virtualizecr) mentioned that I have been selected as a vExpert for 2013! The selections where announced on May 28, 2013.
And what is a vExpert? vExperts have demonstrated significant contributions to the virtualization community and a willingness to share their virtualization evangelism with others around the world. This could be in the form of books, blogs, online forums, and VMUGs; and privately with customers and VMware partners.
I am honored to have been selected for the first time for doing something I love to do and that’s to help contribute to an amazing and expanding VMware community. The VMware social media, and community team, including VMware’s John Troyer and Cory Romero selected 581 vExperts for 2013 and I wouldlike to congratulate all of the other vExperts selected!
The vExperts list for 2013 can be found here.
Blog Series Part 2: Can the global advance disk parameter Disk.DiskIOMaxSize make a difference with software or hardware FCoE adapters running large block I/O in VMware vSphere® 5.1?Posted May 29th, 2013 by Alex Amaya
This blog is the second in a two part series that examines Fibre Channel over Ethernet (FCoE) implementations with VMware vSphere 5.1 using VMware’s software FCoE and hardware FCoE adapter. These blogs are intended to share our findings regarding the relative performance of software and hardware FCoE adapters when working with large-block, sequential I/O – in particular, the impact of the Disk.DiskIOMaxSize setting on storage performance. Keep in mind that your results will be different, as not all environments are the same. Testing should be done to experience the behavior in your own lab environment.
In previous lab tests with software FCoE and a few virtual machines (VMs), we encountered an unexpected drop in throughput (MB/s) starting at around 64K block I/O. Once we made a change to Disk.DiskIOMaxSize, we were able to improve throughput; however, we continued to see poor latency response times.
As the second part of our experiment, we installed and tested a supported converged network adapter (CNA) featuring hardware FCoE (offload) using a single port. We left the default setting of 32727 in the advanced parameter settings. After running the tests, we looked at I/O operations per second (IOPS, throughput, CPU utilization and latency.
We first looked at the IOPS and throughput measurement. The chart below show a similar sloping curve in which IOPS are high with smaller block and high CPU utilization on the VM. Both software FCoE and hardware FCoE had similar slopes, but hardware FCoE produced more I/O operations with smaller block sizes. Both hardware and software FCoE offered similar IOPS performance for larger block sizes.
Figure 1 shows the results for hardware FCoE sequential I/O tests when we used the default setting for Disk.DiskMaxIOSize.
Figure 1. Hardware FCoE adapter I/Os with default Disk.DiskMaxIOSize setting
Next, we wanted to know if there was a difference in behavior for hardware FCoE versus software FCoE in terms of throughput, especially since this is where the software FCoE implementation had demonstrated some problems. We were pleased to find that hardware FCoE throughput results, for the VMs were close to line rate, at around 2300MB/sec starting at the 32K block tests and forward. The chart below shows a nice uprising slope where it flattens out at line rate. A single port with hardware FCoE sequential should give you about 1150MB/sec. This is where the adapter stops, but in our testing with 50/50 duplex mode, we were then able to reach line rate at 2300MB/sec, as the targets were able to handle the throughput and the VM was able to keep up with the block tests.
Figure 2 shows throughput during the same hardware FCoE adapter test and, in particular, the drop-off that occurred before with larger block sizes is not seen.
Figure 2. Results for hardware FCoE throughput with default Disk.DiskMaxIOSize setting
To be fair, we looked at the latency times using esxtop on the host to see if there might be a concern. We looked at device average read and write at the different block size results. Results are shown in Table 1, which provides average, rather than median values.
For good information on storage performance in vSphere, check out this VMware vSphere Blog.
Table 1. Average latency values with default setting
|Block size||DAVG read||DAVG write|
|256K||5 ms||6 ms|
|512K||11 ms||12 ms|
|1M||12 ms||13 ms|
In the test case, we see the latency times are much lower than with software FCoE. With hardware FCoE we did not need to change any parameters since the CNA yielded good results. In addition,according to VMware’s esxtop, the core CPU utilization, at the three block sizes tests, averaged around 3 percent..
Here are the key takeaways:
- Hardware FCoE with the correct driver and firmware can handle large I/O block requests, resulting in higher throughput and keeping device latency for both read and write at an acceptable range.
- With software FCoE, we needed to use Disk.DiskMaxIOSize to get past the throughput hurdle and it came with some latency challenges.
- With hardware FCoE, there was no need to change the parameter; the default setting, as VMware suggests, should already be tuned. Core CPU utilization as expected in large block size was averaging around 3 percent core utilization.
In conclusion, our testing was really to understand the difference between software and hardware FCoE adapters. We decided to do this test when we noticed a sudden drop in throughput at the larger block I/O size when using a software FCoE adapter. By the way, we performed the same test with a Microsoft Windows Server and experienced similar behavior. We found a few suggestions online from bloggers with an advanced parameter setting and checked to see if it would make a difference with software FCoE, which in our case it did. We wanted to compare it to a hardware FCoE adapter. The results that we’ve laid out in this blog series demonstrate the differences in throughput behavior for the adapters.
Overall, this testing should not keep your from your own internal tests, but encourage you to do them. The application and the storage array will have an impact on your results which is only part of the infrastructure.