This study focused on measuring the EUC-based session density differences between the Intel Xeon E5-2680v3 CPU and the Intel Xeon E5-2680v4 CPU using Cisco UCS B200 M4 blade servers. Without wasting any time, the observable delta for both XenApp (Hosted Shared Desktops) and XenDesktop (Host Virtual Desktops) amounted to slightly higher than 20% without incurring an increase in power consumption.
Good News! Upgrading to the latest generation processors does not require the purchase of new servers. The Cisco UCS B200 M4 blades support both the Haswell and Broadwell architectures. Be sure to upgrade to the UCS firmware 3.1(1g) or later before making hardware changes.
The components used for the study comprised: Cisco® Unified Computing System (UCS) B200M4 blade servers, Cisco Nexus 9000 series switches, and NetApp® AFF8080EX-A storage system, VMware vSphere ESXi 6.0 Update 1, Citrix Provisioning Services 7.7, and Citrix XenApp and XenDesktop 7.7 software.
To generate load in the environment, Login VSI 4.1.4 software from Login Consultants (www.loginvsi.com) was used to generate desktop connections, simulate application workloads, and track application responsiveness. In this testing, the Knowledge Worker official workload (via benchmark mode) was used to simulate productivity tasks (Microsoft Office, Internet Explorer with HTML5 video, printing, and PDF viewing) for a typical knowledge worker.
Login VSI records response times during workload operations. Increased latencies in response times indicated when the system configuration was saturated and had reached maximum user capacity. During the testing, comprehensive metrics were captured during the full virtual desktop lifecycle: user login and desktop acquisition (login/ramp-up), user workload execution (steady state), and user logoff. Performance monitoring scripts tracked resource consumption for infrastructure components.
Each test run was started from a fresh state after restarting the blade servers. To begin the testing, we took all desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers initiated desktop sessions and began user logins (the login/ramp-up phase). Once all users were logged in, the steady state portion of the test began in which Login VSI executed the application workload.
Test cases conducted:
- Testing single server scalability under a maximum recommended RDS load. The maximum recommended single server user density occurred when CPU utilization reached a maximum of 90-95%.
- Testing single server scalability under a maximum recommended VDI Non-Persistent load. The maximum recommended single server user density occurred when CPU utilization reached a maximum of 90-95%.
The following test data summarizes the two test cases and the maximum RDS and VDI user densities achieved in each.
XenApp 7.7 (RDS) test case with Windows Server 2012 R2
I started by testing single server scalability for XenApp hosted shared desktop sessions (RDS) running the Login VSI 4.1.4 Knowledge Worker workload. A dedicated blade server ran eight VMs hosting Windows Server 2012 R2 with XenApp 7.7 sessions. This test determined that the recommended maximum density was 240 RDS sessions with the E5-2680v3 processors and 290 RDS sessions with E5-2680v4 processors. The graphs below show the VSI results along with resource utilization metrics for the single server RDS Knowledge Worker workload.
A twenty percent gain was measured from Haswell to Broadwell with only a marginal difference in power consumption.
Figure 1: Single Server Scalability, XenApp 7.7 RDS, CPU Utilization with 2680v4
Figure 2: Single Server Scalability, XenApp 7.7 RDS, Power Utilization with 2680v4
Figure 3: Single Server Scalability, XenApp 7.7 RDS, CPU Utilization with 2680v3
Figure 4: Single Server Scalability, XenApp 7.7 RDS, Power Utilization with 2680v3
Figure 5: Single Server Scalability, XenApp 7.7 RDS, VSI v4.1 Density & Response Times
Figure 6: Single Server Scalability, XenApp 7.7 RDS, VSI Comparison Chart
XenDesktop 7.7 (VDI) test case with Windows 7 32-bit SP1
The next testing single server scalability for XenDesktop hosted virtual desktop sessions (VDI) running the Login VSI 4.1.4 Knowledge Worker workload. A dedicated blade server ran VMs hosting Windows 7 with XenDesktop 7.7 sessions. This test determined that the recommended maximum density was 195 VDI sessions with the E5-2680v3 processors and 235 VDI sessions with E5-2680v4 processors. The graphs below show the VSI results along with resource utilization metrics for the single server VDI Knowledge Worker workload.
The user density increase from Haswell to Broadwell tracks similarly for the VDI workload as compared to RDS. And, much like RDS, VDI testing yielded a negligible difference in power consumption between the two processors.
Figure 7: Single Server Scalability, XenDesktop 7.7 VDI, CPU Utilization with 2680v4
Figure 8: Single Server Scalability, XenDesktop 7.7 VDI, Power Utilization with 2680v4
Figure 9: Single Server Scalability, XenDesktop 7.7 VDI, CPU Utilization with 2680v3
Figure 10: Single Server Scalability, XenDesktop 7.7 VDI, Power Utilization with 2680v3
Figure 11: Single Server Scalability, XenDesktop 7.7 VDI, VSI v4.1 Density & Response Times
Figure 12: Single Server Scalability, XenDesktop 7.7 VDI, VSI Comparison Chart
Intel E5-2600 v4 Processors
The powerful new Intel® Xeon® processor E5-2600 v4 product family offers versatility across diverse workloads. These processors are designed for architecting next-generation data centers running on, software defined infrastructure supercharged for efficiency, performance, and agile services delivery across cloud-native and traditional applications. They support workloads for cloud, high-performance computing, networking, and storage.
Broadwell is Intel's fifth generation of Core-series processor that defines the sort of power achievable by today’s CPU platform. How much more condensed is Broadwell? Intel Haswell uses 22 nanometer transistors whereas Broadwell's transistors use 14nm. The first Core processors back in 2006 had huge, by comparison, 65nm ones. A lot of progress has been made in those eight years.
A significant characteristic of Broadwell is that its chips are 30% more efficient than Haswell, using 30% less power while providing better performance at the same relative clock speed. Certainly, the v4 processors have a positive impact on VDI and RDS workloads with increasing user scalability without experiencing addition power consumption – as discovered in this study.
The following table provides a feature comparison between the two tested processors.
The test results show how the latest generation Intel CPUs can expand and flex, allowing deployments to grow and support greater RDS and VDI workloads. The Cisco UCS B200 M4 blade servers offer high performance to support extraordinary RDS/VDI densities while maintaining datacenter OpEx cost.
Look for v4 processors to be the new standard in the EUC CVD solutions moving forward.
— Frank Anderson, Senior Solutions Architect, Cisco Systems, Inc. (@FrankCAnderson)