Title2.png

The documentation is available for download in the following formats:

  • PDF – Adobe Reader
  • ePub – Available in various apps on iPhone, iPad, Android Sony Reader, or Windows Phone
  • Mobi – Kindle device or Kindle app
  • HTML– Internet browser

 

Welcome to what I believe is easily the best EUC high-scale solution and publication ever released. In terms of best-of-breed components, architecture, validation, and guidance, this one has got it all.

FlexPod Datacenter with Cisco UCS B-Series along with NetApp’s All-Flash FAS is the underlying building block for this solution that simplifies deployments while supporting outstanding density and proven mixes of XenApp and XenDesktop workloads. While FlexPod provides a cookie-cutter solution, this CVD demonstrates how a turnkey solution can also be quite scalable. Deployments share a common architecture, component design, configuration procedures, and management. At the same time configurations can scale and expand to accommodate greater user densities for hosted shared desktops (RDS) or hosted pooled virtual desktops (VDI).

The CVD describes a base 28-blade FlexPod with Cisco UCS B-Series configuration supporting 5,000 users (2,600 RDS, 1,200 Non-Persistent VDI, and 1,200 Persistent VDI users). Cisco UCS B200 M4 Blade Servers were added to the base configuration to support workload expansion and larger densities. All configurations followed a fault-tolerant N+1 design for infrastructure and RDS/VDI clusters. To size and validate workload combinations, we conducted single and multiple blade server scalability tests using Login VSI software. The complete FlexPod CVD documents the step-by-step procedures used to create the test environment and includes all of the test results, which are highlighted here.

  

Figure 1: Reference architecture components in the FlexPod EUC solution

Figure1_bluebk.png

Figure1b_bluebk.png

   

Key Solution Advantages

 

The CVD architecture offers significant benefits to enterprise business deployments:

 

  • Scalable desktop virtualization for the enterprise. Powerful Cisco UCS blade servers enable high user densities with the very best user experience. Twenty-eight blades provide support for 5,000 XenApp and XenDesktop users. The NetApp AFF8080EX-A features exceptionally low-latency flash storage over an end-to-end Ethernet fabric. 
  • Self-contained solution. The FlexPod architecture defines an entirely self-contained solution with the infrastructure needed to support a mix of up to 5,000 Citrix XenApp and XenDesktop users. The solution consumes one full size data center rack—the 4 blade server chassis requires 24RU, NetApp storage with shelves takes 10 RU, and the switches occupy 4 RU. The entire design fits in a single data center rack, conserving valuable data center rack space and simplifying deployments.
  • Fault-tolerant design. The architecture defines redundant infrastructure and workload VMs across multiple physical Cisco UCS blade servers, optimizing availability to keep users productive.
  • Easy to deploy and manage. UCS Manager can manage Cisco UCS servers in the FlexPod solution along with other Cisco UCS blade and rack servers in the management domain. Cisco UCS Performance Manager provides monitoring insights throughout the environment. Cisco UCS Central can extend management across Cisco UCS Manager domains and centralize management across multiple deployments.
  • Fully validated and proven solution. The CVD defines a reference architecture that has been tested under aggressive usage scenarios, including boot and login storms. Test requirements mandated that each configuration boot within 15 minutes and complete logins in 48 minutes at the peak user densities tested. All test scenarios are repeated three times to ensure consistency and reliability.

 

Testing Methodology

To generate load in the environment, Login VSI 4.1.4 software from Login Consultants (www.loginvsi.com) was used to generate desktop connections, simulate application workloads, and track application responsiveness. In this testing, the Knowledge Worker official workload (via benchmark mode) was used to simulate office productivity tasks (Microsoft Office, Internet Explorer with Flash, printing, and PDF viewing) for a typical knowledge worker.

Login VSI records response times during workload operations. Increased latencies in response times indicated when the system configuration was saturated and had reached maximum user capacity. During the testing, comprehensive metrics were captured during the full virtual desktop lifecycle: desktop boot-up, user login and desktop acquisition (login/ramp-up), user workload execution (steady state), and user logoff. Performance monitoring scripts tracked resource consumption for infrastructure components.

Login VSI 4.1.4 features updated workloads that reflect more realistic user workload patterns. In addition, the analyzer functionality has changed—a new Login VSI index, VSImax v4.1, uses a new calculation algorithm and scale optimized for high densities, so that it is no longer necessary to launch more sessions than VSImax.

Each test run was started from a fresh state after restarting the blade servers. To begin the testing, we took all desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers initiated desktop sessions and began user logins (the login/ramp-up phase). Once all users were logged in, the steady state portion of the test began in which Login VSI executed the application workload.  

Test metrics were gathered from the hypervisor, virtual desktop, storage, and load generation software to assess the overall success of an individual test cycle. Each test cycle was not considered passing unless all of the planned test users completed the ramp-up and steady state phases within the required timeframes and all metrics were within permissible thresholds. Three test runs were conducted for each test case to confirm that the results were relatively consistent.

Seven different test cases were conducted:

  1. Testing single server scalability under a maximum recommended RDS load. The maximum recommended single server user density occurred when CPU utilization reached a maximum of 90-95%.
  2. Testing single server scalability under a maximum recommended VDI Non-Persistent load. The maximum recommended density occurred when processor utilization reached a maximum of 90-95%.
  3. Testing single server scalability under a maximum recommended VDI Persistent load. Again, the maximum recommended density occurred when processor utilization reached a maximum of 90-95%.
  4. Cluster testing multiple server scalability using a 12-blade server configuration under a RDS 2600-user workload.
  5. Cluster testing multiple server scalability using a 7-blade server configuration under a VDI Non-Persistent 1200-user workload.
  6. Cluster testing multiple server scalability using a 7-blade server configuration under a VDI Persistent 1200-user workload.
  7. Full scale testing multiple server scalability using a 28-blade server configuration under a mixed 5000-user workload. 

Test configurations

 

Figure 2 shows the VM configurations workload distribution for each of the seven test cases. In addition to RDS and VDI workload VMs, infrastructure servers were defined to host Cisco Nexus 1000V VSM, Cisco UCS Performance Manager, XenDesktop Delivery Controllers, Studio, StoreFront, Licensing, Director, Citrix Provisioning Services (PVS), SQL, Active Directory, DNS, DHCP, vCenter, and NetApp Virtual Storage Console.

Figure 2: Infrastructure and workload distribution.

Figure2_bluebk.png

Figure 3: VM configurations for the seven test cases.

Figure3_bluebk.png

 

For the cluster and full scale tests, multiple workload and infrastructure VMs were hosted across more than one physical blade. To validate the design as production ready, N+1 server high availability was incorporated. 

 

Table 1 shows the VM definitions used for the RDS and VDI workload servers. Different virtual CPU (vCPU) configurations was first tested, finding that the best performance was achieved when not overcommitting CPU resources. Optimal performance was observed when each RDS VM was configured with six vCPUs and 24GB RAM, and each VDI VM was configured with 2 vCPUs and 1.7GB RAM.

  Table 1: RDS and VDI VM configurations

Table1.png

This CVD used Citrix Provisioning Server 7.7. When planning a PVS deployment, design decisions need to be made regarding PVS vDisk and PVS write cache placement. In this CVD, the PVS write cache was located on RAM with overflow to disk (NFS storage volumes), reducing the number of IOPS to storage. PVS vDisks were hosted using CIFS/SMB3 via NetApp, allowing the same vDisk to be shared among multiple PVS servers while providing resilience in the event of storage node failover.

 

Main Findings

Figure 3 summarizes the seven test cases and the maximum RDS and VDI user densities achieved in each.  The first two test cases examined single server scalability for RDS and VDI respectively, determining the recommended maximum density for each workload type on a Cisco UCS B200 M4 blade with dual Intel® E5-2680 v3 processors and 384GB of RAM. The other three tests analyzed the performance of mixed workloads on multiple blade servers. Multiple blade testing showed that the configurations could support mixed workload densities under simulated stress conditions (cold-start boot and simulated login storms).

 

Figure 4: Seven test cases were run to examine single server and multiple server scalability.

Figure4.png

Test Case 1: Single Server Scalability, RDS

 

We started by testing single server scalability for XenApp hosted shared desktop sessions (RDS) running the Login VSI 4.1.4 Knowledge Worker workload. A dedicated blade server ran eight VMs hosting Windows Server 2012 R2 with XenApp 7.7 sessions. This test determined that the recommended maximum density was 240 RDS sessions. The graphs below show the 240-user VSI results along with resource utilization metrics for the single server RDS Knowledge Worker workload. Note that all metrics, especially CPU utilization during steady state, did not exceed requirements at this capacity. The full CVD contains additional performance metrics, including performance metrics for various infrastructure VMs.

Figure 5: Single Server Scalability, XenApp 7.7 RDS, VSIMax v4.1 Density

Figure5.png

Figure 6: Single Server Scalability, XenApp 7.7 RDS, CPU Utilization

Figure6.png

Figure 7:  Single Server Scalability, XenApp 7.7 RDS, Memory Utilization

Figure7.png

Figure 8: Single Server Scalability, XenApp 7.7 RDS, Network Utilization

Figure8.png

Test Case 2: Single Server Scalability, VDI Non-Persistent

In the second test case, we tested single server scalability for XenDesktop VDI Non-Persistent running the Login VSI 4.1 Knowledge Worker workload. The recommended maximum density for a single blade server was 195 desktops hosting Microsoft Windows 7 (32-bit). The graphs below show the 195-seat VSI results for single server VDI along with resource utilization metrics. These metrics, especially CPU utilization during steady state, were well within permissible bounds.

Figure 9: Single Server Scalability, XenDesktop 7.7 VDI Non-Persistent, VSIMax v4.1 Density

Figure9.png

Figure 10: Single Server Scalability, XenDesktop 7.7 VDI Non-Persistent, CPU Utilization

Figure10.png

Figure 11: Single Server Scalability, XenDesktop 7.7 VDI Non-Persistent, Memory Utilization

Figure11.png

Figure 12: Single Server Scalability, XenDesktop 7.7 VDI Non-Persistent, Network Utilization

Figure12.png

 

Test Case 3: Single Server Scalability, VDI Persistent

In the third test case, we tested single server scalability for XenDesktop VDI Persistent running the Login VSI 4.1 Knowledge Worker workload. The recommended maximum density for a single blade server was 195 desktops hosting Microsoft Windows 7 (32-bit). The graphs below show the 195-seat VSI results for single server VDI along with resource utilization metrics. These metrics, especially CPU utilization during steady state, were well within permissible bounds. From the compute scalability perspective, both persistent and non-persistent workloads can achieve the same level of scalability with host CPU being the gating factor.

Figure 13: Single Server Scalability, XenDesktop 7.7 VDI Persistent, VSIMax v4.1 Density

Figure13.png

Figure 14: Single Server Scalability, XenDesktop 7.7 VDI Persistent, CPU Utilization

Figure14.png

Figure 15: Single Server Scalability, XenDesktop 7.7 VDI Persistent, Memory Utilization

Figure15.png

Figure 16: Single Server Scalability, XenDesktop 7.7 VDI Persistent, Network Utilization

Figure16.png

Test Case 4: Cluster Scalability, RDS

In this cluster test case, 12 blade servers were used to support a RDS workload in which N+1 infrastructure and workload servers were configured for fault tolerance. The 12-server configuration supported 2600 RDS sessions on Windows Server 2012 R2. The graphs below show the VSI results for the RDS cluster workload and total storage IOPS and latencies.

Figure 17: RDS Cluster Scalability, 2600 Users, VSIMax v4.1 Density

Figure17.png

Figure 18: RDS Cluster Scalability, 2600 Users, Total IOPS and Latency

Figure18.png

Test Case 5: Cluster Scalability, VDI Non-Persistent

In this cluster test case, 7 blade servers were used to support a VDI Non-Persistent workload in which N+1 infrastructure and workload servers were configured for fault tolerance. The 7-server configuration supported 1200 VDI sessions on Windows 7. The graphs below show the VSI results for the VDI Non-Persistent cluster workload and total storage IOPS and latencies.

Figure 19: VDI Non-Persistent Cluster Scalability, 1200 Users, VSIMax v4.1 Density

Figure19.png

Figure 20: VDI Non-Persistent Cluster Scalability, 1200 Users, Total IOPS and Latency

Figure20.png

Test Case 6: Cluster Scalability, VDI Persistent

In this cluster test case, 7 blade servers were used to support a VDI Persistent workload in which N+1 infrastructure and workload servers were configured for fault tolerance. The 7-server configuration supported 1200 VDI sessions on Windows 7. The graphs below show the VSI results for the VDI Persistent cluster workload and total storage IOPS and latencies.

Figure 21: VDI Persistent Cluster Scalability, 1200 Users, VSIMax v4.1 Density

Figure21.png

Figure 22: VDI Persistent Cluster Scalability, 1200 Users, Total IOPS and Latency

Figure22.png

Test Case 7: Full Scale Mixed Scalability

In this full-scale test case, 28 blade servers were used to support a RDS, VDI Non-Persistent, and VDI Persistent workload in which N+1 infrastructure and workload servers were configured for fault tolerance. The 28-server configuration supported 5000 mixed sessions. The graphs below show the VSI results for the mixed workload and total storage IOPS and latencies.

Figure 23: Full Scale Mixed Scalability, 5000 Users, VSIMax v4.1 Density

Figure23.png

Figure 24: Full Scale Mixed Scalability, 5000 Users, Total IOPS and Latency per Volume Type

Figure24.png

Figure 25: Full Scale Mixed Scalability, 5000 Users, Total Read and Write IOPS

Figure25.png

Desktop Virtualization at High Scale

The test results show how easily FlexPod EUC configurations can accommodate very high densities while maintaining top user experience, allowing deployments at the enterprise to support greater amounts of RDS and VDI workloads. The Cisco UCS B200 M4 blade servers offer high performance to support high RDS/VDI densities, while the NetApp all-flash storage system and 10GbE connectivity optimizes storage-related operations.

 

In the full scale test case, the NetApp storage easily handled load requirements with average read and write latencies less than 1ms. SSD drives in the AFF8080 significantly decrease response time latencies during the boot and login phases. During steady state, the storage experienced low IOPS for the RDS and VDI non-persistent datastores. PVS write cache was configured to use RAM with overflow to disk, which complimented the storage system by decreasing IOPS.

 

To see the full set of test results and learn more, you can access the full CVD here.

 

 

— Frank Anderson, Senior Solutions Architect, Cisco Systems, Inc. (@FrankCAnderson)