As a part of the Cisco Desktop Virtualization Solutions team, I spend a good portion of my time conducting performance tests to size EUC software configurations in conjunction with Cisco UCS hardware and products from our partners. For example, I recently released a large scale FlexPod Datacenter solution comprising of 5,000 seats of Citrix XenApp/XenDesktop on UCS B200 M4 blade servers and NetApp All-Flash storage. Learn more about that solution here.
Since Microsoft Office 2016 was released, I wondered how this new application stack might impact EUC user densities in comparison to densities under a Microsoft Office 2010 workload. To find out, I set up a series of performance tests using Citrix XenDesktop 7.7. I decided to combine testing objectives for determining the performance characterization of the Cisco UCS B200 M4 blade server running Windows 10 VDI workloads along with a Microsoft Office version comparison.
To eliminate as many variables in the environment as I could to make the comparison of Office 2010 and Office 2016 workloads as “vanilla” as possible. I configured the tests to use non-persistent desktops with Citrix Provisioning Services running Windows 10 and turned off Defender.
The user density results I saw for the various scenarios were certainly quantifiable. Microsoft Office 2016 proved to be more CPU-intensive than Microsoft Office 2010, resulting in a notable 9% decrease in user density. (Note that the systems under test had no GPU installed, so DirectX-enabled applications had to rely on the CPU for graphics rasterization.) After running tests to gauge host scalability and Login VSI results, I ran single session tests to collect additional performance data. The single session testing confirmed that overall CPU usage was higher for much of the Microsoft Office 2016 workload – other VDA metrics was analyzed as well. I also generated a video sequence that demonstrates side-by-side performance for the single session test, which helps to correlate the qualitative differences from the synthetic workload perspective with the quantitative performance differences of the VDA system. The video also helps visually illustrate the functional nature of the Login VSI 4.1 Knowledge Worker workload.
The components used for the study comprised: Cisco® Unified Computing System (UCS) B200M4 blade servers, Cisco Nexus 9000 series switches, and NetApp® AFF8080EX-A storage system, VMware vSphere ESXi 6.0 Update 1, Citrix Provisioning Services 7.7, and Citrix XenApp and XenDesktop 7.7 software.
To generate load in the environment, Login VSI 4.1.4 software from Login Consultants (www.loginvsi.com) was used to generate desktop connections, simulate application workloads, and track application responsiveness. In this testing, the Knowledge Worker official workload (via benchmark mode) was used to simulate productivity tasks (Microsoft Office, Internet Explorer with HTML5 video, printing, and PDF viewing) for a typical knowledge worker.
Login VSI records response times during workload operations. Increased latencies in response times indicated when the system configuration was saturated and had reached maximum user capacity. During the testing, comprehensive metrics were captured during the full virtual desktop lifecycle: user login and desktop acquisition (login/ramp-up), user workload execution (steady state), and user logoff. Performance monitoring scripts tracked resource consumption for infrastructure components.
Each test run was started from a fresh state after restarting the blade servers. To begin the testing, we took all desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers initiated desktop sessions and began user logins (the login/ramp-up phase). Once all users were logged in, the steady state portion of the test began in which Login VSI executed the application workload.
Four different test cases were conducted:
- Testing single server scalability under a maximum recommended VDI load using Windows 10 and Office 2016. The maximum recommended single server user density occurred when CPU utilization reached a maximum of 90-95%.
- Testing single server scalability under a VDI load using Windows 10 and Office 2010 with the same amount of sessions as the first test case. The objective to determine the host resource variance between Office 2010 and Office 2016.
- Testing a single desktop session using Windows 10 and Office 2016 for a granular level examination.
- Testing a single desktop session using Windows 10 and Office 2010 for a granular level examination.
- Multiple server scalability testing using a 28-blade server configuration under a 4320-user workload.
Single Server Testing
To produce the Login VSI graphs that show user densities, I processed the Login VSI logs using VSI Analyzer . With the Windows 10 and Office 2016 workload, the recommended maximum load was reached at 180 users. The same number of sessions tested against Windows 10 with Office 2010 results in marginally lower VSI operation response times. Under the same test conditions, the CPU host usage reports about 9% lower under an Office 2010 workload compared to an Office 2016 workload.
Login VSI results for 180x Windows 10 sessions with Microsoft Office 2016 workload
Login VSI results for 180x Windows 10 sessions with Microsoft Office 2010 workload
Login VSI results for two Windows 10 test runs with Microsoft Office 2016 compared to Office 2010
As LoginVSI launches virtual desktop sessions and starts to execute the defined application workload, the percentage of CPU usage rises, as shown in the graph below. As the number of desktops reaches the recommended maximum load threshold, CPU usage tops out for the Office 2016 test run until LoginVSI starts logging off the VMs. The graph shows how the test environment with an Office 2010 workload sustains the same user density while using 9% less host CPU resources (which roughly equates to an additional 15 sessions per server).
Single Session Testing
I really wanted to understand what was behind such a dramatic difference in user density, and identify what resources were limiting performance for the Office 2016 applications. To do this, I ran single session tests, recording PerfMon data for each of the Microsoft Office applications—Word, Outlook, PowerPoint, and Excel—as they executed in the Login VSI workload. By recording performance data for each application process, I was able to get a more comprehensive view of resource usage overall and by application.
The following table averages the processor usage over the entire 48-minute test run and the graphs below show percentages of processor time used for Office 2010 and Office 2016 applications, respectively. Although all Office 2016 applications are generally more CPU-intensive than their 2010 counterparts, some use noticeably more than others. In particular, Word (depicted in blue) results in the more intense and frequent occurrences in CPU use measuring over four times greater resource consumption. A distant second to Word is Excel (depicted in orange) with an observable difference of 61% higher CPU usage.
Although CPU explains the differences in session density, I wanted to make sure that there were other adequate system resources available to both sets of applications. The graph below charts memory use for the two test cases, showing that on average the tested applications used about the same amount of RAM. The following table averages the memory usage over the entire 48-minute test run.
The graphs below shows network bandwidth consumed. The Office 2016 test case consumed marginally higher network bandwidth. The following table averages the network usage over the entire 48-minute test run.
The graphs below show disk/storage usage. The Office 2016 test case consumed 11-17% more network bandwidth. The following table averages the disk usage over the entire 48-minute test run.
Multiple Server Scalability Testing
In this multi-server test case, 28 blade servers were used to support a Windows 10, Non-Persistent workload in which N+1 infrastructure and workload servers were configured for fault tolerance. The 28-server configuration supported 4,320 total sessions. The graphs below show the VSI results for the mixed workload and total storage IOPS, throughput, and latencies.
Workload distribution for 4320 seats of Windows 10 with Microsoft Office 2016
Login VSI results for 4320 seats of Windows 10 with Microsoft Office 2016
NetApp 8080EX-A storage usage for 4320 seats of Windows 10 with Microsoft Office 2016
Comparing the User Experience (Video Results)
I recorded a side-by-side video sequence of the single session tests of Office 2010 and Office 2016 to show qualitative application performance differences from a user experience perspective. In the video, I compressed a full single test run of 48 minutes into a short 8½-minute movie. While the difference doesn’t seem that significant at first, the impact is more far-reaching when you consider performance beyond the resource consumption of this single session. When sites try to scale EUC workloads based on Office 2016, they will need to take the greater CPU demand and the user densities differences into account.
Aside from the intended purposes of this content, the video also visually illustrates the VSI Knowledge Worker workload that is used across many EUC-based Cisco solutions.
Here's a second recorded video of Windows 10 and Office 2016 performance testing that illustrates CPU, memory, network, and storage usage from the viewpoint of a single session.
My testing showed a calculable performance difference between Microsoft Office 2010 and 2016 applications. The results here don’t particularly reflect the Citrix software used in the test rig, but instead represent the performance impact customers may experience in EUC environments with the Microsoft Office 2016 application stack. Another factor to consider is when applications that use DirectX (like Microsoft Office 2013/2016 and Internet Explorer 10/11/Edge) don’t detect the presence of a physical GPU, then they resort to using a software rasterizer, which consumes additional CPU cycles. For the purposes of this study, GPU hardware acceleration was disabled.
— Frank Anderson, Senior Solutions Architect, Cisco Systems, Inc. (@FrankCAnderson)