This is follow up to my previous post, which can be found here.
Disclaimer: following tests are done just to show how easy it is to do bogus performance testing or showcase false performance numbers and demonstrate Nutanix analytics capabilities to catch these unrealistic results. This does not represent in any shape or form normal performance of Nutanix . Nor does it imply that Nutanix is using these techniques while publishing performance numbers. This is NOT a true or realistic benchmark or should NOT be interpreted as one.
Once again I am withholding some key information about the configuration and workload characteristics. I’ve chosen to do this to comply with Nutanix EULA. This is also done to prevent anyone copying the tests and then running them on a competing product and then claiming that my box is faster than yours. Since this a bogus test, doing that would be silly, but you never know, there are plenty of crazy people to go around 🙂
Try to have massive amount of IOPS while having a next to impossible latency.
This test is dedicated to my manager who said something along the lines: “you couldn’t possibly get 100000 IOPS out of that box while having decent latency”. Well, I did 🙂
Screenshot from Prism VM summary, I/O as seen by Virtual Machines
I managed to get around 120000 IOPS with average latency of 0,04 ms. I could have gotten more, but got bored with cloning and launching testing machines.
While 120 kIOPS is fairly large number, it is nothing special. There are plenty of All Flash storage boxes that can easily do that number or more at decent latency.
Latency of 0,04 ms is just ridiculously low. As a comparison typical All Flash storage systems are designed to have an average latency of less than a millisecond = 1,00 ms. The box that I am using isn’t even All Flash, it is a hybrid with both SDD and HDD. By looking at these numbers, I could claim that this hybrid box has 96% better latency or has 25 times faster latency than average All Flash storage system. Well I am not claiming that, as the test is not realistic representation of any real world workload.
How did I achieve these numbers? Well if you read the previous post, you already have some idea.
- The workload is 100% small block random read on an empty file (lots of zeroes)
- To make sure that data fits into DRAM cache the working set was only 1GB
- All the data gets served from DRAM/CPU which has much better latency than SDD or HDD
How can I prove these points with Nutanix statistics?
Screenshot from Prism Hardware summary, I/O as seen by hardware
Close to zero I/O hits the disk while the virtual machines are pushing close 120 kIOPS. Metadata and configuration data are stored on disk, so there is some I/O related to those operations. Disk latency is higher than VM latency, other clear indication that the data is not served from SSD/HDD.
Let’s dig in further, screenshots below are from Stargate statistics web page:
Working set size from Stargate: Read just below 1GB, Write: 0 MB
What about Stargate read buckets?
Not a single I/O hit SSD or HDD. Only Cache DRAM or Estore Zero is used
We’ve already shown that there is no writes in “Active Working Set”, let’s just verity that with Stargate Write Destination statistics
Yep, no writes at all.
Try to generate plenty of big sequential I/O.
Screenshot from Prism VM summary, I/O as seen by Virtual Machines
Well I managed to get quite a lot, over 5,5 GBps which about more than ten times than with random workload test. 5,5 GBps is quite large number, but when you combine that with average latency 0,24 ms, it is obvious that something is not quite right. Naturally the latency is higher than with random tests as the I/O size was much higher and it takes more time to handle large IO, even if done at CPU or DRAM level. While latency is not typically important with sequential workload, it is still about four times better than with typical All Flash arrays (doing small random I/O, latency with similar load would probably be much higher). Once again this is a bogus test, not representing any workload found in real world.
Screenshot from Prism Hardware summary page, I/O as seen by hardware
Hardware statistics barely registers any traffic towards disks, there is always some metadata to be written/read, so there some traffic, but not much.
Let’s take one Vdisk as an example and get more info.
How to find if workload is random or sequential? Stargate web page
99% of workload is recognized by Stargate as sequential, so the recognition system is not perfect. I guess that recognition is based on some patterns and some of the content or access matched random pattern. I’ve blocked IO rates per virtual machine as it is not relevant information for my purposes.
How about working set?
The test is using most of the allocated 1GB, 940 MB is used, only by reads. No write activity.
How about read source?
No read I/Os are served from SSD or HDD, all access is from Cache Dram or Estore Zero, so the source of data is either CPU or DRAM.
How about writes?
No write operations show = 100% read workload.
It is very easy to generate bogus unrealistic performance figures with synthetic load generators, both random and sequential workload, either unintentionally or on purpose to claim a “hero” number. If it looks too good to be true, it probably isn’t true.
There are validated and audited test sets, like SPEC and SPC, that will give more reliable representation of storage system performance. However there are ways to “cheat” in these tests as well, one typical way is to build “Lab Queens”, hardware/software configurations that are built purely for testing purposes and no one in their right mind would use such configurations for any real work. Also these tests were designed to test traditional storage systems and may not work well to test Hyperconverged solutions where the architecture is quite different. Features like data locality might give distorted results with these tests.
If you plan to do your own testing, using real workload with real data would give the most realistic performance estimate. Since Nutanix scales performance in linear fashion, you can just take a small, well chosen, part of your workload for the tests and then multiply the results to get larger configurations. If real workloads are not an option, try to at least use data which is not random and full of blocks filled with zeroes. Also turning off data reduction or avoidance processes like deduplication or compression will help to get more reliable results, especially if the test data was generated with testing tools and is such that it will dedupe or compress well rather than landing directly on disk.
Thanks for reading