vSAN Home Lab

Tim Borland
3 min readOct 20, 2018

--

I’ll just use a cheap Samsung prosumer SSD… what could go wrong?

This was my thought when initially ordering the cache disks for my hybrid vSAN home lab. An SSD is an SSD… is an SSD… right?

Wrong.

Here’s the rundown on my hardware when I started:

3x Dell r420 LFF servers
* 2x E5–2440 6 Core CPU
* 64 GB DDR3 1600Mhz RAM
* 3x 600GB SAS15K HDD
* 1x 256GB Samsung 840 Pro SSD
* Mellanox Connect-X2 10 Gig-E
* LSI 9211–4i (Flashed to IT mode)

My hardware was capable enough to run a Ceph cluster and KVM prior to the switch to VMware. But after spending a night at the Pittsburgh Little Hack setting up vSAN on my new cluster, things weren’t adding up.

Who needs SAS when you have telegraph wires?

Hardly any IOPS, minimal throughput, and enough latency to traverse the transatlantic cable from New York to London… round trip… two to three times! (https://www.thebroadcastbridge.com/content/entry/3988/trans-atlantic-network-latency-reduced)

It has to be the network right? We can always blame the network!

Acceptable throughput.
Sub millisecond latency.

Definitely not the network. At this point, I decided to just deal with it another day.

You should replace those consumer SSDs with some enterprise ones like the Intel DC S3610’s I used for my lab — Ariel

So after the Little Hack I decided to take Ariel’s advice and buy new SSDs. Everything else seemed to be OK, and with prime shipping plus the ability to return them, there was no reason not to try it.

I’ll be the first to admit I was skeptical that this would resolve my issue, until I changed one drive…

Replacement of SSD and subsequent rebuild of disk group 1

And then another…

Replacement of SSD and subsequent rebuild of disk group 2

And the final SSD.

Replacement of SSD and subsequent rebuild of disk group 3

Throughout the entire process, my latency decreased each time I removed a node, replaced an SSD, and rebuilt the disk group. Ultimately, it took around 4 hours to rebuild approximately 500 GB of data after each of the disk group rebuilds.

What latency?

Ultimately, my vSAN cluster is now usable, latency lives only in my memory. Hopefully this helps someone else avoid a few hours of reconfiguration in their vSAN endeavors.

Simple advice like this is easy to come by at the Pittsburgh Little Hack, details can be found at capozza.io or by reaching out to Carl Capozza @carlcapozza or Ariel Sanchez @arielsanchezmor.

Oh, and thanks to Ariel for pointing me in the right direction!

--

--