Unbricking a remote ESXI 6.0 Server with Intel 10GB NIC’s

I had a fun evening today,

I have an esxi 6.0 dedicated server hosted with OVH in their BHS datacenter. Recently, it has been having a few issues with SNMP stopping and refusing to start unless I rebooted the host itself. Not great.

I did some googling and the latestVMwaree update seemed to solve the issue. I don’t have this box in Vcenter, so it was down to the command line to apply the patch.

I applied the patch, rebooted and I couldn’t ping it, no access to it at all. Just my luck.

OVH have a great little thing they do, where if a server reboots and doesn’t ping – they send a dc tech to look at it, first call is another reboot. I had no idea this was a thing and started to get really confused when I could see my server rebooting in the OVh manager. After a while, it then gets put into a rescue boot. The equivalent of a ubuntu live cd. Didn’t really help me with figuring out what was wrong, though.

I hopped on to the #ovh channel in Freenode, which always has useful people around (helped me a year ago with a Pfsense pain in the ass setup) and got chatting, one of the guys in there works at OVH and got in touch with the techs so they didn’t run off and reboot my server for me, allowing me to boot into VMware and work out the issue.

He also told me that, to stop the tech’s getting the call to go bounce the server, disabling the monitoring in the OVH web manager will do the trick. A handy one to know.

With a booted server, I connected to the KVM and had a look, management IP was there so it hadn’t lost config, but it couldn’t access anything. I checked the network adapter status and saw “disconnected” on both NICs. It made me think it was a cable, yet, the rescue disk could reach outbound so I knew it couldn’t have been.

I then had a slight moment of genius. I remembered that the nic’s were 10gb/s and the ones presented to the hypervisor were only 1gb/s. I pondered for a moment and remembered, when I made my new Dvswitch there were 4 NICs. 2 were 1gb and 2 were 10gb. The 10GB ones weren’t showing in the hypervisor anymore.

Time to pop the thinking cap on.

I had a look at what PCI cards were installed on the system using the trusty lspci command. Inside the beast was an Intel x5520 Card. The only problem being that for some reason, the patch to ESXi wiped out the drivers. Just what I needed.

After a bit of googling, I found a driver for ESXI 6.0 [here][1]. You might need to sign in to download it. The next bit gets interesting.

Reboot back into the rescue image on the server and wget the zip file with the driver in. ( I used one of my apache2 boxes and used wget to clone it from there). Just save it in /root/ for now.

Then fun Fdisk -l to see what drives / partitions you have. Choose a partition that doesn’t have “unknown” as the format type. These are the vmfs datastores and can’t be mounted easily.

I chose sda5 of my install.

create a folder in /root/ called mount. leave it empty for now.

run the command

then

if you run ls, you will see some other files in there, these are crucial OS files and must not be touched.

Create a dir in here called drivers and move the zip file into it.

Reboot the server back into the ESXi os. Once booted, press alt+f1 to get to the command line. Once there, log in with the root credentials.

run the command :

this will then show you where that drivers.zip file is located after the OS has mounted that partition. In my case, it was within /vmfs/volumes/

unzip the folder and there will be a .vib file extracted. This is the one you need.

run

it will then say “installed successfully”

at this point reboot again and the NIC will show as before and traffic now flows again. Not a bad way of fixing it eh?

Just to help out, I had this issue with the following specs :

Server : Supermicro SuperServer

PCI Card : Intel(R) Ethernet Connection X552/X557-AT 10GBASE-T

ESX Version before upgrade 6.0.0

ESX Version after upgrade 6.0.0 update 2

Thanks to the guys at ovh for being so helpful! Really appreciated.