#Upgrading ONTAP simulator

1 messages · Page 1 of 1 (latest)

worthy oar
#

Hi,

Just to know is it possible to upgrade the ONTAP Simulator with the image available on the support or you need to re-install it from scratch to get the newer version ?
The ONTAP Simulator 9.16.1 has been released.

twilit frigate
#

interesting question. I know that during EAPs, you get special image_v_9161X..._sim.tgz packages that should work for exactly that... but upgrading with a "normal" image? I will try that 🙂

#

looks promising

twilit frigate
#

yep, works fine

#

note that I was using a HA simulator (via Sean's scripts), for a single-node or non-ha 2-node simulator you might need to do node image update but that should of course also work

#

I had a bit of trouble after the first giveback, as the node's ports were not in the correct BroadcastDomain and so the second takeover couldn't take place because the system couldn't move the LIFs away from the node. I am not sure if that was an issue that existed before already, or if it was due to the upgrade.

coarse nimbus
#

I think you are also able to update ONTAP Select with a "normal" image even though there is a version for Select... not sure what's the difference there...

worthy oar
# twilit frigate yep, works fine

We should make a section dedicated to test envs with the simulator. I've heard you even tried to setup a metrocluster with the simulator

#

What does sean's scripts do?

#

It's been almost a year I've started discovering the world of storage and clearly I have fun with NetApp. The simulator really helps me aswell

twilit frigate
# worthy oar What does sean's scripts do?

he made an Ansible script to set up complete lab environments automatically, including Linux, Windows, ESX and simulators. here is the relevant part of setting up an HA simulator (I adapted it into a PowerShell script since that's easier to run than Ansible IMHO).

#

I have not succeeded in deploying the simlator as MetroCluster yet (neither as ASA r2 which should theoretically also be possible)

#

I don't know if @fluid escarp has been working on those two things recently though, but I hope MetroCluster (and ASA r2) configurations are still on the table 🙂

worthy oar
worthy oar
twilit frigate
#

should work, as long as Proxmox supports shared SCSI disks between VMs (that is a requirement for HA)

worthy oar
#

I'll give it a shot !

fluid escarp
#

Shared SCSI disks is the tricky part. While it does technically allow it, unlike VMware it has no mechanism to support scsi3 persistent reservations on virtual disks.

#

So you need a block device on the host to pass through for each virtual disk. I was able to get this working on shared storage using LUNs on an external ontap system but I couldn’t scale that approach.

#

But back to the original question around upgrading the sim the things you may have to do are increasing ram, increasing the root volume size, and increasing the upload jail (system services web file-uploads config) going from memory but off in that general direction.

#

ASAr2 I am working on but since mroot is on the boot disk I need to scratch build a new boot disk.

fluid escarp
twilit frigate
#

yeah it was a bit tricky since I had to reverse-engineer the part of Ansible that sends keypresses to a running VM via USB, that was fun, especially since I wanted to make it more general and usable on a German keyboard layout, which involvd creating new character mappings etc.

#

also, adding and removing generic devices to a VM via the API is weird, having to fiddle with the VirtualDeviceConfigSpec and all that. But I learned a lot in the process 🙂

#

the crucial part (which I would have never figured out myself) was how the virtula HA interconnect works with the NIC ID and the MAC address set in the loader. I guess MetroCluster also needs a similar crucial (but undocumented) "trick" and then it will just work, but I didn't have any luck so far

#

I also liked how you did the cluster add-node from BSD to get the APIPA IP Address of the other node 🙂

#

but I'm thinking of ditching that and manually generating the cluster LIFs to get more control over the assigned IP addressse (for example I like 169.254.<node#>.<lif#> to make it look less random)

twilit frigate
fluid escarp
fluid escarp
twilit frigate
#

if you ever figure out how to setup a MetroCluster Simulator or ASA r2 let us know. I'd be really interested! Also if you know a way of simulating NVRAM on a vmdk, that would make the simulator much more stable (currently it often crashes during vmotion or small storage outages, often trashing its WAFL beyond all repair)

fluid escarp
#

Virtual NVram is backed by a VMDK but there are a few operating modes. The default mode is panic. In this mode it’s maintained in ram on both partners to protect you from a panic. If you want full nvram emulation then set it to full.

#

You take a perf hit because writes to the nvram disk are now in the write path. So you’ll want that on local nvme if you can. But it’s just a simulator so maybe it’s fine.

twilit frigate
rain shale
worthy oar
#

well ig i'll be stuck as you can't even have multiple scsi controllers, + it's limited to 30 disks

worthy oar
#

yup, too much limited with proxmox, sad

worthy oar
#

@fluid escarp Well thanks for that (Darkstar too for sharing at first the link), your vsim lab deployment works fine, only had some issues and couldn't create disks in thin mode (I don't know if it is restricted in v8)

fluid escarp
#

Thin mode needs an NFS datastore. If you’ve got a vmfs datastore it needs thick.

#

@worthy oar

fluid escarp
#

In the sim it’s one of the IDE virtual disks. In OtS it’s either the boot disk or the first virtual nvme disk. Depending on the config.

worthy oar
#

Ah ok ok !

worthy oar
fluid escarp
#

Yeah that seems to be present on proxmox but I haven’t found a working config.

#

And the 30 disk limit seems to be their version of 640k

worthy oar
#

I wasn't even able to make the node recognize disks. each time I was booting (putting all the same args as your labBuilder) I was getting a "0 disk found"

#

I think "native" qemu on linux has more machine types, but that means building qemu args by hand which is annoying x)

fluid escarp
#

PVE doesn't support the scsi controller vsim uses out of the box. Had to hack support for it back in. It was kludgy. I had some luck getting virtio working in a really recent ontap but then got stuck on the missing persistent reservations.

worthy oar
fluid escarp
#

fiddling with bootargs to get virtio drivers to load. I'll see if I kept good notes...

#

bootarg.vm.run_vmtools=false
bootarg.vm.virtio_load=true

#

bootarg.vm.kvm=true
bootarg.vm.qemu=true

#

not sure which are real or really needed. I'm way off the reservation trying to get it to work.

worthy oar
#

alright thanks !

#

I wonder where you can find all possible vars. listenv doesn't print everything I think ? Only declared env vars

fluid escarp
#

as far as I know no such list exists. 🤷‍♂️

worthy oar
#

aarg can't make it work (not trying pr for now)..
Tried that

-device virtio-scsi,id=scsi0
-drive if=none,driver=raw,id=disk0,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-0.raw
-device scsi-hd,drive=disk0,bus=scsi0.0,channel=0,scsi-id=0,lun=0
-drive if=none,driver=raw,id=disk1,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-1.raw
-device scsi-hd,drive=disk1,bus=scsi0.0,channel=0,scsi-id=0,lun=1
-drive if=none,driver=raw,id=disk2,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-2.raw
-device scsi-hd,drive=disk2,bus=scsi0.0,channel=0,scsi-id=0,lun=2
-drive if=none,driver=raw,id=disk3,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-3.raw
-device scsi-hd,drive=disk3,bus=scsi0.0,channel=0,scsi-id=0,lun=3
-drive if=none,driver=raw,id=disk4,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-4.raw
-device scsi-hd,drive=disk4,bus=scsi0.0,channel=0,scsi-id=0,lun=4
-drive if=none,driver=raw,id=disk5,cache=none,aio=io_uring,detect-zeroes=on,file.filename=/var/lib/vz/images/100/vsim-disk-5.raw
-device scsi-hd,drive=disk5,bus=scsi0.0,channel=0,scsi-id=0,lun=5
#

(for all disks the picture)

fluid escarp
#

without pr make sure bootarg.vm.ha=false or it will fail all the disks when it can't get a reservation

worthy oar
#

aaah

#

works better !

#

x) each time I get further, now disk partitioning takes ages, and I think I get some errors because it can't find disk ownership after disk partitioning. I'll find out why

worthy oar
#

Only first disk going fast (0b.0). the rest takes ages then. I've set up thick provisioning instead but doesn't change anything

fluid escarp
#

I think I did mine with whole disk, 44/4a

worthy oar
# fluid escarp I think I did mine with whole disk, 44/4a

yup well it goes further with that but I get then

Group name plex0/rg0 state NORMAL. 3 disks failed in the group.
Disk 0b.0 S/N [j2FGnqJot190I2Y-8NF0] UID [j2FGnqJot190I2Y-8NF0] error: cannot reach device, due to either loop or fabric error.
Disk 0b.1 S/N [j2FGnqJoO539Sydm8NF1] UID [j2FGnqJoO539Sydm8NF1] error: cannot reach device, due to either loop or fabric error.
Disk 0b.2 S/N [j2FGnqJoz7wQOI2u8cF2] UID [j2FGnqJoz7wQOI2u8cF2] error: cannot reach device, due to either loop or fabric error.

Will take some rest, kinda hard 🤣

fluid escarp
#

indeed 🙂

twilit frigate
fluid escarp
#

same!

worthy oar
#

Gave up x) can't get anything else than coredumps

twilit frigate
#

small update: I managed to get MetroCluster configured on a Simulator. Still not working very well (Switchover timeouts, fails, aggregates stay offline, etc.) but that might be due to the underlying disks being SAS and not (yet) SSD

#

@fluid escarp I keep getting messages kern.icl.sendspace too low, must be at least 16776252 and kern.icl.recvspace too low, must be at least 16776252... are those critical? is there a way (bootarg or similar) to increase those values?

fluid escarp
#

@twilit frigate Cool! I wonder if those are a function of system ram

twilit frigate
#

I used your config for the HA clusters as basis, so 8GB of RAM

#

I only added two more NICs, more disks and those 4 bootargs:

setenv bootarg.srm.virtual.adapter true
setenv bootarg.ip.if_conf "sw,e0e,e0f"
setenv bootarg.mcc.IP_mode true
setenv bootarg.mcc.is_enabled true

I'm not sure if the first one is required (what does that do in the first place?) but the others are definitely required. Setting up the MetroCluster still needs to be done manually after that (dr-group create, interface create, connection connect, aggr mirroring, etc)

fluid escarp
#

Icl appears to be iscsi common layer, something down in there is emitting the messages

twilit frigate
#

I guess it's not critical, I just want to figure out why I'm having trouble with the switchovers

fluid escarp
#

Could always try adding some ram. See if its better behaved on 16gb

#

And 4vcpu

twilit frigate
#

good idea

rain shale
#

hmmm was there not a requirement for MCC-IP to have at least a certain number of cores?

#

remember Niels talking about that

fluid escarp
#

I would be inclined to believe him 🙂

#

See icl_conn_start

twilit frigate
#

yeah, it's basically a sysctl tuneable (are they called sysctl on BSD too?) so you can probably set them in systemshell, but I wonder if there's a persistent way to do it via a bootloader arg or something

#

also I just realized that the different colors that you see in the boot log when the simulator boots are the different CPUs... with 4 CPUs, everything is so colorful 😄

twilit frigate
fluid escarp
#

Probably shouldn't matter but you're already further along than I was.

twilit frigate
#

yeah it looks much better now with 4CPUs and 16gig RAM

worthy oar
#

If anyone has a complete setup of arguments passed to qemu for proxmox + vscsi disks I'd be glad to test. Still getting core dumps KEKSad

fluid escarp
#

I started on an import script, I'll see if it's close enough to working to share.

worthy oar
#

thank you !

twilit frigate
#

@fluid escarp still no luck with MetroCluster switchover. Swithover basically works, but I keep losing one plex of my data aggregate (funnily enough the plex on the side where I'm switching over to), cl1 in my case. Event log is full of diskown.ownerReservationMismatch: disk 0b.1 (S/N 6000c29576b23d8362cf65ef523b6aa4) is supposed to be owned by this node but has a persistent reservation placed by node cl2n1 (ID 4055836668). messages for all the disks of that plex. storage disk show shows the disks as Container-Type:unkown. So either there's still an issue with VMware and SCSI reservations not working properly, or something else is going wrong

fluid escarp
#

Did you build a 4 node?

twilit frigate
#

so yeah, 4-node mcc

fluid escarp
#

You still got further than I did 😄

#

Disks shared by all 4 nodes, or only shared by the local partner?

twilit frigate
#

no, only locally between each HA pair

#

maybe I need to compare the environment to the one in Lab-on-Demand some more

fluid escarp
#

Did you ADP or whole disk?

twilit frigate
#

whole disk. but that's a good point, maybe I could try with ADP for a change

#

Now I'm stuck with node 2 having taken over node 1 while in switchover, and I can't do a HA giveback because "please do the heal phase for the aggregates first", and if I try to do that on node 2, it says "cannot do heal aggregates because you're in HA takeover, do a giveback first" 😄
fun times. But at least I got a bit further. Thanks for your help so far @fluid escarp

fluid escarp
#

I hope to find some time to replicate your setup, when q4 settles down.

wraith scroll
#

There's some MetroCluster Lab-on-Demand courses with Sims that might help you with some hints, e.g. I did a sysconfig on it:

wraith scroll
twilit frigate
worthy oar
placid kestrel
fluid escarp
#

For more convoluted setups, I went with ansible.

worthy oar