sabre critical disk space issue

Print Friendly, PDF & Email

I’ve been trying to get the network storage service to show me signs of being stable for using on the DB server for a while now without much luck. This has been down to an incompatibility with the fibre channel HBAs and the central fabric. I’ve purchased some kit to replace the cards but it’s going to be a while before it can be deemed stable. In the meantime the space usage of sabre has been knocking on 98% full and there’s no room to expand the virtual disk any further. So I think it’s got to the point where I need to do something drastic about it. As I see it there are 3 options.

1 – Migrate the sabre VM to another Xen box with more disk space spare.

We have oberon which was purchased to expand the ganeti cluster that will absorb all the apache instances coming off scarab. Titania has space for quite a few VMs for the time being. We could always add oberon to the ganeti cluster and migrate sabre there thus providing some data redundancy in the process. It will also free up all the “pizza boxes” (proteus, triton, nereid and galatea) for network storage development.

2 – Split the BOS db off onto Nereid. Nereid was purchased so that it can be the db service paired box for HA and load balancing. Due to the issues getting the network storage going it’s pretty much just been sitting there doing sweet FA. The advantage to this is approach is that all the other db’s will not be throttled by BOS’s performace hunger. The down side is that it kinda makes it more difficult to build the db redundancy later.

3 – We accept an offer from SRCT to “borrow” a Sun direct attached storage array. This might allow us to use a VERY highly performing I/O backend (I mean this will seriously fly). However, it might in it’s self pose some setup issues and the unit isn’t currently under warranty so we can’t call on help if should fail. So it could pose a bit of a risk. However, the unit boasts a split backplane with 2 raid controllers, redundant hot swappable components, the works, and SRCT have more spares than you can shake a stick at. So if something fails we can replace it.

I’m thinking option 1 is the safest as we’re using known technology and will be the quickest to get up and running. I’d say we have about another month before this becomes a problem.

The net effect of the move would be the same (if not quicker) than the move from old sabre. The migration will mean about 15-20 mins of downtime. I propose that I do the move on the morning of the 14th of July. Please, let me know if this is going to be a problem.

Edit – Adding 2 more suggestions (see comments).

4 – We move all existing services except BOS to a separate xen and leave BOS on the existing sabre. This would give the advantages of 1) with also the possible performance headroom of 2)

5 – Split and move to 2 separate xen guest instances on the ganeti cluster.