This was done on my HP P400 but the same method should work on a range of other cards with a similar setup such as the P212, P410 and probably many others from this era. As ever your mileage may vary!
Recently I was trying to copy one particularly large file across my network and realised the write speed on my server was dramatically reduced from what I would expect and the reported queue length in resource manager was regularly getting to 10 which seemed excessive. I was also seeing peak throughput to the 12 disk array of about 30 MB/s total.
I was copying the file off my desktop and this is where the weirdness started. I was sending from the desktop at 50MB/s which was slow but on investigation it checked out that that was correct for the drive it was coming from (though it did highlight I was still using one drive that was about 13 years old I should probably replace!). It would start fine and after a while I’d see the following:
Error 0x8007003B: An unexpected network error occurred
I’m still not entirely sure what process is going on in the background here but what I saw was increasing ram usage (presumably at the 20MB/s difference between the sending and writing processes but I didn’t work it out). I assume at some point this buffer hits a limit and it fails. Whatever the exact process this is just a symptom of the fault elsewhere as indicated by having such an awful write speed on the array.
I started investigating the array by actually installing the HP array configuration utility (I probably should have done this ages ago but I actually configured the array through the BIOS utility originally) and on selecting the controller sure enough :
Well that’s fairly definitive. So I decided to open it up and have a look at the damage. for the P400 the battery pack is on a long cable and is usually mounted to a bracket on the chassis next to the card. On the DL180 I have it installed into you have to remove three mounting screws and lift out the entire mounting frame and riser card section in one piece.
Here you can see the HP P400 card (made by LSI but as far as I’m aware with no LSI branded equivalent). Piggybacked on the card is the 512MB cache card and coming out of the right end of it is the multi-coloured cable to the battery. This can be gently pulled out so the battery can be removed.
The battery module as it was installed in my server, apparently someone had previously put it back in 180 degrees wrong but nevermind. Right away you can just about see the distortion in the casing caused by the battery failure.
A closer look at the battery pack shows how badly distorted the casing actually is indicating a total failure of the cells inside – not terribly surprising for a 13 year old battery! If you were doing this properly you’d just buy another one of this whole battery unit and replace it but I decided that for a raid card this old most second hand ones would likely be well used because I doubt they have produced new ones for a while so it’d be hit and miss anyway. Add to that the battery unit just uses four standard 1.2V NiMH cells (albeit uncommon shaped ones) I decided I could probably fix it with basic parts. The issue here was that I found this issue late on December 23rd so the challenge was to fix this with parts I could find locally before the Xmas break. Standby for the hackery…
I started trying to get into the pack on the basis it was ruined anyway. First off flip the pack over and unplug the cable from the battery pack. Technically this isn’t required nor is the board removal below but it makes it easier and takes the risk out when you’re working on the rest of it.
Removing the cable shows us the management board that handles charging status and health reporting to the card so we need to keep this. This is retained by a single clip in the recessed section on the left hand edge. Push the clip to the left and lift it, you may need to press it with a tool of some sort as it’s quite small and firm.
Once unclipped as above the board can just be lifted out because the battery contacts are just sprung against the underside.
Next up we need to try to get at the cells themselves. It’s a little hard to see but it turns out the underside is the ‘removable’ section of the case (as in the rest of it is a single moulded part) but mine was thin and brittle enough it just broke up as I tried to take it out but being the underside it’s not visible when reinstalled anyway.
I used one of these iPod cover pry tools I had handy but since the cover broke up anyway a fine screwdriver would also do. Gently work round the cover prying it up but the cover is silicone-ed to the centre of the cells as well so just work on it – it comes of relatively easily.
Now we can see the cells for the first time. You can probably see the how the battery contacts for the PCB form part of the pack. Gently work round the pack cutting/levering the silicone apart then lift the cells out from the curved (non PCB) end of the housing to prevent damage to the contacts. Hopefully If should look like this:
For anyone who wants to rebuilt their pack with the correct new cells they’re four Varta V500HT 1.2V/500mAh cells wired in series but you would have to find a good way to replace the cell links because they’re welded on. Also I could only find the cells on eBay where they were £7.50 each
So this is where my plan goes a bit more creative. I decided to just replace the the pack with some basic off the shelf NiMH cells so initially I tried to find a prebuilt pack of the right voltage. These are commonly available on eBay and several other places with a pair of wires coming out however as mentioned earlier this was now Xmas eve and so anything I ordered wouldn’t arrive for ages due to the break. I realised I had some AA NiMH cells which were also 1.2V so four of these would also be correct but I had no way of mounting them. After a brief search I found a local model shop who had suitable 4xAA holders in stock and I was the owner of this.
Well worth a couple quid. Next we have to try to make this connect to the management PCB. Now if we were going totally hacky I’d have just soldered the two wires off the battery pack directly onto the contact pads on the back of the PCB and put a big bit of heat shrink over the board but that seemed a step too dodgy…however tempting it was! What I actually did was cut the board contact pads from the pack with a section of the metal from the cell side to allow a wire to be soldered on. The contact pad can them be slotted back into the original housing.
In the picture above I slotted the PCB back in to check it made contact and to help hold the pads in place during soldering. Next I drilled a small hole into the rear of the housing to run the wires through from the other pack then soldered the wires on. The + and – are clearly marked on the housing and on the PCB so check this before soldering.
And I stuck a bit of hot glue on the contact points to stop them moving and a blob on the wire to keep it from straining the joints.
Now just put the AA’s in the case. If you have a battery case like this one, remember to flick the switch to ON!
Now you just plug the cable back into the PCB and install back into your computer. If you have the proper mount for it in your chassis it will just clip back in and not even be obvious the back has been removed. If you used a longer bit of black 2 core cable between the new pack and the old one you could make it even neater. Alternatively if you cut out the top of the housing you could probably fit a 2×2 AA housing sticking out through with the aid of some silicone but it’d still mount onto the standard bracket even if it would be pretty ugly. Take your pick!
All done, now put the server back together and fire it up…
Now on checking the HP Array configuration tool the message has changed to the above. I left it do its own devices and after a while this warning cleared. Once caching was enabled again the speed increase was dramatic. The write speed on the array for large files (>1Gb so larger than the cache) measured at 234 MB/s on the server Vs the 30 MB/s it was before. Smaller file writes which store to the cache are dramatically faster. On reattempting the large copy that kicked all this off it worked flawlessly first time. Monitoring resource manager on the server I could see that rather than a continuous write at 30 MB/s and high queue depths the write speed was periodically jumping much higher and waiting for more data in between, the queue depth was constantly at less than 1. I also didn’t see the increasing RAM usage I’d seen previously. It is also evident looking at the drive lights are mostly solid now with the occasional blink where they were constantly blinking.