Repairing A Failed P400 Raid Cache Battery

This was done on my HP P400 but the same method should work on a range of other cards with a similar setup such as the P212, P410 and probably many others from this era. As ever your mileage may vary!

Recently I was trying to copy one particularly large file across my network and realised the write speed on my server was dramatically reduced from what I would expect and the reported queue length in resource manager was regularly getting to 10 which seemed excessive. I was also seeing peak throughput to the 12 disk array of about 30 MB/s total.

I was copying the file off my desktop and this is where the weirdness started. I was sending from the desktop at 50MB/s which was slow but on investigation it checked out that that was correct for the drive it was coming from (though it did highlight I was still using one drive that was about 13 years old I should probably replace!). It would start fine and after a while I’d see the following:

Error 0x8007003B: An unexpected network error occurred

Error 0x8007003B: An unexpected network error occurred

I’m still not entirely sure what process is going on in the background here but what I saw was increasing ram usage (presumably at the 20MB/s difference between the sending and writing processes but I didn’t work it out). I assume at some point this buffer hits a limit and it fails. Whatever the exact process this is just a symptom of the fault elsewhere as indicated by having such an awful write speed on the array.

I started investigating the array by actually installing the HP array configuration utility (I probably should have done this ages ago but I actually configured the array through the BIOS utility originally) and on selecting the controller sure enough :

P400 Failed battery warning

Well that’s fairly definitive. So I decided to open it up and have a look at the damage. for the P400 the battery pack is on a long cable and is usually mounted to a bracket on the chassis next to the card. On the DL180 I have it installed into you have to remove three mounting screws and lift out the entire mounting frame and riser card section in one piece.

P400 card installed

Here you can see the HP P400 card (made by LSI but as far as I’m aware with no LSI branded equivalent). Piggybacked on the card is the 512MB cache card and coming out of the right end of it is the multi-coloured cable to the battery. This can be gently pulled out so the battery can be removed.

P400 position in DL180G6

The battery module as it was installed in my server, apparently someone had previously put it back in 180 degrees wrong but nevermind. Right away you can just about see the distortion in the casing caused by the battery failure.

P400 Cache Battery

A closer look at the battery pack shows how badly distorted the casing actually is indicating a total failure of the cells inside – not terribly surprising for a 13 year old battery! If you were doing this properly you’d just buy another one of this whole battery unit and replace it but I decided that for a raid card this old most second hand ones would likely be well used because I doubt they have produced new ones for a while so it’d be hit and miss anyway. Add to that the battery unit just uses four standard 1.2V NiMH cells (albeit uncommon shaped ones) I decided I could probably fix it with basic parts. The issue here was that I found this issue late on December 23rd so the challenge was to fix this with parts I could find locally before the Xmas break. Standby for the hackery…

I started trying to get into the pack on the basis it was ruined anyway. First off flip the pack over and unplug the cable from the battery pack. Technically this isn’t required nor is the board removal below but it makes it easier and takes the risk out when you’re working on the rest of it.

P400 Battery Cable connector
P400 Battery Cable connector removal

Removing the cable shows us the management board that handles charging status and health reporting to the card so we need to keep this. This is retained by a single clip in the recessed section on the left hand edge. Push the clip to the left and lift it, you may need to press it with a tool of some sort as it’s quite small and firm.

P400 Battery management PCB

Once unclipped as above the board can just be lifted out because the battery contacts are just sprung against the underside.

P400 Battery management PCB removed

Next up we need to try to get at the cells themselves. It’s a little hard to see but it turns out the underside is the ‘removable’ section of the case (as in the rest of it is a single moulded part) but mine was thin and brittle enough it just broke up as I tried to take it out but being the underside it’s not visible when reinstalled anyway.

P400 Battery enclosure cover removal

I used one of these iPod cover pry tools I had handy but since the cover broke up anyway a fine screwdriver would also do. Gently work round the cover prying it up but the cover is silicone-ed to the centre of the cells as well so just work on it – it comes of relatively easily.

P400 Battery cells exposed

Now we can see the cells for the first time. You can probably see the how the battery contacts for the PCB form part of the pack. Gently work round the pack cutting/levering the silicone apart then lift the cells out from the curved (non PCB) end of the housing to prevent damage to the contacts. Hopefully If should look like this:

P400 Battery cells removed

For anyone who wants to rebuilt their pack with the correct new cells they’re four Varta V500HT 1.2V/500mAh cells wired in series but you would have to find a good way to replace the cell links because they’re welded on. Also I could only find the cells on eBay where they were £7.50 each

So this is where my plan goes a bit more creative. I decided to just replace the the pack with some basic off the shelf NiMH cells so initially I tried to find a prebuilt pack of the right voltage. These are commonly available on eBay and several other places with a pair of wires coming out however as mentioned earlier this was now Xmas eve and so anything I ordered wouldn’t arrive for ages due to the break. I realised I had some AA NiMH cells which were also 1.2V so four of these would also be correct but I had no way of mounting them. After a brief search I found a local model shop who had suitable 4xAA holders in stock and I was the owner of this.

4xAA Battery  Case

Well worth a couple quid. Next we have to try to make this connect to the management PCB. Now if we were going totally hacky I’d have just soldered the two wires off the battery pack directly onto the contact pads on the back of the PCB and put a big bit of heat shrink over the board but that seemed a step too dodgy…however tempting it was! What I actually did was cut the board contact pads from the pack with a section of the metal from the cell side to allow a wire to be soldered on. The contact pad can them be slotted back into the original housing.

P400 Battery cell contacts reinstalled

In the picture above I slotted the PCB back in to check it made contact and to help hold the pads in place during soldering. Next I drilled a small hole into the rear of the housing to run the wires through from the other pack then soldered the wires on. The + and – are clearly marked on the housing and on the PCB so check this before soldering.

P400 Battery new pack connected

And I stuck a bit of hot glue on the contact points to stop them moving and a blob on the wire to keep it from straining the joints.

P400 Battery 3xAA complete

Now just put the AA’s in the case. If you have a battery case like this one, remember to flick the switch to ON!

Now you just plug the cable back into the PCB and install back into your computer. If you have the proper mount for it in your chassis it will just clip back in and not even be obvious the back has been removed. If you used a longer bit of black 2 core cable between the new pack and the old one you could make it even neater. Alternatively if you cut out the top of the housing you could probably fit a 2×2 AA housing sticking out through with the aid of some silicone but it’d still mount onto the standard bracket even if it would be pretty ugly. Take your pick!

P400 Battery with new 4xAA pack in the server

All done, now put the server back together and fire it up…

P400 Battery charging warning

Now on checking the HP Array configuration tool the message has changed to the above. I left it do its own devices and after a while this warning cleared. Once caching was enabled again the speed increase was dramatic. The write speed on the array for large files (>1Gb so larger than the cache) measured at 234 MB/s on the server Vs the 30 MB/s it was before. Smaller file writes which store to the cache are dramatically faster. On reattempting the large copy that kicked all this off it worked flawlessly first time. Monitoring resource manager on the server I could see that rather than a continuous write at 30 MB/s and high queue depths the write speed was periodically jumping much higher and waiting for more data in between, the queue depth was constantly at less than 1. I also didn’t see the increasing RAM usage I’d seen previously. It is also evident looking at the drive lights are mostly solid now with the occasional blink where they were constantly blinking.

Solving the DL180 G6 Fan Controller Problem

Some time ago I acquired a DL180-G6 rack mount server for very little money as these go for next to nothing on eBay in most configurations and one version has 12×3.5″ drive bays. Unfortunately there seems to be a bit of a problem with these systems where if you change any of the hardware (such as an unsupported raid card) or a whole variety of other things the on board fan controller defaults to high speed and it will sound like a jet engine. The fans are high flow high speed fans and so while they do work well they really scream at high speeds. Disconnecting the fans won’t help because if you do that the system will refuse to boot entirely.

The Fans

I decided that I needed to come up with a solution to this problem because well…how hard can it be?! Fundamentally these are standard 4 pin PWM type fans which have the following connections:

1 – GND
2 – +12V
3 – Tach
4 – PWM

Unfortunately the ones in the DL180G6 are a special case in that they have 6 pin connectors which follow a slightly different layout:

1 – +12V
2 – GND
3 – Blank
4 – Tach
5 – PWM
6 – Blank

The blank pins on the fan connector may seem a bit pointless at first but they’re like that because the motherboard support redundant pairs of fans in some configurations (I saw this on a Storageworks x1600g2 which shares the same main components). Due to this the motherboard needs a second tach (RPM) signal to monitor the second fan in the redundant pair and presumably due to the high power consumption of these fans a second +12V supply. This means you can either connect a single fan direct to the motherboard or a pair via the splitter which comes in the storage works (HP p/n  534358-001). The motherboard header looks like this:

1 – +12V-1
2 – GND
3 – +12V-2
4 – Tach – 1 (Inlet)
5 – PWM
6 – Tach – 2 (Outlet)

The good news is this means you only need one PWM per pair of fans in this configuration.

Arduino 25kHz PWM

At this point I decided I needed to build something I could fully control and so I decided to try to control it with an Arduino using the on board hardware PWM. This isn’t as easy as it sounds because PWM fans operate on a PWM control frequency of 25kHz which isn’t a standard configurable frequency on an Arduino IDE because it isn’t a direct division of the system clock. Thankfully there is a way round this. Timer 1 on the Arduino nano is a 16 bit timer and by directly manipulating the timer registers we can make the timer reset after a number of cycles so we can basically create a timer of any frequency we need. In this case  rather than being limited to prescalers (divide by 1,8,64,256,1024) from the main clock (16MHz) and 0-255 (0% – 100% duty) by using the 16 bit timer we can set a clock reset every 320 cycles which gives 25kHz at this clock frequency. I found some code which nicely did this on stack exchange and so just went with it.

https://arduino.stackexchange.com/questions/25609/set-pwm-frequency-to-25-khz

The clever bits are shown here:

    // Configure Timer 1 for PWM @ 25 kHz.
    TCCR1A = 0;           // undo the configuration done by...
    TCCR1B = 0;           // ...the Arduino core library
    TCNT1  = 0;           // reset timer
    TCCR1A = _BV(COM1A1)  // non-inverted PWM on ch. A
           | _BV(COM1B1)  // same on ch; B
           | _BV(WGM11);  // mode 10: ph. correct PWM, TOP = ICR1
    TCCR1B = _BV(WGM13)   // ditto
           | _BV(CS10);   // prescaler = 1
    ICR1   = 320;         // TOP = 320
void analogWrite25k(int pin, int value)
{
    switch (pin) {
        case 9:
            OCR1A = value;
            break;
        case 10:
            OCR1B = value;
            break;
        default:
            // no other pin will work
            break;
    }
}

The first bit sets up the timers to run at 25kHz as we need. The second bit is a special analogue write function to make sure we set the time right. We will use this function to update the timer value. We now have two PWM channels on pins 9 and 10 of our Arduino Nano.

Temperature Control

Next up I decided to add temperature control to these fans using a couple old DS18B20 temperature sensors I had lying about. These are good because multiple sensors can be daisy chained to a single data pin and the value doesn’t need scaling or anything unlike an analogue input pin and a thermistor. They cost a little more but for convenience they’re hard to argue with. They usually use +5V, Gnd and one signal pin. The signal pin needs a pullup resistor to work – this is normally specified as 4.7k. Technically these are a “1wire” device which can be run on parasitic power so the +5V isn’t actually required at the device end but if used in parasitic power mode you short it’s Vdd pin to ground. More information can be found here:

https://datasheets.maximintegrated.com/en/ds/DS18B20.pdf

There are specific Arduino libraries for using these devices so they’re really easy to use but they have one issue in this application. To do what I intended and have two separate fan zones we need to know which sensor is which. Thankfully included with the Onewire library is an example function called “DS18x20_Temperature” which will scan for devices on a specified pin. In the example this is called “ds” and it pin 10 by default but for my purposes this won’t work because we specifically need pin 9 and pin 10 for the two PWM outputs so this needs to be relocated somewhere else.

#include <OneWire.h>

// OneWire DS18S20, DS18B20, DS1822 Temperature Example
//
// http://www.pjrc.com/teensy/td_libs_OneWire.html
//
// The DallasTemperature library can do all this work for you!
// http://milesburton.com/Dallas_Temperature_Control_Library

OneWire ds(10); // on pin 10 (a 4.7K resistor is necessary)

void setup(void) {
Serial.begin(9600);
}

The code above isn’t the complete code for this example just an excerpt so you can see yours is the right one – use the one from the library.

Changing the pin to whatever you choose and running the example with serial monitor open you’ll get an output of the devices it finds. If you do this with one device connected at a time you can note down the device address for each. I marked them as #1 and #2 before hand to keep track of these later.

The Circuit

The next bit is bringing it all together electronically. We already looked at the Onewire sensors so these are really easy to connect up, I used two normal 3 pin PC fan connectors to allow these to be disconnectable for easy installation. The next bit is the fans. The PWM on fans are controlled using a pulse signal as described earlier but this isn’t a direct voltage control input. The fans need the PWM pin pulled to ground to control the signal but the output pins on the Arduino aren’t capable of absorbing the current required for direct connection so we need to add a transistor to do this for us. I went for a 2N7000 Mofset for each PWM channel because..well basically I had a bag full of them to hand and they’ll work at the output voltage of the Arduino. These will connect the fan PWM connection to ground when the Arduino output is driven high which unfortunately will reverse our PWM direction but that doesn’t matter because we can flip it back the right way in code. To correctly drive a a mosfet from an arduino you will need a gate drive resistor (to limit inrush to the Mosfet gate capacitance as this could damage the Arduino) and a gate pulldown resistor to make sure the mosfet fully turns off when the Arduino is no driving it. People will suggest all sorts of value for this but I went for a 220Ω for the gate drive and a 10kΩ for the pull down. The capacity of this Mosfet means we can use a single one for a large number of fans.

For simplicity I decided to use another of the eBay 6 pin fan connectors to connect up the cables for the fans. I intend to pass the power and tach signals directly to the motherboard so I’ll come up with a breakout cable to just isolate the PWM signal from the motherboard and allow us to feed in our own. Due to doing this we only actually need 4 pins (+12V, GND, PWM1 and PWM2) running up to the Arduino to control the fan but since I had them anyway why not!

Arduino Nano PWM Fan Breadboard

So this is the result the two connections for the DS18B20’s are on the right and the 6 pins for the fan/motherboard loom in the middle.

The Complete Code

This is where we start to bring everything together:

#include <DallasTemperature.h>
#include <OneWire.h>

// PWM output @ 25 kHz, only on pins 9 and 10.
// Output value should be between 0 and 320, inclusive.

int fanspeed1 = 25;
int fanspeed2 = 25;

float temp1;
float temp2;

#define onewirepin 3
OneWire oneWire(onewirepin);
DallasTemperature sensors(&oneWire);

DeviceAddress TempAdd1 = {0x28, 0x83, 0xC8, 0xE6, 0x3, 0x0, 0x0, 0x15};
DeviceAddress TempAdd2 = {0x28, 0x4A, 0xDE, 0xE6, 0x3, 0x0, 0x0, 0xFC};

void analogWrite25k(int pin, int value)
{
    switch (pin) {
        case 9:
            OCR1A = value;
            break;
        case 10:
            OCR1B = value;
            break;
        default:
            // no other pin will work
            break;
    }
}

void setup()
{

    Serial.begin(9600);    //Start diagnostic serial port
    
    // Configure Timer 1 for PWM @ 25 kHz.
    TCCR1A = 0;           // undo the configuration done by...
    TCCR1B = 0;           // ...the Arduino core library
    TCNT1  = 0;           // reset timer
    TCCR1A = _BV(COM1A1)  // non-inverted PWM on ch. A
           | _BV(COM1B1)  // same on ch; B
           | _BV(WGM11);  // mode 10: ph. correct PWM, TOP = ICR1
    TCCR1B = _BV(WGM13)   // ditto
           | _BV(CS10);   // prescaler = 1
    ICR1   = 320;         // TOP = 320

    // Set the PWM pins as output.
    pinMode( 9, OUTPUT);
    pinMode(10, OUTPUT);

    sensors.begin (); // Initialize the sensor and set resolution level
    sensors.setResolution(TempAdd1, 10);
    delay(1000);
    sensors.setResolution(TempAdd2, 10);
    delay(1000); 
}

void loop()
{

    sensors.requestTemperatures(); // Command all devices on bus to read temperature
    delay(1000);
    temp1 = sensors.getTempC(TempAdd1);             //Read Temp probe 1
    fanspeed1 = constrain(map((int)temp1,20,60,15,100),15,100);   
//Map fan speed for channel 1 to 25% at 30C to 100% at 60C and constrain the values to 25-100% range 
//(prevents fan from either going below 25% or trying to be set to above 100%)
    Serial.print ("Temp 1 is : ");
    Serial.print (temp1);
    Serial.print ("  Speed 1 is : ");
    Serial.println (fanspeed1);
    delay(1000);
    temp2 = sensors.getTempC(TempAdd2);            //Read Temp probe 2
    fanspeed2 = constrain(map((int)temp2,20,75,15,100),10,100);   
//Map fan speed to 25% at 30C to 100% at 75C and constrain the values to 25-100% range
    //(prevents fan from either going below 25% or trying to be set to above 100%)
    Serial.print ("Temp 2 is : ");
    Serial.print (temp2);
    Serial.print ("  Speed 2 is : ");
    Serial.println (fanspeed2);
    delay(1000);
  
    // Update Fan Speeds:
    analogWrite25k( 9, map(fanspeed1,0,100,320,0));         // Scale % fan speed from sensor one to actual PWM value for 25kHz output      
    analogWrite25k(10, map(fanspeed2,0,100,320,0));         // Scale % fan speed from sensor two to actual PWM value for 25kHz output   
    
   
}

Ok, it may not be perfect but it works.  The main bits to note are as follows:

1. The two “DeviceAddress” fields, these are the ID’s for our two DS18B20’s that we identified earlier.

2.  This line – fanspeed1 = constrain(map((int)temp1,20,60,15,100),15,100); which maps a temperature (converted to a float) between 20°C and 60°C to a fan speed of between 15% and 100% which is then constrained to this range so a temperature of 10°C doesn’t cause the fan to be given a 0% demand. You may want to change these constraints to minimise fan noise but I was being cautious and wanted to be sure the fans didn’t even slow down too far that anything overheated or triggered a BIOS warning. You can also tweak the temp vs speed mapping to optimise for a specific volume or temperature.

Map and ConstrainThere is another line like this for fanspeed2 so we can set different scaling ranges for the two sets of fans. You might only need a lower fan speed to maintain a temperature through the raid card area for example.

3. This line : analogWrite25k( 9, map(fanspeed1,0,100,320,0));
It maps our 0-100% fan speed to a value between 320-0 respectively so that we correct the signal inversion caused by using the N channel Mosfet.

The Wiring Loom

This bit is actually really simple but takes ages, buy or borrow a proper crimp tool for fan connectors a soldering iron and a good selection of heat shrink to cover everything up.

With this loom you can actually wire it up to a standard PWM fan controller off eBay or whatever if you just want a quick solution without all the soldering for the breadboard bit since it gives you power, ground and the PWM signals. Just be aware ONLY PWM fan controllers (the 4 pin ones). By doubling up the tach lines you could also take out several fans if that’s what you want so run the tach line from one fan to multiple motherboard headers if thats what you want. Here’s just how to intercept the right signal from the fan headers the way I did it:

DL180G6 Fan Wiring

Here’s what we end up with:

DL180g6 Fan Breakout

The Result

Finished Arduino DL180G6 Fan Controller
As mentioned earlier this will also work with the redundant fan systems because it passes through the tach signal of the second fan in the pair and a single PWM controls both. If doing this be careful, different servers seem to have different fan inputs configured depending on exactly what they are and you need to make sure you connect up to the right ones or the server will error on fan failure. This will probably work OK with other similar servers but can’t guarantee anything so  your mileage may vary. The other good news is that if PWM fans don’t get a PWM signal they default to 100%.

With the settings I have the fans will spin up to 100% briefly as the Arduino powers up but then spin down a few seconds later- this is normal behaviour for these fans. When the system is idling the fans seem to sit at the 15% most of the time occasionally going a little faster and now the power supply is by far the loudest part of the server so I probably wouldn’t bother setting a speed below 15% but the system is drastically quieter overall. You probably still won’t want it in your bedroom but you won’t be able to hear it everywhere in your building any more!

Other developments

When I get a bit more time I’m hoping to wire the Arduino USB connection up to an internal USB header so I can monitor the temperatures and even reprogram it when the server is running to fine tune the settings in operation. Technically its also possible to write a program to monitor CPU load and ramp the fan accordingly but this might just be too much effort.

Hopefully this will help those of you out there who have been having problems with these servers. Good luck