Update 10/2/2006: My issues were probably never RAM related at all. It may have been due to bad capacitors on the motherboard. The first generation of G5 iMacs were subject to a bad batch of capacitors that were prone to bulge and leak causing a while host issues including video and power outtages, and in my case kernel panics. To read more about the conclusion of this 6 month saga, click here.

If there’s one thing I really didn’t want to have to deal with last week, it was “computer issues”. As life usually goes though, when you really don’t want things to go wrong, they usually do. Generally “computer issues” means troubleshooting networking or Windows problems with one of our testy homemade PCs, but this time I was having problems with the G5 iMac. My wife and I originally bought the iMac back in December of 2004 to use as our main desktop computer. A few months after the purchase I switched jobs and started working from home on the Mac full time. Since we bought it, the machine has run flawlessly. I leave it on in the office 24/7 and generally only have to restart it when I install OS updates. This was a far cry from my days of working on Windows machines. I would consider myself lucky to go 3 days without having something freeze.

And that’s how things went. Day-in and day-out my Mac just ran. Until around the middle of last month. I came into the office, sat down and moved the mouse to wake up the computer. As the LCD faded in, there was a little gray box centered on the screen with my desktop and running applications shaded in the background like a Lightbox effect. In the box was a message (in several languages) that I needed to restart my computer. Having never seen this message before, I didn’t think much of it and promptly restarted the computer as the message said. As the computer started up, I heard the familiar OSX statup chime, which was followed by the display of the Apple logo and spinny widget, and then came the Gray Screen of Death:

Gray Screen of Death
Click image for detailed view.

At this point, I was starting to get worried. The only thing I had backed up recently was my iPhoto archive and the book I’m working on. I needed to get back into my computer. I jumped over onto the PC to learn that both of these errors were known as kernel panics. From the linked article:

A kernel panic is a type of error that occurs when the core (kernel) of an operating system receives an instruction in an unexpected format, or that it fails to handle properly. A kernel panic may also follow when the operating system is not able to recover from a different type of error. A kernel panic can be caused by damaged or incompatible software or, more rarely, damaged or incompatible hardware.

Great… I thought I had escaped OS crashes when I started working on the mac. I guess not. Everything I read about this problem points to software issue first, and then hardware issues. Luckly, there are a lot of great resources out there that explain how to troubleshoot mac issues. Here are a few of my favorite:

In running through the troubleshooting steps on those guides, I thought I had it figured out. Apple’s Disk Utility reported a few errors when I tried to repair the volume. Disk utility said that it repaired the issues and the S.M.A.R.T. diagnostic status said the drive was OK. I was able to reboot just fine and thought I was done. That evening though, the machine crashed again. So I went back to Google and picked up where I left off. Each time I was able to eventually get the machine running again only to have it crash within a day or two. Eventually, I found a thread in a forum that suggested that RAM might be the issue. In researching this new theory, I found the the best way to test the RAM on a Mac, isn’t the hardware checking software that comes with the computer, but a free little utility called memtest. When I tried to run memtest in single-user mode it went into a kernel panic within about 3 minutes of me issueing the start command. I then removed one of my two 512MB DIMMs and ran the test again overnight. The next morning, the test had run successfully. So, I restarted the computer with only the one DIMM and as of this morning it’s been running for 2 days straight…and I’ve been working on my book all weekend. I still need to run the test again with just the faulty DIMM to make sure it’s faulty, and run it once more with the good DIMM in the other slot to make sure that the slot isn’t bad. I have noticed that the computer is running a bit slower with only 512MB but I think it’ll do until I can finish the testing process and pick up a new DIMM from Crucial. Hopefully this solution resolves my “computer issues” for a while.