Windows appears to hate me

Short version: my PC is unstable when I run Microsoft Windows 8 Professional (64-bit), but the system is completely stable when I run Arch Linux off of another partition on the same SSD. This problem started out of the blue a few days ago, it does not correspond with any hardware or software modifications. The PC is a previous generation i5 with an AMD 7870 "MYST" edition, assembled from components (most of which were purchased new this spring), and all relevant firmware has been updated.

When things go bad, they really go bad. If music is playing, it loops. If the display is on, it stops updating and the mouse becomes unresponsive. Keyboard lights don't work when hitting caps / num lock. Interestingly, the HD activity light is solid. The system does not respond to ping. The only recourse is a power cycle / reset button.

Memory diagnostics are A-OK.

This is especially intriguing to me because it seems somewhat random. I've yet to have it actually die while playing a game; indeed, it seems to not occur while under heavy load.

I partially suspect there's some horrible issue with the SSD, a Crucial M4 256GB. This is speculation but based on a few observations:

1) It never happens in Linux; although on the same disk, since it is partitioned, perhaps there is some area of NAND that is defective which only the NTFS partition touches?

2) Usually when I have observed the system die during active use, it was performing I/O operations (Steam updating, downloading things, etc)

3) This drive shipped with a pretty legendary and hilarious bug which caused it to fail when a SMART counter overflowed (after 5000 hours of uptime). This was fixed in a firmware update which I have applied, but that causes me to generally distrust the hardware.

So... I'm somewhat at a loss. Windows doesn't report anything interesting in its event log, so it's difficult for me to guess what might be happening. Has anybody seen this particular behavior in Microsoft Windows? Does it even make sense that it might be the SSD? In my experience when storage goes bad on Linux, a system will often keep running (to some extent for some time), but in this case everything seems completely hosed immediately, which makes me concerned it's something else.

I partially suspect there's some horrible issue with the SSD, a Crucial M4 256GB.

That's a reasonable suspicion. I had one fail on me quite recently, and the symptoms were fairly similar.

First thing: BACK UP your data, you could lose everything. I was actually in the process of backing up when my drive just stopped working. Fortunately, I had the weekly system backups, so I didn't lose much, but I did lose a little.

Once you have a good backup, run a chkdsk on drive C, and see what it tells you.

Malor wrote:
I partially suspect there's some horrible issue with the SSD, a Crucial M4 256GB.

That's a reasonable suspicion. I had one fail on me quite recently, and the symptoms were fairly similar.

First thing: BACK UP your data, you could lose everything. I was actually in the process of backing up when my drive just stopped working. Fortunately, I had the weekly system backups, so I didn't lose much, but I did lose a little.

Once you have a good backup, run a chkdsk on drive C, and see what it tells you.

OK, thanks for the info. I'm glad it sounds at least like a possibility, and I'll operate under this assumption for now. I ordered a new drive and I'll see if it fixes the issue (if not, I think it can be returned).

Can I just dd the contents of the existing drive over to the new one, or will Windows be cranky that it's differing hardware? It looks like I'm giving up a few GB, but that just means dd will fail when it reaches the end of the drive (that's where my Linux partition is, which I know how to deal with). I guess there are tools like clonezilla for this purpose, but this is a one-off so I'd think dd could get the job done.

So this is really weird. Yesterday I booted Arch and "dd"'d the entire drive to a file so I would have an image I could restore from if it totally crapped out. This is mainly for the NTFS partition; I know the partition I had booted Arch from would have been in an inconsistent state, but that is backed up with rdiff-backup which is easy to restore from (I have no idea how one would do a bare metal Windows restore from file-based backup).

At any rate I did this and saved it to my file server. After which I rebooted back to Windows to play some Rogue Legacy. And I just left Windows running after I was done.

Well guess what? No stability problems last night. Which is the longest the system has remained stable since this issue first started. I am totally at a loss.

Can you run something like HDtune surface scan to check for bad sectors of an SSD? It works very well at finding bad areas of standard HDs when run from a boot disc or separate drive.

I've run fsck and chkdsk(sp?) on the ext4/ntfs volumes respectively. And I examined the SMART statistics which are all totally fine. I should see if there is a better test to run, I will look into HDtune. I believe there should also be a SMART self test which I should try.

In the process of running the "dd" every single sector will be read from, which is one reason I wanted to do so; to see if it might tickle the problem within Linux where I might find more useful debug information. But that completed A-OK.

System is still up now apparently. I left it running Windows when I went in to work just to see whether it would stay alive for a while.

I hate computers.