Over the past week I've been tackling a problem of my PC constantly crashing making it practically unusable. I've been consulting with fellow GWJ'er Wayfarer who has been generously helping me out with some troubleshooting steps.
I was wondering if anyone on the boards here also has any potential insights to move forward on solving this problem.
- AMD Ryzen 5 5600X 3.7 GHz 6-Core Processor
- ARCTIC Freezer 34 eSports DUO CPU Cooler
- MSI MPG B550 GAMING PLUS
- ARCTIC Freezer 34 eSpCrucial Ballistix 16 GB (2 x 8 GB) DDR4-3600 CL16 Memory
- Team MP33 1 TB M.2-2280 NVME Solid State Drive
- RTX 3080 Founders Edition
- Lian Li Lancool II Mesh
- ADATA XPG CORE Reactor 750 W
- Windows 10 Home
All items were purchased April 2020 except for the GPU which was purchased by a friend of mine on release and was resold to me.
When the PC boots up, after a certain amount of time (ranging from 30 seconds to approx. 20 minutes) Windows will freeze for a few moments then the system with crash to the BSOD. The error message within the BSOD varies but the most common I've seen is "WHEA Uncorrectable error" although i've seen "System service exception" on occasion.
While it is BSOD, it will attempt to do the Windows crash log dump, but it will be stuck at 0%, time out after 2/3 minutes, and then auto reboot to the BIOS.
In this state the BOOT LED indicator on the motherboard will then light up.
If you close the BIOS and the system auto reboots, instead of going into Windows it will return to the BIOS with the BOOT LED indicator still lit up (as if it can't detect my NVMe SSD).
I can then do a hard power down, start the PC back up again, and it will return to Windows (until the next BSOD).
I can let it sit in the BIOS menu for a considerable amount of time and it does not crash out. At this time it seems like the symptoms occur only when it hits Windows.
I was trying to see if there was a particular trigger that causes the BSOD (e.g doing specific actions), and initially I thought it was during heavy read/writes to the SSD, but as an experiment I let the computer sit there on the windows desktop after a reboot not touching anything and it crashed out regardless.
Trying to find issues via software
- To rule out CPU overheating, I downloaded a temperature monitor and watched it before the next BSOD. Right before the BSOD it was at about ~40C which from my understanding is well within tolerances.
- I then downloaded Teamgroup's S.M.A.R.T tool to check the health of the NVMe SSD (which was frustrating as the system kept BSOD'ing before i could download/start the application) and it came out as "healthy"
- I then ran "chkdsk c: /f" to look for errors but windows did not find any.
- I then ran the "Windows Memory Diagnostic" from Control Panel which came out as no errors.
- As a last resort to rule out any software issues with Windows drivers, I did a full reimage of the PC with a fresh install of Windows 10.
After a full reimage of Windows, the system seemingly worked, but after a few hours the symptoms began occurring again. This made me feel like this the issue was hardware.
Replacing NVMe SSD, issue still occurs with slight change in behaviour
At this point I was fairly sure it's a bad NVMe SSD. I've never had to send in a computer component for RMA before so when I looked it up the process it was a more of a headache than expected with me having to covering a bunch of the costs shipping it to Taiwan and back.
I was fairly confident the issue was a bad NVMe SSD and not wanting to go weeks/months without a gaming PC I went out and purchased a Samsung 970 Evo Plus NVMe and swapped out my old presumably dead one.
After 4 days of smooth sailing, I thought I was out of the woods.
Unfortunately, last night it BSOD'd again and started to exhibit very similar symptoms (system crashing to BSOD in intervals of about 30 seconds to 15 minutes from boot).
The only difference is that when it hit's the BSOD, rather than waiting at the BSOD log gathering screen stuck at 0%, it will immediately reboot the system and return to Windows. The system will then BSOD again within 10-20 seconds, and then after the second reboot return to the BIOS. From what i've seen, the BOOT LED on the motherboard does not light up.
Lastly I updated the BIOS of my motherboard to the latest version but no change in behaviour.
Potential Next Steps?
At this point, I'm starting to run out of potential troubleshooting steps I can think of.
My next guess is that the faulty component is the Motherboard and not the NVMe SSD after all?
I'm thinking my next two options are :
- Research MSI's warranty procedures hoping it's not a massive hassle/expense to get it RMA since it's within warranty, and then pray like hell that's the issue.
- Walk down to the computer component store a few blocks away from my condo and just straight up buy another motherboard... and then pray like hell that's the issue.
Does anyone have any other potential next steps that I haven't though of? Are there any other ways I can potentially pinpoint what the problem component is?
Thank you for your help!