Random crashes

Bhuna50

Author Level
Hoping @ubuysa is still around and able to assist please :D

Had the following machine for a couple of weeks now and everything was working brilliantly but being the greedy man I am I decided to upgrade and get two more RAM sticks to go in. Purchased via PCS and added and since then had issues, so have since removed the new sticks (basically, the RAM they sent is different type so not compatible with what I had in already even though purchased through the upgrade page - RMA being sorted).

Since then though now when I play a game I am getting random restarts after about 10 minutes of play. I have therefore uploaded a zipped file with the bits and pieces you ask for for a BSOD hoping you might be able to help identify what might be the problem. I am up to date with all Windows updates and optional drivers.

I am pretty sure there are no loose connections or wires or components since I opened the case and added/removed the RAM.

I am also sure the other RAM sticks are still seated correctly (having reseated them to check) and I also ran a MEMTest and no errors found.

3DMark runs a test ok and Im not experiencing any temperature spikes, but when I get into a game (currently F1 Manager and also Death Stranding) have both crashed on me / reset the machine completely.

Have not experienced any other BSODs or resets that Im aware of - can surf internet for hours etc.

Thanks

Andy

Spec:

Case
CORSAIR 4000D RGB AIRFLOW TEMPERED GLASS GAMING CASE - WHITE
Processor (CPU)
AMD Ryzen 9 7900 12 Core CPU (4.0GHz-5.4GHz/76MB CACHE/AM5)
Motherboard
ASUS® TUF GAMING B650-PLUS WIFI (DDR5, USB 3.2, 6Gb/s)
Memory (RAM)
32GB Corsair VENGEANCE RGB DDR5 6000MHz (2 x 16GB)
Graphics Card
12GB NVIDIA GEFORCE RTX 4070 Ti - HDMI, DP, LHR
1st M.2 SSD Drive
1TB SOLIDIGM P44 PRO GEN 4 M.2 NVMe PCIe SSD (up to 7000MB/sR, 6500MB/sW)
1st M.2 SSD Drive
2TB SOLIDIGM P44 PRO GEN 4 M.2 NVMe PCIe SSD (up to 7000MB/sR, 6500MB/sW)
1st Storage Drive
2TB SEAGATE BARRACUDA SATA-III 3.5" HDD, 6GB/s, 7200RPM, 256MB CACHE
1st Storage Drive
2TB SEAGATE BARRACUDA SATA-III 3.5" HDD, 6GB/s, 7200RPM, 256MB CACHE
RAID
RAID 1 (MIRRORED VOLUME - 2 x same size & model HDD / SSD)
Power Supply
CORSAIR 1000W RMx SERIES™ - MODULAR 80 PLUS GOLD, ULTRA QUIET
Power Cable
1 x 1.5 Metre UK Power Cable (Kettle Lead, 1.0mm Core)
Processor Cooling
CORSAIR H100x RGB ELITE HIGH PERFORMANCE CPU COOLER
Thermal Paste
STANDARD THERMAL PASTE FOR SUFFICIENT COOLING
Sound Card
ONBOARD 6 CHANNEL (5.1) HIGH DEF AUDIO (AS STANDARD)
Network Card
ONBOARD 2.5Gbe LAN PORT
USB/Thunderbolt Options
MIN. 2 x USB 3.0 & 2 x USB 2.0 PORTS @ BACK PANEL + MIN. 2 FRONT PORTS
Operating System
Windows 11 Home 64 Bit - inc. Single Licence [KUK-00003]
Operating System Language
United Kingdom - English Language
Windows Recovery Media
Windows 10/11 Multi-Language Recovery Image - Unlimited Downloads from Online Account
Office Software
FREE 30 Day Trial of Microsoft 365® (Operating System Required)
Anti-Virus
NO ANTI-VIRUS SOFTWARE
Browser
Microsoft® Edge
Keyboard & Mouse
LOGITECH® MK540 WIRELESS KEYBOARD & MOUSE COMBO
Warranty
3 Year Platinum Warranty (3 Year Collect & Return, 3 Year Parts, 3 Year labour)
Delivery
SATURDAY DELIVERY TO UK MAINLAND (BEFORE 2PM)
Build Time
FAST TRACK 3 WORKING DAY DISPATCH
Welcome Book
PCSpecialist Welcome Book - United Kingdom & Republic of Ireland
Logo Branding
PCSpecialist Logo
 

ubuysa

The BSOD Doctor
Waaaa? Oh, hang on, I'm just lazing on a sunbed on the beach, it's 27C and sunny. The sea was glorious for my swim. Most of the tourists have gone now, so the beach belongs to the locals again! Eh? Oh, you don't want to know that? OK then...

On balance (and this one isn't clear-cut) I think that flaky RAM is the most likely cause of what you're seeing.

The one dump fails with an exception code of 0xC0000005 (a memory access violation), and in a Microsoft module (which tends to suggest a hardware cause). The 0xC0000005 exception doesn't always indicate bad RAM, it just indicates that the referenced page was invalid (not allocated, paged out, or bad) and the usual cause is a flaky third-party driver screwing up its memory pointers and pointing at the wrong memory location.

However, in your Application log there are a great many application error messages with either 0xC0000005 or 0xC0000409 exceptions. The 0xC0000409 exception is a stack buffer overrun, which means that a stack-based buffer pointer was pointing beyond the end of the buffer. That's either a third-party driver fouling up its buffer pointer or the buffer is in bad RAM.

There are way more of the application error messages (with 0xC0000005 or 0xC0000409 exceptions) than I would expect to see in a normal log, and they are generally for different executables. All that makes a strong case for RAM. That you've been messing with RAM is another big clue too, as I'm sure you realise.

If re-seating the existing RAM doesn't help, then remove the existing RAM and just install the new RAM, see whether that stops the problems?

There is one other potential issue in the dump relating to the LAN adapter driver. There's a, lot going on in this dump, but in general a networking operation was in progress, we see the Microsoft networking drivers called (ndis.sys, for example) and the third-party LAN adapter driver rtwlane601.sys. But there is also some sort of USB3 operation in progress, we see Microsoft USB drivers being called (USBXHCI.sys and UsbHub3.sys) and the Windows Driver Foundation root driver Wdf01000.sys. That one is called by any third-party driver written to use the WDF libraries, but we can't see the actual third party driver. We do see the Windows HIDCLASS.sys and hidusb.sys drivers called, so this will be a human interface device (a mouse or keyboard) - in which case we might suspect the third-party mouse or keyboard driver.

The one third-party driver that we do see (rtwlane601.sys) is dated July 2022...
Code:
1: kd> lmDvm rtwlane601
Browse full module list
start             end                 module name
fffff802`a6350000 fffff802`a6c16000   rtwlane601 T (no symbols)   
    Loaded symbol image file: rtwlane601.sys
    Image path: rtwlane601.sys
    Image name: rtwlane601.sys
    Browse all global symbols  functions  data
    Timestamp:        Mon Jul  4 06:35:11 2022 (62C25FEF)
    CheckSum:         008B8047
    ImageSize:        008C6000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
You might want to see whether there is an update for that driver, but just because it's referenced in the dump doesn't mean it's automatically at fault of course. If there is a RAM issue there it can fail in any driver or module of course.
 

Bhuna50

Author Level
Waaaa? Oh, hang on, I'm just lazing on a sunbed on the beach, it's 27C and sunny. The sea was glorious for my swim. Most of the tourists have gone now, so the beach belongs to the locals again! Eh? Oh, you don't want to know that? OK then...

On balance (and this one isn't clear-cut) I think that flaky RAM is the most likely cause of what you're seeing.

The one dump fails with an exception code of 0xC0000005 (a memory access violation), and in a Microsoft module (which tends to suggest a hardware cause). The 0xC0000005 exception doesn't always indicate bad RAM, it just indicates that the referenced page was invalid (not allocated, paged out, or bad) and the usual cause is a flaky third-party driver screwing up its memory pointers and pointing at the wrong memory location.

However, in your Application log there are a great many application error messages with either 0xC0000005 or 0xC0000409 exceptions. The 0xC0000409 exception is a stack buffer overrun, which means that a stack-based buffer pointer was pointing beyond the end of the buffer. That's either a third-party driver fouling up its buffer pointer or the buffer is in bad RAM.

There are way more of the application error messages (with 0xC0000005 or 0xC0000409 exceptions) than I would expect to see in a normal log, and they are generally for different executables. All that makes a strong case for RAM. That you've been messing with RAM is another big clue too, as I'm sure you realise.

If re-seating the existing RAM doesn't help, then remove the existing RAM and just install the new RAM, see whether that stops the problems?

There is one other potential issue in the dump relating to the LAN adapter driver. There's a, lot going on in this dump, but in general a networking operation was in progress, we see the Microsoft networking drivers called (ndis.sys, for example) and the third-party LAN adapter driver rtwlane601.sys. But there is also some sort of USB3 operation in progress, we see Microsoft USB drivers being called (USBXHCI.sys and UsbHub3.sys) and the Windows Driver Foundation root driver Wdf01000.sys. That one is called by any third-party driver written to use the WDF libraries, but we can't see the actual third party driver. We do see the Windows HIDCLASS.sys and hidusb.sys drivers called, so this will be a human interface device (a mouse or keyboard) - in which case we might suspect the third-party mouse or keyboard driver.

The one third-party driver that we do see (rtwlane601.sys) is dated July 2022...
Code:
1: kd> lmDvm rtwlane601
Browse full module list
start             end                 module name
fffff802`a6350000 fffff802`a6c16000   rtwlane601 T (no symbols)  
    Loaded symbol image file: rtwlane601.sys
    Image path: rtwlane601.sys
    Image name: rtwlane601.sys
    Browse all global symbols  functions  data
    Timestamp:        Mon Jul  4 06:35:11 2022 (62C25FEF)
    CheckSum:         008B8047
    ImageSize:        008C6000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
You might want to see whether there is an update for that driver, but just because it's referenced in the dump doesn't mean it's automatically at fault of course. If there is a RAM issue there it can fail in any driver or module of course.
Thanks ubuysa

A few things for me to try and check. A thought on the USB 3 but is that I do leave a USB drive plugged in so I will eject that and unplug it for a long test run too.

By the way. Always interested in nice beach visits. As a tourist of course I’m sure we will be back to Cyprus at some point 😂😂
 

Bhuna50

Author Level
Ok. I think I’m making progress.

Short of resorting to a clean install at the moment I have now completed the following and think I’ve narrowed it down to Grfx card driver / hardware.

To rule out RAM memory issues I have tried both sets of pair of sticks and run Memtest on them - no errors.

I have also run Furmark but had no issues arise (but still think it’s graphics related).

I also ran Cinebench - no issues.

I took out grfx card and reseated it. Noted that the supporting arm isn’t really that supportive so adjusted that.

Ran my game crash.

But when I now look at event viewer all the system warnings are around an Nvidia OpenGL driver issue / cannot be found / failure.

1698786164246.png

So now I’m on a clean install of Nvidia using the Clean install option from todays driver update.

If this fails not sure where to go from here other than a clean install which I’m trying to avoid so soon. Lol.

Also for information I cannot run 4 sticks at 6000. There are known issues still according to PCS reply I got so will be returning my two new sticks.
 

Bhuna50

Author Level
OK not sure if Im on a wild goose chase here or not, but after completing uninstalling via DDU then letting Windows update pick up the driver, I tried again - game reset again - this time though:

1698788309021.png


that warning says ran out of memory???!!!???
 

SpyderTracks

We love you Ukraine
OK not sure if Im on a wild goose chase here or not, but after completing uninstalling via DDU then letting Windows update pick up the driver, I tried again - game reset again - this time though:

View attachment 39128

that warning says ran out of memory???!!!???
Windows update can't configure graphics drivers, you have to install directly from Nvidia.
 

SpyderTracks

We love you Ukraine
Every time I do, Windows installs:
View attachment 39129

so it does do some form of install - I then run the NVIDIA driver update.
There's a specific process to run it, follow the guide here, you have to disconnect from the internet

 

Martinr36

MOST VALUED CONTRIBUTOR
So now I’m on a clean install of Nvidia using the Clean install option from todays driver update.
Have you tried uninstalling the graphics driver using DDU

 

Bhuna50

Author Level
There's a specific process to run it, follow the guide here, you have to disconnect from the internet

I was so where would this be picking up from. Ok. I'm exhausted as been up since 3am so will try in morning. Thanks all.
 

ubuysa

The BSOD Doctor
On a slightly different tack, are there any relevant dump in any of the folders under C:\Windows\LiveKernelEvents? Look especially in the C:\Windows\LiveKernelEvents\WATCHDOG folder. Upload any you find...
 

ubuysa

The BSOD Doctor
Those two dumps are interesting. Live kernel event dumps are taken when a problem occurs but Windows is able to recover, usually by crashing just the offending address space. Both of these dumps relate to USB3 issues - something I saw in the earlier BSOD dump you uploaded. Which is REALLY interesting.

Both of the live kernel dumps are the same, they are both 0x144 BUGCODE_USB3_DRIVER, with an argument 1 value of 0x1020. That indicates that the problem was caused by the USB3 driver indicating a completion event that was not outstanding on the USB3 controller. We can see this in the failure bucket in the dump...
Code:
FAILURE_BUCKET_ID:  LKD_0x144_INVALID_TRANSFER_EVENT_PTR_ED_0_DUPLICATE_USBXHCI!TelemetryData_CreateReport
That indicates that there was a 'duplicate event' (flagging a completion event twice) in the TelemetryData_CreateReport function of the usbxhci.sys driver. That's a Windows driver (so it's not at fault) and it's part of the USB3 driver stack.

Also in the dump we can access the hardware identifiers for the specific device...
Code:
HARDWARE_ID:  VEN_1022&DEV_43F7&REV_0001
As far as I'm able to tell that's a AMD chipset device, probably USB3 related since it's involved here. Certainly the VEN_1022 is for AMD, but I can't pin down the DEV_43F7. Looking at all of the AMD devices the 43xx series are all mostly USB/SATA/PCI device related.

I'm encouraged that all the dumps we have are USB related and that suggests a possible AMD chipset driver issue related to the USB3 ports?

I would use the AMD Drivers & Support tool to see whether there are any chipset driver updates available? It might also be worth talking to PCS to see whether there is a BIOS/AGESA update that relates to USB3 issues?
 

Bhuna50

Author Level
@ubuysa OK some further tests:

I unplugged all usb's except keyboard. Put mouse onto the BT of the motherboard so I could remove the BT usb and also removed external drive and monitor USB connections from desktop machine.

Ran 3dMark stress test in window mode and it appeared to work until I pressed escape - then machine rebooted.

So I thought lets clear all logs and do a run through. So, cleared logs, loaded up 3d mark, and ran stress test - machine rebooted at 17:38:26 - so now downloaded all logs to here:


However, I have noticed the last few crashes there have been no minidump files or anything new in Live Kernel Events folder.

I think it might be time for a CLEAN install, but will wait out for PCS response first to my message.

Best start making a list of the drivers / programs that the machine came with so I can have them all ready to install LOL :D. eg Armoury crate, iCue, etc :D.
 
Top