#Major problems - new build crashing (experts needed)

1 messages · Page 1 of 1 (latest)

frail locust
#

Alright. This is thread two because thread one is frankly too long as this point for anyone new to jump in and try to help.

I made a brand new build. It crashes frequently. (every thirty minutes or 1-2 hours depending on hardware config)
absolutely brand new. Windows 11 fresh install, all new hardware.
4090, 13900k, 64gb(4x16) trident ddr5 6000mhz cl40, gigabyte aorus z790 master, multiple Samsung 990 pro drives, single ironwolf pro 12tb. Corsair Ax1600 psu. Custom hard-line loop with oversized radiators.

All drivers up to date, bios is up to date(current version F5k release December 27th).

Crashing began before the system was even set up(software still downloading). Crashed every 30 minutes. Crashes involve screen freezing, fans ramping up. No bsod shown, no minidump file generated.

Tests and steps already taken:

  • crashes regardless of if xmp enabled or not.
  • disabling all turbo, energy saving, etc on cpu. Still crashing.
  • remove all ram, and test one at a time. A single ram stick resulted in a crash after 1 hour and 30 minutes. >>> testing another single ram stick from a different package had that one crash after two hours. This concludes it is likely not a ram issue. The fact that 1/4th the ram crashed approximately 1/4th as often is extremely strange.
  • memtest64 produced no errors. No crashes with all 4 sticks in slot over several hours.
  • furmark stress test showed crashing at usual interval without change. 4090 max temp during stress test: 60 degrees.
  • prime95 stress test showed crashing at usual interval without change. 13900k max temp during stress test: 72 degrees.

After these tests, I have since completely replaced the motherboard, ram, cpu, and the ssd that windows was on with same SKU components. So a new motherboard, new ram, new cpu, and new ssd for windows. THE CRASHING IS STILL HAPPENING.

The only components that have remained the same are: gpu, psu, gpu vertical bracket, fans, pump, rads, case.

#

I need serious expert help. The only thing I know for certain at this point, is that after 15 years of building my own computers, this is the absolute last time I will ever under any circumstances build a pc. The amount of frustration, and time wasted on this piece of garbage makes me want to take a sledgehammer to it. Nevermind that this is all happening with components that cost more than they ever have in the past. This build easily cost 3x what a "top of the line" would have cost 5 years ago.

fresh canopy
#

So this is an LGA 1700 build. This socket is known for deforming CPUs with the ILM because Intel was huffing Gasoline when they put this socket out and they know it.

#

Can you post a picture of your CPU heatspreader after removing the cooler? I want to see the thermal paste spread pattern.

frail locust
#

I can't believe that the ilm bending issue is causing massive instability on not one, but two unique sets of 13900k and z790 motherboard. Sure maybe the ilm issue can be bad enough in some situations to cause this kind of problem, but what's the odds that it's going to happen two times in a row with 2 different sets of cpu/ram/mobo? If this was happening that frequently, then I'd be able to see other reports of my problem.

#

But right now I'm not seeing anything like what I'm experiencing.

fresh canopy
#

The odds are high. That's why those contact frame products exist.

frail locust
#

Those exist largely for more common issues such as poor cooling from everything I'm seeing. Even the video you linked for gamersnexus had no mention of complete system instability.

#

Is it possible this is being caused or exacerbated by a cpu backplate that is incorrectly milled?

#

Th ekwb backplate I am using has 4 holes drilled in it, presumably to allow the visible screw terminals to poke through, and for the backplate to only make contact with the frame, and not the screws in the frame.

#

This is not the case. The holes are drilled at the wrong position, so the backplate is making contact with all 4 screws in the cpu frame.

#

If this ilm issue was happening on its own and frequently causing this much instability, it wouldn't be generating third party products. It'd be generating class action lawsuits and the ftc investigating Intel in the litigious society that is America. It doesn't make sense that this would be common enough to experience it twice in a row with an all new set of parts unless there was a contributing factor.

#

Could this be the cause?

fresh canopy
#

No that would not cause the issue. In GN's own piece they state this can cause memory detection and PCIe stability issues from improper mounting pressure in the socket.

#

Your CPU backplate is not going to cause these issues.

frail locust
#

Since I hadn't yet wiped the paste off the old cpu, I took a closer look at it.

#

The paste is even from right to left. The paste however appears to have had poor contact at the very top of the chip. Quite a bit different from what GN described, and also away from ram area.

#

Furthermore, I unplugged the pcie vertical mount bracket, so now the gpu is effectively unplugged from the system.

#

The system has now been running for about an hour without crash, significantly longer than any previous time.

#

This suggests to me it's either a gpu, pcie, or an issue with pcie traces on the cpu.

#

Could the verticle mount pcie extension strip be impacting it? I have used this bracket in a previous setup without issue, however that setup was pcie4.0 while this is obviously 5.0... It should only be the same metal strips in the extension, but maybe?

#

Worst case scenario for me is that this is a gpu hardware failure.

fresh canopy
#

Hmm. Try reinstalling the GPU without the extension if you can.

frail locust
#

Do you realize how difficult of a question you're asking?

fresh canopy
#

LOL I understand it's all hardlined and everything.

#

Do you not have any soft tubing for this test?

frail locust
#

It takes a solid 15 minutes just to drain this loop.

#

I probably have some somewhere, but I'm not sure I have any fittings.

fresh canopy
#

Do you have a hardware store nearby? You just need some brass barbs and zip ties.

frail locust
#

I just want to throw this fucking thing out the window at this point.

#

I'm never building any computer ever again. This is so fucking stupid.

fresh canopy
#

Did you put everything together and test it before you hardlined it?

frail locust
#

No because to do that I'd have to have purchased an air cooler or an aio purely for the purpose of testing it.

#

1700 is a new pattern that I don't have any coolers for besides the block I got for this one.

fresh canopy
#

What coolers do you have?

frail locust
#

The two air coolers I've retained is a cryorig H7 and an R1 universal.

fresh canopy
#

Ahh.

#

Well at the very least if I were that strapped I would have soft-tubed everything up first before doing a hardline run.

frail locust
#

The R1 needs proper hardware with its weight and size. Can't just zip tie it on.

#

OK, and how would I soft tube it outside the box anyways?

fresh canopy
#

You got eager and you're frustrated because you got eager and didn't test. It happens.

frail locust
#

Dude I literally did a full hardware replacement. Like how tf does that not fix it?

fresh canopy
#

Because obviously something you reused has an issue.

frail locust
#

I can't possibly understand how replacing the motherboard, cpu, ram, AND the OS ssd outright doesn't resolve all issues.

#

OK, but how?

#

If it's a gpu issue, why is it not generating a dump file when it crashes?

#

Like sure, fine. No bsod because gpu is dead and can't display it.

#

But where's the dump file?

#

The only way I see no dump file happening on a crash is if the cpu or mobo is so shot that windows can't manage to generate one.

fresh canopy
#

Or it fails in a way that the system is not able to catch itself and generate a dump. That does happen.

frail locust
#

Yeah but how does a gpu do that?

fresh canopy
#

The GPU PCIe lanes come directly from the CPU

frail locust
#

It's technically not a critical component. I get that the system can't fall back on the igpu in this setup, but even if it can no longer generate an image, it should be able to generate a file.

fresh canopy
#

Try going into BIOS and setting the primary slot speed to Gen 4.0

frail locust
#

Yes, but for the issue to crash the cpu, it'd have to be an issue with the cpu itself, no?

fresh canopy
frail locust
#

I'm not super familiar on exactly how much a gpu and cpu interface with each other, but I don't understand why a gpu hardware problem would nuke the cpu.

fresh canopy
#

Because the GPU connects directly to the CPU. There is no middle component.

frail locust
#

And in a repeatable fashion where over probably two dozen crashes not a single bsod or dump file gets generated.

fresh canopy
#

If you touch a tree, then you touch a tree. If you pick up a pole and touch the tree with the pole, then you are touching the pole that is touching the tree. Savvy?

frail locust
#

Can we glean anything from just how repeatable this crash is? With very similar timing every time?

fresh canopy
#

Yeah it may be the rise cable. Go into BIOS and MANUALLY set your PCIe primary slot speed to Gen 4.0

frail locust
#

Pretty much always crashes at 30 minutes +/- 5min

fresh canopy
#

See if the failure repeats.

frail locust
#

OK that'll be my next step. For now I'm going to keep it running another few hours hoping it'll crash without the gpu attached so that way I can rule out a gpu hardware failure.

fresh canopy
#

It could be an issue with PCIe autonegotiation on a riser that is not rated for PCIe gen 5.

frail locust
#

I really really really don't want to have to deal with the mess of replacing this gpu considering current return policies surrounding gpus, and the fact I put a water block on it.

fresh canopy
#

Where are you purchasing from?

#

Country?

frail locust
#

So my plan is another 2 hours of up time. If no crashes then I'm going to plug the gpu in as is and test again(due to having moved throughout the house, possibly the outlet I was on before was the issue, and wires were overheating after a certain time)
Then I'll test with the riser removed after draining.

#

USA. Luckily the gpu is one of the only things that didn't come from Newegg.

#

Came from microcenter.

fresh canopy
#

Yeah. Then it won't matter as long as you can restore the GPU to its original shipping state.

#

The warranty is on the card and the shipped cooling solution.

frail locust
#

Sure, but technically their little tamper stickers are messed with.

#

Their little illegal tamper stickers.

fresh canopy
#

Doesn't matter. Warranties and returns cannot be voided for that reason.

#

They would have to prove that what you did after removing that sticker caused the device to fail.

#

The act of removing or breaking the seal is not enough to legally disperse warranty or return obligations.

frail locust
fresh canopy
#

That's why you involve your State's Attorney General's office. They typically handle consumer protections.

#

Corporations REALLY don't like state-paid lawyers.

#

Besides, most component manufacturers will accept the returns even if you have video evidence that you dropped it out of an airplane.

frail locust
#

I see some reviews on the riser cable. Several saying it works great on their 4090, and they probably aren't disabling pcie5.0 if their board uses it. I do see one that says using the cable resulted in resizable bar getting disabled on their board.

fresh canopy
#

I'm seeing posts that say riser cables should always match the Motherboard specification or exceed it.

frail locust
#

Just changed the bios setting for cpu pcie lane from auto to Gen 4. Didn't mess with the sch lane, but that shouldn't matter for gpu.

#

Let's see what happens now.

#

Well I managed to generate a crash. And off schedule too.

#

Generated it by launching metro exodus. Figured that'd be a good stress test.

fresh canopy
#

Yeah. Definitely need to try without the riser then

frail locust
fresh canopy
#

Yeah.... that's still not good.

#

So it looks like the top half of the CPU is being pressed down.

frail locust
#

Similar to the last one. Poor top center contact.
It's a little thick, but that's as thin as I could get the kryonaut with the spreader available

#

You mean bottom half?

#

That's where all the paste has been squeezed out, and it's collected on the top half.

fresh canopy
#

No. The bottom half is making contact. That's why the paste spread is thin.

#

The top half is not, which is why you have that thick sspread

frail locust
#

Yeah, so the top half is not being pressed down.

fresh canopy
#

No. The top half is being pressed down, more so than the bottom half.

frail locust
#

You mean by the ilm?

fresh canopy
#

Yeah.

frail locust
#

And shouldn't it be pretty even despite that so long as it's not bending? The way this water block is put on is not like past water blocks I've dealt with.

#

This uses spring pressure on all 4 sides, not screw pressure.

#

All 4 corners screwed in like this.

fresh canopy
#

Again it's not the waterblock. The mounting force of the waterblock is miniscule compared to the ILM.

frail locust
#

I know, but even if the cpu is canted, shouldn't the waterblock self level?

#

As in, level itself against the ihs?

fresh canopy
#

IT. IS. NOT. THE. WATERBLOCK.

#

Let me explain.

#

The ILM exerts dozens of pounds of pressure on the CPU.

frail locust
#

Like I understand the waterblock isn't what's causing the issue. If the inconsistent pressure is the cause of this issue, the waterblock has nothing to do with it.

fresh canopy
#

Inside the socket are 1700 little springs that push AGAINST that mounting pressure to ensure good electrical contact.

frail locust
#

I just don't understand why the thermal paste contact patch is uneven with this specific waterblock design, even if the ihs is canted.

fresh canopy
#

The ILM is exerting all the force on the LONG side of a rectangular component in only TWO locations in the middle.

#

So the pins that are NOT directly in this area are sometimes able to push the CPU substrate AWAY, causing poor or no contact for the pins.

#

Does that graphic make it more understandable?

#

Basically, the force from the ILM and the counterforce from the socket pins cause the CPU to deform.

frail locust
#

Yeah I understand it. I understand what's going on with the pins and ilm. My question was kind of unrelated haha, and not really at all relevant to the current problem.

#

I just looked hoping that microcenter would have these brackets, but they don't.

#

Is there any way to solve this issue without a third party bracket that's going to take another week+ to get here?

fresh canopy
#

Only way to be sure is by eliminating the riser cable as the problem.

frail locust
#

That'd still leave the possibility that the gpu is a failure.

fresh canopy
#

Potentially.

#

But not likely.

frail locust
#

And I mean the ilm issue. Is there any way to solve that without a bracket?

fresh canopy
#

Reliably? no.

#

Intel ADMITTED to me on Facebook that it is a faulty design

frail locust
#

That doesn't help me in the here and now lol.

fresh canopy
#

Nope

frail locust
#

And still, it seems that the issue isn't 100%. It doesn't always happen. Otherwise threads like this one of computers crashing every 30 minutes or more often would be absolutely everywhere.

#

And what are boutiques like maingear doing in relation to the issue? Are they just by default using third party brackets?

#

Could it be this specific cpu that's more vulnerable to it? Maybe this specific motherboard?

#

Honestly I've not been able to find a single other case similar to mine in my searching.

fresh canopy
#

There is no rhyme or reason to how each CPU responds.

frail locust
#

Yeah but again this isn't one cpu, this is two. And two motherboards.

fresh canopy
#

Most times the deformation is center-right in the IHS.

frail locust
#

Both of these cpus showed similar thermal paste patterns to one another.

fresh canopy
#

Maybe you just got a unicorn.

frail locust
#

Two unicorns then?

fresh canopy
#

Hey it could happen. XD

#

You got the exact same board both times, right?

frail locust
#

Yes. And the RMA rate for this board doesn't seem crazy.

#

The only issues I've really found at all on the thing are people having issues with the 10g Lan chip.

fresh canopy
#

So it's either going to be the GPU or the socket bend.

#

And the GPU might be caused by the riser

frail locust
#

Well, this riser and vertical bracket is going to be a giant pain to get out.

#

So I think I have a way to eliminate the riser as a cause.

#

I do have a waterblock 2080ti that I'll swap in.

#

With similar enough mounts to the 4090 that I won't have to mess with soft tubing or recut hard tube.

fresh canopy
#

And that would allow you to eliminate or confirm the 4090 as the problem

frail locust
#

Yes. The most important thing to do imo.

#

It took like two hours just to install this waterblock. And it's a very cool waterblock, that held the card very cool after 30 minutes of furmark stress, but I don't want to have to mess with it again.

#

It's one of EK's new front and back double blocks.

frail locust
#

Is it possible the 4090 could uniquely have issues with the riser cable?

#

Issues that wouldn't happen on the 2080ti due to lower bandwidth?

fresh canopy
#

Yes, as the 2080ti is a PCIe Gen 3 card and the 4090 is a gen 4.

#

What riser cable did you use?

frail locust
#

It's a lian li cable. Should be gen4.

#

When did Gen 4 get introduced? I know we're onto Gen 5 now.

fresh canopy
#

Gen 4 was a short-lived unicorn standard. Make sure the cable is Gen 4.

frail locust
#

I would have gotten this riser cable back in late ~2020, early 2021 probably.

#

I don't see anything indicating a generation stamped anywhere on it.

fresh canopy
#

Remove it from the equation and test a known-good device.

frail locust
#

I don't have anything known good that is Gen 4.

#

And lian li doesn't make things easy. There's both older pcie gen3 and newer pcie gen4 risers and brackets both on this exact same SKU.

#

With no visual differences.

#

4090 is technically Gen 4. What happens if I force the motherboard to Gen 3 with it installed? Will there be any problems, or will it just throttle?(assuming riser is the issue)

fresh canopy
#

Ehh you might see some performance decreases.

frail locust
#

So yeah just throttling then to test and see if the issue was the riser running Gen 4, or the gpu.

#

Because for clarification, I'm now an hour in with the 2080 without a crash.

fresh canopy
#

So it's definitely an issue with the RTX 4090.

#

But that still does not remove the riser from the root cause.

frail locust
#

Yeah

#

So I'll try the 4090 again with the motherboard forced to Gen 3 and see what happens.

#

If it doesn't crash with that guess I'll be ordering a lian li Gen 4 cable and concluding this cable as Gen 3.

frail locust
#

Currently at 3 hours+ uptime without any crashes with the 4090 and the motherboard set to Gen 3.

#

Safe to say it's the riser I think at this point.

#

Steam is freezing occasionally between actions, but idk if that's just a result of having 380 games downloading.

#

And windows is extremely slow bringing up any setting tabs, but maybe that's just because it's not activated and they want to extort me for another key.

#

Nothing crashing though.

jade rapids
#

@frail locust I forget which YouTuber had a video on this issue but basically you have to buy a high quality riser or you'll have issues. Apparently even with a good one you'll loose a few percent of gaming performance.

frail locust
#

Iirc ltt did a video on them, and even used some with some absurd lengths. Their video was mostly related to length though I think.