• RyeMan@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      9 months ago

      That includes NVMe… Just spent two weeks troubleshooting a constant random reboot on my newly built pc… It ended up being the m.2 port on the motherboard that was faulty, not even the drive itself. I’ve been building computers personally and professionally for over 20 years and that was a first for me. Everyone I talked to and every support forum insisted RAM or power supply were the problem but nope! Not this time!

      But the lesson here is, if you have a recurring problem that has no obvious cause… Test EVERYTHING. Start with the common stuff that fails and work your way down: Power Supply -> RAM -> CPU -> GPU -> HDDs -> SSDs -> USBs

      Tips for RAM: It’s usually best to first boot into a ram testing tool like memtest86 and just let that do its thing. That alone is usually all you need to know if you have a memory issue. Sometimes though, results may not make sense, I’ve seen situations where a new stick of RAM fails at almost every block and it turned out to be the slots on the motherboard that were faulty. In that case if results seem a little fishy you can remove all but one stick of RAM in the first slot, run another test, then move that stick of RAM down to the next slot. Repeat until all slots have been tested, you can also be extra thorough if needed and repeat the same test with the other sticks of RAM. That usually helps rule out if it’s a motherboard issue or an issue on the stick of RAM.

      CPU/GPU: usually any old stress test will make any hardware issues apparent with these two.

      SSDs: these can be a little tricky to test especially if you are booting from them but in my case I found that completely removing the NVMe drive solved all my problems (well a mobo rma was the real fix). I couldn’t even boot into a live Linux USB without crashing and rebooting when my NVMe was plugged in. One not so obvious clue that the SSD was acting up was that event logs related to the crash were never written to the drive… Because I/O was outright failing.

      USBs: yes, USBs are on that list. One of my first significant computer issues that I had ever encountered occurred from a faulty USB hub that stopped my PC from even booting up. I took it to two different repair shops they all told me nothing was wrong with my computer, but every time I brought it back home and plugged everything back in… I couldn’t boot. It was a lucky chance that I figured out it was the USB hub, that was not a fun one.

      Now I didn’t even add motherboards to the list because quite frankly I’m not sure how they rank but they are the absolute worst piece of hardware to troubleshoot but luckily it’s usually pretty rare that they fail. There are so many connections and settings built into motherboards that it quickly gets overwhelming trying to troubleshoot anything related to it. From my experience, if you have individually tested every bit of hardware and everything passes its test, most often it’s the motherboard that’s failing, especially if you have already ruled out software/firmware issues for sure. Motherboard issues aren’t always obvious and can often fail in very bizarre ways.

      And as a final bit of advice I’d like to throw out there from my years of experience in PC building… NEVER CHEAP OUT ON A POWER SUPPLY. It affects every single component in your PC and when they fail it can get ugly. I bought a super cheap off-brand power supply one time and pushed that thing to the absolute limits and when it failed it took down more than half of my PC with it, fried my motherboard, CPU, and RAM. Additionally, the risk of fire is not zero when these things fail. Always use ONLY the cables provided for that power supply and nothing else. Those cables are rated specifically for the wattage that can be supplied by that power supply. Also, it’s good to get a power supply that’s roughly 100+ watts more than what your PC needs. This helps in maximizing the efficiency of the power supply as well as increasing longevity due to less thermal wear.

      • Jarix@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        9 months ago

        Just to add to this. If its a removable part. If it works in a different machine then its either a compatibility issue with the part or the problem is with what its plugged into