Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I installed 2 x 980 Pro 2Tb in a laptop in Nov 2022. Running a daily Robocopy bat script to backup a folder in C: to D: would freeze a couple of times a week and lock the D: drive. After reboot, a drive check would find no errors and everything would work as normal. I've used the same script for years with no issues.

Since the firmware update last week Robocopy has not frozen the drive at all this week.



The freeze/reboot/fine cycle seems to be a common one for SSDs acting poorly, running out of blocks they want to use internally, or memory or cache or something, or just hanging in their own firmware for whatever other reason.

One of my earlier forays into switching to SSDs, I installed Intel... I think it was 525, 535, something like that, 2.5 inch SATA drives in several different machines. Every one has failed by now with this similar mode of (in)operation. On my desktop where I had one, it would simply bluescreen, but then come back fine until eventually reading certain parts of the disk would just always cause it to hang and it had to be replaced. Failed SSDs like this are interesting because Windows (and to a lesser extent Linux) really aren't prepared for the disk to just hang, so trying to recover anything off them can be a challenge.

Just recently I found out the last one I had around, in a little headless desktop server, was the cause of my problems with it where it would partially hang after a couple days of uptime. Having finally gotten around to having it hooked up to a display, I was treated to a sea of red dmesg errors from the disk.

I think ultimately part of the problem was new power-saving features Intel had tried to add for these disks, which would cause them to write to themselves a large amount and just eat through their useful lifetime much faster than you'd assume.

In almost every case, I replaced these with, of course... Samsungs. Though I believe I've been lucky enough not to choose any of their bad ones.


> The freeze/reboot/fine cycle seems to be a common one for SSDs acting poorly, running out of blocks they want to use internally, or memory or cache or something, or just hanging in their own firmware for whatever other reason.

Given that it's two gen4 drives in a laptop being subjected to a moderately heavy sustained workload, I'd also suspect a thermal problem or maybe even power delivery. Those two slots are probably being fed off the same 3.3V regulator.

Since the firmware bug appears to have caused catastrophic write amplification, what may seem to the user to be only a modest and reasonable workload may be causing the drive that is the backup destination to be running at full tilt doing a ton of writes to the flash and causing the drive to hit its peak power consumption and heat output.


I always suspected that it may be ability for laptop hardware to handle the second drive as performance was not as quite as performant as primary slot. Both slots are rated for PCIE 4 though.

It is strange though that after the firmware update there have been zero freezes.


Yeah I had an old intel ssd that would hang like this and I could never figure out wtf was going on


> the firmware update last week

Link to specific firmware version please?


The new firmware is version 5B2QGXA7, updated via magician on Windows. I didn't make a note of earlier firmware versions. It's still too soon to know if the ssd freeze will reoccur.


How odd, I've got a 980 Pro 2TB that I've had since mid last year, and checking I'm already on that firmware version.


thank you


Could you provide that batch script please? Like in a GitHub Gist or something similar.


Sure...

@echo off

pause

robocopy "C:\Users\o\Desktop\2023" "D:\2023" /e /mir /np /v /tee /r:0 /w:0 /log+:"C:\Users\o\Desktop\log_robocopy.txt"

pause

@echo on


You don’t need to turn echo back on at the end of your batch file. That line is pointless.


And don't need /e since /mir=/e /purge


Thanks. This script has adapted over time and I need to lookup most of the switches these days to remind me what they do.


what does the SMART data for your drive say?

I'm morbidly curious how much it reports lifespan remaining for its internal write-wear-leveling system.


I've only really experienced drive locking and freezing which resolves on a reboot, this is concerning enough. I haven't experienced any endurance issues.

SMART data is as follows (both since 2022-11-25)

Primary drive C: 5.1 TBW Model Name, Samsung SSD 980 PRO 2TB Serial Number, S***** Drive Type, NVMe Result,Byte End,Byte Start,Description,Raw Data,Status ,0,0,Critical Warning,0,OK ,2,1,Temperature (K),320,OK ,3,3,Available Spare,100,OK ,4,4,Available Spare Threshold,10,OK ,5,5,Percentage Used,0,OK ,47,32,Data Units Read,6465577,OK ,63,48,Data Units Written,10998930,OK ,79,64,Host Read Commands,150273501,OK ,95,80,Host Write Commands,157439035,OK ,111,96,Controller Busy Time,1083,OK ,127,112,Power Cycles,199,OK ,143,128,Power On Hours,571,OK ,159,144,Unsafe Shutdowns,12,OK ,175,160,Media Errors,0,OK ,191,176,Number of Error Information Log Entries,0,OK ,195,192,Warning Composite Temperature Time,0,OK ,199,196,Critical Composite Temperature Time,0,OK ,201,200,Temperature Sensor 1,320,OK ,203,202,Temperature Sensor 2,328,OK ,205,204,Temperature Sensor 3,0,OK ,207,206,Temperature Sensor 4,0,OK ,209,208,Temperature Sensor 5,0,OK ,211,210,Temperature Sensor 6,0,OK ,213,212,Temperature Sensor 7,0,OK ,215,214,Temperature Sensor 8,0,OK

Secondary drive D: 2.9 TBW Model Name, Samsung SSD 980 PRO 2TB Serial Number, S***** Drive Type, NVMe Result,Byte End,Byte Start,Description,Raw Data,Status ,0,0,Critical Warning,0,OK ,2,1,Temperature (K),320,OK ,3,3,Available Spare,100,OK ,4,4,Available Spare Threshold,10,OK ,5,5,Percentage Used,0,OK ,47,32,Data Units Read,4919136,OK ,63,48,Data Units Written,6128916,OK ,79,64,Host Read Commands,164977799,OK ,95,80,Host Write Commands,94324034,OK ,111,96,Controller Busy Time,78,OK ,127,112,Power Cycles,199,OK ,143,128,Power On Hours,538,OK ,159,144,Unsafe Shutdowns,20,OK ,175,160,Media Errors,0,OK ,191,176,Number of Error Information Log Entries,0,OK ,195,192,Warning Composite Temperature Time,56,OK ,199,196,Critical Composite Temperature Time,0,OK ,201,200,Temperature Sensor 1,320,OK ,203,202,Temperature Sensor 2,323,OK ,205,204,Temperature Sensor 3,0,OK ,207,206,Temperature Sensor 4,0,OK ,209,208,Temperature Sensor 5,0,OK ,211,210,Temperature Sensor 6,0,OK ,213,212,Temperature Sensor 7,0,OK ,215,214,Temperature Sensor 8,0,OK




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: