[WD Caviar Green 3TB] Błędy odczytu na 3 nowych dyskach

Question

Zakupiłem 6 dysków WD Caviar Green do serwera w celu powiększenia przestrzeni backupowej. 4 mają pracować w software'owej macierzy RAID5 (linuksowy mdadm). Po podłączeniu spartycjonowałem 4 dyski i zacząłem budowanie macierzy: Niestety, macierz się nie zbudowała - pierwszy dysk został wyłączony...

Łukasz_W · Answer

Podaj SMART wszystkich dysków. Podany wyżej ma jeden błędny sektor (197) - na procedurę naprawczą: Dyski zakupione gdzie? Jako nowe?

artaa · Answer

Zwróć WD Green i kup chociaż Red: lub to co bozia przykazała do raidu: WD G są do G...

vmario · Answer

Dyski kupione w dużym sklepie komputerowym, z którego niejednokrotnie korzystałem. Oczywiście, nowe, fabrycznie zapakowane w antystatyczną folię. Zastanawiam się, czy jest sens brać się za realokację sektorów, bo pytanie, czy za parę tygodni to wszystko znów się nie sypnie... SMART-y wszystkich dysków oprócz tego, który pierwszy uległ awarii i już jest wyciągnięty z serwera: Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowka# smartctl -a /dev/sdb smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen,  === START OF INFORMATION SECTION === Device Model:     WDC WD30EZRX-00MMMB0 Serial Number:    WD-WCAWZ2913825 Firmware Version: 80.00A80 User Capacity:    3,000,592,982,016 bytes Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Thu Jan 10 22:34:48 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status:  (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status:      (   0) The previous self-test routine completed without error or no self-test has ever  been run. Total time to complete Offline  data collection:   (49380) seconds. Offline data collection capabilities:   (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine  recommended polling time:   (   2) minutes. Extended self-test routine recommended polling time:   ( 255) minutes. Conveyance self-test routine recommended polling time:   (   5) minutes. SCT capabilities:         (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0   3 Spin_Up_Time            0x0027   153   146   021    Pre-fail  Always       -       9316   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       296  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       15 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       54 194 Temperature_Celsius     0x0022   126   120   000    Old_age   Always       -       26 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed without error       00%       277         - # 2  Conveyance offline  Completed without error       00%       248         - # 3  Short offline       Completed without error       00%       248         - # 4  Conveyance offline  Completed without error       00%       248         - SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS     1        0        0  Not_testing     2        0        0  Not_testing     3        0        0  Not_testing     4        0        0  Not_testing     5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowka# smartctl -a /dev/sdc smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen,  === START OF INFORMATION SECTION === Device Model:     WDC WD30EZRX-00MMMB0 Serial Number:    WD-WCAWZ2913844 Firmware Version: 80.00A80 User Capacity:    3,000,592,982,016 bytes Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Thu Jan 10 22:35:14 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status:  (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status:      (   0) The previous self-test routine completed without error or no self-test has ever  been run. Total time to complete Offline  data collection:   (50700) seconds. Offline data collection capabilities:   (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine  recommended polling time:   (   2) minutes. Extended self-test routine recommended polling time:   ( 255) minutes. Conveyance self-test routine recommended polling time:   (   5) minutes. SCT capabilities:         (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0   3 Spin_Up_Time            0x0027   253   157   021    Pre-fail  Always       -       4341   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       221  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       110 194 Temperature_Celsius     0x0022   125   120   000    Old_age   Always       -       27 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed without error       00%       202         - # 2  Conveyance offline  Completed without error       00%       173         - # 3  Conveyance offline  Completed without error       00%       172         - # 4  Short offline       Completed without error       00%       172         - SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS     1        0        0  Not_testing     2        0        0  Not_testing     3        0        0  Not_testing     4        0        0  Not_testing     5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowka# smartctl -a /dev/sdd smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen,  === START OF INFORMATION SECTION === Device Model:     WDC WD30EZRX-00MMMB0 Serial Number:    WD-WCAWZ2910393 Firmware Version: 80.00A80 User Capacity:    3,000,592,982,016 bytes Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Thu Jan 10 22:35:35 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status:  (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status:      ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline  data collection:   (51000) seconds. Offline data collection capabilities:   (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine  recommended polling time:   (   2) minutes. Extended self-test routine recommended polling time:   ( 255) minutes. Conveyance self-test routine recommended polling time:   (   5) minutes. SCT capabilities:         (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0   3 Spin_Up_Time            0x0027   253   157   021    Pre-fail  Always       -       4258   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       221  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       113 194 Temperature_Celsius     0x0022   124   120   000    Old_age   Always       -       28 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Short offline       Completed: read failure       90%       220         1189794616 # 2  Extended offline    Completed: read failure       90%       193         1189794616 # 3  Short offline       Completed: read failure       90%       193         1189794616 # 4  Conveyance offline  Completed without error       00%       173         - # 5  Conveyance offline  Completed without error       00%       172         - # 6  Short offline       Completed without error       00%       172         - SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS     1        0        0  Not_testing     2        0        0  Not_testing     3        0        0  Not_testing     4        0        0  Not_testing     5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowka# smartctl -a /dev/sde smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen,  === START OF INFORMATION SECTION === Device Model:     WDC WD30EZRX-00MMMB0 Serial Number:    WD-WCAWZ2911185 Firmware Version: 80.00A80 User Capacity:    3,000,592,982,016 bytes Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Thu Jan 10 22:35:57 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status:  (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status:      ( 113) The previous self-test completed having the read element of the test failed. Total time to complete Offline  data collection:   (51900) seconds. Offline data collection capabilities:   (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine  recommended polling time:   (   2) minutes. Extended self-test routine recommended polling time:   ( 255) minutes. Conveyance self-test routine recommended polling time:   (   5) minutes. SCT capabilities:         (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0   3 Spin_Up_Time            0x0027   252   155   021    Pre-fail  Always       -       4375   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       221  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       115 194 Temperature_Celsius     0x0022   126   119   000    Old_age   Always       -       26 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed: read failure       10%       201         1189795664 # 2  Conveyance offline  Completed without error       00%       173         - # 3  Conveyance offline  Completed without error       00%       172         - # 4  Short offline       Completed without error       00%       172         - SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS     1        0        0  Not_testing     2        0        0  Not_testing     3        0        0  Not_testing     4        0        0  Not_testing     5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowka# smartctl -a /dev/sdg smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen,  === START OF INFORMATION SECTION === Device Model:     WDC WD30EZRX-00MMMB0 Serial Number:    WD-WCAWZ2937677 Firmware Version: 80.00A80 User Capacity:    3,000,592,982,016 bytes Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  Exact ATA specification draft version not indicated Local Time is:    Thu Jan 10 22:36:19 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status:  (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status:      (   0) The previous self-test routine completed without error or no self-test has ever  been run. Total time to complete Offline  data collection:   (50160) seconds. Offline data collection capabilities:   (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine  recommended polling time:   (   2) minutes. Extended self-test routine recommended polling time:   ( 255) minutes. Conveyance self-test routine recommended polling time:   (   5) minutes. SCT capabilities:         (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0   3 Spin_Up_Time            0x0027   153   153   021    Pre-fail  Always       -       9333   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       9   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       48  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       7 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       19 194 Temperature_Celsius     0x0022   126   120   000    Old_age   Always       -       26 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed without error       00%        28         - # 2  Conveyance offline  Completed without error       00%         0         - # 3  Short offline       Completed without error       00%         0         - # 4  Conveyance offline  Completed without error       00%         0         - SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS     1        0        0  Not_testing     2        0        0  Not_testing     3        0        0  Not_testing     4        0        0  Not_testing     5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Na to wygląda. Ponadto tutaj piszą, że NCQ czy jakiś inny energooszczędny wynalazek w tych dyskach sprawia, że są wyrzucane z RAID-a. Pytanie tylko dlaczego w drugim serwerze Greeny nie sprawiają problemów i dlaczego SMART też zgłasza błędy? Jeżeli dobrze rozumiem, oznacza to fizyczny błąd? Po kilkunastu godzinach pracy? To chyba przesada...

artaa · Answer

Takie po prostu są... to nigdy nie miały być dobre dyski... ;-)

Poza tym, w jakiej firmie pozwolono na stosowanie w raidzie serwera dysków (chociażby) nie z serii RE??
Mam nadzieję że chociaż backup jest robiony często na jakąś macierz, itp.

vmario · Answer

Uruchomiłem program badblocks i przeskanowałem większość dysków. Zajęło to kilka dni, a skutek był taki, że powtórnie wykonane testy SMART nie znalazły już wspomnianych błędów. Tylko na jednym dysku znalazł się jeden bad sector. Oddałem wszystkie 6 dysków na gwarancji do sklepu. 5 z nich odebrałem po kilku dniach jako sprawne. 1 pojechał do WD (zapewne ten z bad sectorem) i po kilku kolejnych dniach został wymieniony na nowy. Prawdopodobnie powtórna próba postawienia macierzy znów zakończyłaby się niepowodzeniem, gdyby nie post na blogu Jakuba Kasprzyckiego. Okazuje się, że dyski WD Caviar Green posiadają specjalny tryb uśpienia IDLE3, który potrafi być uruchamiany co 8 sekund. Skutkuje to nieustannym parkowaniem głowic, a to z kolei prowadzi do fizycznego zużycia dysku w ekspresowym tempie i, prawdopodobnie, problemami z funkcjonowaniem macierzy. Na szczęście WD udostępnia program WDIDLE3, który pozwala zmienić ustawienia tego trybu. Posługując się płytą Ultimate Boot CD, która zawiera DOS-a m.in. z programem WDIDLE3, zwiększyłem czas oczekiwania na uśpienie do 300 sekund: Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowkawdidle3 /s300 Zrobiłem to tylko dla pewności, gdyby nie zadziałała następna komenda, wyłączająca w ogóle tryb IDLE3: Kod: text Rozwiń Zaznacz wszystko Kopiuj do schowkawdidle3 /d Przed wykonaniem tych operacji można było wyraźnie zaobserwować, że 8 sekund bezczynności skutkowało zaparkowaniem głowicy, po - parkowanie nie następuje w ogóle. Macierz zbudowała się poprawnie i na razie nie ma z nią żadnych problemów. Jako ciekawostkę dodam, że macierz zbudowana z podobnych modeli WD Caviar Green pracująca pod sprzętowym kontrolerem nie sprawiała żadnych problemów, ale liczba parkowań na tych dyskach przekroczyła 400000 w ciągu kilkunastu miesięcy (dopuszczalna wartość to ponoć 300000)! Oczywiście, na tych dyskach też wyłączyłem IDLE3. Trzecia macierz z WD Caviar Green, pracująca w macierzy software'owej z niewiadomych przyczyn ani nie sprawia problemów, ani nie parkuje nieustannie dysków (liczba parkowań ok. 30...40 w ciągu kilku miesięcy).

zakk87 · Answer

Odkopię temat bo trafił do mnie kilka dni temu WD20EADS. Włożyłem go do NAS i wyłączyłem parkowanie głowic po czym postanowiłem go zapełnić przez rsync. Do 1,6TB szło dobrze a później transfer spadł do 10MB/s, po kilku godzinach hdparm -t --direct pokazywał już w KB/s. smartctl stwierdził 1465 sektorów oczekujących.
Przełożyłem dysk do PC i rozpocząłem skan w MHDD przed sektorem wskazanym w smartctl. Co ciekawe proces zawiesił się na dłużej na UNC i nic już nie mogłem zrobić. Po restarcie wykonałem komendę ERASE od wadliwych sektorów dysk zaczął skrobać i nagle trach. C5 się wyzerowało, C4 nie zmieniło się z 0 na inną wartość. 05 także, jest tylko C8 = 1. Przez cały proces formatowania prędkość oscylowała między 80 a 100MB/s. Spróbuję go dziś zapełnić jakimiś danymi jednak zmienię wartość w wdidle na 300. Na forum WD przeczytałem, że akurat w greenach nie powinno się wyłączać całkowicie parkowania głowic ponieważ dysk zwalnia dramatycznie jak u mnie.
https://community.wd.com/t/green-caviar-high-...ycle-cout-after-short-operation-time/13283/52

kaleron · Answer

Zamieść wynik pełnego skanu powierzchni - wykres z Wiktorii + ew. komunikaty o błędach.

zakk87 · Answer

tydzień temu smart wyglądał tak:

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   187   142   021    Pre-fail  Always       -       7641
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1119
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4690
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       345
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       149
193 Load_Cycle_Count        0x0032   189   189   000    Old_age   Always       -       33429
194 Temperature_Celsius     0x0022   109   098   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   199   196   000    Old_age   Always       -       329
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

W międzyczasie wykonałem sprawdzenie badblock co dało wynik:

a smart zmienił się na:

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   185   142   021    Pre-fail  Always       -       7750
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1121
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4948
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       345
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       149
193 Load_Cycle_Count        0x0032   189   189   000    Old_age   Always       -       33431
194 Temperature_Celsius     0x0022   108   098   000    Old_age   Always       -       44
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   196   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

kaleron · Answer

Mam się domyślić, że podałeś 4 adresy uszkodzonych sektorów bez podania typu błędu?

BANANvanDYK · Answer

Mam coś podobnego w Toshiba MQ01ABD050. Po jakimś czasie pojawiają się błędy odczytu uniemożliwiające normalne użytkowanie. Badblocks w trybie odczytu przestawał być skuteczny, bo tablica uszkodzonych sektorów trafiła na miejsce gdzie był uszkodzony sektor. Każdy sposób polegający na zapisywaniu powierzchni (zerowanie itp) skutkuje „naprawą” dysku. Jak zrobiłem badblocks w opcji destruktywnej, to okazało się że dysk jest idealnie sprawny, na każdym patternie.

kaleron · Answer

Przyjrzyj się zasilaniu, bo prawdopodobnie tam leży przyczyna.

BANANvanDYK · Answer

W laptopie za bardzo nie ma jak „przyjrzeć” się zasilaniu. Poprzednie dyski twarde nie miały takich problemów, w tym MQ01ABD100 na którym jest stary OS. Z 5V USB nie ma problemów gdyż wykorzystuję do ładowania telefonów, akumulatorów li-ion i latarek.

kaleron · Answer

Takie pojawiające się i znikające problemowe sektory najczęściej są wynikiem nieprawidłowego namagnesowania powierzchni, a to z kolei zazwyczaj wynika z niestabilnego zasilania.

helmud7543 · Answer

Nie podam modelu ale mialem niedawno Green-a 2 TB 3,5 cala, który lapal niższe transfery a potem błędy odczytu (te na szczęście tylko w SMART, danych przy nie ubił) przy dłuższym intensywnym obciążeniu losowym zapis/odczyt. Wiem tylko czy to kwestia temperatury (lekko powyżej 55C, dalej nie szła) czy SMR (Crystal Disk Info pokazywał że, ma TRIM). Chwilą oddechu i pamięć dostała skrzydeł do następnego dłuższego obciążenia bez przerwy. Żadnych opóźnionych sektorów po odpoczynku. Żadnych realokowanych sektorów. Wykorzystywałem niemal całą dostępną przestrzeń więc ewentualne przenoszenie do zdrowych obszarów i ukrycie faktycznej lokalizacji uszkodzeń poza LBA nie przeszło by - nie byłoby już gdzie takiej ilości przenieść. To nie było lokalne spowolnienie. Jak już dostał zadyszki co cokolwiek by mu nie każąc czytać to sobie radził jak trup na wykonczeniu. A on po odpoczynku po prostu wracał do dużych transferów i braku błędów. I podobnie z jakimś 3,5 calowym 2TB Seagate, chyba też z SMR. Jak zlokalizuje które to to obciążę i pokaże co się odstawia po kilkudziesięciu minutach nieustannej pracy głowic.

Promuję tematy:
08.12.2024 Co nie podoba Ci się na elektroda.pl?

kaleron · Answer

SMR, to dodatkowa komplikacja tłumaczenia adresów logicznych na fizyczne. Po pierwsze fizyczne odległości pomiędzy adresami sąsiednimi w adresacji logicznej często robią się większe, po drugie, adresy logiczne mogą (choćby tymczasowo) lądować w innych lokalizacjach fizycznych. I dramatyczne spadki przy zapisie dużej ilości danych, to w przypadku SMR też cecha konstrukcyjna, a nie usterka.

[WD Caviar Green 3TB] Błędy odczytu na 3 nowych dyskach

Post #1

Post #2

HEX SERWIS

Post #3

Post #4

Post #5

Post #6

Post #7

Post #8

Kaleron sp. z o. o.

Post #9

Post #10

Kaleron sp. z o. o.

Post #11

Post #12

Kaleron sp. z o. o.

Post #13

Post #14

Kaleron sp. z o. o.

Post #15

Post #16

Kaleron sp. z o. o.

Podsumowanie tematu