![]() ![]() |
Aug 22 2007, 10:29 AM
Post
#1
|
|
![]() Master ![]() ![]() ![]() Group: Members Posts: 369 Joined: 12-March 02 Member No.: 1,620 |
I've noticed high iowait on our box for the last few months and put in a ticket w/ The Planet to take a look. They ran a test to check for bad blocks and came back and told us the disk was fine.
My question is this: Do bad blocks cause iowait? |
|
|
|
Aug 22 2007, 03:59 PM
Post
#2
|
|
|
SuperGeek ![]() ![]() ![]() ![]() Group: Members Posts: 1,481 Joined: 18-November 05 From: Lake Michigan Member No.: 18,911 |
What does the following look like?
smartctl -t long /dev/sda smartctl -a /dev/sda -------------------- |
|
|
|
Aug 22 2007, 06:26 PM
Post
#3
|
|
![]() SuperGeek ![]() ![]() ![]() ![]() Group: Members Posts: 4,856 Joined: 23-May 03 Member No.: 7,754 |
A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks.
You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up. Smartctl is a good test but really is not all that trustworthy, even if it says a disk is not in the best condition they are often fine for years. I also do not believe only a bad smartctl is enough for a hardware swap. What does iostat show when you have the high load? If your disk is in fact doing a lot of IO the disk may be fine and just busy. -------------------- John W My personal website with many free security and linux how-to's! Tss -- Live Support! Tweaking, Securing, 24x7 Service Monitoring, Monthly Management, Migrations, Restores, Optimization, LoadBalancer Configuration, Mysql Clusters, Custom Configurations, Consulting. English And Spanish Support! We do it all @ TotalServerSolutions |
|
|
|
Aug 22 2007, 07:31 PM
Post
#4
|
|
![]() SuperGeek ![]() Group: Admin Posts: 1,242 Joined: 18-May 07 From: Dallas, Tx Member No.: 48,459 |
A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks. You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up. Smartctl is a good test but really is not all that trustworthy, even if it says a disk is not in the best condition they are often fine for years. I also do not believe only a bad smartctl is enough for a hardware swap. What does iostat show when you have the high load? If your disk is in fact doing a lot of IO the disk may be fine and just busy. Also, check your swap usage. Might be an indicator that you need to consider more memory. As far as smart goes, it's an OK indicator. I have a 20GB at home that's been pending failure for 4 years now according to smart. An offline(manufacturer) test is the best indicator if the drive is failing, it'll take out factors, such as load. I've personally authorized HDD replacements based on smartctl and iowait alone. In any case.. back your data up! -------------------- Tomy Durden
Manager - Office of Change Management |
|
|
|
Aug 23 2007, 02:19 PM
Post
#5
|
|
![]() Master ![]() ![]() ![]() Group: Members Posts: 369 Joined: 12-March 02 Member No.: 1,620 |
A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks. You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up. The iostat command is what I've been using to determine there's a iowait issue. It gets up to 90% at times when the server is busy. I'm extremely doubtful their techs ran a badblocks test and monitored the system using the troubleshooting method you explained above. Given the response to our ticket, I'm certain they simply ran a badblocks test, and then came back and said all is well. I appreciate the responses, I have recently disabled mailscanner and spamassassin which were sources of high load and have replaced w/ milters. I'll keep an eye on the box and see if that helps any. Also, our data is backed up several times a week. |
|
|
|
Aug 23 2007, 02:59 PM
Post
#6
|
|
|
SuperGeek ![]() ![]() ![]() ![]() Group: Members Posts: 1,481 Joined: 18-November 05 From: Lake Michigan Member No.: 18,911 |
The one big (and possibly obvious, by the name) drawback of the offline test is that it takes the server totally offline for several hours in order to perform it (at least when I had one done this spring.)
-------------------- |
|
|
|
Aug 23 2007, 05:40 PM
Post
#7
|
|
![]() SuperGeek ![]() Group: The Planet Staff Posts: 1,696 Joined: 27-December 05 Member No.: 19,248 |
Well, if the system is attempting to writing/reading from a bad sector it will cause i/o wait. If badblocks is run, it will mark bad blocks as... well... bad... And the system will not use those blocks. So if after a badblocks scan, the iowait is gone.. that was your problem. The only drawback is if you are experiencing high iowait, a badblocks test could take a LONG time.
As Tomy said, drive replacements have been made based solely on smartctl, as iffy as its results are... It is just better to be safe than sorry. So if you are getting bad smartctl responses, you might as well back up your data, and get the drive replaced. good luck, -James -------------------- "The average person thinks he isn't." -- Father Larry Lorenzoni
James Jhurani Managed Hosting http://www.theplanet.com |
|
|
|
Aug 31 2007, 11:50 AM
Post
#8
|
|
|
Enlightened ![]() Group: The Planet Staff Posts: 68 Joined: 17-March 05 From: Houston, Texas Member No.: 16,174 |
I tend to recommend checking for badblocks BEFORE creating the filesystem if possible with mke2fs -c /dev/sdbX
Optionally, you may also want to boot from a Linux LIVE CD and run a 'non destructive' read-write test with 'e2fsck -cc /dev/sdbX' I hope this helps! -------------------- Houston DataCenter Operations, Level 2 Technician
|
|
|
|
![]() ![]() |
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
| Lo-Fi Version | Time is now: 31st July 2010 - 07:48 AM |





Aug 22 2007, 10:29 AM








