Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Bad Blocks & IOWAIT, HD Problems w/ Our Server
CyberSEAL
post Aug 22 2007, 10:29 AM
Post #1


Master
***

Group: Members
Posts: 369
Joined: 12-March 02
Member No.: 1,620



I've noticed high iowait on our box for the last few months and put in a ticket w/ The Planet to take a look. They ran a test to check for bad blocks and came back and told us the disk was fine.

My question is this: Do bad blocks cause iowait?
Go to the top of the page
 
+Quote Post
Jeff
post Aug 22 2007, 03:59 PM
Post #2


SuperGeek
****

Group: Members
Posts: 1,481
Joined: 18-November 05
From: Lake Michigan
Member No.: 18,911



What does the following look like?

smartctl -t long /dev/sda

smartctl -a /dev/sda


--------------------
Go to the top of the page
 
+Quote Post
eth00
post Aug 22 2007, 06:26 PM
Post #3


SuperGeek
****

Group: Members
Posts: 4,856
Joined: 23-May 03
Member No.: 7,754



A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks.

You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up.

Smartctl is a good test but really is not all that trustworthy, even if it says a disk is not in the best condition they are often fine for years. I also do not believe only a bad smartctl is enough for a hardware swap.

What does

iostat


show when you have the high load? If your disk is in fact doing a lot of IO the disk may be fine and just busy.


--------------------
John W
My personal website with many free security and linux how-to's!
Tss -- Live Support! Tweaking, Securing, 24x7 Service Monitoring, Monthly Management, Migrations, Restores, Optimization, LoadBalancer Configuration, Mysql Clusters, Custom Configurations, Consulting. English And Spanish Support!
We do it all @ TotalServerSolutions
Go to the top of the page
 
+Quote Post
Tomy Durden
post Aug 22 2007, 07:31 PM
Post #4


SuperGeek
Group Icon

Group: Admin
Posts: 1,242
Joined: 18-May 07
From: Dallas, Tx
Member No.: 48,459



QUOTE (eth00 @ Aug 22 2007, 07:26 PM) *
A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks.

You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up.

Smartctl is a good test but really is not all that trustworthy, even if it says a disk is not in the best condition they are often fine for years. I also do not believe only a bad smartctl is enough for a hardware swap.

What does

iostat
show when you have the high load? If your disk is in fact doing a lot of IO the disk may be fine and just busy.


Also, check your swap usage. Might be an indicator that you need to consider more memory.

As far as smart goes, it's an OK indicator. I have a 20GB at home that's been pending failure for 4 years now according to smart. An offline(manufacturer) test is the best indicator if the drive is failing, it'll take out factors, such as load. I've personally authorized HDD replacements based on smartctl and iowait alone.

In any case.. back your data up!


--------------------
Tomy Durden
Manager - Office of Change Management
Go to the top of the page
 
+Quote Post
CyberSEAL
post Aug 23 2007, 02:19 PM
Post #5


Master
***

Group: Members
Posts: 369
Joined: 12-March 02
Member No.: 1,620



QUOTE (eth00 @ Aug 23 2007, 12:26 AM) *
A badblocks test is one of the better ways to see if a disk is failing or otherwise having trouble. During the badblocks test you can often either get a very high load, errors on the screen, or run smoothly. If your server is already it may already have a high load so that is not always a good indication. "Ideally" on a failing drive it reports bad blocks.

You have to know what iowait is before you can really say if badblocks can cause iowait. The iowait you see in top is basically the amount of time the system is holding up an action because of the response from the disk. Going on that if you have a bad block on the OS that is causing it to re-read or skip over you are going to have a delay in the IO operations which may possibly cause the IO to raise. So the short answer is YES. A slow disk is a good indication that something in up.


The iostat command is what I've been using to determine there's a iowait issue. It gets up to 90% at times when the server is busy. I'm extremely doubtful their techs ran a badblocks test and monitored the system using the troubleshooting method you explained above. Given the response to our ticket, I'm certain they simply ran a badblocks test, and then came back and said all is well.

I appreciate the responses, I have recently disabled mailscanner and spamassassin which were sources of high load and have replaced w/ milters. I'll keep an eye on the box and see if that helps any. Also, our data is backed up several times a week.
Go to the top of the page
 
+Quote Post
Jeff
post Aug 23 2007, 02:59 PM
Post #6


SuperGeek
****

Group: Members
Posts: 1,481
Joined: 18-November 05
From: Lake Michigan
Member No.: 18,911



The one big (and possibly obvious, by the name) drawback of the offline test is that it takes the server totally offline for several hours in order to perform it (at least when I had one done this spring.)


--------------------
Go to the top of the page
 
+Quote Post
James Jhurani
post Aug 23 2007, 05:40 PM
Post #7


SuperGeek
Group Icon

Group: The Planet Staff
Posts: 1,696
Joined: 27-December 05
Member No.: 19,248



Well, if the system is attempting to writing/reading from a bad sector it will cause i/o wait. If badblocks is run, it will mark bad blocks as... well... bad... And the system will not use those blocks. So if after a badblocks scan, the iowait is gone.. that was your problem. The only drawback is if you are experiencing high iowait, a badblocks test could take a LONG time.

As Tomy said, drive replacements have been made based solely on smartctl, as iffy as its results are... It is just better to be safe than sorry. So if you are getting bad smartctl responses, you might as well back up your data, and get the drive replaced.

good luck,
-James


--------------------
"The average person thinks he isn't." -- Father Larry Lorenzoni


James Jhurani
Managed Hosting
http://www.theplanet.com
Go to the top of the page
 
+Quote Post
jbyers
post Aug 31 2007, 11:50 AM
Post #8


Enlightened
Group Icon

Group: The Planet Staff
Posts: 68
Joined: 17-March 05
From: Houston, Texas
Member No.: 16,174



I tend to recommend checking for badblocks BEFORE creating the filesystem if possible with mke2fs -c /dev/sdbX

Optionally, you may also want to boot from a Linux LIVE CD and run a 'non destructive' read-write test with 'e2fsck -cc /dev/sdbX'

I hope this helps!


--------------------
Houston DataCenter Operations, Level 2 Technician
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

Lo-Fi Version Time is now: 31st July 2010 - 07:48 AM