Help - Search - Members - Calendar
Full Version: Named/BIND failed on both of my servers
The Planet Forums > Control Panels > cPanel/WHM
pr0tahn
I was troubleshooting the one and just could NOT find it. Then I noticed it was also failing on the other and I knew something very bad was going on.

I just found this on CPanel's forums:

QUOTE
hi

since the release update last night
named is down on all my servers

rndc: connect failed: connection refused

Starting named: [FAILED]
root@ns23 [~]#


Jul 21 08:16:48 ns23 named[7330]: using 1 CPU
Jul 21 08:16:48 ns23 named[7330]: loading configuration: failure
Jul 21 08:16:48 ns23 named[7330]: exiting (due to fatal error)
Jul 21 08:16:48 ns23 named: named startup failed
Jul 21 08:22:54 ns23 named[7570]: starting BIND 9.2.4 -u named
Jul 21 08:22:54 ns23 named[7570]: using 1 CPU
Jul 21 08:22:54 ns23 named[7570]: loading configuration: failure
Jul 21 08:22:54 ns23 named[7570]: exiting (due to fatal error)
Jul 21 08:22:54 ns23 named: named startup failed


I dont know what the config problem is...

thx


QUOTE
Yes, we are also seeing this on all servers running RHEL3.

CPanel needs to check this issue immediately.


Anyone else?

If they hosed everyone that's going to be horrifying.
pr0tahn
I just talked to a tech who got me fixed up right away (thanks icon_smile.gif. Apparently there was a bad update.

He said something about changing my resolv.conf to point to a server other than myself so 'up2date -u bind-libs' could run. Then he changed resolv.conf back.
Quibbler
I have this problem also.

The fix should be posted here, people are going to come looking for it!

Quibbler
Quibbler
The solution for me was to :-

backup /etc/resolv.conf icon_biggrin.gif

add to the top of resolv.conf the ip of a working theplanet nameserver, I used :-

CODE
nameserver 70.87.7.70


Save resolv.conf

Then run

CODE
up2date -u bind-libs


If completes the update successfully

CODE
service named restart


Problem fixed. Use at your own risk!

Quibbler
spiff06
Thanks! That worked.

How do I prevent this from occuring again? Is there a ServerMatrix service for DNS redundancy, similar to DynDNS?
nsusa
This worked for me, too. Just found about the fix after it got fixed. Had submitted a ticket and Support got me backup in less than 5 minutes!!!!!

Thanks to ThePlanet/Servermatrix icon_smile.gif
pr0tahn
QUOTE (spiff06)
Thanks!  That worked.

How do I prevent this from occuring again?  Is there a ServerMatrix service for DNS redundancy, similar to DynDNS?


good question. i was in a full-on panic and started signing up for easydns (20/yr) becaues i could not figure it out.
martynas
This should be sticky as I am experiencing (was experiencing the same problem). The funny thing was i was experiencing a brute force attack just before this update happed so my log looked pretty scary...

Thanks for the fix.
timbernet
Wow - I had this too...

The fix is not working for me, but I also updated cPanel and did some other things thinking that cPanel just fsck'ed itself - again...
timbernet
QUOTE (timbernet)
Wow - I had this too...  

The fix is not working for me, but I also updated cPanel and did some other things thinking that cPanel just fsck'ed itself - again...


Fixed - in my attempts to fix it, I changed the permissions and forgot to change them back - mea culpa
James Erickson
This wasn't only a cpanel issue, but yes, cpanel servers were affected. I will get with our internal services guys to see if we can't find a way to prevent this from happening again.

On a side note, I would highly recommend keeping at least 1 external resolver in your /etc/resolv.conf to ensure that you can resolve hosts if your named server goes down. This way, 'updates' can be pushed out at a later time. If you do not have an external resolver listed, and your named service is down, you will also not be able to recieve updates from the RHN servers.
TheUniverses
QUOTE (spiff06)
Thanks!  That worked.

How do I prevent this from occuring again?  Is there a ServerMatrix service for DNS redundancy, similar to DynDNS?


You can sign up for rollernet.us or something similar, which is free.
jamesn
QUOTE (jerickson)
This wasn't only a cpanel issue, but yes, cpanel servers were affected. I will get with our internal services guys to see if we can't find a way to prevent this from happening again.


The issue is really two fold, and while non CPanel users might have seen it, that's somewhat of an edge case.

The bind rpm has a dependency list as such:

CODE
/bin/bash

/bin/sh

/bin/sh

/bin/sh

/bin/sh

/bin/sh

/bin/sh

/bin/usleep

bind-utils

chkconfig

config(bind) = 20:9.2.4-14_EL3

fileutils

glibc >= 2.2

grep

libc.so.6

libc.so.6(GLIBC_2.0)

libc.so.6(GLIBC_2.1)

libc.so.6(GLIBC_2.1.1)

libc.so.6(GLIBC_2.1.3)

libc.so.6(GLIBC_2.3)

libcrypto.so.4

libdns.so.16

libisc.so.7

libisccc.so.0

libisccfg.so.0

liblwres.so.1

libnsl.so.1

libpthread.so.0

libpthread.so.0(GLIBC_2.0)

rpmlib(CompressedFileNames) <= 3.0.4-1

rpmlib(PartialHardlinkSets) <= 4.0.4-1

rpmlib(PayloadFilesHavePrefix) <= 4.0-1

sed

shadow-utils

textutils


The dependency lists looks OK on the surface, but notice how bind-utils is a dependency, but none of the other bind packages are depeancies. For the bind-libs package, the bind rpm has files bind-libs PROVIDES as dependencies, but not the package itself.

Here's where the problem pops up: the files provided by bind-libs have been updated, but the version numbers on the libraries have not changed. This means that the old bind-libs package solves the dependencies for libdns, libisc, libiscfg, etc. Unfortunately, the new set of libraries breaks binary compatibility with the bind package. (something which is not supposed to happen) Both packages have to be updated at the same time.

If you run, say, 'up2date -u bind' up2date will update bind for you. It will also most certainly update bind-utils for you as well, but bind-libs will not be updated. This is because of the way the dependencies are built: the bind-utils dependency will force a package version check, up2date will see that the installed version of bind-utils is older than the available version and update. There is no bind-libs dependency so there's never a version check. Running 'up2date -u' works around this problem by forcing up2date to do a version check on all of the installed packages, so it would see the old bind-libs package installed and update it accordingly.

Now take a look what how CPanel runs up2date via /scripts/sysup (YMMV, I grabbed this from ps(1) output on a test system):

CODE
up2date --nox -i bind bind-devel bind-utils bzip2 expect freetype freetype-devel gcc gd gd-devel gd-progs gd-utils gnupg libgd1 libgd1-devel libmysqlclient10-dev lynx openssh openssh-clients openssh-server openssl openssl-devel openssl-misc perl-CPAN pine sharutils ucd-snmp ucd-snmp-devel ucd-snmp-utils wget XFree86-devel XFree86-libs


Oops. CPanel specified bind, bind-devel and bind-utils. No bind-libs, so no version check on that package is ever done and since the libraries already exist the dependency is considered solved.

We believe that we have implemented a good work around; we've "scheduled" all 4 of the bind packages for every RHEL3 system that has bind installed on it. That should force up2date to update bind properly the next time the system checks in (once every couple of hours IIRC). Get ought to get a working bind-libs out to all the systems that need it, and cause the other systems to update all of the bind packages, hopefully in the right order.
Stefaans
Knowing that things like this happen from time to time (blame it on cPanel if you want), I have disabled automatic updates on all our servers. I rather sit down once a month a initiate all the updates myself icon_wink.gif

When I read about this issue today, I thought I would be proactive and do the Bind update manually, before cPanel does it for me one of theses days. Little did I know that the instructions above works for fixing the problem, rather than be a complete set of update instructions in itself. I ended up with a mismatch of RPM versions and Bind not wanting to run because of that.

I received the following errors:
CODE
zone version.bind/CH: has 0 SOA records

zone version.bind/CH: has no NS records

view.c:347: REQUIRE((&view->references)->refs > 0) failed


Should you wish to run the Bind update yourself and do not have the cPanel induced problem yet, I suggest the following:

1) Make sure you have a valid setup in your /etc resolv.conf as explained above

2) Update both Bind and it libraries:
CODE
up2date -u bind

up2date -u bind-libs


I hope that helps someone in the same situation.
Stefaans
QUOTE (Quibbler)
add to the top of resolv.conf the ip of a working theplanet nameserver, I used :-  
QUOTE
nameserver 70.87.7.70

With that as the fist nameserver entry, Exim suddenly gave us "unroutable domain" errors. I moved the main server IP back into the first position, and all was fine again.

Does this mean the TP name server 70.87.7.70 does not allow recursion for our servers? icon_confused.gif
optic
QUOTE (Stefaans)
With that as the fist nameserver entry, Exim suddenly gave us "unroutable domain" errors. I moved the main server IP back into the first position, and all was fine again.

Does this mean the TP name server 70.87.7.70 does not allow recursion for our servers?   icon_confused.gif


I had the same problem i to have moved it to the bottom of the list, well see how it goes.
APC Hosting
Thanks this allowed me to fix my server before my techs responded!

Is their anyway to ping the status of named service?

I have monitoring scripts that ping http, smtp, pop3 etc what port does named run on? or how do I find out?

Cheers

Andrew
Paul
DNS uses port 53...
APC Hosting
I thought that and thats what port I use to monitor DNS but I got no failure notifications. :shock:

Time to investigate

Thanks

Andrew
spiff06
QUOTE (jester)
QUOTE (spiff06)
Is there a ServerMatrix service for DNS redundancy, similar to DynDNS?


You can sign up for rollernet.us or something similar, which is free.

Kewl.

Browsed around, looks like there are many others. Couldn't find a comparison matrix between all these Secondary DNS services, though...

Could you give me pointers as to criteria for choosing a 2DNS server?

Thanks icon_biggrin.gif
jamesn
QUOTE (Stefaans)
Does this mean the TP name server 70.87.7.70 does not allow recursion for our servers?


That's an authorative name server. It does not do recursion.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.