DO NOT FEAR, EVERYONE WHO STILL RUNS AN OLD ENSIM SERVER SEEMS TO HAVE THIS PROBLEM SOONER OR LATER.
In my case, I didn't even discover this problem until 5 months after it happened (obviously an old server). You may be tempted to get a new server, which isn't a bad idea, but in the meantime you can still fix this Ensim one.
The problem is that Ensim relies on PostgreSQL, which has apparently crashed due to spontaneous database corruption which damaged the index for your Webppliance database.
NOTE: These instructions assume your PostgreSQL is located at /var/lib/pgsql
Before you follow the lengthy instructions below, your problems may simply be that PostgreSQL isn't running, so first try this command:
CODE
# pg_ctl '-D' '/var/lib/pgsql/data' start
If it starts, you may be able to log into Ensim Webppliance and already see your sites listed!
If not, then please continue to follow these steps...
1. Make sure no postgres services are running
CODE
# ps auxw | grep post
2. Stop PostgreSQL using pg_ctl utility (DO NOT EVER USE "KILL" TO STOP postgresql).
CODE
# pg_ctl '-D' '/var/lib/pgsql/data' stop
***IF AND ONLY IF THIS DOESN'T WORK, THEN USE: # service postgresql stop
3. Look for the pid file and delete it.
CODE
# find / -name postmaster.pid
# rm -f /var/lib/data/postmaster.pid
# rm -f /var/lib/data/postmaster.pid
4. Backup your PostgreSQL databases.
CODE
# cp -dpR /var/lib/pgsql/data /var/lib/pgsql/data_BAK
5. Try to start PostgreSQL using pg_ctl.
CODE
# pg_ctl '-D' '/var/lib/pgsql/data' start
This should output something like:
CODE
DEBUG: database system was interrupted at 2006-04-11 00:05:07 EDT
DEBUG: ReadRecord: bad rmgr data CRC in record at 1/1636292512
DEBUG: Invalid primary checkPoint record
DEBUG: ReadRecord: bad rmgr data CRC in record at 1/1636275772
DEBUG: Invalid secondary checkPoint record
FATAL 2: Unable to locate a valid CheckPoint record
DEBUG: ReadRecord: bad rmgr data CRC in record at 1/1636292512
DEBUG: Invalid primary checkPoint record
DEBUG: ReadRecord: bad rmgr data CRC in record at 1/1636275772
DEBUG: Invalid secondary checkPoint record
FATAL 2: Unable to locate a valid CheckPoint record
The "ReadRecord" messages are culprit of the error.
7. Force reset of the write-ahead log and other control information of a PostgreSQL database cluster. They are important but this is the only way to recover.
CODE
# /usr/lib/pgsql/contrib/pg_resetxlog/pg_resetxlog -f /var/lib/pgsql/data
8. Start PostgreSQL
CODE
# pg_ctl '-D' '/var/lib/pgsql/data' start
***DO NOT START IT USING: # service postgresql start
It should say something like this:
CODE
postmaster successfully started
bash-2.05$ DEBUG: database system was shut down at 2006-09-16 10:10:13 EDT
DEBUG: CheckPoint record at (1, 1660944392)
DEBUG: Redo record at (1, 1660944392); Undo record at (1, 1660944392); Shutdown TRUE
DEBUG: NextTransactionId: 844523; NextOid: 932552
DEBUG: database system is in production state
bash-2.05$ DEBUG: database system was shut down at 2006-09-16 10:10:13 EDT
DEBUG: CheckPoint record at (1, 1660944392)
DEBUG: Redo record at (1, 1660944392); Undo record at (1, 1660944392); Shutdown TRUE
DEBUG: NextTransactionId: 844523; NextOid: 932552
DEBUG: database system is in production state
9. If it did not start successfully in Step 8, you may need to re-index the Webppliance db.
CODE
# postgres -D /var/lib/data -O -P appldb
This should get you into the database, displaying a "backend>" prompt, so you can re-index the Webppliance db:
CODE
> reindex database appldb;
> q
> q
*** The "q" is supposed to quit, but didn't work for me, so I had to close my SSH session and log in again.
9. Vacuum the database (apparently it is good to do this periodically, but may take while the first time you do it).
CODE
# vacuumdb -d appldb -U postgres -z -v
*** WEBPPLIANCE SHOULD NOW WORK AND DISPLAY YOUR SITES AGAIN!!!
*** IF WEBPPLIANCE DOES NOT DISPLAY YOUR SITES, RESTORE THE BACKUP "data" AND RE-TRY FROM STEP 4. YOU MIGHT HAVE ACCIDENTALLY USED "kill" or "service postgresql start/stop" WHICH SEEMS TO CAUSE MORE PROBLEMS, SO BE SURE YOU ONLY USE THE pg_ctl COMMAND. OTHERWISE, YOUR SERVER MAY SUDDENLY CRASH AND KEEP CRASHING, FORCING YOU TO REBOOT FROM "EV1 SERVER COMMAND" AND STOP PostgreSQL UNTIL YOU FIX THE REAL PROBLEM. DO NOT GET DISCOURAGED AND DUMP YOUR SERVER AS I NEARLY DID. JUST REPEAT THE PROCESS ABOVE AND IT SHOULD WORK.
If these steps don't solve it for you, here are a few other links to check:
http://forums.ev1servers.net/showthread.php?t=46485
http://forums.ev1servers.net/showthread.php?t=45187
http://forums.ev1servers.net/showthread.php?t=63341
http://forums.ev1servers.net/showthread.php?t=46511
http://forums.ev1servers.net/showthread.php?t=26918
http://www.psoft.net/HSdocumentation/sysad...s_database.html