Help - Search - Members - Calendar
Full Version: Global SA Bayes under Ensim4
The Planet Forums > Control Panels > Ensim > Ensim HOWTOs
Pages: 1, 2
Catalyst
* 200410022320-5: Changed: I messed up and put "auto_learn 1" down there instead of "bayes_auto_learn 1"
* 200412041544-5: This HOWTO is for 2.6x+. If you keep coming up with errors, look at "man Mail::SpamAssassin::Conf" for more info about local.cf configurations.
* 200505011219-4: 2.6x+ "may" not support SQL (depends on the distro you ended up with), but there's no reason, at this point, for anyone to be using 2.6x. Use SA 3.x+

Since a whole bunch of people have asked, and I keep getting pinned in Chat about it ...


1. Create a database called "spamassassin"

If you're having issues doing that, then try:

CODE
mkdir /var/lib/mysql/spamassassin

chown mysql:root /var/lib/mysql/spamassassin


2. Open up phpMyAdmin, and go to the "spamassassin" database and run the following as a Query in the SQL tab:

CODE
CREATE TABLE bayes_expire (

 id int(11) NOT NULL default '0',

 runtime int(11) NOT NULL default '0',

 KEY bayes_expire_idx1 (id)

) TYPE=MyISAM;



CREATE TABLE bayes_global_vars (

 variable varchar(30) NOT NULL default '',

 value varchar(200) NOT NULL default '',

 PRIMARY KEY  (variable)

) TYPE=MyISAM;



INSERT INTO bayes_global_vars VALUES ('VERSION','3');



CREATE TABLE bayes_seen (

 id int(11) NOT NULL default '0',

 msgid varchar(200) binary NOT NULL default '',

 flag char(1) NOT NULL default '',

 PRIMARY KEY  (id,msgid)

) TYPE=MyISAM;



CREATE TABLE bayes_token (

 id int(11) NOT NULL default '0',

 token char(5) NOT NULL default '',

 spam_count int(11) NOT NULL default '0',

 ham_count int(11) NOT NULL default '0',

 atime int(11) NOT NULL default '0',

 PRIMARY KEY  (id, token)

) TYPE=MyISAM;



CREATE TABLE bayes_vars (

 id int(11) NOT NULL AUTO_INCREMENT,

 username varchar(200) NOT NULL default '',

 spam_count int(11) NOT NULL default '0',

 ham_count int(11) NOT NULL default '0',

 token_count int(11) NOT NULL default '0',

 last_expire int(11) NOT NULL default '0',

 last_atime_delta int(11) NOT NULL default '0',

 last_expire_reduce int(11) NOT NULL default '0',

 oldest_token_age int(11) NOT NULL default '2147483647',

 newest_token_age int(11) NOT NULL default '0',

 PRIMARY KEY  (id),

 UNIQUE bayes_vars_idx1 (username)

) TYPE=MyISAM;


3. Still in phpMyAdmin, click "Home" on the left. Click "Privileges" on the right. Scroll down a bit and click "Add a New User."

CODE
Username:  spamassassin

Host:  localhost

Password:  spamassassin


Do not assign any privileges, and hit "Go."

On the next screen, under the "Database-specific privileges" section, set:

CODE
Add privileges on the following database:  spamassassin


On the next screen, allow "Select, Insert, Update and Delete" only, and click Go.

Now you've created a user "spamassassin" with the password "spamassassin" and the only permissions the "spamassassin" user has are to make entries, changes and deletions in the "spamassassin" database.

4. Next ... edit /etc/mail/spamassassin/local.cf and make the following changes to the Bayes section --- these will be all you'll need:

CODE
# Enable the Bayes system

bayes_store_module      Mail::SpamAssassin::BayesStore::SQL

bayes_sql_dsn           DBI:mysql:spamassassin:localhost

bayes_sql_username      spamassassin

bayes_sql_password      spamassassin

use_bayes               1

bayes_auto_learn        1


Make sure you don't have any bayes_path or any other stuff in the local.cf --- this will do perfectly fine.

5. Make the same change in /home/virtual/FILESYSTEMTEMPLATE/spam_filter/etc/mail/spamassassin/local.cf.

6. Type "service spamassassin restart" (or "service spamd restart" if that comes back with an error).

7. Do your Post Maintenance:

CODE
/usr/local/sbin/set_pre_maintenance

/usr/local/sbin/set_maintenance

/usr/local/sbin/set_post_maintenance

/sbin/service webppliance restart


...and now you have Global Bayes.

Go ahead and import your Spam and Ham with sa-learn, and you're ready to go.


If you wanna use a centralized database on one server, and have all your others connect to it, just repeat Step 3 and use the IP of the other system in place of "localhost."

Of course, you'd have to open the firewall to your other server ... Incoming Rule from the other server's IP address on port 3306. Not a problem.

The only real issue with this is that Domain Users can see the username and password to the Spamassassin database. Its permissions are explicit ... and you can always back it up using "mysqldump -uspamassassin -pspamassassin --add-drop-table -a spamassassin > 2004xxxx-spamassassin.dump" and restore it just as easily. To me, it's a quick and dirty way to get global Bayes, with an acceptable risk that someone on the local system might wanna be a jerk and mess their mail up ... ;-)
Bluesax
Hi,
Have followed instructions to the letter but still am having the below errors. Any help would greatly be appreciated


Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: connection from localhost.localdomain [127.0.0.1] at port 46997
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: handle_user: unable to find user 'band@ghostroad.com.au'!
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Still running as root: user not specified with -u, not found, or set to root. Fall back to nobody.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Use of uninitialized value in pattern match (m//) at /usr/bin/spamd line 1047, line 4.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Use of uninitialized value in pattern match (m//) at /usr/bin/spamd line 1048, line 4.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Use of uninitialized value in concatenation (.) or string at /usr/bin/spamd line 1050, line 4.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Use of uninitialized value in concatenation (.) or string at /usr/bin/spamd line 1050, line 4.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: Use of uninitialized value in scalar assignment at /usr/bin/spamd line 1051, line 4.
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: fatal: setuid to nobody failed
Nov 1 23:58:05 ozzie-423-p4 spamd[18520]: error: Died at /usr/bin/spamd line 1054, line 4._ , continuing
Catalyst
Your problems have nothing at all to do with this HOWTO.
Bluesax
Sorry - still getting used to thhis forum . My apologies
Any help would be greatly appreciated- where to post
Catalyst
QUOTE
Originally posted by Bluesax
Sorry - still getting used to thhis forum . My apologies
Any help would be greatly appreciated- where to post


BTW ... you problem is most probably caused by people screwing up the case. Procmail invokes SA using user@hostname, so if they sent the mail to User@HostName.Tld the user doesn't exist, and you get those errors.
abubin
would appreciate if you could explain a little on what your method does and why it need to use a database for bayes.

Thanks and a great howto.
Catalyst
QUOTE
Originally posted by abubin
would appreciate if you could explain a little on what your method does and why it need to use a database for bayes.

Thanks and a great howto.


Because under the normal virtual filesystem, you're not able to get back to the bayes databases without hardlinking to them from each virtual site. It's also exceptionally faster than the dbfile method, and allows you to share the same Bayes database across multiple servers.
abubin
oh yeah...each virtual domain now have their own bayes. One more question, with autolearn ON, does this means the database will grow unlimited? This would cause problems when you have domain where there are heavy spamming going on. I have this domain where there's like 50000 mails each day and out of 95% are spams. Therefore previously i have my bayes set autolearn to OFF. Instead I uses mailwatch to manually feed it daily spams and hams. There's some manual work but it works better with each day. But since mailwatch doesn't work 100% with ensimized SA, this method would have to go.

Another method that I see people were using is that they schedule bayes database to be reset periodically. That would make sure that the database doesn't grow too big.

What do you think of those methods and any comment on them?

Thanks.
Catalyst
QUOTE
Originally posted by abubin
oh yeah...each virtual domain now have their own bayes. One more question, with autolearn OFF, does this means the database will grow unlimited? This would cause problems when you have domain where there are heavy spamming going on. I have this domain where there's like 50000 mails each day and out of 95% are spams. Therefore previously i have my bayes set autolearn to OFF. Instead I uses mailwatch to manually feed it daily spams and hams. There's some manual work but it works better with each day. But since mailwatch doesn't work 100% with ensimized SA, this method would have to go.

Another method that I see people were using is that they schedule bayes database to be reset periodically. That would make sure that the database doesn't grow too big.

What do you think of those methods and any comment on them?

Thanks.


These questions are beyond the scope of this HOWTO. ;-)

Any Bayes Database will grow to maximum unless you set it. However, it'll access a SQL query in an 1 Gig database much faster than it'll access a 1 Gig bayes_toks file --- dbfile is slow, and has to load the entire file before it can process, which'll cause a crapload of disk swapping.

If you're worried about the size of your database, you can always change local.cf (both copies)...

CODE
bayes_expiry_max_db_size           150000

bayes_auto_expire                  1


Another nice one ...

CODE
bayes_auto_learn_threshold_nonspam      -12.1

bayes_auto_learn_threshold_spam         12.0


It defaults 0.1 for nonspam, which is *really* bad, actually -- plenty of regular spam scores negative, based on the default spamassassin rules. -12.1/12.0 is a nice bell curve.
abubin
sorry to bother you again but can you tell me where to find explanations for those commands?

I have search in spamassassin site :

http://spamassassin.apache.org/full/3.0.x/...assin_Conf.html

but i can't find commands for bayes_expiry_scan_count.
JamesC
hmmm seems this does not work with Ensim 4.02.

I get these errors when I try sa-learn

CODE
debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_store_module      Mail::SpamAssassin::BayesStore::SQL

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_dsn           DBI:mysql:spamassassin:localhost

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_username      spamassassin

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_password      spamassassin

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_auto_learn        1
Catalyst
It has nothing to do with your Ensim version --- You have to be using version SpamAssassin 2.60 at a bare minimum (ie. one that allows for Bayes SQL storage).
JamesC
QUOTE
Originally posted by Catalyst
It has nothing to do with your Ensim version --- You have to be using version SpamAssassin 2.60 at a bare minimum (ie. one that allows for Bayes SQL storage).


Thank you Catalyst for the response I figured that out and I am upgradeing now icon_wink.gif

Then I will redo the changes icon_smile.gif
Catalyst
QUOTE
Originally posted by JamesC
Thank you Catalyst for the response I figured that out and I am upgradeing now icon_wink.gif  

Then I will redo the changes icon_smile.gif


3.01 is a beautiful thing.
splicesite
i followed the howto but it seems like the mysql database isnt being used.... spamassassin is still updating the bayes_toks files in ./spamassassin

how can I check if it is properly using the database for all bayesian updates? that is, how can i check if this howto worked properly
doug357
QUOTE
Originally posted by splicesite
i followed the howto but it seems like the mysql database isnt being used.... spamassassin is still updating the bayes_toks files in ./spamassassin

how can I check if it is properly using the database for all bayesian updates?  that is, how can i check if this howto worked properly


Make sure you don't have any bayes_path or any other stuff in the local.cf
Catalyst
QUOTE
Originally posted by splicesite
i followed the howto but it seems like the mysql database isnt being used.... spamassassin is still updating the bayes_toks files in ./spamassassin


*nod* What he said..

QUOTE
how can I check if it is properly using the database for all bayesian updates?  that is, how can i check if this howto worked properly


Look in the database...
splicesite
ya,
yes i had looked in the DB, it wasnt updating the db and there were no refs to bayes_path in the local.cf

anyway that was 2.6, i upgraded to version 3 and now the DB seems to be working properly

thanks for the help tho
webexceed
My turn. icon_biggrin.gif

Well I have SpamAssassin 2.64 installed from source...followed the how to in this forum somewhere.

Then did the instructions exactly as above and the database is not being used at all.

/etc/mail/spamassassin/local.cf and /home/virtual/FILESYSTEMTEMPLATE/spam_filter/etc/mail/spamassassin/local.cf

are the exact same:

CODE
bayes_store_module      Mail::SpamAssassin::BayesStore::SQL

bayes_sql_dsn   DBI:mysql:spamassassin:localhost

bayes_sql_username      spamassassin

bayes_sql_password      myrealpasswordhere

use_bayes       1

bayes_auto_learn        1


As root I run:

CODE
sa-learn --mbox --spam Junk E-mail

Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/PerMsgStatus.pm line 1274.

Learned from 703 message(s) (1164 message(s) examined).


I then look in the database and it's empty except for the 1 entry that was put there during setup.

I also checked and the sites all seem to have the correct local.cf as well.

Ensim 4.0.2-7 (RHEL) with all patches/fixes.

I'm guessing there is a configuration overriding this somewhere? I'm out of guesses on where to look. Anyone get this working with SA2.64???

I don't want to upgrade to 3.x until Ensim gets their act together to make it compatible. I hear 4.0.3 is coming out this week, but it sounds like it's the Fedora version.
hooter
@Catalyst, if you're still reading, great how-to and thanks cool.gif

I've got the mysql db setup and have changed spamassassin configuration to point to the db.

My question is this:
Could I set up a "junk" email account at my support site for example junk@mysupportsite.com - then I would like to have the users on this box who receive non-tagged spam emails that have slipped through, to forward that email to that "junk" account. Then I would just have a cron job which runs sa-learn on that junk mbox hourly/daily etc.

Will this work? Are there any dire implications/consequences?:confused:

Thanks and any information is greatly appreciated.
Catalyst
[weird dupe, weird that it got here anyway]
Catalyst
QUOTE
Originally posted by hooter
@Catalyst, if you're still reading, great how-to and thanks cool.gif  

I've got the mysql db setup and have changed spamassassin configuration to point to the db.

My question is this:
Could I set up a "junk" email account at my support site for example junk@mysupportsite.com - then I would like to have the users on this box who receive non-tagged spam emails that have slipped through,  to forward that email to that "junk" account.  Then I would just have a cron job which runs sa-learn on that junk mbox hourly/daily etc.

Will this work?  Are there any dire implications/consequences?:confused:  

Thanks and any information is greatly appreciated.


Yep ... easy ... just create the account ... I usually make an abuse@ account ... but, yeah ... Tell them to email me full emails (attachments instead of a simple forward), then drop them in a Monthly folder ... A quick and dirty cron.hourly script

CODE
#!/bin/sh

su -l root -s /bin/sh -c "sa-learn --spam --mbox --file /home/virtual/domain.tld/home/abuse/mail/Spam/`date '+%Y%m'` >> /dev/null"

su -l root -s /bin/sh -c "sa-learn --ham --mbox --file /home/virtual/domain.tld/home/abuse/mail/Ham >> /dev/null"


Neat stuff.

But ... also ... I do things quite differently than Ensim wants, as their way caused way too much load --- I host some pretty high-profile sites, and process *way* more mail than most people. Spam is quarantined, and the database is pretty up-to-date (and huge). I dunno ... I'm just weird, I guess. I don't wanna divulge too much, either. *grin*
hooter
Catalyst, were you trying to answer? icon_wink.gif
Catalyst
QUOTE
Originally posted by webexceed
CODE
sa-learn --mbox --spam Junk E-mail

Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/PerMsgStatus.pm line 1274.

Learned from 703 message(s) (1164 message(s) examined).



What's the file name you're trying to learn? You've got spaces and stuff in there ... but ... I'd say from the error there's a bad UTF-8 encoded message (or the file isn't in mbox format, or there's a bad EOL character, or...).

Try using a filename without a "-" in there ... sometimes that does screw PERL up.

As for why the database is still empty, I dunno ... When I did this howto, I was using 2.64 and it worked perfectly. Perhaps your install doesn't contain some of the PERL packages you need, like Perl-DBI-Mysql, etc. ?
webexceed
QUOTE
Originally posted by Catalyst
What's the file name you're trying to learn? You've got spaces and stuff in there ... but ... I'd say from the error there's a bad UTF-8 encoded message (or the file isn't in mbox format, or there's a bad EOL character, or...).  

Try using a filename without a "-" in there ... sometimes that does screw PERL up.

As for why the database is still empty, I dunno ... When I did this howto, I was using 2.64 and it worked perfectly.  Perhaps your install doesn't contain some of the PERL packages you need, like Perl-DBI-Mysql, etc. ?


Thanks for the reply!

Ya, that box is called "Junk Email" (As outlook creates when using IMAP). I'm thinking the same thing...a bad message in there somewhere. I think I'll empty it out and start recollecting spam, actually, I'll let Outlook do it for me, it seems to do a very good job at knowing what is spam and what isn't.

If I was missing something like Perl-DBI-Mysql wouldn't it complain about that? I'm going to check on that. Ensim says they will have a patch out "by the end of next week" which will allow an upgrade to SpamAssassin 3, so I'll see what that does. I've seen a few people say the db wouldn't work with 2.6x but that it suddenly started working with 3. We'll see what happens.

Oh, and thanks for the how-to also! icon_biggrin.gif
Catalyst
QUOTE
Originally posted by webexceed
Thanks for the reply!

Ya, that box is called "Junk Email" (As outlook creates when using IMAP).  I'm thinking the same thing...a bad message in there somewhere.  I think I'll empty it out and start recollecting spam, actually, I'll let Outlook do it for me, it seems to do a very good job at knowing what is spam and what isn't.

If I was missing something like Perl-DBI-Mysql wouldn't it complain about that? I'm going to check on that.  Ensim says they will have a patch out "by the end of next week" which will allow an upgrade to SpamAssassin 3, so I'll see what that does.  I've seen a few people say the db wouldn't work with 2.6x but that it suddenly started working with 3.  We'll see what happens.

Oh, and thanks for the how-to also!  :D


Do a "spamassassin -D --lint < test.message" with your favourite test.message ... look for the lines beginning with "debug: bayes:" and see what they say.
webexceed
QUOTE
Originally posted by Catalyst
Do a "spamassassin -D --lint < test.message" with your favourite test.message ... look for the lines beginning with "debug: bayes:" and see what they say.


Well I see this:

CODE
Failed to parse line in SpamAssassin configuration, skipping: bayes_store_module  Mail::SpamAssassin::BayesStore::SQL

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_dsn  DBI:mysql:spamassassin:localhost

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_username  spamassassin

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_password  myspamassassingpass


I'm at a loss to explain why it can't parse those lines. I tried it with a tab, and with a space, etc..

Futher on down in the output I have:

CODE
debug: using "/root/.spamassassin" for user state dir

debug: bayes: 20981 tie-ing to DB file R/O /root/.spamassassin/bayes_toks

debug: bayes: 20981 tie-ing to DB file R/O /root/.spamassassin/bayes_seen

debug: bayes: found bayes db version 2

debug: bayes: Not available for scanning, only 0 ham(s) in Bayes DB < 200


Going to post this and go checkout my /etc/mail/spamassassin/local.cf file some more since it seems to be the source of my trouble. ???

Adding this for info:

#spamassassin -V
SpamAssassin version 2.64
Catalyst
QUOTE
Originally posted by webexceed
Well I see this:

CODE
Failed to parse line in SpamAssassin configuration, skipping: bayes_store_module  Mail::SpamAssassin::BayesStore::SQL

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_dsn  DBI:mysql:spamassassin:localhost

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_username  spamassassin

Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_password  myspamassassingpass


I'm at a loss to explain why it can't parse those lines.  I tried it with a tab, and with a space, etc..  


Ok, here's a thought ... How did you upgrade to 2.64? Ensim, Source, 3rd party RPM or up2date? I've been trying to figure out why it works for some (on 2.63+) but not for others... Must be the difference in distributions.
webexceed
QUOTE
Originally posted by Catalyst
Ok, here's a thought ... How did you upgrade to 2.64?  Ensim, Source, 3rd party RPM or up2date?  I've been trying to figure out why it works for some (on 2.63+) but not for others... Must be the difference in distributions.


I had followed the how-to here:

http://forum.ev1servers.net/showthread.php...&threadid=42806

That is from Source, I just substituted spamassassin 2.64 for the 2.63 described in those instructions.

Hmm, something I just noticed:

# rpm -q spamassassin
spamassassin-2.55-3.4

I don't think it would have any effect on my problem, but I'm certainly no expert. I guess when I installed from source it never removed the RPM entry??
sirrion
is there anyway to test this and make sure the bayes portion is working?
webexceed
Yes, you can go into phpMyAdmin and login as the spamassassin user (or root) and look in the database. If it's filling up it's working.

I upgraded to Spamassassin 3.02 last night and my bayes sql database suddenly started working. I never did get it going with 2.64.

I think I was missing some modules somewhere.
paslax
Just in case anyone hasn't figured this one out...

Step 1 should be

mkdir /var/lib/mysql/spamassassin
chown mysql:root /var/lib/mysql/spamassassin
Catalyst
QUOTE
Originally posted by paslax
Just in case anyone hasn't figured this one out...

Step 1 should be

mkdir /var/lib/mysql/spamassassin
[B]chown
mysql:root /var/lib/mysql/spamassassin [/B]


thpft. Thanks. ;-) I do that in the shell all the time, too. Worse yet, I sometimes f&$* up and a whole Directory structure gets owned by 755:root. *snicker*
paslax
FWIW, I have also had the parsing problems described above.

rpm -qa spamassassin:
spamassassin-2.64-1

spamassassin -V
SpamAssassin version 2.64

In the man page, I don't have settins for "bayes_*", but I do have similar settings for "user_scores_*". Not much help though...
hooter
@Catalyst,

I just had to come back and report what a great how-to/mod this is. I've only been kicking myself that I hadn't done this sooner;)

Now with centralized bayes repository, auto-training cron script, and "Report Spam" address for customers that allows them to merely re-direct a questionable email, my spam management has become a virtual hands-off task now.

Quick question though, in this config line:
CODE
bayes_expiry_max_db_size           150000

bayes_auto_expire                  1


Can I assume that 150000 is in MBs therefore = 1.5 GIGs?
And that the bayes_auto_expire means that once reaching that size, oldest entries will be overwritten by newest ones?

Thanks again Catalyst:cool:
Catalyst
QUOTE
Originally posted by hooter
@Catalyst,

I just had to come back and report what a great how-to/mod this is.  I've only been kicking myself that I hadn't done this sooner;)  

Now with centralized bayes repository, auto-training cron script, and "Report Spam" address for customers that allows them to merely re-direct a questionable email, my spam management has become a virtual hands-off task now.

Quick question though, in this config line:
CODE
bayes_expiry_max_db_size           150000

bayes_auto_expire                  1


Can I assume that 150000 is in MBs therefore = 1.5 GIGs?
And that the bayes_auto_expire means that once reaching that size, oldest entries will be overwritten by newest ones?


Thanks. :-)

No, actually, it's the number of ham/spam tokens it knows. It use it considerably higher, myself, but it's a good starting point for a busy server. Change according to your needs. :-)
hooter
Excellent! Thanks!
ozric
Nice and easy HowTO Catalyst. I found this thread from searching google for SA with SQL.

Walking thru your directions were clear and I got it working the first time. I noticed that every user (user@domain.tld) has their own Bayes bucket to play with. I don't think I want that for my server. I have 4 virtual domains on my server (vpopmail) and would like to have a Bayes per domain instead of user. It does no good when a user has 4 forwards/aliases that point to an actual account. That gives a total of 5 Bayes databases for 1 actual account.

Before the SQL set-up it was a site-wide Bayes set-up, now it's on the other end of the spectrum, per user. How can I set this up so that there is a Bayes bucket for each virtual domain within MySQL? vpopmail, qmail, clamd & simscan.

Thanks for your time.
Catalyst
QUOTE (ozric)
Walking thru your directions were clear and I got it working the first time.  I noticed that every user (user@domain.tld) has their own Bayes bucket to play with.  I don't think I want that for my server.  I have 4 virtual domains on my server (vpopmail) and would like to have a Bayes per domain instead of user.  It does no good when a user has 4 forwards/aliases that point to an actual account.  That gives a total of 5 Bayes databases for 1 actual account.

Before the SQL set-up it was a site-wide Bayes set-up, now it's on the other end of the spectrum, per user.  How can I set this up so that there is a Bayes bucket for each virtual domain within MySQL?  vpopmail, qmail, clamd & simscan.


Actually, none of that changed --- all this mod did was change the storage method from a Flatfile database to a SQL database. Now that you're able to SEE the structure of the Bayes database, kinda makes you wonder about the Bayes implementation, doesn't it? *grin*

I'd rather not go into it here, but take a look at the man page for Mail::SpamAssassin::Conf / bayes_sql_override_username and you'll find what you're looking for.
JoshFink
Hi.. Quick question, and hopefully someone can help.

I've read through all the posts but I'm not sure how to do what I want.

I have a domain that I set up on my server. I want to have everyone on the server forward all their spam there as attachments.

The path is

/home/virtual/site4/fst/var/spool/mail/fmadmin

If I have everyone forward all of ther attachments to this account and then do an

sa-learn --spam --mbox /home/virtual/site4/fst/var/spool/mail/fmadmin

This should work right?

Is there a way that I can set up the server so that any spam that is KNOWN spam (a high score, I would assume) would just get deleted and not even make it into a users mailbox?

Right now all the mail is getting tagged as [SPAM] but the user is still getting the email just now with the [SPAM] subject.

Thanks for the help, I'm still trying to understand all of this.

Josh
almartin
QUOTE (JamesC)
hmmm seems this does not work with Ensim 4.02.

I get these errors when I try  sa-learn

CODE
debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_store_module      Mail::SpamAssassin::BayesStore::SQL

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_dsn           DBI:mysql:spamassassin:localhost

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_username      spamassassin

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_sql_password      spamassassin

debug: Failed to parse line in SpamAssassin configuration, skipping: bayes_auto_learn        1


I am running ensim 4.0.3 with spamassassin 2.63 and have followed this threads initial instructions on setting up a database for storing bayes scores.

I am getting these same errors when trying to run sa-learn, where am I going wrong?

Doe's ensim have the DBI perl module already installed? would that be the problem?

Al
Catalyst
QUOTE (JoshFink)
This should work right?


Yep, will work fine.

QUOTE
Is there a way that I can set up the server so that any spam that is KNOWN spam (a high score, I would assume) would just get deleted and not even make it into a users mailbox?


Yeah, make a procmail rule. A little hint, tho ... Don't re-write the subject, and do the Procmail rule to search for ^X-Spam-Status: (Yes|YES|yes) ... Like ...

CODE
:0H

* ^X-Spam-Status: (Yes|YES|yes).*

! user@domain.tld


Just be sure you *don't* use that as a global rule on the domain.tld you're using it on, or you'll create an endless loop and the mail delivery will fail ... You'll have to modify that instance of /etc/procmailrc to drop it in the local folder you mentioned.
Catalyst
QUOTE (almartin)
I am running ensim 4.0.3 with spamassassin 2.63 and have followed this threads initial instructions on setting up a database for storing bayes scores.

I am getting these same errors when trying to run sa-learn, where am I going wrong?

Doe's ensim have the DBI perl module already installed? would that be the problem?


3.0.2 is out for 4.0.3 --- use it.

Apparently, some versions of 2.6x had SQL and others didn't. It was removed from the source trees at spammassassin.org as well ... so ... do the upgrade from Ensim.
almartin
I've upgraded to version 3.0 now and everything seemed to be fine.

I initially ran sa-learn on over 200 messages and everything worked.

Now however when I try running sa-learn it hangs on 'initialising learner'

Here is my etc/mail/spamassassin/local.cf file

# These values can be overridden by editing ~/.spamassassin/user_prefs.cf
# (see spamassassin(1) for details)

# These should be safe assumptions and allow for simple visual sifting
# without risking lost emails.

required_hits 1
report_safe 0
rewrite_header Subject +++SPAM+++

# Enable the Bayes system
bayes_store_module Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn DBI:mysql:spamassassin:localhost
bayes_sql_username spamassassin
bayes_sql_password spamassassin
use_bayes 1
bayes_auto_learn 1

#White list
whitelist_from *@client17.email-bureau.co.uk
whitelist_from *@f-secure.com
etc ..........

debug of sa-learn

debug: SpamAssassin version 3.0.2
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/root/bin', which doesn't exist, dropping.
debug: Final PATH set to: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin
debug: using "/etc/mail/spamassassin/init.pre" for site rules init.pre
debug: config: read file /etc/mail/spamassassin/init.pre
debug: using "/usr/share/spamassassin" for default rules dir
debug: config: read file /usr/share/spamassassin/10_misc.cf
debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf
debug: config: read file /usr/share/spamassassin/20_body_tests.cf
debug: config: read file /usr/share/spamassassin/20_compensate.cf
debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /usr/share/spamassassin/20_drugs.cf
debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /usr/share/spamassassin/20_head_tests.cf
debug: config: read file /usr/share/spamassassin/20_html_tests.cf
debug: config: read file /usr/share/spamassassin/20_meta_tests.cf
debug: config: read file /usr/share/spamassassin/20_phrases.cf
debug: config: read file /usr/share/spamassassin/20_porn.cf
debug: config: read file /usr/share/spamassassin/20_ratware.cf
debug: config: read file /usr/share/spamassassin/20_uri_tests.cf
debug: config: read file /usr/share/spamassassin/23_bayes.cf
debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf
debug: config: read file /usr/share/spamassassin/25_hashcash.cf
debug: config: read file /usr/share/spamassassin/25_spf.cf
debug: config: read file /usr/share/spamassassin/25_uribl.cf
debug: config: read file /usr/share/spamassassin/30_text_de.cf
debug: config: read file /usr/share/spamassassin/30_text_fr.cf
debug: config: read file /usr/share/spamassassin/30_text_nl.cf
debug: config: read file /usr/share/spamassassin/30_text_pl.cf
debug: config: read file /usr/share/spamassassin/50_scores.cf
debug: config: read file /usr/share/spamassassin/60_whitelist.cf
debug: using "/etc/mail/spamassassin" for site rules dir
debug: config: read file /etc/mail/spamassassin/local.cf
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x9687794)
debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::Hashcash=HASH(0x9d5e4e4)
debug: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::SPF=HASH(0x9d2d838)
debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x9687794) implements 'parse_config'
debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0x9d5e4e4) implements 'parse_config'
debug: bayes: Using username: root
debug: bayes: Database connection established
debug: bayes: found bayes db version 3
debug: bayes: Using userid: 2
debug: bayes: Not available for scanning, only 1 spam(s) in Bayes DB < 200
debug: Score set 0 chosen.
debug: Initialising learner

***point of hang***
almartin
I added --showdots to the command and sa-learn no longer hangs

Something else I just realised was that if you don't add --mbox for a mail account it only learns the first message.

Seems to be working fab now! icon_smile.gif
abubin
-deleted-
CompuKid101
I would like to know how I could get my SA/MS setup back to how it was after following the SA/MS/FProt howto for 3.1. I want to set a required score of 6 in MailScanner.conf but it gets overridden by the spamassassin conf file in /home/virtual/FILE... I liked how MS controlled SA completely. And what is with the default required score of 1 in ensim 4?! When I was on 3.1, I set it to 7.5 in MS.conf and was fine! Did MS scale it or something?
CompuKid101
QUOTE (JoshFink)
Hi.. Quick question, and hopefully someone can help.

I've read through all the posts but I'm not sure how to do what I want.

I have a domain that I set up on my server. I want to have everyone on the server forward all their spam there as attachments.

The path is

/home/virtual/site4/fst/var/spool/mail/fmadmin

If I have everyone forward all of ther attachments to this account and then do an

sa-learn --spam --mbox /home/virtual/site4/fst/var/spool/mail/fmadmin

This should work right?

Careful!!! Forwarding the mail could change the headers and increase the chances of ham being marked as spam if SA learns that those headers are associated with "spam".
Catalyst
QUOTE (CompuKid101)
Careful!!! Forwarding the mail could change the headers and increase the chances of ham being marked as spam if SA learns that those headers are associated with "spam".


No it won't. SA doesn't learn header information (other than the subject) as a Bayes token.
CompuKid101
QUOTE (Catalyst)
No it won't.  SA doesn't learn header information (other than the subject) as a Bayes token.
Like I said, it would work great! icon_wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.