* $Id: spamdbm.cpp 30630 2009-05-05 01:31:01Z bga $
*
* This is a BeOS program for classifying e-mail messages as spam (unwanted
* junk mail) or as genuine mail using a Bayesian statistical approach. There
* is also a Mail Daemon Replacement add-on to filter mail using the
* classification statistics collected earlier.
*
* See also http://www.paulgraham.com/spam.html for a good writeup and
* http://www.tuxedo.org/~esr/bogofilter/ for another implementation.
* And more recently, Gary Robinson's write up of his improved algorithm
* at http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
* which gives a better spread in spam ratios and slightly fewer
* misclassifications.
*
* Note that this uses the AGMS vacation coding style, not the OpenTracker one.
* That means no tabs, indents are two spaces, m_ is the prefix for member
* variables, g_ is the prefix for global names, C style comments, constants
* are in all capital letters and most other things are mixed case, it's word
* wrapped to fit in 79 characters per line to make proofreading on paper
* easier, and functions are listed in reverse dependency order so that forward
* declarations (function prototypes with no code) aren't needed.
*
* The Original Design:
* There is a spam database (just a file listing words and number of times they
* were used in spam and non-spam messages) that a BeMailDaemon input filter
* will use when scanning email. It will mark the mail with the spam
* probability (an attribute, optionally a mail header field) and optionally do
* something if the probability exceeds a user defined level (delete message,
* change subject, file in a different folder). Or should that be a different
* filter? Outside the mail system, the probability can be used in queries to
* find spam.
*
* A second user application will be used to update the database. Besides
* showing you the current list of words, you can drag and drop files to mark
* them as spam or non-spam (a balanced binary tree is used internally to make
* word storage fast). It will add a second attribute to the files to show how
* they have been classified by the user (and won't update the database if you
* accidentally try to classify a file again). Besides drag and drop, there
* will be a command line interface and a message passing interface. BeMail
* (or other programs) will then communicate via messages to tell it when the
* user marks a message as spam or not (via having separate delete spam /
* delete genuine mail buttons and a menu item or two).
*
* Plus lots of details, like the rename swap method to update the database
* file (so programs with the old file open aren't affected). A nice tab text
* format so you can open the database in a spreadsheet. Startup and shutdown
* control of the updater from BeMail. Automatic creation of the indices
* needed by the filter. MIME types for the database file. Icons for the app.
* System settings to enable tracker to display the new attributes when viewing
* e-mail (and maybe news articles if someone ever gets around to an NNTP as
* files reader). Documentation. Recursive directory traversal for the
* command line or directory drag and drop. Options for the updater to warn or
* ignore non-email files. Etc.
*
* The Actual Implementation:
* The spam database updates and the test for spam have been combined into one
* program which runs as a server. That way there won't be as long a delay
* when the e-mail system wants to check for spam, because the database is
* already loaded by the server and in memory. The MDR mail filter add-on
* simply sends scripting commands to the server (and starts it up if it isn't
* already running). The filter takes care of marking the messages when it
* gets the rating back from the server, and then the rest of the mail system
* rule chain can delete the message or otherwise manipulate it.
*
* Revision History (now manually updated due to SVN's philosophy)
* $Log: spamdbm.cpp,v $
* ------------------------------------------------------------------------
* r15195 | agmsmith | 2005-11-27 21:07:55 -0500 (Sun, 27 Nov 2005) | 4 lines
* Just a few minutes after checking in, I mentioned it to Japanese expert Koki
* and he suggested also including the Japanese comma. So before I forget to
* do it...
*
* ------------------------------------------------------------------------
* r15194 | agmsmith | 2005-11-27 20:37:13 -0500 (Sun, 27 Nov 2005) | 5 lines
* Truncate overly long URLs to the maximum word length. Convert Japanese
* periods to spaces so that more "words" are found. Fix UTF-8 comparison
* problems with tolower() incorrectly converting characters with the high bit
* set.
*
* r15098 | agmsmith | 2005-11-23 23:17:00 -0500 (Wed, 23 Nov 2005) | 5 lines
* Added better tokenization so that HTML is parsed and things like tags
* between letters of a word no longer hide that word. After testing, the
* result seems to be a tighter spread of ratings when done in full text plus
* header mode.
*
* Revision 1.10 2005/11/24 02:08:39 agmsmith
* Fixed up prefix codes, Z for things that are inside other things.
*
* Revision 1.9 2005/11/21 03:28:03 agmsmith
* Added a function for extracting URLs.
*
* Revision 1.8 2005/11/09 03:36:18 agmsmith
* Removed noframes detection (doesn't show up in e-mails). Now use
* just H for headers and Z for HTML tag junk.
*
* Revision 1.7 2005/10/24 00:00:08 agmsmith
* Adding HTML tag removal, which also affected the search function so it
* could search for single part things like .
*
* Revision 1.6 2005/10/17 01:55:08 agmsmith
* Remove HTML comments and a few other similar things.
*
* Revision 1.5 2005/10/16 18:35:36 agmsmith
* Under construction - looking into HTML not being in UTF-8.
*
* Revision 1.4 2005/10/11 01:51:21 agmsmith
* Starting on the tokenising passes. Still need to test asian truncation.
*
* Revision 1.3 2005/10/06 11:54:07 agmsmith
* Not much.
*
* Revision 1.2 2005/09/12 01:49:37 agmsmith
* Enable case folding for the whole file tokenizer.
*
* r13961 | agmsmith | 2005-08-13 22:25:28 -0400 (Sat, 13 Aug 2005) | 2 lines
* Source code changes so that mboxtobemail now compiles and is in the build
* system.
*
* r13959 | agmsmith | 2005-08-13 22:05:27 -0400 (Sat, 13 Aug 2005) | 2 lines
* Rename the directory before doing anything else, otherwise svn dies badly.
*
* r13952 | agmsmith | 2005-08-13 15:31:42 -0400 (Sat, 13 Aug 2005) | 3 lines
* Added the resources and file type associations, changed the application
* signature and otherwise made the spam detection system work properly again.
*
* r13951 | agmsmith | 2005-08-13 11:40:01 -0400 (Sat, 13 Aug 2005) | 2 lines
* Had to do the file rename as a separate operation due to SVN limitations.
*
* r13950 | agmsmith | 2005-08-13 11:38:44 -0400 (Sat, 13 Aug 2005) | 3 lines
* Oops, "spamdb" is already used for a Unix package. And spamdatabase is
* already reserved by a domain name squatter. Use "spamdbm" instead.
*
* r13949 | agmsmith | 2005-08-13 11:17:52 -0400 (Sat, 13 Aug 2005) | 3 lines
* Renamed spamfilter to be the more meaningful spamdb (spam database) and
* moved it into its own source directory in preparation for adding resources.
*
* r13628 | agmsmith | 2005-07-10 20:11:29 -0400 (Sun, 10 Jul 2005) | 3 lines
* Updated keyword expansion to use SVN keywords. Also seeing if svn is
* working well enough for me to update files from BeOS R5.
*
* r11909 | axeld | 2005-03-18 19:09:19 -0500 (Fri, 18 Mar 2005) | 2 lines
* Moved bin/ directory out of apps/.
*
* r11769 | bonefish | 2005-03-17 03:30:54 -0500 (Thu, 17 Mar 2005) | 1 line
* Move trunk into respective module.
*
* r10362 | nwhitehorn | 2004-12-06 20:14:05 -0500 (Mon, 06 Dec 2004) | 2 lines
* Fixed the spam filter so it works correctly now.
*
* r9934 | nwhitehorn | 2004-11-11 21:55:05 -0500 (Thu, 11 Nov 2004) | 2 lines
* Added AGMS's excellent spam detection software. Still some weirdness with
* the configuration interface from E-mail prefs.
*
* Revision 1.2 2004/12/07 01:14:05 nwhitehorn
* Fixed the spam filter so it works correctly now.
*
* Revision 1.87 2004/09/20 15:57:26 nwhitehorn
* Mostly updated the tree to Be/Haiku style identifier naming conventions. I
* have a few more things to work out, mostly in mail_util.h, and then I'm
* proceeding to jamify the build system. Then we go into Haiku CVS.
*
* Revision 1.86 2003/07/26 16:47:46 agmsmith
* Bug - wasn't allowing double classification if the user had turned on
* the option to ignore the previous classification.
*
* Revision 1.85 2003/07/08 14:52:57 agmsmith
* Fix bug with classification choices dialog box coming up with weird
* sizes due to RefsReceived message coming in before ReadyToRun had
* finished setting up the default sizes of the controls.
*
* Revision 1.84 2003/07/04 19:59:29 agmsmith
* Now with a GUI option to let you declassify messages (set them back
* to uncertain, rather than spam or genuine). Required a BAlert
* replacement since BAlerts can't do four buttons.
*
* Revision 1.83 2003/07/03 20:40:36 agmsmith
* Added Uncertain option for declassifying messages.
*
* Revision 1.82 2003/06/16 14:57:13 agmsmith
* Detect spam which uses mislabeled text attachments, going by the file name
* extension.
*
* Revision 1.81 2003/04/08 20:27:04 agmsmith
* AGMSBayesianSpamServer now shuts down immediately and returns true if
* it is asked to quit by the registrar.
*
* Revision 1.80 2003/04/07 19:20:27 agmsmith
* Ooops, int64 doesn't exist, use long long instead.
*
* Revision 1.79 2003/04/07 19:05:22 agmsmith
* Now with Allen Brunson's atoll for PPC (you need the %lld, but that
* becomes %lld on other systems).
*
* Revision 1.78 2003/04/04 22:43:53 agmsmith
* Fixed up atoll PPC processor hack so it would actually work, was just
* returning zero which meant that it wouldn't load in the database file
* (read the size as zero).
*
* Revision 1.77 2003/01/22 03:19:48 agmsmith
* Don't convert words to lower case, the case is important for spam.
* Particularly sentences which start with exciting words, which you
* normally won't use at the start of a sentence (and thus capitalize).
*
* Revision 1.76 2002/12/18 02:29:22 agmsmith
* Add space for the Uncertain display in Tracker.
*
* Revision 1.75 2002/12/18 01:54:37 agmsmith
* Added uncertain sound effect.
*
* Revision 1.74 2002/12/13 23:53:12 agmsmith
* Minimize the window before opening it so that it doesn't flash on the
* screen in server mode. Also load the database when the window is
* displayed so that the user can see the words.
*
* Revision 1.73 2002/12/13 20:55:57 agmsmith
* Documentation.
*
* Revision 1.72 2002/12/13 20:26:11 agmsmith
* Fixed bug with adding messages in strings to database (was limited to
* messages at most 1K long). Also changed default server mode to true
* since that's what people use most.
*
* Revision 1.71 2002/12/11 22:37:30 agmsmith
* Added commands to train on spam and genuine e-mail messages passed
* in string arguments rather then via external files.
*
* Revision 1.70 2002/12/10 22:12:41 agmsmith
* Adding a message to the database now uses a BPositionIO rather than a
* file and file name (for future string rather than file additions). Also
* now re-evaluate a file after reclassifying it so that the user can see
* the new ratio. Also remove the [Spam 99.9%] subject prefix when doing
* a re-evaluation or classification (the number would be wrong).
*
* Revision 1.69 2002/12/10 01:46:04 agmsmith
* Added the Chi-Squared scoring method.
*
* Revision 1.68 2002/11/29 22:08:25 agmsmith
* Change default purge age to 2000 so that hitting the purge button
* doesn't erase stuff from the new sample database.
*
* Revision 1.67 2002/11/25 20:39:39 agmsmith
* Don't need to massage the MIME type since the mail library now does
* the lower case conversion and converts TEXT to text/plain too.
*
* Revision 1.66 2002/11/20 22:57:12 nwhitehorn
* PPC Compatibility Fixes
*
* Revision 1.65 2002/11/10 18:43:55 agmsmith
* Added a time delay to some quitting operations so that scripting commands
* from a second client (like a second e-mail account) will make the program
* abort the quit operation.
*
* Revision 1.64 2002/11/05 18:05:16 agmsmith
* Looked at Nathan's PPC changes (thanks!), modified style a bit.
*
* Revision 1.63 2002/11/04 03:30:22 nwhitehorn
* Now works (or compiles at least) on PowerPC. I'll get around to testing it
* later.
*
* Revision 1.62 2002/11/04 01:03:33 agmsmith
* Fixed warnings so it compiles under the bemaildaemon system.
*
* Revision 1.61 2002/11/03 23:00:37 agmsmith
* Added to the bemaildaemon project on SourceForge. Hmmmm, seems to switch to
* a new version if I commit and specify a message, but doesn't accept the
* message and puts up the text editor. Must be a bug where cvs eats the first
* option after "commit".
*
* Revision 1.60.1.1 2002/10/22 14:29:27 agmsmith
* Needed to recompile with the original Libmail.so from Beta/1 since
* the current library uses a different constructor, and thus wouldn't
* run when used with the old library.
*
* Revision 1.60 2002/10/21 16:41:27 agmsmith
* Return a special error code when no words are found in a message,
* so that messages without text/plain parts can be recognized as
* spam by the mail filter.
*
* Revision 1.59 2002/10/20 21:29:47 agmsmith
* Watch out for MIME types of "text", treat as text/plain.
*
* Revision 1.58 2002/10/20 18:29:07 agmsmith
* *** empty log message ***
*
* Revision 1.57 2002/10/20 18:25:02 agmsmith
* Fix case sensitivity in MIME type tests, and fix text/any test.
*
* Revision 1.56 2002/10/19 17:00:10 agmsmith
* Added the pop-up menu for the tokenize modes.
*
* Revision 1.55 2002/10/19 14:54:06 agmsmith
* Fudge MIME type of body text components so that they get
* treated as text.
*
* Revision 1.54 2002/10/19 00:56:37 agmsmith
* The parsing of e-mail messages seems to be working now, just need
* to add some user interface stuff for the tokenizing mode.
*
* Revision 1.53 2002/10/18 23:37:56 agmsmith
* More mail kit usage, can now decode headers, but more to do.
*
* Revision 1.52 2002/10/16 23:52:33 agmsmith
* Getting ready to add more tokenizing modes, exploring Mail Kit to break
* apart messages into components (and decode BASE64 and other encodings).
*
* Revision 1.51 2002/10/11 20:05:31 agmsmith
* Added installation of sound effect names, which the filter will use.
*
* Revision 1.50 2002/10/02 16:50:02 agmsmith
* Forgot to add credits to the algorithm inventors.
*
* Revision 1.49 2002/10/01 00:39:29 agmsmith
* Added drag and drop to evaluate files or to add them to the list.
*
* Revision 1.48 2002/09/30 19:44:17 agmsmith
* Switched to Gary Robinson's method, removed max spam/genuine word.
*
* Revision 1.47 2002/09/23 17:08:55 agmsmith
* Add an attribute with the spam ratio to files which have been evaluated.
*
* Revision 1.46 2002/09/23 02:50:32 agmsmith
* Fiddling with display width of e-mail attributes.
*
* Revision 1.45 2002/09/23 01:13:56 agmsmith
* Oops, bug in string evaluation scripting.
*
* Revision 1.44 2002/09/22 21:00:55 agmsmith
* Added EvaluateString so that the BeMail add-on can pass the info without
* having to create a temporary file.
*
* Revision 1.43 2002/09/20 19:56:02 agmsmith
* Added about box and button for estimating the spam ratio of a file.
*
* Revision 1.42 2002/09/20 01:22:26 agmsmith
* More testing, decide that an extreme ratio bias point of 0.5 is good.
*
* Revision 1.41 2002/09/19 21:17:12 agmsmith
* Changed a few names and proofread the program.
*
* Revision 1.40 2002/09/19 14:27:17 agmsmith
* Rearranged execution of commands, moving them to a separate looper
* rather than the BApplication, so that thousands of files could be
* processed without worrying about the message queue filling up.
*
* Revision 1.39 2002/09/18 18:47:16 agmsmith
* Stop flickering when the view is partially obscured, update cached
* values in all situations except when app is busy.
*
* Revision 1.38 2002/09/18 18:08:11 agmsmith
* Add a function for evaluating the spam ratio of a message.
*
* Revision 1.37 2002/09/16 01:30:16 agmsmith
* Added Get Oldest command.
*
* Revision 1.36 2002/09/16 00:47:52 agmsmith
* Change the display to counter-weigh the spam ratio by the number of
* messages.
*
* Revision 1.35 2002/09/15 20:49:35 agmsmith
* Scrolling improved, buttons, keys and mouse wheel added.
*
* Revision 1.34 2002/09/15 03:46:10 agmsmith
* Up and down buttons under construction.
*
* Revision 1.33 2002/09/15 02:09:21 agmsmith
* Took out scroll bar.
*
* Revision 1.32 2002/09/15 02:05:30 agmsmith
* Trying to add a scroll bar, but it isn't very useful.
*
* Revision 1.31 2002/09/14 23:06:28 agmsmith
* Now has live updates of the list of words.
*
* Revision 1.30 2002/09/14 19:53:11 agmsmith
* Now with a better display of the words.
*
* Revision 1.29 2002/09/13 21:33:54 agmsmith
* Now draws the words in the word display view, but still primitive.
*
* Revision 1.28 2002/09/13 19:28:02 agmsmith
* Added display of most genuine and most spamiest, fixed up cursor.
*
* Revision 1.27 2002/09/13 03:08:42 agmsmith
* Show current word and message counts, and a busy cursor.
*
* Revision 1.26 2002/09/13 00:00:08 agmsmith
* Fixed up some deadlock problems, now using asynchronous message replies.
*
* Revision 1.25 2002/09/12 17:56:58 agmsmith
* Keep track of words which are spamiest and genuinest.
*
* Revision 1.24 2002/09/12 01:57:10 agmsmith
* Added server mode.
*
* Revision 1.23 2002/09/11 23:30:45 agmsmith
* Added Purge button and ignore classification checkbox.
*
* Revision 1.22 2002/09/11 21:23:13 agmsmith
* Added bulk update choice, purge button, moved to a BView container
* for all the controls (so background colour could be set, and Pulse
* works normally for it too).
*
* Revision 1.21 2002/09/10 22:52:49 agmsmith
* You can now change the database name in the GUI.
*
* Revision 1.20 2002/09/09 14:20:42 agmsmith
* Now can have multiple backups, and implemented refs received.
*
* Revision 1.19 2002/09/07 19:14:56 agmsmith
* Added standard GUI measurement code.
*
* Revision 1.18 2002/09/06 21:03:03 agmsmith
* Rearranging code to avoid forward references when adding a window class.
*
* Revision 1.17 2002/09/06 02:54:00 agmsmith
* Added the ability to purge old words from the database.
*
* Revision 1.16 2002/09/05 00:46:03 agmsmith
* Now adds spam to the database!
*
* Revision 1.15 2002/09/04 20:32:15 agmsmith
* Read ahead a couple of letters to decode quoted-printable better.
*
* Revision 1.14 2002/09/04 03:10:03 agmsmith
* Can now tokenize (break into words) a text file.
*
* Revision 1.13 2002/09/03 21:50:54 agmsmith
* Count database command, set up MIME type for the database file.
*
* Revision 1.12 2002/09/03 19:55:54 agmsmith
* Added loading and saving the database.
*
* Revision 1.11 2002/09/02 03:35:33 agmsmith
* Create indices and set up attribute associations with the e-mail MIME type.
*
* Revision 1.10 2002/09/01 15:52:49 agmsmith
* Can now delete the database.
*
* Revision 1.9 2002/08/31 21:55:32 agmsmith
* Yet more scripting.
*
* Revision 1.8 2002/08/31 21:41:37 agmsmith
* Under construction, with example code to decode a B_REPLY.
*
* Revision 1.7 2002/08/30 19:29:06 agmsmith
* Combined loading and saving settings into one function.
*
* Revision 1.6 2002/08/30 02:01:10 agmsmith
* Working on loading and saving settings.
*
* Revision 1.5 2002/08/29 23:17:42 agmsmith
* More scripting.
*
* Revision 1.4 2002/08/28 00:40:52 agmsmith
* Scripting now seems to work, at least the messages flow properly.
*
* Revision 1.3 2002/08/25 21:51:44 agmsmith
* Getting the about text formatting right.
*
* Revision 1.2 2002/08/25 21:28:20 agmsmith
* Trying out the BeOS scripting system as a way of implementing the program.
*
* Revision 1.1 2002/08/24 02:27:51 agmsmith
* Initial revision
*/
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <iostream>
#include <map>
#include <queue>
#include <set>
#include <string>
#include <vector>
using namespace std;
#include <Alert.h>
#include <Application.h>
#include <Beep.h>
#include <Button.h>
#include <CheckBox.h>
#include <Cursor.h>
#include <Directory.h>
#include <Entry.h>
#include <File.h>
#include <FilePanel.h>
#include <FindDirectory.h>
#include <fs_index.h>
#include <fs_info.h>
#include <MenuBar.h>
#include <MenuItem.h>
#include <Message.h>
#include <MessageQueue.h>
#include <MessageRunner.h>
#include <Mime.h>
#include <NodeInfo.h>
#include <Path.h>
#include <Picture.h>
#include <PictureButton.h>
#include <Point.h>
#include <Polygon.h>
#include <PopUpMenu.h>
#include <PropertyInfo.h>
#include <RadioButton.h>
#include <Resources.h>
#include <Screen.h>
#include <ScrollBar.h>
#include <String.h>
#include <StringView.h>
#include <TextControl.h>
#include <View.h>
directory, available from http://sourceforge.net/projects/bemaildaemon/ */
#include <MailMessage.h>
#include <MailAttachment.h>
* Global variables, and not-so-variable things too. Grouped by functionality.
*/
static float g_MarginBetweenControls;
static float g_LineOfTextHeight;
static float g_StringViewHeight;
static float g_ButtonHeight;
static float g_CheckBoxHeight;
static float g_RadioButtonHeight;
static float g_PopUpMenuHeight;
static float g_TextBoxHeight;
static const char *g_ABSAppSignature =
"application/x-vnd.agmsmith.spamdbm";
static const char *g_ABSDatabaseFileMIMEType =
"text/x-vnd.agmsmith.spam_probability_database";
static const char *g_DefaultDatabaseFileName =
"SpamDBM Database";
static const char *g_DatabaseRecognitionString =
"Spam Database File";
static const char *g_AttributeNameClassification = "MAIL:classification";
static const char *g_AttributeNameSpamRatio = "MAIL:ratio_spam";
static const char *g_BeepGenuine = "SpamFilter-Genuine";
static const char *g_BeepSpam = "SpamFilter-Spam";
static const char *g_BeepUncertain = "SpamFilter-Uncertain";
static const char *g_ClassifiedSpam = "Spam";
static const char *g_ClassifiedGenuine = "Genuine";
static const char *g_DataName = "data";
static const char *g_ResultName = "result";
static const char *g_SettingsDirectoryName = "Mail";
static const char *g_SettingsFileName = "SpamDBM Settings";
static const uint32 g_SettingsWhatCode = 'SDBM';
static const char *g_BackupSuffix = ".backup %d";
static const int g_MaxBackups = 10;
static const size_t g_MaxWordLength = 50;
static const int g_MaxInterestingWords = 150;
static const double g_RobinsonS = 0.45;
static const double g_RobinsonX = 0.5;
static bool g_CommandLineMode;
exit after processing the command), FALSE if it is running with a graphical
user interface. */
static bool g_ServerMode;
pop-up dialog boxes, but you can still see them in stderr. Also the window
is minimized, if it exists. */
static int g_QuitCountdown = -1;
count down before the program quits. Negative means stop counting. Zero
means quit at the next pulse event. This is used to keep the program alive
for a short while after someone requests that it quit, in case more scripting
commands come in, which will stop the countdown. Needed to handle the case
where there are multiple e-mail accounts all requesting spam identification,
and one finishes first and tells the server to quit. It also checks to see
that there is no more work to do before trying to quit. */
static volatile bool g_AppReadyToRunCompleted = false;
which can lead to initialisation problems (button heights not determined).
So wait for this to turn TRUE in code that might run early, like
RefsReceived. */
static class CommanderLooper *g_CommanderLooperPntr = NULL;
static BMessenger *g_CommanderMessenger = NULL;
(arguments received, file references received), needed for avoiding deadlocks
which would happen if the BApplication sent a scripting message to itself. */
static BCursor *g_BusyCursor = NULL;
startup. */
typedef enum PropertyNumbersEnum
{
PN_DATABASE_FILE = 0,
PN_SPAM,
PN_SPAM_STRING,
PN_GENUINE,
PN_GENUINE_STRING,
PN_UNCERTAIN,
PN_IGNORE_PREVIOUS_CLASSIFICATION,
PN_SERVER_MODE,
PN_FLUSH,
PN_PURGE_AGE,
PN_PURGE_POPULARITY,
PN_PURGE,
PN_OLDEST,
PN_EVALUATE,
PN_EVALUATE_STRING,
PN_RESET_TO_DEFAULTS,
PN_INSTALL_THINGS,
PN_TOKENIZE_MODE,
PN_SCORING_MODE,
PN_MAX
} PropertyNumbers;
static const char * g_PropertyNames [PN_MAX] =
{
"DatabaseFile",
"Spam",
"SpamString",
"Genuine",
"GenuineString",
"Uncertain",
"IgnorePreviousClassification",
"ServerMode",
"Flush",
"PurgeAge",
"PurgePopularity",
"Purge",
"Oldest",
"Evaluate",
"EvaluateString",
"ResetToDefaults",
"InstallThings",
"TokenizeMode",
"ScoringMode"
};
scripting system can understand too. */
static struct property_info g_ScriptingPropertyList [] =
{
{g_PropertyNames[PN_DATABASE_FILE], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Get the pathname of the current database file. "
"The default name is something like B_USER_SETTINGS_DIRECTORY / "
"Mail / SpamDBM Database", PN_DATABASE_FILE,
{}, {}, {}},
{g_PropertyNames[PN_DATABASE_FILE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Change the pathname of the database file to "
"use. It will automatically be converted to an absolute path name, "
"so make sure the parent directories exist before setting it. If it "
"doesn't exist, you'll have to use the create command next.",
PN_DATABASE_FILE, {}, {}, {}},
{g_PropertyNames[PN_DATABASE_FILE], {B_CREATE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Creates a new empty database, will replace "
"the existing database file too.", PN_DATABASE_FILE, {}, {}, {}},
{g_PropertyNames[PN_DATABASE_FILE], {B_DELETE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Deletes the database file and all backup copies "
"of that file too. Really only of use for uninstallers.",
PN_DATABASE_FILE, {}, {}, {}},
{g_PropertyNames[PN_DATABASE_FILE], {B_COUNT_PROPERTIES, 0},
{B_DIRECT_SPECIFIER, 0}, "Returns the number of words in the database.",
PN_DATABASE_FILE, {}, {}, {}},
{g_PropertyNames[PN_SPAM], {B_SET_PROPERTY, 0}, {B_DIRECT_SPECIFIER, 0},
"Adds the spam in the given file (specify full pathname to be safe) to "
"the database. The words in the files will be added to the list of words "
"in the database that identify spam messages. The files processed will "
"also have the attribute MAIL:classification added with a value of "
"\"Spam\" or \"Genuine\" as specified. They also have their spam ratio "
"attribute updated, as if you had also used the Evaluate command on "
"them. If they already have the MAIL:classification "
"attribute and it matches the new classification then they won't get "
"processed (and if it is different, they will get removed from the "
"statistics for the old class and added to the statistics for the new "
"one). You can turn off that behaviour with the "
"IgnorePreviousClassification property. The command line version lets "
"you specify more than one pathname.", PN_SPAM, {}, {}, {}},
{g_PropertyNames[PN_SPAM], {B_COUNT_PROPERTIES, 0}, {B_DIRECT_SPECIFIER, 0},
"Returns the number of spam messages in the database.", PN_SPAM,
{}, {}, {}},
{g_PropertyNames[PN_SPAM_STRING], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Adds the spam in the given string (assumed to "
"be the text of a whole e-mail message, not just a file name) to the "
"database.", PN_SPAM_STRING, {}, {}, {}},
{g_PropertyNames[PN_GENUINE], {B_SET_PROPERTY, 0}, {B_DIRECT_SPECIFIER, 0},
"Similar to adding spam except that the message file is added to the "
"genuine statistics.", PN_GENUINE, {}, {}, {}},
{g_PropertyNames[PN_GENUINE], {B_COUNT_PROPERTIES, 0},
{B_DIRECT_SPECIFIER, 0}, "Returns the number of genuine messages in the "
"database.", PN_GENUINE, {}, {}, {}},
{g_PropertyNames[PN_GENUINE_STRING], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Adds the genuine message in the given string "
"(assumed to be the text of a whole e-mail message, not just a file name) "
"to the database.", PN_GENUINE_STRING, {}, {}, {}},
{g_PropertyNames[PN_UNCERTAIN], {B_SET_PROPERTY, 0}, {B_DIRECT_SPECIFIER, 0},
"Similar to adding spam except that the message file is removed from the "
"database, undoing the previous classification. Obviously, it needs to "
"have been classified previously (using the file attributes) so it can "
"tell if it is removing spam or genuine words.", PN_UNCERTAIN, {}, {}, {}},
{g_PropertyNames[PN_IGNORE_PREVIOUS_CLASSIFICATION], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "If set to true then the previous classification "
"(which was saved as an attribute of the e-mail message file) will be "
"ignored, so that you can add the message to the database again. If set "
"to false (the normal case), the attribute will be examined, and if the "
"message has already been classified as what you claim it is, nothing "
"will be done. If it was misclassified, then the message will be removed "
"from the statistics for the old class and added to the stats for the "
"new classification you have requested.",
PN_IGNORE_PREVIOUS_CLASSIFICATION, {}, {}, {}},
{g_PropertyNames[PN_IGNORE_PREVIOUS_CLASSIFICATION], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Find out the current setting of the flag for "
"ignoring the previously recorded classification.",
PN_IGNORE_PREVIOUS_CLASSIFICATION, {}, {}, {}},
{g_PropertyNames[PN_SERVER_MODE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "If set to true then error messages get printed "
"to the standard error stream rather than showing up in an alert box. "
"It also starts up with the window minimized.", PN_SERVER_MODE,
{}, {}, {}},
{g_PropertyNames[PN_SERVER_MODE], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Find out the setting of the server mode flag.",
PN_SERVER_MODE, {}, {}, {}},
{g_PropertyNames[PN_FLUSH], {B_EXECUTE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Writes out the database file to disk, if it has "
"been updated in memory but hasn't been saved to disk. It will "
"automatically get written when the program exits, so this command is "
"mostly useful for server mode.", PN_FLUSH, {}, {}, {}},
{g_PropertyNames[PN_PURGE_AGE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Sets the old age limit. Words which haven't "
"been updated since this many message additions to the database may be "
"deleted when you do a purge. A good value is 1000, meaning that if a "
"word hasn't appeared in the last 1000 spam/genuine messages, it will "
"be forgotten. Zero will purge all words, 1 will purge words not in "
"the last message added to the database, 2 will purge words not in the "
"last two messages added, and so on. This is mostly useful for "
"removing those one time words which are often hunks of binary garbage, "
"not real words. This acts in combination with the popularity limit; "
"both conditions have to be valid before the word gets deleted.",
PN_PURGE_AGE, {}, {}, {}},
{g_PropertyNames[PN_PURGE_AGE], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Gets the old age limit.", PN_PURGE_AGE,
{}, {}, {}},
{g_PropertyNames[PN_PURGE_POPULARITY], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Sets the popularity limit. Words which aren't "
"this popular may be deleted when you do a purge. A good value is 5, "
"which means that the word is safe from purging if it has been seen in 6 "
"or more e-mail messages. If it's only in 5 or less, then it may get "
"purged. The extreme is zero, where only words that haven't been seen "
"in any message are deleted (usually means no words). This acts in "
"combination with the old age limit; both conditions have to be valid "
"before the word gets deleted.", PN_PURGE_POPULARITY, {}, {}, {}},
{g_PropertyNames[PN_PURGE_POPULARITY], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Gets the purge popularity limit.",
PN_PURGE_POPULARITY, {}, {}, {}},
{g_PropertyNames[PN_PURGE], {B_EXECUTE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Purges the old obsolete words from the "
"database, if they are old enough according to the age limit and also "
"unpopular enough according to the popularity limit.", PN_PURGE,
{}, {}, {}},
{g_PropertyNames[PN_OLDEST], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Gets the age of the oldest message in the "
"database. It's relative to the beginning of time, so you need to do "
"(total messages - age - 1) to see how many messages ago it was added.",
PN_OLDEST, {}, {}, {}},
{g_PropertyNames[PN_EVALUATE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Evaluates a given file (by path name) to see "
"if it is spam or not. Returns the ratio of spam probability vs genuine "
"probability, 0.0 meaning completely genuine, 1.0 for completely spam. "
"Normally you should safely be able to consider it as spam if it is over "
"0.56 for the Robinson scoring method. For the ChiSquared method, the "
"numbers are near 0 for genuine, near 1 for spam, and anywhere in the "
"middle means it can't decide. The program attaches a MAIL:ratio_spam "
"attribute with the ratio as its "
"float32 value to the file. Also returns the top few interesting words "
"in \"words\" and the associated per-word probability ratios in "
"\"ratios\".", PN_EVALUATE, {}, {}, {}},
{g_PropertyNames[PN_EVALUATE_STRING], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Like Evaluate, but rather than a file name, "
"the string argument contains the entire text of the message to be "
"evaluated.", PN_EVALUATE_STRING, {}, {}, {}},
{g_PropertyNames[PN_RESET_TO_DEFAULTS], {B_EXECUTE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Resets all the configuration options to the "
"default values, including the database name.", PN_RESET_TO_DEFAULTS,
{}, {}, {}},
{g_PropertyNames[PN_INSTALL_THINGS], {B_EXECUTE_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Creates indices for the MAIL:classification and "
"MAIL:ratio_spam attributes on all volumes which support BeOS queries, "
"identifies them to the system as e-mail related attributes (modifies "
"the text/x-email MIME type), and sets up the new MIME type "
"(text/x-vnd.agmsmith.spam_probability_database) for the database file. "
"Also registers names for the sound effects used by the separate filter "
"program (use the installsound BeOS program or the Sounds preferences "
"program to associate sound files with the names).", PN_INSTALL_THINGS,
{}, {}, {}},
{g_PropertyNames[PN_TOKENIZE_MODE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Sets the method used for breaking up the "
"message into words. Use \"Whole\" for the whole file (also use it for "
"non-email files). The file isn't broken into parts; the whole thing is "
"converted into words, headers and attachments are just more raw data. "
"Well, not quite raw data since it converts quoted-printable codes "
"(equals sign followed by hex digits or end of line) to the equivalent "
"single characters. \"PlainText\" breaks the file into MIME components "
"and only looks at the ones which are of MIME type text/plain. "
"\"AnyText\" will look for words in all text/* things, including "
"text/html attachments. \"AllParts\" will decode all message components "
"and look for words in them, including binary attachments. "
"\"JustHeader\" will only look for words in the message header. "
"\"AllPartsAndHeader\", \"PlainTextAndHeader\" and \"AnyTextAndHeader\" "
"will also include the words from the message headers.", PN_TOKENIZE_MODE,
{}, {}, {}},
{g_PropertyNames[PN_TOKENIZE_MODE], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Gets the method used for breaking up the "
"message into words.", PN_TOKENIZE_MODE, {}, {}, {}},
{g_PropertyNames[PN_SCORING_MODE], {B_SET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Sets the method used for combining the "
"probabilities of individual words into an overall score. "
"\"Robinson\" mode will use Gary Robinson's nth root of the product "
"method. It gives a nice range of values between 0 and 1 so you can "
"see shades of spaminess. The cutoff point between spam and genuine "
"varies depending on your database of words (0.56 was one point in "
"some experiments). \"ChiSquared\" mode will use chi-squared "
"statistics to evaluate the difference in probabilities that the lists "
"of word ratios are random. The result is very close to 0 for genuine "
"and very close to 1 for spam, and near the middle if it is uncertain.",
PN_SCORING_MODE, {}, {}, {}},
{g_PropertyNames[PN_SCORING_MODE], {B_GET_PROPERTY, 0},
{B_DIRECT_SPECIFIER, 0}, "Gets the method used for combining the "
"individual word ratios into an overall score.", PN_SCORING_MODE,
{}, {}, {}},
{ 0 }
};
typedef enum ScoringModeEnum
{
SM_ROBINSON = 0,
SM_CHISQUARED,
SM_MAX
} ScoringModes;
static const char * g_ScoringModeNames [SM_MAX] =
{
"Robinson",
"ChiSquared"
};
typedef enum TokenizeModeEnum
{
TM_WHOLE = 0,
TM_PLAIN_TEXT,
TM_PLAIN_TEXT_HEADER,
TM_ANY_TEXT,
TM_ANY_TEXT_HEADER,
TM_ALL_PARTS,
TM_ALL_PARTS_HEADER,
TM_JUST_HEADER,
TM_MAX
} TokenizeModes;
static const char * g_TokenizeModeNames [TM_MAX] =
{
"All",
"Plain text",
"Plain text and header",
"Any text",
"Any text and header",
"All parts",
"All parts and header",
"Just header"
};
typedef enum ClassificationTypesEnum
{
CL_GENUINE = 0,
CL_SPAM,
CL_UNCERTAIN,
CL_MAX
} ClassificationTypes;
static const char * g_ClassificationTypeNames [CL_MAX] =
{
g_ClassifiedGenuine,
g_ClassifiedSpam,
"Uncertain"
};
static BPoint g_UpLinePoints [] =
{
BPoint (8, 2 * (1)),
BPoint (14, 2 * (6)),
BPoint (10, 2 * (6)),
BPoint (10, 2 * (13)),
BPoint (6, 2 * (13)),
BPoint (6, 2 * (6)),
BPoint (2, 2 * (6))
};
static BPoint g_DownLinePoints [] =
{
BPoint (8, 2 * (14-1)),
BPoint (14, 2 * (14-6)),
BPoint (10, 2 * (14-6)),
BPoint (10, 2 * (14-13)),
BPoint (6, 2 * (14-13)),
BPoint (6, 2 * (14-6)),
BPoint (2, 2 * (14-6))
};
static BPoint g_UpPagePoints [] =
{
BPoint (8, 2 * (1)),
BPoint (13, 2 * (6)),
BPoint (10, 2 * (6)),
BPoint (14, 2 * (10)),
BPoint (10, 2 * (10)),
BPoint (10, 2 * (13)),
BPoint (6, 2 * (13)),
BPoint (6, 2 * (10)),
BPoint (2, 2 * (10)),
BPoint (6, 2 * (6)),
BPoint (3, 2 * (6))
};
static BPoint g_DownPagePoints [] =
{
BPoint (8, 2 * (14-1)),
BPoint (13, 2 * (14-6)),
BPoint (10, 2 * (14-6)),
BPoint (14, 2 * (14-10)),
BPoint (10, 2 * (14-10)),
BPoint (10, 2 * (14-13)),
BPoint (6, 2 * (14-13)),
BPoint (6, 2 * (14-10)),
BPoint (2, 2 * (14-10)),
BPoint (6, 2 * (14-6)),
BPoint (3, 2 * (14-6))
};
If character code X has g_SpaceCharacters[X] set to true then it is a
space-like character. Character codes 128 and above are always non-space since
they are UTF-8 characters. Initialised in the ABSApp constructor. */
static bool g_SpaceCharacters [128];
* Each word in the spam database gets one of these structures. The database
* has a string (the word) as the key and this structure as the value
* (statistics for that word).
*/
typedef struct StatisticsStruct
{
uint32 age;
database, so that we can remove old words (haven't been seen in recent
spam). It's zero for the first file ever added (spam or genuine) to the
database, 1 for all words added or updated by the second file, etc. If a
later file updates an existing word, it gets the age of the later file. */
uint32 genuineCount;
uint32 spamCount;
} StatisticsRecord, *StatisticsPointer;
typedef map<string, StatisticsRecord> StatisticsMap;
we can more conveniently specify things that are derived from it, like
iterators. */
* An alert box asking how the user wants to mark messages. There are buttons
* for each classification category, and a checkbox to mark all remaining N
* messages the same way. And a cancel button. To use it, first create the
* ClassificationChoicesWindow, specifying the input arguments. Then call the
* Go method which will show the window, stuff the user's answer into your
* output arguments (class set to CL_MAX if the user cancels), and destroy the
* window. Implemented because BAlert only allows 3 buttons, max!
*/
class ClassificationChoicesWindow : public BWindow
{
public:
ClassificationChoicesWindow (BRect FrameRect,
const char *FileName, int NumberOfFiles);
virtual void MessageReceived (BMessage *MessagePntr);
void Go (bool *BulkModeSelectedPntr,
ClassificationTypes *ChoosenClassificationPntr);
static const uint32 MSG_CLASS_BUTTONS = 'ClB0';
static const uint32 MSG_CANCEL_BUTTON = 'Cncl';
static const uint32 MSG_BULK_CHECKBOX = 'BlkK';
private:
bool *m_BulkModeSelectedPntr;
ClassificationTypes *m_ChoosenClassificationPntr;
};
class ClassificationChoicesView : public BView
{
public:
ClassificationChoicesView (BRect FrameRect,
const char *FileName, int NumberOfFiles);
virtual void AttachedToWindow ();
virtual void GetPreferredSize (float *width, float *height);
private:
const char *m_FileName;
int m_NumberOfFiles;
float m_PreferredBottomY;
};
* Due to deadlock problems with the BApplication posting scripting messages to
* itself, we need to add a second Looper. Its job is to just to convert
* command line arguments and arguments from the Tracker (refs received) into a
* series of scripting commands sent to the main BApplication. It also prints
* out the replies received (to stdout for command line replies). An instance
* of this class will be created and run by the main() function, and shut down
* by it too.
*/
class CommanderLooper : public BLooper
{
public:
CommanderLooper ();
~CommanderLooper ();
virtual void MessageReceived (BMessage *MessagePntr);
void CommandArguments (int argc, char **argv);
void CommandReferences (BMessage *MessagePntr,
bool BulkMode = false,
ClassificationTypes BulkClassification = CL_GENUINE);
bool IsBusy ();
private:
void ProcessArgs (BMessage *MessagePntr);
void ProcessRefs (BMessage *MessagePntr);
static const uint32 MSG_COMMAND_ARGUMENTS = 'CArg';
static const uint32 MSG_COMMAND_FILE_REFS = 'CRef';
bool m_IsBusy;
};
* This view contains the various buttons and other controls for setting
* configuration options and displaying the state of the database (but not the
* actual list of words). It will appear in the top half of the
* DatabaseWindow.
*/
class ControlsView : public BView
{
public:
ControlsView (BRect NewBounds);
~ControlsView ();
virtual void AttachedToWindow ();
virtual void FrameResized (float Width, float Height);
virtual void MessageReceived (BMessage *MessagePntr);
virtual void Pulse ();
private:
static const uint32 MSG_BROWSE_BUTTON = 'Brws';
static const uint32 MSG_DATABASE_NAME = 'DbNm';
static const uint32 MSG_ESTIMATE_BUTTON = 'Estm';
static const uint32 MSG_ESTIMATE_FILE_REFS = 'ERef';
static const uint32 MSG_IGNORE_CLASSIFICATION = 'IPCl';
static const uint32 MSG_PURGE_AGE = 'PuAg';
static const uint32 MSG_PURGE_BUTTON = 'Purg';
static const uint32 MSG_PURGE_POPULARITY = 'PuPo';
static const uint32 MSG_SERVER_MODE = 'SrvM';
void BrowseForDatabaseFile ();
void BrowseForFileToEstimate ();
void PollServerForChanges ();
BButton *m_AboutButtonPntr;
BButton *m_AddExampleButtonPntr;
BButton *m_BrowseButtonPntr;
BFilePanel *m_BrowseFilePanelPntr;
BButton *m_CreateDatabaseButtonPntr;
char m_DatabaseFileNameCachedValue [PATH_MAX];
BTextControl *m_DatabaseFileNameTextboxPntr;
bool m_DatabaseLoadDone;
BButton *m_EstimateSpamButtonPntr;
BFilePanel *m_EstimateSpamFilePanelPntr;
uint32 m_GenuineCountCachedValue;
BTextControl *m_GenuineCountTextboxPntr;
bool m_IgnorePreviousClassCachedValue;
BCheckBox *m_IgnorePreviousClassCheckboxPntr;
BButton *m_InstallThingsButtonPntr;
uint32 m_PurgeAgeCachedValue;
BTextControl *m_PurgeAgeTextboxPntr;
BButton *m_PurgeButtonPntr;
uint32 m_PurgePopularityCachedValue;
BTextControl *m_PurgePopularityTextboxPntr;
BButton *m_ResetToDefaultsButtonPntr;
ScoringModes m_ScoringModeCachedValue;
BMenuBar *m_ScoringModeMenuBarPntr;
BPopUpMenu *m_ScoringModePopUpMenuPntr;
bool m_ServerModeCachedValue;
BCheckBox *m_ServerModeCheckboxPntr;
uint32 m_SpamCountCachedValue;
BTextControl *m_SpamCountTextboxPntr;
bigtime_t m_TimeOfLastPoll;
TokenizeModes m_TokenizeModeCachedValue;
BMenuBar *m_TokenizeModeMenuBarPntr;
BPopUpMenu *m_TokenizeModePopUpMenuPntr;
uint32 m_WordCountCachedValue;
BTextControl *m_WordCountTextboxPntr;
};
static const uint32 MSG_LINE_DOWN = 'LnDn';
static const uint32 MSG_LINE_UP = 'LnUp';
static const uint32 MSG_PAGE_DOWN = 'PgDn';
static const uint32 MSG_PAGE_UP = 'PgUp';
* This view contains the list of words. It displays as many as can fit in the
* view rectangle, starting at a specified word (so it can simulate scrolling).
* Usually it will appear in the bottom half of the DatabaseWindow.
*/
class WordsView : public BView
{
public:
WordsView (BRect NewBounds);
virtual void AttachedToWindow ();
virtual void Draw (BRect UpdateRect);
virtual void KeyDown (const char *BufferPntr, int32 NumBytes);
virtual void MakeFocus (bool Focused);
virtual void MessageReceived (BMessage *MessagePntr);
virtual void MouseDown (BPoint point);
virtual void Pulse ();
private:
void MoveTextUpOrDown (uint32 MovementType);
void RefsDroppedHere (BMessage *MessagePntr);
BPictureButton *m_ArrowLineDownPntr;
BPictureButton *m_ArrowLineUpPntr;
BPictureButton *m_ArrowPageDownPntr;
BPictureButton *m_ArrowPageUpPntr;
bar. To make them less obvious, their background view colour needs to be
changed whenever the main view's colour changes. */
float m_AscentHeight;
of the highest letter to the base line (which is near the middle bottom of
the letters, the line where you would align your writing of the text by
hand, all letters have part above, some also have descenders below this
line). */
rgb_color m_BackgroundColour;
uint32 m_CachedTotalGenuineMessages;
uint32 m_CachedTotalSpamMessages;
uint32 m_CachedWordCount;
reflect what's currently displayed. If they are different than the values
from the BApplication then the polling loop will try to redraw the display.
They get set to the values actually used during drawing when drawing is
successful. */
char m_FirstDisplayedWord [g_MaxWordLength + 1];
numbers (word[12345] for example), we use the word itself. The scroll
buttons set this to the next or previous word in the database. Typing by
the user when the view has the focus will also change this starting word.
*/
rgb_color m_FocusedColour;
our view). */
bigtime_t m_LastTimeAKeyWasPressed;
the user has stopped typing a batch of letters. */
float m_LineHeight;
Includes the height of the letters plus a bit of extra space for between
the lines (called leading). */
BFont m_TextFont;
float m_TextHeight;
the baseline and the part below. Doesn't include the sliver of space
between lines. */
rgb_color m_UnfocusedColour;
};
* The BWindow class for this program. It displays the database in real time,
* and has various buttons and gadgets in the top half for changing settings
* (live changes, no OK button, and they reflect changes done by other programs
* using the server too). The bottom half is a scrolling view listing all the
* words in the database. A simple graphic blotch behind each word shows
* whether the word is strongly or weakly related to spam or genuine messages.
* Most operations go through the scripting message system, but it also peeks
* at the BApplication data for examining simple things and when redrawing the
* list of words.
*/
class DatabaseWindow : public BWindow
{
public:
DatabaseWindow ();
virtual void MessageReceived (BMessage *MessagePntr);
virtual bool QuitRequested ();
private:
ControlsView *m_ControlsViewPntr;
WordsView *m_WordsViewPntr;
};
* ABSApp is the BApplication class for this program. This handles messages
* from the outside world (requests to load a database, or to add files to the
* collection). It responds to command line arguments (if you start up the
* program a second time, the system will just send the arguments to the
* existing running program). It responds to scripting messages. And it
* responds to messages from the window. Its thread does the main work of
* updating the database and reading / writing files.
*/
class ABSApp : public BApplication
{
public:
ABSApp ();
~ABSApp ();
virtual void AboutRequested ();
virtual void ArgvReceived (int32 argc, char **argv);
virtual status_t GetSupportedSuites (BMessage *MessagePntr);
virtual void MessageReceived (BMessage *MessagePntr);
virtual void Pulse ();
virtual bool QuitRequested ();
virtual void ReadyToRun ();
virtual void RefsReceived (BMessage *MessagePntr);
virtual BHandler *ResolveSpecifier (BMessage *MessagePntr, int32 Index,
BMessage *SpecifierMsgPntr, int32 SpecificationKind, const char *Property);
private:
status_t AddFileToDatabase (ClassificationTypes IsSpamOrWhat,
const char *FileName, char *ErrorMessage);
status_t AddPositionIOToDatabase (ClassificationTypes IsSpamOrWhat,
BPositionIO *MessageIOPntr, const char *OptionalFileName,
char *ErrorMessage);
status_t AddStringToDatabase (ClassificationTypes IsSpamOrWhat,
const char *String, char *ErrorMessage);
void AddWordsToSet (const char *InputString, size_t NumberOfBytes,
char PrefixCharacter, set<string> &WordSet);
status_t CreateDatabaseFile (char *ErrorMessage);
void DefaultSettings ();
status_t DeleteDatabaseFile (char *ErrorMessage);
status_t EvaluateFile (const char *PathName, BMessage *ReplyMessagePntr,
char *ErrorMessage);
status_t EvaluatePositionIO (BPositionIO *PositionIOPntr,
const char *OptionalFileName, BMessage *ReplyMessagePntr,
char *ErrorMessage);
status_t EvaluateString (const char *BufferPntr, ssize_t BufferSize,
BMessage *ReplyMessagePntr, char *ErrorMessage);
status_t GetWordsFromPositionIO (BPositionIO *PositionIOPntr,
const char *OptionalFileName, set<string> &WordSet, char *ErrorMessage);
status_t InstallThings (char *ErrorMessage);
status_t LoadDatabaseIfNeeded (char *ErrorMessage);
status_t LoadSaveDatabase (bool DoLoad, char *ErrorMessage);
public:
status_t LoadSaveSettings (bool DoLoad);
private:
status_t MakeBackup (char *ErrorMessage);
void MakeDatabaseEmpty ();
void ProcessScriptingMessage (BMessage *MessagePntr,
struct property_info *PropInfoPntr);
status_t PurgeOldWords (char *ErrorMessage);
status_t RecursivelyTokenizeMailComponent (
BMailComponent *ComponentPntr, const char *OptionalFileName,
set<string> &WordSet, char *ErrorMessage,
int RecursionLevel, int MaxRecursionLevel);
status_t SaveDatabaseIfNeeded (char *ErrorMessage);
status_t TokenizeParts (BPositionIO *PositionIOPntr,
const char *OptionalFileName, set<string> &WordSet, char *ErrorMessage);
status_t TokenizeWhole (BPositionIO *PositionIOPntr,
const char *OptionalFileName, set<string> &WordSet, char *ErrorMessage);
public:
updating, and to draw the words. However, the other threads will lock the
BApplication or using scripting commands if they want to make changes. */
bool m_DatabaseHasChanged;
changed and is different from the on-disk database file. When the
application exits, the database will be written out if it has changed. */
BString m_DatabaseFileName;
bool m_IgnorePreviousClassification;
attribute on the message file) will be ignored, and the message will be
added to the requested spam/genuine list. If this is FALSE then the spam
won't be added to the list if it has already been classified as specified,
but if it was mis-classified, it will be removed from the old list and
added to the new list. */
uint32 m_OldestAge;
database. Mostly useful for scaling graphics representing age in the word
display. If the oldest word is no longer the oldest, this variable won't
get immediately updated since it would take a lot of effort to find the
next older age. Since it's only used for display, we'll let it be slightly
incorrect. The next database load or purge will fix it. */
uint32 m_PurgeAge;
for deletion. Age is measured as the number of e-mails added to the
database since the word was last updated in the database. Zero means all
words are old. */
uint32 m_PurgePopularity;
popularity limit to be eligible for deletion. Popularity is measured as
the number of messages (spam and genuine) which have the word. Zero means
no words. */
ScoringModes m_ScoringMode;
See the PN_SCORING_MODE comments for details. */
BPath m_SettingsDirectoryPath;
never changes after that. */
bool m_SettingsHaveChanged;
were loaded). When the application exits, the settings will be written out
if they have changed. */
double m_SmallestUseableDouble;
smaller than this because the double exponent range is close to being
exhausted. The IEEE STANDARD 754 floating-point arithmetic (used on the
Intel i8087 and later math processors) has 64 bit numbers with 53 bits of
mantissa, giving it an underflow starting at 0.5**1022 = 2.2e-308 where it
rounds off to the nearest multiple of 0.5**1074 = 4.9e-324. */
TokenizeModes m_TokenizeMode;
PN_TOKENIZE_MODE comments for details. */
uint32 m_TotalGenuineMessages;
uint32 m_TotalSpamMessages;
uint32 m_WordCount;
member variable to avoid having to call m_WordMap.size() all the time,
which other threads can't do while the database is being updated (but they
can look at the word count variable). */
StatisticsMap m_WordMap;
associated statistics. When the database isn't in use, it is an empty
collection. You should lock the BApplication if you are using the word
collection (reading or writing) from another thread. */
};
* Global utility function to display an error message and return. The message
* part describes the error, and if ErrorNumber is non-zero, gets the string
* ", error code $X (standard description)." appended to it. If the message
* is NULL then it gets defaulted to "Something went wrong". The title part
* doesn't get displayed (no title bar in the dialog box, but you can see it in
* the debugger as the window thread name), and defaults to "Error Message" if
* you didn't specify one. If running in command line mode, the error gets
* printed to stderr rather than showing up in a dialog box.
*/
static void
DisplayErrorMessage (
const char *MessageString = NULL,
int ErrorNumber = 0,
const char *TitleString = NULL)
{
BAlert *AlertPntr;
char ErrorBuffer [PATH_MAX + 1500];
if (TitleString == NULL)
TitleString = "SpamDBM Error Message";
if (MessageString == NULL)
{
if (ErrorNumber == 0)
MessageString = "No error, no message, why bother?";
else
MessageString = "Something went wrong";
}
if (ErrorNumber != 0)
{
sprintf (ErrorBuffer, "%s, error code $%X/%d (%s) has occured.",
MessageString, ErrorNumber, ErrorNumber, strerror (ErrorNumber));
MessageString = ErrorBuffer;
}
if (g_CommandLineMode || g_ServerMode)
cerr << TitleString << ": " << MessageString << endl;
else
{
AlertPntr = new BAlert (TitleString, MessageString,
"Acknowledge", NULL, NULL, B_WIDTH_AS_USUAL, B_STOP_ALERT);
if (AlertPntr != NULL) {
AlertPntr->SetFlags(AlertPntr->Flags() | B_CLOSE_ON_ESCAPE);
AlertPntr->Go ();
}
}
}
* Word wrap a long line of text into shorter 79 column lines and print the
* result on the given output stream.
*/
static void
WrapTextToStream (ostream& OutputStream, const char *TextPntr)
{
const int LineLength = 79;
char *StringPntr;
char TempString [LineLength+1];
TempString[LineLength] = 0;
while (*TextPntr != 0)
{
while (isspace (*TextPntr))
TextPntr++;
if (*TextPntr == 0)
break;
strncpy (TempString, TextPntr, LineLength);
it is (rather than doing strlen). */
StringPntr = TempString;
while (*StringPntr != 0)
StringPntr++;
if (StringPntr - TempString < LineLength)
{
OutputStream << TempString << endl;
TextPntr += StringPntr - TempString;
continue;
}
while (StringPntr > TempString)
{
if (isspace (*StringPntr))
break;
else
StringPntr--;
}
several spaces in a row. */
while (StringPntr > TempString && isspace (StringPntr[-1]))
StringPntr--;
if (StringPntr == TempString)
{
OutputStream << TempString << endl;
TextPntr += strlen (TempString);
continue;
}
*StringPntr = 0;
OutputStream << TempString << endl;
TextPntr += StringPntr - TempString;
}
}
* Print the usage info to the stream. Includes a list of all commands.
*/
ostream& PrintUsage (ostream& OutputStream);
ostream& PrintUsage (ostream& OutputStream)
{
struct property_info *PropInfoPntr;
OutputStream << "\nSpamDBM - A Spam Database Manager\n";
OutputStream << "Copyright © 2002 by Alexander G. M. Smith. ";
OutputStream << "Released to the public domain.\n\n";
WrapTextToStream (OutputStream, "Compiled on " __DATE__ " at " __TIME__
". $Id: spamdbm.cpp 30630 2009-05-05 01:31:01Z bga $ $HeadURL: http://svn.haiku-os.org/haiku/haiku/trunk/src/bin/mail_utils/spamdbm.cpp $");
OutputStream << "\n"
"This is a program for classifying e-mail messages as spam (junk mail which\n"
"you don't want to read) and regular genuine messages. It can learn what's\n"
"spam and what's genuine. You just give it a bunch of spam messages and a\n"
"bunch of non-spam ones. It uses them to make a list of the words from the\n"
"messages with the probability that each word is from a spam message or from\n"
"a genuine message. Later on, it can use those probabilities to classify\n"
"new messages as spam or not spam. If the classifier stops working well\n"
"(because the spammers have changed their writing style and vocabulary, or\n"
"your regular correspondants are writing like spammers), you can use this\n"
"program to update the list of words to identify the new messages\n"
"correctly.\n"
"\n"
"The original idea was from Paul Graham's algorithm, which has an excellent\n"
"writeup at: http://www.paulgraham.com/spam.html\n"
"\n"
"Gary Robinson came up with the improved algorithm, which you can read about at:\n"
"http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html\n"
"\n"
"Then he, Tim Peters and the SpamBayes mailing list developed the Chi-Squared\n"
"test, see http://mail.python.org/pipermail/spambayes/2002-October/001036.html\n"
"for one of the earlier messages leading from the central limit theorem to\n"
"the current chi-squared scoring method.\n"
"\n"
"Thanks go to Isaac Yonemoto for providing a better icon, which we can\n"
"unfortunately no longer use, since the Hormel company wants people to\n"
"avoid associating their meat product with junk e-mail.\n"
"\n"
"Tokenising code updated in 2005 to use some of the tricks that SpamBayes\n"
"uses to extract words from messages. In particular, HTML is now handled.\n"
"\n"
"Usage: Specify the operation as the first argument followed by more\n"
"information as appropriate. The program's configuration will affect the\n"
"actual operation (things like the name of the database file to use, or\n"
"whether it should allow non-email messages to be added). In command line\n"
"mode it will do the operation and exit. In GUI/server mode a command line\n"
"invocation will just send the command to the running server. You can also\n"
"use BeOS scripting (see the \"Hey\" command which you can get from\n"
"http://www.bebits.com/app/2042 ) to control the Spam server. And finally,\n"
"there's also a GUI interface which shows up if you start it without any\n"
"command line arguments.\n"
"\n"
"Commands:\n"
"\n"
"Quit\n"
"Stop the program. Useful if it's running as a server.\n"
"\n";
the usage text. */
for (PropInfoPntr = g_ScriptingPropertyList + 0;
PropInfoPntr->name != 0;
PropInfoPntr++)
{
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
OutputStream << "Get " << PropInfoPntr->name << endl;
break;
case B_SET_PROPERTY:
OutputStream << "Set " << PropInfoPntr->name << " NewValue" << endl;
break;
case B_COUNT_PROPERTIES:
OutputStream << "Count " << PropInfoPntr->name << endl;
break;
case B_CREATE_PROPERTY:
OutputStream << "Create " << PropInfoPntr->name << endl;
break;
case B_DELETE_PROPERTY:
OutputStream << "Delete " << PropInfoPntr->name << endl;
break;
case B_EXECUTE_PROPERTY:
OutputStream << PropInfoPntr->name << endl;
break;
default:
OutputStream << "Buggy Command: " << PropInfoPntr->name << endl;
break;
}
WrapTextToStream (OutputStream, (char *)PropInfoPntr->usage);
OutputStream << endl;
}
return OutputStream;
}
* A utility function to send a command to the application, will return after a
* short delay if the application is busy (doesn't wait for it to be executed).
* The reply from the application is also thrown away. It used to be an
* overloaded function, but the system couldn't distinguish between bool and
* int, so now it has slightly different names depending on the arguments.
*/
static void
SubmitCommand (BMessage& CommandMessage)
{
status_t ErrorCode;
ErrorCode = be_app_messenger.SendMessage (&CommandMessage,
be_app_messenger ,
1000000 );
if (ErrorCode != B_OK)
cerr << "SubmitCommand failed to send a command, code " <<
ErrorCode << " (" << strerror (ErrorCode) << ")." << endl;
}
static void
SubmitCommandString (
PropertyNumbers Property,
uint32 CommandCode,
const char *StringArgument = NULL)
{
BMessage CommandMessage (CommandCode);
if (Property < 0 || Property >= PN_MAX)
{
DisplayErrorMessage ("SubmitCommandString bug.");
return;
}
CommandMessage.AddSpecifier (g_PropertyNames [Property]);
if (StringArgument != NULL)
CommandMessage.AddString (g_DataName, StringArgument);
SubmitCommand (CommandMessage);
}
static void
SubmitCommandInt32 (
PropertyNumbers Property,
uint32 CommandCode,
int32 Int32Argument)
{
BMessage CommandMessage (CommandCode);
if (Property < 0 || Property >= PN_MAX)
{
DisplayErrorMessage ("SubmitCommandInt32 bug.");
return;
}
CommandMessage.AddSpecifier (g_PropertyNames [Property]);
CommandMessage.AddInt32 (g_DataName, Int32Argument);
SubmitCommand (CommandMessage);
}
static void
SubmitCommandBool (
PropertyNumbers Property,
uint32 CommandCode,
bool BoolArgument)
{
BMessage CommandMessage (CommandCode);
if (Property < 0 || Property >= PN_MAX)
{
DisplayErrorMessage ("SubmitCommandBool bug.");
return;
}
CommandMessage.AddSpecifier (g_PropertyNames [Property]);
CommandMessage.AddBool (g_DataName, BoolArgument);
SubmitCommand (CommandMessage);
}
* A utility function which will estimate the spaminess of file(s), not
* callable from the application thread since it sends a scripting command to
* the application and waits for results. For each file there will be an entry
* reference in the message. For each of those, run it through the spam
* estimator and display a box with the results. This function is used both by
* the file requestor and by dragging and dropping into the middle of the words
* view.
*/
static void
EstimateRefFilesAndDisplay (BMessage *MessagePntr)
{
BAlert *AlertPntr;
BEntry Entry;
entry_ref EntryRef;
status_t ErrorCode;
int i, j;
BPath Path;
BMessage ReplyMessage;
BMessage ScriptingMessage;
const char *StringPntr;
float TempFloat;
int32 TempInt32;
char TempString [PATH_MAX + 1024 +
g_MaxInterestingWords * (g_MaxWordLength + 16)];
for (i = 0; MessagePntr->FindRef ("refs", i, &EntryRef) == B_OK; i++)
{
ErrorCode = Entry.SetTo (&EntryRef, true );
if (ErrorCode != B_OK || !Entry.Exists () || Entry.GetPath (&Path) != B_OK)
continue;
ScriptingMessage.MakeEmpty ();
ScriptingMessage.what = B_SET_PROPERTY;
ScriptingMessage.AddSpecifier (g_PropertyNames[PN_EVALUATE]);
ScriptingMessage.AddString (g_DataName, Path.Path ());
if (be_app_messenger.SendMessage (&ScriptingMessage,&ReplyMessage) != B_OK)
break;
if (ReplyMessage.FindInt32 ("error", &TempInt32) != B_OK ||
TempInt32 != B_OK)
break;
ReplyMessage.FindFloat (g_ResultName, &TempFloat);
sprintf (TempString, "%f spam ratio for \"%s\".\nThe top words are:",
(double) TempFloat, Path.Path ());
for (j = 0; j < 20 ; j++)
{
if (ReplyMessage.FindString ("words", j, &StringPntr) != B_OK ||
ReplyMessage.FindFloat ("ratios", j, &TempFloat) != B_OK)
break;
sprintf (TempString + strlen (TempString), "\n%s / %f",
StringPntr, TempFloat);
}
if (j >= 20 && j < g_MaxInterestingWords)
sprintf (TempString + strlen (TempString), "\nAnd up to %d more words.",
g_MaxInterestingWords - j);
AlertPntr = new BAlert ("Estimate", TempString, "OK");
if (AlertPntr != NULL) {
AlertPntr->SetFlags(AlertPntr->Flags() | B_CLOSE_ON_ESCAPE);
AlertPntr->Go ();
}
}
}
* A utility function from the http://sourceforge.net/projects/spambayes
* SpamBayes project. Return prob(chisq >= x2, with v degrees of freedom). It
* computes the probability that the chi-squared value (a kind of normalized
* error measurement), with v degrees of freedom, would be larger than a given
* number (x2; chi is the Greek letter X thus x2). So you can tell if the
* error is really unusual (the returned probability is near zero meaning that
* your measured error number is kind of large - actual chi-squared is rarely
* above that number merely due to random effects), or if it happens often
* (usually if the probability is over 5% then it's within 3 standard
* deviations - meaning that chi-squared goes over your number fairly often due
* merely to random effects). v must be even for this calculation to work.
*/
static double ChiSquaredProbability (double x2, int v)
{
int halfV = v / 2;
int i;
double m;
double sum;
double term;
if (v & 1)
return -1.0;
m = x2 / 2.0;
sum = term = exp (-m);
for (i = 1; i < halfV; i++)
{
term *= m / i;
sum += term;
}
platform exp(), can cause this to spill a few ULP above 1.0. For example,
ChiSquaredProbability(100, 300) on my box has sum == 1.0 + 2.0**-52 at this
point. Returning a value even a teensy bit over 1.0 is no good. */
if (sum > 1.0)
return 1.0;
return sum;
}
* A utility function to remove the "[Spam 99.9%] " from in front of the
* MAIL:subject attribute of a file.
*/
static status_t RemoveSpamPrefixFromSubjectAttribute (BNode *BNodePntr)
{
status_t ErrorCode;
const char *MailSubjectName = "MAIL:subject";
char *StringPntr;
char SubjectString [2000];
ErrorCode = BNodePntr->ReadAttr (MailSubjectName,
B_STRING_TYPE, 0 , SubjectString,
sizeof (SubjectString) - 1);
if (ErrorCode <= 0)
return 0;
if (ErrorCode >= (int) sizeof (SubjectString) - 1)
return 0;
SubjectString [ErrorCode] = 0;
ErrorCode = 0;
if (strncmp (SubjectString, "[Spam ", 6) == 0)
{
for (StringPntr = SubjectString;
*StringPntr != 0 && *StringPntr != ']'; StringPntr++)
;
if (StringPntr[0] == ']' && StringPntr[1] == ' ')
{
ErrorCode = BNodePntr->RemoveAttr (MailSubjectName);
ErrorCode = BNodePntr->WriteAttr (MailSubjectName,
B_STRING_TYPE, 0 ,
StringPntr + 2, strlen (StringPntr + 2) + 1);
if (ErrorCode > 0)
ErrorCode = 0;
}
}
return ErrorCode;
}
* The tokenizing functions. To make tokenization of the text easier to
* understand, it is broken up into several passes. Each pass goes over the
* text (can include NUL bytes) and extracts all the words it can recognise
* (can be none). The extracted words are added to the WordSet, with the
* PrefixCharacter prepended (zero if none) so we can distinguish between words
* found in headers and in the text body. It also modifies the input text
* buffer in-place to change the text that the next pass will see (blanking out
* words that it wants to delete, but not inserting much new text since the
* buffer can't be enlarged). They all return the number of bytes remaining in
* InputString after it has been modified to be input for the next pass.
* Returns zero if it has exhausted the possibility of getting more words, or
* if something goes wrong.
*/
static size_t TokenizerPassLowerCase (
char *BufferPntr,
size_t NumberOfBytes)
{
char *EndOfStringPntr;
EndOfStringPntr = BufferPntr + NumberOfBytes;
while (BufferPntr < EndOfStringPntr)
{
characters that have the high bit set. */
if (*BufferPntr >= 'A' && *BufferPntr <= 'Z')
*BufferPntr = *BufferPntr + ('a' - 'A');
BufferPntr++;
}
return NumberOfBytes;
}
we could use a nested procedure. But it's not. Adds the given word to the set
of words, checking for maximum word length and prepending the prefix to the
word, which gets modified by this function to reflect the word actually added
to the set. */
static void
AddWordAndPrefixToSet (
string &Word,
const char *PrefixString,
set<string> &WordSet)
{
if (Word.empty ())
return;
if (Word.size () > g_MaxWordLength)
Word.resize (g_MaxWordLength);
Word.insert (0, PrefixString);
WordSet.insert (Word);
}
separate words. Doesn't affect the text in the buffer. Looks for
protocol://user:password@computer:port/path?query=key#anchor strings. Also
www.blah strings are detected and broken down. Doesn't do HREF="" strings
where the string has a relative path (no host computer name). Assumes the
input buffer is already in lower case. */
static size_t TokenizerPassExtractURLs (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
char *AtSignStringPntr;
char *HostStringPntr;
char *InputStringEndPntr;
char *InputStringPntr;
char *OptionsStringPntr;
char *PathStringPntr;
char PrefixString [2];
char *ProtocolStringPntr;
string Word;
InputStringPntr = BufferPntr;
InputStringEndPntr = BufferPntr + NumberOfBytes;
PrefixString [0] = PrefixCharacter;
PrefixString [1] = 0;
while (InputStringPntr < InputStringEndPntr - 4)
{
HostStringPntr = NULL;
if (memcmp (InputStringPntr, "www.", 4) == 0)
HostStringPntr = InputStringPntr;
else if (memcmp (InputStringPntr, "://", 3) == 0)
{
ProtocolStringPntr = InputStringPntr;
while (ProtocolStringPntr > BufferPntr &&
isalpha (ProtocolStringPntr[-1]))
ProtocolStringPntr--;
Word.assign (ProtocolStringPntr,
(InputStringPntr - ProtocolStringPntr) + 1 );
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
HostStringPntr = InputStringPntr + 3;
}
if (HostStringPntr == NULL)
{
InputStringPntr++;
continue;
}
until the next slash or space, like "user:password@computer:port". */
InputStringPntr = HostStringPntr;
AtSignStringPntr = NULL;
while (InputStringPntr < InputStringEndPntr &&
(*InputStringPntr != '/' && !isspace (*InputStringPntr)))
{
if (*InputStringPntr == '@')
AtSignStringPntr = InputStringPntr;
InputStringPntr++;
}
if (AtSignStringPntr != NULL)
{
Word.assign (HostStringPntr,
AtSignStringPntr - HostStringPntr + 1 );
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
HostStringPntr = AtSignStringPntr + 1;
}
Word.assign (HostStringPntr, InputStringPntr - HostStringPntr);
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
separators (they're stored as separate options). Stops at white space or a
double quote mark. */
PathStringPntr = InputStringPntr;
OptionsStringPntr = NULL;
while (InputStringPntr < InputStringEndPntr &&
(*InputStringPntr != '"' && !isspace (*InputStringPntr)))
{
if (OptionsStringPntr == NULL &&
(*InputStringPntr == '?' || *InputStringPntr == '#'))
OptionsStringPntr = InputStringPntr;
InputStringPntr++;
}
if (OptionsStringPntr == NULL)
{
Word.assign (PathStringPntr, InputStringPntr - PathStringPntr);
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
}
else
{
Word.assign (PathStringPntr, OptionsStringPntr - PathStringPntr);
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
Word.assign (OptionsStringPntr, InputStringPntr - OptionsStringPntr);
AddWordAndPrefixToSet (Word, PrefixString, WordSet);
}
}
return NumberOfBytes;
}
character in the word. */
static size_t TokenizerPassTruncateLongAsianWords (
char *BufferPntr,
size_t NumberOfBytes)
{
char *EndOfStringPntr;
char *InputStringPntr;
int Letter;
char *OutputStringPntr;
char *StartOfInputLongUnicodeWord;
char *StartOfOutputLongUnicodeWord;
InputStringPntr = BufferPntr;
EndOfStringPntr = InputStringPntr + NumberOfBytes;
OutputStringPntr = InputStringPntr;
StartOfInputLongUnicodeWord = NULL;
StartOfOutputLongUnicodeWord = NULL;
a sequence of UTF-8 characters that is too long then truncate it down to one
character and reset the output pointer to be after that character, thus
deleting the word. Replacing the deleted characters after it with spaces
won't work since we need to preserve the lack of space to handle those sneaky
HTML artificial word breakers. So that Thelongword<blah>ing becomes
"T<blah>ing" rather than "T <blah>ing", so the next step joins them up into
"Ting" rather than "T" and "ing". The first code in a UTF-8 character is
11xxxxxx and subsequent ones are 10xxxxxx. */
while (InputStringPntr < EndOfStringPntr)
{
Letter = (unsigned char) *InputStringPntr;
if (Letter < 128)
{
if (StartOfInputLongUnicodeWord != NULL)
{
if (InputStringPntr - StartOfInputLongUnicodeWord >
(int) g_MaxWordLength * 2)
{
back down to the first UTF-8 character, so find out where the first
character ends (skip past the 10xxxxxx bytes), and rewind the output
pointer to be just after that (ignoring the rest of the long word in
effect). */
OutputStringPntr = StartOfOutputLongUnicodeWord + 1;
while (OutputStringPntr < InputStringPntr)
{
Letter = (unsigned char) *OutputStringPntr;
if (Letter < 128 || Letter >= 192)
break;
++OutputStringPntr;
}
}
StartOfInputLongUnicodeWord = NULL;
}
}
else if (Letter >= 192 && StartOfInputLongUnicodeWord == NULL)
{
if this is a too long UTF-8 word, which is often a whole sentence in
asian languages, since they sort of use a single character per word. */
StartOfInputLongUnicodeWord = InputStringPntr;
StartOfOutputLongUnicodeWord = OutputStringPntr;
}
*OutputStringPntr++ = *InputStringPntr++;
}
return OutputStringPntr - BufferPntr;
}
The characters considered white space are defined by g_SpaceCharacters. This
function is also used as a subroutine by other tokenizer functions when they
have a bunch of presumably plain text they want broken into words and added. */
static size_t TokenizerPassGetPlainWords (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
string AccumulatedWord;
char *EndOfStringPntr;
size_t Length;
int Letter;
if (NumberOfBytes <= 0)
return 0;
if (PrefixCharacter != 0)
AccumulatedWord = PrefixCharacter;
EndOfStringPntr = BufferPntr + NumberOfBytes;
while (true)
{
if (BufferPntr >= EndOfStringPntr)
Letter = EOF;
else
Letter = (unsigned char) *BufferPntr++;
like dashes and periods aren't considered as space. Note that codes above
127 are UTF-8 characters, which we consider non-space. */
if (Letter < 0 ||
(Letter < 128 && g_SpaceCharacters[Letter]))
{
while ((Length = AccumulatedWord.size()) > 0 &&
AccumulatedWord [Length-1] == '.')
AccumulatedWord.resize (Length - 1);
words which are too big (it's probably some binary encoded data). But
leave room for supercalifragilisticexpialidoceous. According to one web
site, pneumonoultramicroscopicsilicovolcanoconiosis is the longest word
currently in English. Note that some uuencoded data was seen with a 60
character line length. */
if (PrefixCharacter != 0)
Length--;
if (Length > 0 && Length <= g_MaxWordLength)
WordSet.insert (AccumulatedWord);
start it off with the prefix character if any. */
if (PrefixCharacter != 0)
AccumulatedWord = PrefixCharacter;
else
AccumulatedWord.resize (0);
}
else
AccumulatedWord.append (1 , (char) Letter);
if (Letter < 0)
break;
}
return NumberOfBytes;
}
end string, such as "<!--" and "--> for HTML comment things. All the text
between the markers will be added to the word list before it gets deleted from
the buffer. The markers must be prepared in lower case and the buffer is
assumed to have already been converted to lower case. You can specify an empty
string for the end marker if you're just matching a string constant like
" ", which you would put in the starting marker. This is a utility
function used by other tokenizer functions. */
static size_t TokenizerUtilRemoveStartEndThing (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet,
const char *ThingStartCode,
const char *ThingEndCode,
bool ReplaceWithSpace)
{
char *EndOfStringPntr;
bool FoundAndDeletedThing;
char *InputStringPntr;
char *OutputStringPntr;
int ThingEndLength;
char *ThingEndPntr;
int ThingStartLength;
InputStringPntr = BufferPntr;
EndOfStringPntr = InputStringPntr + NumberOfBytes;
OutputStringPntr = InputStringPntr;
ThingStartLength = strlen (ThingStartCode);
ThingEndLength = strlen (ThingEndCode);
if (ThingStartLength <= 0)
return NumberOfBytes;
while (InputStringPntr < EndOfStringPntr)
{
FoundAndDeletedThing = false;
if (EndOfStringPntr - InputStringPntr >=
ThingStartLength + ThingEndLength &&
*InputStringPntr == *ThingStartCode &&
memcmp (InputStringPntr, ThingStartCode, ThingStartLength) == 0)
{
empty string, then we've found it right now! */
ThingEndPntr = InputStringPntr + ThingStartLength;
while (EndOfStringPntr - ThingEndPntr >= ThingEndLength)
{
if (ThingEndLength == 0 ||
(*ThingEndPntr == *ThingEndCode &&
memcmp (ThingEndPntr, ThingEndCode, ThingEndLength) == 0))
{
and end markers into the words list. */
TokenizerPassGetPlainWords (InputStringPntr + ThingStartLength,
ThingEndPntr - (InputStringPntr + ThingStartLength),
PrefixCharacter, WordSet);
pointer to just after the ending tag. */
InputStringPntr = ThingEndPntr + ThingEndLength;
if (ReplaceWithSpace)
*OutputStringPntr++ = ' ';
FoundAndDeletedThing = true;
break;
}
ThingEndPntr++;
}
}
if (!FoundAndDeletedThing)
*OutputStringPntr++ = *InputStringPntr++;
}
return OutputStringPntr - BufferPntr;
}
static size_t TokenizerPassRemoveHTMLComments (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
return TokenizerUtilRemoveStartEndThing (BufferPntr, NumberOfBytes,
PrefixCharacter, WordSet, "<!--", "-->", false);
}
static size_t TokenizerPassRemoveHTMLStyle (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
return TokenizerUtilRemoveStartEndThing (BufferPntr, NumberOfBytes,
PrefixCharacter, WordSet,
"<style", "/style>", false );
}
start of the next sentence is recognised at least as the start of a very long
word. The Japanese comma also does the same job. */
static size_t TokenizerPassJapanesePeriodsToSpaces (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
size_t BytesRemaining = NumberOfBytes;
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, "。" , "", true);
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, "、" , "", true);
return BytesRemaining;
}
before being deleted. <P>, <BR> and are replaced by spaces at this
stage while other HTML things get replaced by nothing. */
static size_t TokenizerPassRemoveHTMLTags (
char *BufferPntr,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
size_t BytesRemaining = NumberOfBytes;
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, " ", "", true);
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, "<p", ">", true);
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, "<br", ">", true);
BytesRemaining = TokenizerUtilRemoveStartEndThing (BufferPntr,
BytesRemaining, PrefixCharacter, WordSet, "<", ">", false);
return BytesRemaining;
}
* Implementation of the ABSApp class, constructor, destructor and the rest of
* the member functions in mostly alphabetical order.
*/
ABSApp::ABSApp ()
: BApplication (g_ABSAppSignature),
m_DatabaseHasChanged (false),
m_SettingsHaveChanged (false)
{
status_t ErrorCode;
int HalvingCount;
int i;
const void *ResourceData;
size_t ResourceSize;
BResources *ResourcesPntr;
MakeDatabaseEmpty ();
the actual settings are loaded later on (or set to defaults) by the main()
function, before this BApplication starts running. So we don't bother
initialising the other setting related variables here. */
ErrorCode =
find_directory (B_USER_SETTINGS_DIRECTORY, &m_SettingsDirectoryPath);
if (ErrorCode == B_OK)
ErrorCode = m_SettingsDirectoryPath.Append (g_SettingsDirectoryName);
if (ErrorCode != B_OK)
m_SettingsDirectoryPath.SetTo (".");
are not. Spaces are all control characters and all punctuation except for:
apostrophe (so "it's" and possessive versions of words get stored), dash (for
hyphenated words), dollar sign (for cash amounts), period (for IP addresses,
we later remove trailing periods). */
memset (g_SpaceCharacters, 1, sizeof (g_SpaceCharacters));
g_SpaceCharacters['\''] = false;
g_SpaceCharacters['-'] = false;
g_SpaceCharacters['$'] = false;
g_SpaceCharacters['.'] = false;
for (i = '0'; i <= '9'; i++)
g_SpaceCharacters[i] = false;
for (i = 'A'; i <= 'Z'; i++)
g_SpaceCharacters[i] = false;
for (i = 'a'; i <= 'z'; i++)
g_SpaceCharacters[i] = false;
if ((ResourcesPntr = AppResources ()) != NULL && (ResourceData =
ResourcesPntr->LoadResource ('CURS', "Busy Cursor", &ResourceSize)) != NULL
&& ResourceSize >= 68 )
g_BusyCursor = new BCursor (ResourceData);
m_SmallestUseableDouble = 1.0;
HalvingCount = 0;
while (HalvingCount < 10000 && m_SmallestUseableDouble > 0.0)
{
HalvingCount++;
m_SmallestUseableDouble /= 2;
}
precision bits and a bit of extra margin for intermediate results in future
calculations. */
HalvingCount -= 50 + sizeof (double) * 8;
m_SmallestUseableDouble = 1.0;
while (HalvingCount > 0)
{
HalvingCount--;
m_SmallestUseableDouble /= 2;
}
}
ABSApp::~ABSApp ()
{
status_t ErrorCode;
char ErrorMessage [PATH_MAX + 1024];
if (m_SettingsHaveChanged)
LoadSaveSettings (false );
if ((ErrorCode = SaveDatabaseIfNeeded (ErrorMessage)) != B_OK)
DisplayErrorMessage (ErrorMessage, ErrorCode, "Exiting Error");
delete g_BusyCursor;
g_BusyCursor = NULL;
}
void
ABSApp::AboutRequested ()
{
BAlert *AboutAlertPntr;
AboutAlertPntr = new BAlert ("About",
"SpamDBM - Spam Database Manager\n\n"
"This is a BeOS program for classifying e-mail messages as spam (unwanted \
junk mail) or as genuine mail using a Bayesian statistical approach. There \
is also a Mail Daemon Replacement add-on to filter mail using the \
classification statistics collected earlier.\n\n"
"Written by Alexander G. M. Smith, fall 2002.\n\n"
"The original idea was from Paul Graham's algorithm, which has an excellent \
writeup at: http://www.paulgraham.com/spam.html\n\n"
"Gary Robinson came up with the improved algorithm, which you can read about \
at: http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html\n\n"
"Mr. Robinson, Tim Peters and the SpamBayes mailing list people then \
developed the even better chi-squared scoring method.\n\n"
"Icon courtesy of Isaac Yonemoto, though it is no longer used since Hormel \
doesn't want their meat product associated with junk e-mail.\n\n"
"Tokenising code updated in 2005 to use some of the tricks that SpamBayes \
uses to extract words from messages. In particular, HTML is now handled.\n\n"
"Released to the public domain, with no warranty.\n"
"$Revision: 30630 $\n"
"Compiled on " __DATE__ " at " __TIME__ ".", "Done");
if (AboutAlertPntr != NULL)
{
AboutAlertPntr->SetFlags(AboutAlertPntr->Flags() | B_CLOSE_ON_ESCAPE);
AboutAlertPntr->Go ();
}
}
genuine message, or removes it from the database if you claim it is
CL_UNCERTAIN. Also resets the spam ratio attribute to show the effect of the
database change. */
status_t ABSApp::AddFileToDatabase (
ClassificationTypes IsSpamOrWhat,
const char *FileName,
char *ErrorMessage)
{
status_t ErrorCode;
BFile MessageFile;
BMessage TempBMessage;
ErrorCode = MessageFile.SetTo (FileName, B_READ_ONLY);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "Unable to open file \"%s\" for reading", FileName);
return ErrorCode;
}
ErrorCode = AddPositionIOToDatabase (IsSpamOrWhat,
&MessageFile, FileName, ErrorMessage);
MessageFile.Unset ();
if (ErrorCode != B_OK)
return ErrorCode;
return EvaluateFile (FileName, &TempBMessage, ErrorMessage);
}
will be added to the database (incrementing the count for the number of
messages using each word, either the spam or genuine count depending on
IsSpamOrWhat). It will remove the message (decrement the word counts) if you
specify CL_UNCERTAIN as the new classification. And if it switches from spam
to genuine or vice versa, it will do both - decrement the counts for the old
class and increment the counts for the new one. An attribute will be added to
MessageIOPntr (if it is a file) to record that it has been marked as Spam or
Genuine (so that it doesn't get added to the database a second time). If it is
being removed from the database, the classification attribute gets removed too.
If things go wrong, a non-zero error code will be returned and an explanation
written to ErrorMessage (assumed to be at least PATH_MAX + 1024 bytes long).
OptionalFileName is just used in the error message to identify the file to the
user. */
status_t ABSApp::AddPositionIOToDatabase (
ClassificationTypes IsSpamOrWhat,
BPositionIO *MessageIOPntr,
const char *OptionalFileName,
char *ErrorMessage)
{
BNode *BNodePntr;
char ClassificationString [NAME_MAX];
StatisticsMap::iterator DataIter;
status_t ErrorCode = 0;
pair<StatisticsMap::iterator,bool> InsertResult;
uint32 NewAge;
StatisticsRecord NewStatistics;
ClassificationTypes PreviousClassification;
StatisticsPointer StatisticsPntr;
set<string>::iterator WordEndIter;
set<string>::iterator WordIter;
set<string> WordSet;
NewAge = m_TotalGenuineMessages + m_TotalSpamMessages;
if (NewAge >= 0xFFFFFFF0UL)
{
sprintf (ErrorMessage,
"The database is full! There are %" B_PRIu32 " messages in "
"it and we can't add any more without overflowing the maximum integer "
"representation in 32 bits", NewAge);
return B_NO_MEMORY;
}
PreviousClassification = CL_UNCERTAIN;
BNodePntr = dynamic_cast<BNode *> (MessageIOPntr);
if (BNodePntr != NULL)
{
ErrorCode = BNodePntr->ReadAttr (g_AttributeNameClassification,
B_STRING_TYPE, 0 , ClassificationString,
sizeof (ClassificationString) - 1);
if (ErrorCode <= 0)
strcpy (ClassificationString, "none");
else
ClassificationString [ErrorCode] = 0;
if (strcasecmp (ClassificationString, g_ClassifiedSpam) == 0)
PreviousClassification = CL_SPAM;
else if (strcasecmp (ClassificationString, g_ClassifiedGenuine) == 0)
PreviousClassification = CL_GENUINE;
}
if (!m_IgnorePreviousClassification &&
PreviousClassification != CL_UNCERTAIN)
{
if (IsSpamOrWhat == PreviousClassification)
{
sprintf (ErrorMessage, "Ignoring file \"%s\" since it seems to have "
"already been classified as %s.", OptionalFileName,
g_ClassificationTypeNames [IsSpamOrWhat]);
}
else
{
sprintf (ErrorMessage, "Changing existing classification of file \"%s\" "
"from %s to %s.", OptionalFileName,
g_ClassificationTypeNames [PreviousClassification],
g_ClassificationTypeNames [IsSpamOrWhat]);
}
DisplayErrorMessage (ErrorMessage, 0, "Note");
}
if (!m_IgnorePreviousClassification &&
IsSpamOrWhat == PreviousClassification)
want double classification. */
return B_OK;
ErrorCode = GetWordsFromPositionIO (MessageIOPntr, OptionalFileName,
WordSet, ErrorMessage);
if (ErrorCode != B_OK)
return ErrorCode;
reclassifying a message. */
m_DatabaseHasChanged = true;
if (!m_IgnorePreviousClassification &&
PreviousClassification == CL_SPAM && m_TotalSpamMessages > 0)
m_TotalSpamMessages--;
if (IsSpamOrWhat == CL_SPAM)
m_TotalSpamMessages++;
if (!m_IgnorePreviousClassification &&
PreviousClassification == CL_GENUINE && m_TotalGenuineMessages > 0)
m_TotalGenuineMessages--;
if (IsSpamOrWhat == CL_GENUINE)
m_TotalGenuineMessages++;
fails. */
if (BNodePntr != NULL)
{
ErrorCode = BNodePntr->RemoveAttr (g_AttributeNameClassification);
if (IsSpamOrWhat != CL_UNCERTAIN)
{
strcpy (ClassificationString, g_ClassificationTypeNames [IsSpamOrWhat]);
ErrorCode = BNodePntr->WriteAttr (g_AttributeNameClassification,
B_STRING_TYPE, 0 ,
ClassificationString, strlen (ClassificationString) + 1);
}
}
for each word as appropriate. */
WordEndIter = WordSet.end ();
for (WordIter = WordSet.begin (); WordIter != WordEndIter; WordIter++)
{
if ((DataIter = m_WordMap.find (*WordIter)) == m_WordMap.end ())
{
if (IsSpamOrWhat == CL_UNCERTAIN)
continue;
memset (&NewStatistics, 0, sizeof (NewStatistics));
InsertResult = m_WordMap.insert (
StatisticsMap::value_type (*WordIter, NewStatistics));
if (!InsertResult.second)
{
sprintf (ErrorMessage, "Failed to insert new database entry for "
"word \"%s\", while processing file \"%s\"",
WordIter->c_str (), OptionalFileName);
return B_NO_MEMORY;
}
DataIter = InsertResult.first;
m_WordCount++;
}
StatisticsPntr = &DataIter->second;
StatisticsPntr->age = NewAge;
find the next older age. Since it's only used for display, we'll let it be
slightly incorrect. The next database load or purge will fix it. */
if (IsSpamOrWhat == CL_SPAM)
StatisticsPntr->spamCount++;
if (IsSpamOrWhat == CL_GENUINE)
StatisticsPntr->genuineCount++;
if (!m_IgnorePreviousClassification &&
PreviousClassification == CL_SPAM && StatisticsPntr->spamCount > 0)
StatisticsPntr->spamCount--;
if (!m_IgnorePreviousClassification &&
PreviousClassification == CL_GENUINE && StatisticsPntr->genuineCount > 0)
StatisticsPntr->genuineCount--;
}
return B_OK;
}
genuine message. */
status_t ABSApp::AddStringToDatabase (
ClassificationTypes IsSpamOrWhat,
const char *String,
char *ErrorMessage)
{
BMemoryIO MemoryIO (String, strlen (String));
return AddPositionIOToDatabase (IsSpamOrWhat, &MemoryIO,
"Memory Buffer" , ErrorMessage);
}
extract words from HTML), and add them to the set. Allow NULs in the text. If
the PrefixCharacter isn't zero then it is prepended to all words found (so you
can distinguish words as being from a header or from the body text). See also
TokenizeWhole which does something similar. */
void
ABSApp::AddWordsToSet (
const char *InputString,
size_t NumberOfBytes,
char PrefixCharacter,
set<string> &WordSet)
{
char *BufferPntr;
size_t CurrentSize;
int PassNumber;
fragments and other junk are deleted. */
BufferPntr = new char [NumberOfBytes];
if (BufferPntr == NULL)
return;
memcpy (BufferPntr, InputString, NumberOfBytes);
and may add words to the word set. */
CurrentSize = NumberOfBytes;
for (PassNumber = 1; PassNumber <= 8 && CurrentSize > 0 ; PassNumber++)
{
switch (PassNumber)
{
case 1:
CurrentSize = TokenizerPassLowerCase (BufferPntr, CurrentSize);
break;
case 2: CurrentSize = TokenizerPassJapanesePeriodsToSpaces (
BufferPntr, CurrentSize, PrefixCharacter, WordSet); break;
case 3: CurrentSize = TokenizerPassTruncateLongAsianWords (
BufferPntr, CurrentSize); break;
case 4: CurrentSize = TokenizerPassRemoveHTMLComments (
BufferPntr, CurrentSize, 'Z', WordSet); break;
case 5: CurrentSize = TokenizerPassRemoveHTMLStyle (
BufferPntr, CurrentSize, 'Z', WordSet); break;
case 6: CurrentSize = TokenizerPassExtractURLs (
BufferPntr, CurrentSize, 'Z', WordSet); break;
case 7: CurrentSize = TokenizerPassRemoveHTMLTags (
BufferPntr, CurrentSize, 'Z', WordSet); break;
case 8: CurrentSize = TokenizerPassGetPlainWords (
BufferPntr, CurrentSize, PrefixCharacter, WordSet); break;
default: break;
}
}
delete [] BufferPntr;
}
separate attempt to invoke the program (this application's resource/attributes
have the launch flags set to "single launch", so the shell doesn't start the
program but instead sends the arguments to the already running instance). In
either case, the command is sent to an intermediary thread where it is
asynchronously converted into a scripting message(s) that are sent back to this
BApplication. The intermediary is needed since we can't recursively execute
scripting messages while processing a message (this ArgsReceived one). */
void
ABSApp::ArgvReceived (int32 argc, char **argv)
{
if (g_CommanderLooperPntr != NULL)
g_CommanderLooperPntr->CommandArguments (argc, argv);
}
immediately, otherwise other operations will see the empty database and then
try to load the file, and complain that it doesn't exist. Now they will see
the empty database and redundantly load the empty file. */
status_t ABSApp::CreateDatabaseFile (char *ErrorMessage)
{
MakeDatabaseEmpty ();
m_DatabaseHasChanged = true;
return SaveDatabaseIfNeeded (ErrorMessage);
}
file or it is obsolete. */
void
ABSApp::DefaultSettings ()
{
status_t ErrorCode;
BPath DatabasePath (m_SettingsDirectoryPath);
char TempString [PATH_MAX];
ErrorCode = DatabasePath.Append (g_DefaultDatabaseFileName);
if (ErrorCode != B_OK)
strcpy (TempString, g_DefaultDatabaseFileName);
else
strcpy (TempString, DatabasePath.Path ());
m_DatabaseFileName.SetTo (TempString);
m_IgnorePreviousClassification = true;
g_ServerMode = true;
m_PurgeAge = 2000;
m_PurgePopularity = 2;
m_ScoringMode = SM_CHISQUARED;
m_TokenizeMode = TM_ANY_TEXT_HEADER;
m_SettingsHaveChanged = true;
}
marks it as not changed so that it doesn't get written out when the program
exits. */
status_t ABSApp::DeleteDatabaseFile (char *ErrorMessage)
{
BEntry FileEntry;
status_t ErrorCode;
int i;
char TempString [PATH_MAX+20];
MakeDatabaseEmpty ();
m_DatabaseHasChanged = false;
for (i = 0; i < g_MaxBackups; i++)
{
strcpy (TempString, m_DatabaseFileName.String ());
sprintf (TempString + strlen (TempString), g_BackupSuffix, i);
ErrorCode = FileEntry.SetTo (TempString);
if (ErrorCode == B_OK)
FileEntry.Remove ();
}
strcpy (TempString, m_DatabaseFileName.String ());
ErrorCode = FileEntry.SetTo (TempString);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "While deleting, failed to make BEntry for "
"\"%s\" (does the directory exist?)", TempString);
return ErrorCode;
}
ErrorCode = FileEntry.Remove ();
if (ErrorCode != B_OK)
sprintf (ErrorMessage, "While deleting, failed to remove file "
"\"%s\"", TempString);
return ErrorCode;
}
resulting spam probability ratio. If it also has an e-mail subject attribute,
remove the [Spam 99.9%] prefix since the number usually changes. */
status_t ABSApp::EvaluateFile (
const char *PathName,
BMessage *ReplyMessagePntr,
char *ErrorMessage)
{
status_t ErrorCode;
float TempFloat;
BFile TextFile;
ErrorCode = TextFile.SetTo (PathName, B_READ_ONLY);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "Problems opening file \"%s\" for evaluating",
PathName);
return ErrorCode;
}
ErrorCode =
EvaluatePositionIO (&TextFile, PathName, ReplyMessagePntr, ErrorMessage);
if (ErrorCode == B_OK &&
ReplyMessagePntr->FindFloat (g_ResultName, &TempFloat) == B_OK)
{
TextFile.WriteAttr (g_AttributeNameSpamRatio, B_FLOAT_TYPE,
0 , &TempFloat, sizeof (TempFloat));
blindly remove the prefix, which would have the wrong percentage. */
RemoveSpamPrefixFromSubjectAttribute (&TextFile);
}
return ErrorCode;
}
for spaminess. The output is added to the ReplyMessagePntr message, with the
probability ratio stored in "result" (0.0 means genuine and 1.0 means spam).
It also adds the most significant words (used in the ratio calculation) to the
array "words" and the associated per-word probability ratios in "ratios". If
it fails, an error code is returned and an error message written to the
ErrorMessage string (which is at least MAX_PATH + 1024 bytes long).
OptionalFileName is only used in the error message.
The math used for combining the individual word probabilities in my method is
based on Gary Robinson's method (formerly it was a variation of Paul Graham's
method) or the Chi-Squared method. It's input is the database of words that
has a count of the number of spam and number of genuine messages each word
appears in (doesn't matter if it appears more than once in a message, it still
counts as 1).
The spam word count is divided the by the total number of spam e-mail messages
in the database to get the probability of spam and probability of genuineness
is similarly computed for a particular word. The spam probability is divided
by the sum of the spam and genuine probabilities to get the Raw Spam Ratio for
the word. It's nearer to 0.0 for genuine and nearer to 1.0 for spam, and can
be exactly zero or one too.
To avoid multiplying later results by zero, and to compensate for a lack of
data points, the Raw Spam Ratio is adjusted towards the 0.5 halfway point. The
0.5 is combined with the raw spam ratio, with a weight of 0.45 (determined to
be a good value by the "spambayes" mailing list tests) messages applied to the
half way point and a weight of the number of spam + genuine messages applied to
the raw spam ratio. This gives you the compensated spam ratio for the word.
The top N (150 was good in the spambayes tests) extreme words are selected by
the distance of each word's compensated spam ratio from 0.5. Then the ratios
of the words are combined.
The Gary Robinson combining (scoring) method gets one value from the Nth root
of the product of all the word ratios. The other is the Nth root of the
product of (1 - ratio) for all the words. The final result is the first value
divided by the sum of the two values. The Nth root helps spread the resulting
range of values more evenly between 0.0 and 1.0, otherwise the values all clump
together at 0 or 1. Also you can think of the Nth root as a kind of average
for products; it's like a generic word probability which when multiplied by
itself N times gives you the same result as the N separate actual word
probabilities multiplied together.
The Chi-Squared combining (scoring) method assumes that the spam word
probabilities are uniformly distributed and computes an error measurement
(called chi squared - see http://bmj.com/collections/statsbk/8.shtml for a good
tutorial) and then sees how likely that error value would be observed in
practice. If it's rare to observe, then the words are likely not just randomly
occuring and it's spammy. The same is done for genuine words. The two
resulting unlikelynesses are compared to see which is more unlikely, if neither
is, then the method says it can't decide. The SpamBayes notes (see the
classifier.py file in CVS in http://sourceforge.net/projects/spambayes) say:
"Across vectors of length n, containing random uniformly-distributed
probabilities, -2*sum(ln(p_i)) follows the chi-squared distribution with 2*n
degrees of freedom. This has been proven (in some appropriate sense) to be the
most sensitive possible test for rejecting the hypothesis that a vector of
probabilities is uniformly distributed. Gary Robinson's original scheme was
monotonic *with* this test, but skipped the details. Turns out that getting
closer to the theoretical roots gives a much sharper classification, with a
very small (in # of msgs), but also very broad (in range of scores), "middle
ground", where most of the mistakes live. In particular, this scheme seems
immune to all forms of "cancellation disease": if there are many strong ham
*and* spam clues, this reliably scores close to 0.5. Most other schemes are
extremely certain then -- and often wrong."
I did a test with 448 example genuine messages including personal mail (some
with HTML attachments) and mailing lists, and 267 spam messages for 27471 words
total. Test messages were more recent messages in the same groups. Out of 100
test genuine messages, with Gary Robinson (0.56 cutoff limit), 1 (1%) was
falsely identified as spam and 8 of 73 (11%) spam messages were incorrectly
classified as genuine. With my variation of Paul Graham's scheme (0.90 cutoff)
I got 6 of 100 (6%) genuine messages incorrectly marked as spam and 2 of 73
(3%) spam messages were incorrectly classified as genuine. Pretty close, but
Robinson's values are more evenly spread out so you can tell just how spammy it
is by looking at the number. */
struct WordAndRatioStruct
{
double probabilityRatio;
const string *wordPntr;
bool operator() (
const WordAndRatioStruct &ItemA,
const WordAndRatioStruct &ItemB) const
{
return
(fabs (ItemA.probabilityRatio - 0.5) <
fabs (ItemB.probabilityRatio - 0.5));
};
};
status_t ABSApp::EvaluatePositionIO (
BPositionIO *PositionIOPntr,
const char *OptionalFileName,
BMessage *ReplyMessagePntr,
char *ErrorMessage)
{
StatisticsMap::iterator DataEndIter;
StatisticsMap::iterator DataIter;
status_t ErrorCode;
double GenuineProbability;
uint32 GenuineSpamSum;
int i;
priority_queue<
WordAndRatioStruct ,
vector<WordAndRatioStruct> ,
WordAndRatioStruct >
PriorityQueue;
double ProductGenuine;
double ProductLogGenuine;
double ProductLogSpam;
double ProductSpam;
double RawProbabilityRatio;
float ResultRatio;
double SpamProbability;
StatisticsPointer StatisticsPntr;
double TempDouble;
double TotalGenuine;
double TotalSpam;
WordAndRatioStruct WordAndRatio;
set<string>::iterator WordEndIter;
set<string>::iterator WordIter;
const WordAndRatioStruct *WordRatioPntr;
set<string> WordSet;
ErrorCode = GetWordsFromPositionIO (PositionIOPntr, OptionalFileName,
WordSet, ErrorMessage);
if (ErrorCode != B_OK)
return ErrorCode;
the numbers involved (to avoid the overhead of multiple conversions from
integer to double), with extra precautions to avoid divide by zero. */
if (m_TotalGenuineMessages <= 0)
TotalGenuine = 1.0;
else
TotalGenuine = m_TotalGenuineMessages;
if (m_TotalSpamMessages <= 0)
TotalSpam = 1.0;
else
TotalSpam = m_TotalSpamMessages;
ratio. The results are stored in a priority queue so that we can later find
the top g_MaxInterestingWords for doing the actual determination. */
WordEndIter = WordSet.end ();
DataEndIter = m_WordMap.end ();
for (WordIter = WordSet.begin (); WordIter != WordEndIter; WordIter++)
{
WordAndRatio.wordPntr = &(*WordIter);
if ((DataIter = m_WordMap.find (*WordIter)) != DataEndIter)
{
StatisticsPntr = &DataIter->second;
genuine. Then the raw probability ratio. */
SpamProbability = StatisticsPntr->spamCount / TotalSpam;
GenuineProbability = StatisticsPntr->genuineCount / TotalGenuine;
if (SpamProbability + GenuineProbability > 0)
RawProbabilityRatio =
SpamProbability / (SpamProbability + GenuineProbability);
else
RawProbabilityRatio = 0.5;
data points, with a weight of 0.45 (g_RobinsonS). */
GenuineSpamSum =
StatisticsPntr->spamCount + StatisticsPntr->genuineCount;
WordAndRatio.probabilityRatio =
(g_RobinsonS * g_RobinsonX + GenuineSpamSum * RawProbabilityRatio) /
(g_RobinsonS + GenuineSpamSum);
}
else
WordAndRatio.probabilityRatio = g_RobinsonX;
PriorityQueue.push (WordAndRatio);
}
words. To avoid numeric underflow (doubles can only get as small as 1E-300),
logarithms are also used. But avoid the logarithms (sum of logs of numbers
is the same as the product of numbers) as much as possible due to reduced
accuracy and slowness. */
ProductGenuine = 1.0;
ProductLogGenuine = 0.0;
ProductSpam = 1.0;
ProductLogSpam = 0.0;
for (i = 0;
i < g_MaxInterestingWords && !PriorityQueue.empty();
i++, PriorityQueue.pop())
{
WordRatioPntr = &PriorityQueue.top();
ProductSpam *= WordRatioPntr->probabilityRatio;
ProductGenuine *= 1.0 - WordRatioPntr->probabilityRatio;
If they are, move the value into the logarithm storage part. */
if (ProductSpam < m_SmallestUseableDouble)
{
ProductLogSpam += log (ProductSpam);
ProductSpam = 1.0;
}
if (ProductGenuine < m_SmallestUseableDouble)
{
ProductLogGenuine += log (ProductGenuine);
ProductGenuine = 1.0;
}
ReplyMessagePntr->AddString ("words", WordRatioPntr->wordPntr->c_str ());
ReplyMessagePntr->AddFloat ("ratios", WordRatioPntr->probabilityRatio);
}
if (i > 0)
{
ProductLogSpam += log (ProductSpam);
ProductLogGenuine += log (ProductGenuine);
}
if (m_ScoringMode == SM_ROBINSON)
{
products. This is easiest in logarithm form. */
if (i > 0)
{
ProductSpam = exp (ProductLogSpam / i);
ProductGenuine = exp (ProductLogGenuine / i);
ResultRatio = ProductSpam / (ProductGenuine + ProductSpam);
}
else
ResultRatio = g_RobinsonX;
}
else if (m_ScoringMode == SM_CHISQUARED)
{
for ham and one for spam. The sum-of-the-logs business is more sensitive
to probs near 0 than to probs near 1, so the spam measure uses 1-p (so that
high-spamprob words have greatest effect), and the ham measure uses p
directly (so that lo-spamprob words have greatest effect)." That means we
just reversed the meaning of the previously calculated spam and genuine
products! Oh well. */
TempDouble = ProductLogSpam;
ProductLogSpam = ProductLogGenuine;
ProductLogGenuine = TempDouble;
if (i > 0)
{
ProductSpam =
1.0 - ChiSquaredProbability (-2.0 * ProductLogSpam, 2 * i);
ProductGenuine =
1.0 - ChiSquaredProbability (-2.0 * ProductLogGenuine, 2 * i);
score? We originally used (S-H)/(S+H) scaled into [0., 1.], which equals
S/(S+H). A systematic problem is that we could end up being near-certain
a thing was (for example) spam, even if S was small, provided that H was
much smaller. Rob Hooft stared at these problems and invented the
measure we use now, the simpler S-H, scaled into [0., 1.]." */
ResultRatio = (ProductSpam - ProductGenuine + 1.0) / 2.0;
}
else
ResultRatio = 0.5;
}
else
{
strcpy (ErrorMessage, "Unknown scoring mode specified in settings");
return B_BAD_VALUE;
}
ReplyMessagePntr->AddFloat (g_ResultName, ResultRatio);
return B_OK;
}
status_t ABSApp::EvaluateString (
const char *BufferPntr,
ssize_t BufferSize,
BMessage *ReplyMessagePntr,
char *ErrorMessage)
{
BMemoryIO MemoryIO (BufferPntr, BufferSize);
return EvaluatePositionIO (&MemoryIO, "Memory Buffer",
ReplyMessagePntr, ErrorMessage);
}
command: "hey application/x-vnd.agmsmith.spamdbm getsuites" to
see it in action (this program has to be already running for it to work). */
status_t ABSApp::GetSupportedSuites (BMessage *MessagePntr)
{
BPropertyInfo TempPropInfo (g_ScriptingPropertyList);
MessagePntr->AddString ("suites", "suite/x-vnd.agmsmith.spamdbm");
MessagePntr->AddFlat ("messages", &TempPropInfo);
return BApplication::GetSupportedSuites (MessagePntr);
}
The file name is only there for error messages, it assumes you have already
opened the PositionIO to the right file. If things go wrong, a non-zero error
code will be returned and an explanation written to ErrorMessage (assumed to be
at least PATH_MAX + 1024 bytes long). */
status_t ABSApp::GetWordsFromPositionIO (
BPositionIO *PositionIOPntr,
const char *OptionalFileName,
set<string> &WordSet,
char *ErrorMessage)
{
status_t ErrorCode;
if (m_TokenizeMode == TM_WHOLE)
ErrorCode = TokenizeWhole (PositionIOPntr, OptionalFileName,
WordSet, ErrorMessage);
else
ErrorCode = TokenizeParts (PositionIOPntr, OptionalFileName,
WordSet, ErrorMessage);
if (ErrorCode == B_OK && WordSet.empty ())
{
no words, a good indicator of spam which is pure HTML. */
sprintf (ErrorMessage, "No words were found in \"%s\"", OptionalFileName);
ErrorCode = ENOMSG;
}
return ErrorCode;
}
MAIL:ratio_spam (float) on all mounted disk volumes that support queries. Also
tell the system to make those attributes visible to the user (so they can see
them in Tracker) and associate them with e-mail messages. Also set up the
database file MIME type (provide a description and associate it with this
program so that it picks up the right icon). And register the names for our
sound effects. */
status_t ABSApp::InstallThings (char *ErrorMessage)
{
int32 Cookie;
dev_t DeviceID;
status_t ErrorCode = B_OK;
fs_info FSInfo;
int32 i;
int32 iClassification;
int32 iProbability;
int32 j;
index_info IndexInfo;
BMimeType MimeType;
BMessage Parameters;
const char *StringPntr;
bool TempBool;
int32 TempInt32;
one. Don't bother if the index exists or the device doesn't support indices
(actually queries). */
Cookie = 0;
while ((DeviceID = next_dev (&Cookie)) >= 0)
{
if (!fs_stat_dev (DeviceID, &FSInfo) && (FSInfo.flags & B_FS_HAS_QUERY))
{
if (fs_stat_index (DeviceID, g_AttributeNameClassification, &IndexInfo)
&& errno == B_ENTRY_NOT_FOUND)
{
if (fs_create_index (DeviceID, g_AttributeNameClassification,
B_STRING_TYPE, 0 ))
{
ErrorCode = errno;
sprintf (ErrorMessage, "Unable to make string index %s on "
"volume #%d, volume name \"%s\", file system type \"%s\", "
"on device \"%s\"", g_AttributeNameClassification,
(int) DeviceID, FSInfo.volume_name, FSInfo.fsh_name,
FSInfo.device_name);
}
}
if (fs_stat_index (DeviceID, g_AttributeNameSpamRatio,
&IndexInfo) && errno == B_ENTRY_NOT_FOUND)
{
if (fs_create_index (DeviceID, g_AttributeNameSpamRatio,
B_FLOAT_TYPE, 0 ))
{
ErrorCode = errno;
sprintf (ErrorMessage, "Unable to make float index %s on "
"volume #%d, volume name \"%s\", file system type \"%s\", "
"on device \"%s\"", g_AttributeNameSpamRatio,
(int) DeviceID, FSInfo.volume_name, FSInfo.fsh_name,
FSInfo.device_name);
}
}
}
}
if (ErrorCode != B_OK)
return ErrorCode;
with e-mail and make them visible to the user (but not editable). First need
to get the existing MIME settings, then add ours to them (otherwise the
existing ones get wiped out). */
ErrorCode = MimeType.SetTo ("text/x-email");
if (ErrorCode != B_OK || !MimeType.IsInstalled ())
{
sprintf (ErrorMessage, "No e-mail MIME type (%s) in the system, can't "
"update it to add our special attributes, and without e-mail this "
"program is useless!", MimeType.Type ());
if (ErrorCode == B_OK)
ErrorCode = -1;
return ErrorCode;
}
ErrorCode = MimeType.GetAttrInfo (&Parameters);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "Unable to retrieve list of attributes "
"associated with e-mail messages in the MIME database");
return ErrorCode;
}
for (i = 0, iClassification = -1, iProbability = -1;
i < 1000 && (iClassification < 0 || iProbability < 0);
i++)
{
ErrorCode = Parameters.FindString ("attr:name", i, &StringPntr);
if (ErrorCode != B_OK)
break;
if (strcmp (StringPntr, g_AttributeNameClassification) == 0)
iClassification = i;
else if (strcmp (StringPntr, g_AttributeNameSpamRatio) == 0)
iProbability = i;
}
update the MIME database with all the attributes that exist (so our new
additions don't show up at the wrong index). */
i--;
for (j = 0; j <= i; j++)
{
if (Parameters.FindString ("attr:public_name", j, &StringPntr) ==
B_BAD_INDEX)
{
if (Parameters.FindString ("attr:name", j, &StringPntr) != B_OK)
StringPntr = "None!";
Parameters.AddString ("attr:public_name", StringPntr);
}
}
while (Parameters.FindInt32 ("attr:type", i, &TempInt32) == B_BAD_INDEX)
Parameters.AddInt32 ("attr:type", B_STRING_TYPE);
while (Parameters.FindBool ("attr:viewable", i, &TempBool) == B_BAD_INDEX)
Parameters.AddBool ("attr:viewable", true);
while (Parameters.FindBool ("attr:editable", i, &TempBool) == B_BAD_INDEX)
Parameters.AddBool ("attr:editable", false);
while (Parameters.FindInt32 ("attr:width", i, &TempInt32) == B_BAD_INDEX)
Parameters.AddInt32 ("attr:width", 60);
while (Parameters.FindInt32 ("attr:alignment", i, &TempInt32) == B_BAD_INDEX)
Parameters.AddInt32 ("attr:alignment", B_ALIGN_LEFT);
while (Parameters.FindBool ("attr:extra", i, &TempBool) == B_BAD_INDEX)
Parameters.AddBool ("attr:extra", false);
if (iClassification < 0)
{
Parameters.AddString ("attr:name", g_AttributeNameClassification);
Parameters.AddString ("attr:public_name", "Classification Group");
Parameters.AddInt32 ("attr:type", B_STRING_TYPE);
Parameters.AddBool ("attr:viewable", true);
Parameters.AddBool ("attr:editable", false);
Parameters.AddInt32 ("attr:width", 45);
Parameters.AddInt32 ("attr:alignment", B_ALIGN_LEFT);
Parameters.AddBool ("attr:extra", false);
}
if (iProbability < 0)
{
Parameters.AddString ("attr:name", g_AttributeNameSpamRatio);
Parameters.AddString ("attr:public_name", "Spam/Genuine Estimate");
Parameters.AddInt32 ("attr:type", B_FLOAT_TYPE);
Parameters.AddBool ("attr:viewable", true);
Parameters.AddBool ("attr:editable", false);
Parameters.AddInt32 ("attr:width", 50);
Parameters.AddInt32 ("attr:alignment", B_ALIGN_LEFT);
Parameters.AddBool ("attr:extra", false);
}
if (iClassification < 0 || iProbability < 0)
{
ErrorCode = MimeType.SetAttrInfo (&Parameters);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "Unable to associate the classification "
"attributes with e-mail messages in the MIME database");
return ErrorCode;
}
}
sprintf (ErrorMessage, "Problems with setting up MIME type (%s) for "
"the database files", g_ABSDatabaseFileMIMEType);
ErrorCode = MimeType.SetTo (g_ABSDatabaseFileMIMEType);
if (ErrorCode != B_OK)
return ErrorCode;
MimeType.Delete ();
ErrorCode = MimeType.Install ();
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "Failed to install MIME type (%s) in the system",
MimeType.Type ());
return ErrorCode;
}
MimeType.SetShortDescription ("Spam Database");
MimeType.SetLongDescription ("Bayesian Statistical Database for "
"Classifying Junk E-Mail");
sprintf (ErrorMessage, "1.0 ('%s')", g_DatabaseRecognitionString);
MimeType.SetSnifferRule (ErrorMessage);
MimeType.SetPreferredApp (g_ABSAppSignature);
sound files with the names by using the Sounds preferences panel or the
installsound command. The MDR add-on filter will trigger these sounds. */
add_system_beep_event (g_BeepGenuine);
add_system_beep_event (g_BeepSpam);
add_system_beep_event (g_BeepUncertain);
return B_OK;
}
status_t ABSApp::LoadDatabaseIfNeeded (char *ErrorMessage)
{
if (m_WordMap.empty ())
return LoadSaveDatabase (true , ErrorMessage);
return B_OK;
}
specified in the settings, or write (DoLoad is FALSE) the database to it. If
it doesn't exist (and its parent directories do exist) then it will be created
when saving. If it doesn't exist when loading, the in-memory database will be
set to an empty one and an error will be returned with an explanation put into
ErrorMessage (should be big enough for a path name and a couple of lines of
text).
The database file format is a UTF-8 text file (well, there could be some
latin-1 characters and other junk in there - it just copies the bytes from the
e-mail messages directly), with tab characters to separate fields (so that you
can also load it into a spreadsheet). The first line identifies the overall
file type. The second lists pairs of classifications plus the number of
messages in each class. Currently it is just Genuine and Spam, but for future
compatability, that could be followed by more classification pairs. The
remaining lines each contain a word, the date it was last updated (actually
it's the number of messages in the database when the word was added, smaller
numbers mean it was updated longer ago), the genuine count and the spam count.
*/
status_t ABSApp::LoadSaveDatabase (bool DoLoad, char *ErrorMessage)
{
time_t CurrentTime;
FILE *DatabaseFile = NULL;
BNode DatabaseNode;
BNodeInfo DatabaseNodeInfo;
StatisticsMap::iterator DataIter;
StatisticsMap::iterator EndIter;
status_t ErrorCode;
int i;
pair<StatisticsMap::iterator,bool> InsertResult;
char LineString [10240];
StatisticsRecord Statistics;
const char *StringPntr;
char *TabPntr;
const char *WordPntr;
if (DoLoad)
{
MakeDatabaseEmpty ();
m_DatabaseHasChanged = false;
}
else
{
ErrorCode = MakeBackup (ErrorMessage);
if (ErrorCode != B_OK)
return ErrorCode;
}
DatabaseFile = fopen (m_DatabaseFileName.String (), DoLoad ? "rb" : "wb");
if (DatabaseFile == NULL)
{
ErrorCode = errno;
sprintf (ErrorMessage, "Can't open database file \"%s\" for %s",
m_DatabaseFileName.String (), DoLoad ? "reading" : "writing");
goto ErrorExit;
}
if (DoLoad)
{
sprintf (ErrorMessage, "Can't read first line of database file \"%s\", "
"expected it to start with \"%s\"",
m_DatabaseFileName.String (), g_DatabaseRecognitionString);
ErrorCode = -1;
if (fgets (LineString, sizeof (LineString), DatabaseFile) == NULL)
goto ErrorExit;
if (strncmp (LineString, g_DatabaseRecognitionString,
strlen (g_DatabaseRecognitionString)) != 0)
goto ErrorExit;
}
else
{
CurrentTime = time (NULL);
if (fprintf (DatabaseFile, "%s V1 (word, age, genuine count, spam count)\t"
"Written by SpamDBM $Revision: 30630 $\t"
"Compiled on " __DATE__ " at " __TIME__ "\tThis file saved on %s",
g_DatabaseRecognitionString, ctime (&CurrentTime)) <= 0)
{
ErrorCode = errno;
sprintf (ErrorMessage, "Problems when writing to database file \"%s\"",
m_DatabaseFileName.String ());
goto ErrorExit;
}
}
that the first two are Genuine and Spam. If there are others, they'll be
ignored and lost when the database is saved. */
if (DoLoad)
{
sprintf (ErrorMessage, "Can't read second line of database file \"%s\", "
"expected it to list classifications %s and %s along with their totals",
m_DatabaseFileName.String (), g_ClassifiedGenuine, g_ClassifiedSpam);
ErrorCode = B_BAD_VALUE;
if (fgets (LineString, sizeof (LineString), DatabaseFile) == NULL)
goto ErrorExit;
i = strlen (LineString);
if (i > 0 && LineString[i-1] == '\n')
LineString[i-1] = 0;
TabPntr = LineString;
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
if (strncmp (StringPntr, "Classifications", 15) != 0)
goto ErrorExit;
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
if (strcmp (StringPntr, g_ClassifiedGenuine) != 0)
goto ErrorExit;
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
m_TotalGenuineMessages = atoll (StringPntr);
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
if (strcmp (StringPntr, g_ClassifiedSpam) != 0)
goto ErrorExit;
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
m_TotalSpamMessages = atoll (StringPntr);
}
else
{
fprintf (DatabaseFile,
"Classifications and total messages:\t%s\t%" B_PRIu32
"\t%s\t%" B_PRIu32 "\n",
g_ClassifiedGenuine, m_TotalGenuineMessages,
g_ClassifiedSpam, m_TotalSpamMessages);
}
has a word, a tab, the time when the word was last changed in the database
(sequence number of message addition, starts at 0 and goes up by one for each
message added to the database), a tab then the number of messages in the
first class (genuine) that had that word, then a tab, then the number of
messages in the second class (spam) with that word, and so on. */
if (DoLoad)
{
while (!feof (DatabaseFile))
{
if (fgets (LineString, sizeof (LineString), DatabaseFile) == NULL)
{
ErrorCode = errno;
if (feof (DatabaseFile))
break;
if (ErrorCode == B_OK)
ErrorCode = -1;
sprintf (ErrorMessage, "Error while reading words and statistics "
"from database file \"%s\"", m_DatabaseFileName.String ());
goto ErrorExit;
}
i = strlen (LineString);
if (i > 0 && LineString[i-1] == '\n')
LineString[i-1] = 0;
TabPntr = LineString;
for (WordPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
Statistics.age = atoll (StringPntr);
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
Statistics.genuineCount = atoll (StringPntr);
for (StringPntr = TabPntr; *TabPntr != 0 && *TabPntr != '\t'; TabPntr++)
;
if (*TabPntr == '\t') *TabPntr++ = 0;
Statistics.spamCount = atoll (StringPntr);
(avoids lots of length checking everywhere). */
if (WordPntr[0] == 0 || strlen (WordPntr) > g_MaxWordLength ||
(Statistics.genuineCount <= 0 && Statistics.spamCount <= 0))
continue;
InsertResult = m_WordMap.insert (
StatisticsMap::value_type (WordPntr, Statistics));
if (InsertResult.second == false)
{
ErrorCode = B_BAD_VALUE;
sprintf (ErrorMessage, "Error while inserting word \"%s\" from "
"database \"%s\", perhaps it is a duplicate",
WordPntr, m_DatabaseFileName.String ());
goto ErrorExit;
}
m_WordCount++;
if (Statistics.age < m_OldestAge)
m_OldestAge = Statistics.age;
}
}
else
{
EndIter = m_WordMap.end ();
for (DataIter = m_WordMap.begin (); DataIter != EndIter; DataIter++)
{
if (fprintf (DatabaseFile,
"%s\t%" B_PRIu32 "\t%" B_PRIu32 "\t%" B_PRIu32 "\n",
DataIter->first.c_str (), DataIter->second.age,
DataIter->second.genuineCount, DataIter->second.spamCount) <= 0)
{
ErrorCode = errno;
sprintf (ErrorMessage, "Error while writing word \"%s\" to "
"database \"%s\"",
DataIter->first.c_str(), m_DatabaseFileName.String ());
goto ErrorExit;
}
}
}
and picks up the right icon. */
if (!DoLoad)
{
sprintf (ErrorMessage, "Unable to set attributes (file type) of database "
"file \"%s\"", m_DatabaseFileName.String ());
ErrorCode = DatabaseNode.SetTo (m_DatabaseFileName.String ());
if (ErrorCode != B_OK)
goto ErrorExit;
DatabaseNodeInfo.SetTo (&DatabaseNode);
ErrorCode = DatabaseNodeInfo.SetType (g_ABSDatabaseFileMIMEType);
if (ErrorCode != B_OK)
goto ErrorExit;
}
m_DatabaseHasChanged = false;
ErrorCode = B_OK;
ErrorExit:
if (DatabaseFile != NULL)
fclose (DatabaseFile);
return ErrorCode;
}
write them (DoLoad is FALSE) to it. The configuration file is a flattened
BMessage containing the various program settings. If it doesn't exist (and its
parent directories don't exist) then it will be created when saving. If it
doesn't exist when loading, the settings will be set to default values. */
status_t ABSApp::LoadSaveSettings (bool DoLoad)
{
status_t ErrorCode;
const char *NamePntr;
BMessage Settings;
BDirectory SettingsDirectory;
BFile SettingsFile;
const char *StringPntr;
bool TempBool;
int32 TempInt32;
char TempString [PATH_MAX + 100];
older version of the settings file which doesn't have every field defined. */
if (DoLoad)
DefaultSettings ();
ErrorCode = SettingsDirectory.SetTo (m_SettingsDirectoryPath.Path ());
if (ErrorCode != B_OK)
{
if (DoLoad || ErrorCode != B_ENTRY_NOT_FOUND)
{
sprintf (TempString, "Can't find settings directory \"%s\"",
m_SettingsDirectoryPath.Path ());
goto ErrorExit;
}
ErrorCode = create_directory (m_SettingsDirectoryPath.Path (), 0755);
if (ErrorCode == B_OK)
ErrorCode = SettingsDirectory.SetTo (m_SettingsDirectoryPath.Path ());
if (ErrorCode != B_OK)
{
sprintf (TempString, "Can't create settings directory \"%s\"",
m_SettingsDirectoryPath.Path ());
goto ErrorExit;
}
}
ErrorCode = SettingsFile.SetTo (&SettingsDirectory, g_SettingsFileName,
DoLoad ? B_READ_ONLY : B_READ_WRITE | B_CREATE_FILE | B_ERASE_FILE);
if (ErrorCode != B_OK)
{
sprintf (TempString, "Can't open settings file \"%s\" in directory \"%s\" "
"for %s", g_SettingsFileName, m_SettingsDirectoryPath.Path(),
DoLoad ? "reading" : "writing");
goto ErrorExit;
}
if (DoLoad)
{
ErrorCode = Settings.Unflatten (&SettingsFile);
if (ErrorCode != 0 || Settings.what != g_SettingsWhatCode)
{
sprintf (TempString, "Corrupt data detected while reading settings "
"file \"%s\" in directory \"%s\", will revert to defaults",
g_SettingsFileName, m_SettingsDirectoryPath.Path());
goto ErrorExit;
}
}
variables. For loading, if the setting isn't present, leave it at the
default value. Note that loading and saving are intermingled here to make
code maintenance easier (less chance of forgetting to update it if load and
save were separate functions). */
ErrorCode = B_OK;
NamePntr = "DatabaseFileName";
if (DoLoad)
{
if (Settings.FindString (NamePntr, &StringPntr) == B_OK)
m_DatabaseFileName.SetTo (StringPntr);
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddString (NamePntr, m_DatabaseFileName);
NamePntr = "ServerMode";
if (DoLoad)
{
if (Settings.FindBool (NamePntr, &TempBool) == B_OK)
g_ServerMode = TempBool;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddBool (NamePntr, g_ServerMode);
NamePntr = "IgnorePreviousClassification";
if (DoLoad)
{
if (Settings.FindBool (NamePntr, &TempBool) == B_OK)
m_IgnorePreviousClassification = TempBool;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddBool (NamePntr, m_IgnorePreviousClassification);
NamePntr = "PurgeAge";
if (DoLoad)
{
if (Settings.FindInt32 (NamePntr, &TempInt32) == B_OK)
m_PurgeAge = TempInt32;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddInt32 (NamePntr, m_PurgeAge);
NamePntr = "PurgePopularity";
if (DoLoad)
{
if (Settings.FindInt32 (NamePntr, &TempInt32) == B_OK)
m_PurgePopularity = TempInt32;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddInt32 (NamePntr, m_PurgePopularity);
NamePntr = "ScoringMode";
if (DoLoad)
{
if (Settings.FindInt32 (NamePntr, &TempInt32) == B_OK)
m_ScoringMode = (ScoringModes) TempInt32;
if (m_ScoringMode < 0 || m_ScoringMode >= SM_MAX)
m_ScoringMode = (ScoringModes) 0;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddInt32 (NamePntr, m_ScoringMode);
NamePntr = "TokenizeMode";
if (DoLoad)
{
if (Settings.FindInt32 (NamePntr, &TempInt32) == B_OK)
m_TokenizeMode = (TokenizeModes) TempInt32;
if (m_TokenizeMode < 0 || m_TokenizeMode >= TM_MAX)
m_TokenizeMode = (TokenizeModes) 0;
}
else if (ErrorCode == B_OK)
ErrorCode = Settings.AddInt32 (NamePntr, m_TokenizeMode);
if (ErrorCode != B_OK)
{
strcpy (TempString, "Unable to stuff the program settings into a "
"temporary BMessage, settings not saved");
goto ErrorExit;
}
if (!DoLoad)
{
Settings.what = g_SettingsWhatCode;
ErrorCode = Settings.Flatten (&SettingsFile);
if (ErrorCode != 0)
{
sprintf (TempString, "Problems while writing settings file \"%s\" in "
"directory \"%s\"", g_SettingsFileName,
m_SettingsDirectoryPath.Path ());
goto ErrorExit;
}
}
m_SettingsHaveChanged = false;
return B_OK;
ErrorExit:
DisplayErrorMessage (TempString, ErrorCode, DoLoad ?
"Loading Settings Error" : "Saving Settings Error");
return ErrorCode;
}
void
ABSApp::MessageReceived (BMessage *MessagePntr)
{
const char *PropertyName;
struct property_info *PropInfoPntr;
int32 SpecifierIndex;
int32 SpecifierKind;
BMessage SpecifierMessage;
the other operations this program supports. Pass on other scripting messages
to the inherited parent MessageReceived function (they're usually scripting
messages for the BApplication). */
switch (MessagePntr->what)
{
case B_GET_PROPERTY:
case B_SET_PROPERTY:
case B_COUNT_PROPERTIES:
case B_CREATE_PROPERTY:
case B_DELETE_PROPERTY:
case B_EXECUTE_PROPERTY:
if (MessagePntr->GetCurrentSpecifier (&SpecifierIndex, &SpecifierMessage,
&SpecifierKind, &PropertyName) == B_OK &&
SpecifierKind == B_DIRECT_SPECIFIER)
{
for (PropInfoPntr = g_ScriptingPropertyList + 0; true; PropInfoPntr++)
{
if (PropInfoPntr->name == 0)
break;
if (PropInfoPntr->commands[0] == MessagePntr->what &&
strcasecmp (PropInfoPntr->name, PropertyName) == 0)
{
ProcessScriptingMessage (MessagePntr, PropInfoPntr);
return;
}
}
}
break;
}
what to do. This includes replies to messages we sent ourselves. */
BApplication::MessageReceived (MessagePntr);
}
replacing an older backup. If something goes wrong, returns an error code and
puts an explanation in ErrorMessage. */
status_t ABSApp::MakeBackup (char *ErrorMessage)
{
BEntry Entry;
status_t ErrorCode;
int i;
char LeafName [NAME_MAX];
char NewName [PATH_MAX+20];
char OldName [PATH_MAX+20];
ErrorCode = Entry.SetTo (m_DatabaseFileName.String ());
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "While making backup, failed to make a BEntry for "
"\"%s\" (maybe the directory doesn't exist?)",
m_DatabaseFileName.String ());
return ErrorCode;
}
if (!Entry.Exists ())
return B_OK;
Entry.GetName (LeafName);
for (i = 0; i < g_MaxBackups - 1; i++)
{
strcpy (OldName, m_DatabaseFileName.String ());
sprintf (OldName + strlen (OldName), g_BackupSuffix, i);
Entry.SetTo (OldName);
if (!Entry.Exists ())
break;
}
for (i--; i >= 0; i--)
{
strcpy (OldName, m_DatabaseFileName.String ());
sprintf (OldName + strlen (OldName), g_BackupSuffix, i);
Entry.SetTo (OldName);
strcpy (NewName, LeafName);
sprintf (NewName + strlen (NewName), g_BackupSuffix, i + 1);
ErrorCode = Entry.Rename (NewName, true );
}
Entry.SetTo (m_DatabaseFileName.String ());
strcpy (NewName, LeafName);
sprintf (NewName + strlen (NewName), g_BackupSuffix, 0);
ErrorCode = Entry.Rename (NewName, true );
if (ErrorCode != B_OK)
sprintf (ErrorMessage, "While making backup, failed to rename "
"\"%s\" to \"%s\"", m_DatabaseFileName.String (), NewName);
return ErrorCode;
}
void
ABSApp::MakeDatabaseEmpty ()
{
m_WordMap.clear ();
m_WordCount = 0;
m_TotalGenuineMessages = 0;
m_TotalSpamMessages = 0;
m_OldestAge = (uint32) -1 ;
}
several fields: "error" containing the numerical error code (0 for success),
"CommandText" with a text representation of the command, "result" with the
resulting data for a get or count command. If it isn't understood, then rather
than a B_REPLY kind of message, it will be a B_MESSAGE_NOT_UNDERSTOOD message
with an "error" number and an "message" string with a description. */
void
ABSApp::ProcessScriptingMessage (
BMessage *MessagePntr,
struct property_info *PropInfoPntr)
{
bool ArgumentBool = false;
bool ArgumentGotBool = false;
bool ArgumentGotInt32 = false;
bool ArgumentGotString = false;
int32 ArgumentInt32 = 0;
const char *ArgumentString = NULL;
BString CommandText;
status_t ErrorCode;
int i;
BMessage ReplyMessage (B_MESSAGE_NOT_UNDERSTOOD);
ssize_t StringBufferSize;
BMessage TempBMessage;
BPath TempPath;
char TempString [PATH_MAX + 1024];
if (g_QuitCountdown >= 0 && !g_CommandLineMode)
{
g_QuitCountdown = -1;
cerr << "Quit countdown aborted due to a scripting command arriving.\n";
}
if (g_BusyCursor != NULL)
SetCursor (g_BusyCursor);
ErrorCode = MessagePntr->FindData (g_DataName, B_STRING_TYPE,
(const void **) &ArgumentString, &StringBufferSize);
if (ErrorCode == B_OK)
{
if (PropInfoPntr->extra_data != PN_EVALUATE_STRING &&
PropInfoPntr->extra_data != PN_SPAM_STRING &&
PropInfoPntr->extra_data != PN_GENUINE_STRING &&
strlen (ArgumentString) >= PATH_MAX)
{
sprintf (TempString, "\"data\" string of a scripting message is too "
"long, for SET %s action", PropInfoPntr->name);
ErrorCode = B_NAME_TOO_LONG;
goto ErrorExit;
}
ArgumentGotString = true;
}
else if (MessagePntr->FindBool (g_DataName, &ArgumentBool) == B_OK)
ArgumentGotBool = true;
else if (MessagePntr->FindInt32 (g_DataName, &ArgumentInt32) == B_OK)
ArgumentGotInt32 = true;
switch (PropInfoPntr->commands[0])
{
case B_SET_PROPERTY:
CommandText.SetTo ("Set ");
break;
case B_GET_PROPERTY:
CommandText.SetTo ("Get ");
break;
case B_COUNT_PROPERTIES:
CommandText.SetTo ("Count ");
break;
case B_CREATE_PROPERTY:
CommandText.SetTo ("Create ");
break;
case B_DELETE_PROPERTY:
CommandText.SetTo ("Delete ");
break;
case B_EXECUTE_PROPERTY:
CommandText.SetTo ("Execute ");
break;
default:
sprintf (TempString, "Bug: scripting command for \"%s\" has an unknown "
"action code %d", PropInfoPntr->name,
(int) PropInfoPntr->commands[0]);
ErrorCode = -1;
goto ErrorExit;
}
CommandText.Append (PropInfoPntr->name);
if (ArgumentGotString)
{
CommandText.Append (" \"");
CommandText.Append (ArgumentString);
CommandText.Append ("\"");
}
if (ArgumentGotBool)
CommandText.Append (ArgumentBool ? " true" : " false");
if (ArgumentGotInt32)
{
sprintf (TempString, " %" B_PRId32, ArgumentInt32);
CommandText.Append (TempString);
}
correct format, so it always returns a B_REPLY message. A readable version
of the command is also added to make debugging easier. */
ReplyMessage.what = B_REPLY;
ReplyMessage.AddString ("CommandText", CommandText);
sprintf (TempString, "Operation code %d (get, set, count, etc) "
"unsupported for property %s",
(int) PropInfoPntr->commands[0], PropInfoPntr->name);
ErrorCode = B_BAD_INDEX;
switch (PropInfoPntr->extra_data)
{
case PN_DATABASE_FILE:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddString (g_ResultName, m_DatabaseFileName);
break;
case B_SET_PROPERTY:
if (!ArgumentGotString)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a string for the "
"SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
ErrorCode = TempPath.SetTo (ArgumentString, NULL ,
true );
if (ErrorCode != B_OK)
{
sprintf (TempString, "New database path name of \"%s\" is invalid "
"(parent directories must exist)", ArgumentString);
goto ErrorExit;
}
if ((ErrorCode = SaveDatabaseIfNeeded (TempString)) != B_OK)
goto ErrorExit;
MakeDatabaseEmpty ();
if (strlen (TempPath.Leaf ()) > NAME_MAX-strlen(g_BackupSuffix)-1)
{
extension. Approximately. */
strcpy (TempString, TempPath.Leaf ());
TempString [NAME_MAX - strlen (g_BackupSuffix) - 1] = 0;
TempPath.GetParent (&TempPath);
TempPath.Append (TempString);
}
m_DatabaseFileName.SetTo (TempPath.Path ());
m_SettingsHaveChanged = true;
break;
case B_CREATE_PROPERTY:
if ((ErrorCode = CreateDatabaseFile (TempString)) != B_OK)
goto ErrorExit;
break;
case B_DELETE_PROPERTY:
if ((ErrorCode = DeleteDatabaseFile (TempString)) != B_OK)
goto ErrorExit;
break;
case B_COUNT_PROPERTIES:
if ((ErrorCode = LoadDatabaseIfNeeded (TempString)) != B_OK)
goto ErrorExit;
ReplyMessage.AddInt32 (g_ResultName, m_WordCount);
break;
default:
goto ErrorExit;
}
break;
case PN_SPAM:
case PN_SPAM_STRING:
case PN_GENUINE:
case PN_GENUINE_STRING:
case PN_UNCERTAIN:
switch (PropInfoPntr->commands[0])
{
case B_COUNT_PROPERTIES:
if ((ErrorCode = LoadDatabaseIfNeeded (TempString)) != B_OK)
goto ErrorExit;
if (PropInfoPntr->extra_data == PN_SPAM ||
PropInfoPntr->extra_data == PN_SPAM_STRING)
ReplyMessage.AddInt32 (g_ResultName, m_TotalSpamMessages);
else
ReplyMessage.AddInt32 (g_ResultName, m_TotalGenuineMessages);
break;
case B_SET_PROPERTY:
if (!ArgumentGotString)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a string (%s) "
"for the SET %s command",
(PropInfoPntr->extra_data == PN_GENUINE_STRING ||
PropInfoPntr->extra_data == PN_SPAM_STRING)
? "text of the message to be added"
: "pathname of the file containing the text to be added",
PropInfoPntr->name);
goto ErrorExit;
}
if ((ErrorCode = LoadDatabaseIfNeeded (TempString)) != B_OK)
goto ErrorExit;
if (PropInfoPntr->extra_data == PN_GENUINE ||
PropInfoPntr->extra_data == PN_SPAM ||
PropInfoPntr->extra_data == PN_UNCERTAIN)
ErrorCode = AddFileToDatabase (
(PropInfoPntr->extra_data == PN_SPAM) ? CL_SPAM :
((PropInfoPntr->extra_data == PN_GENUINE) ? CL_GENUINE :
CL_UNCERTAIN),
ArgumentString, TempString );
else
ErrorCode = AddStringToDatabase (
(PropInfoPntr->extra_data == PN_SPAM_STRING) ?
CL_SPAM : CL_GENUINE,
ArgumentString, TempString );
if (ErrorCode != B_OK)
goto ErrorExit;
break;
default:
goto ErrorExit;
}
break;
case PN_IGNORE_PREVIOUS_CLASSIFICATION:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddBool (g_ResultName, m_IgnorePreviousClassification);
break;
case B_SET_PROPERTY:
if (!ArgumentGotBool)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a boolean (true/yes, "
"false/no) for the SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
m_IgnorePreviousClassification = ArgumentBool;
m_SettingsHaveChanged = true;
break;
default:
goto ErrorExit;
}
break;
case PN_SERVER_MODE:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddBool (g_ResultName, g_ServerMode);
break;
case B_SET_PROPERTY:
if (!ArgumentGotBool)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a boolean (true/yes, "
"false/no) for the SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
g_ServerMode = ArgumentBool;
m_SettingsHaveChanged = true;
break;
default:
goto ErrorExit;
}
break;
case PN_FLUSH:
if (PropInfoPntr->commands[0] == B_EXECUTE_PROPERTY &&
(ErrorCode = SaveDatabaseIfNeeded (TempString)) == B_OK)
break;
goto ErrorExit;
case PN_PURGE_AGE:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddInt32 (g_ResultName, m_PurgeAge);
break;
case B_SET_PROPERTY:
if (!ArgumentGotInt32)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a 32 bit integer "
"for the SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
m_PurgeAge = ArgumentInt32;
m_SettingsHaveChanged = true;
break;
default:
goto ErrorExit;
}
break;
case PN_PURGE_POPULARITY:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddInt32 (g_ResultName, m_PurgePopularity);
break;
case B_SET_PROPERTY:
if (!ArgumentGotInt32)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a 32 bit integer "
"for the SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
m_PurgePopularity = ArgumentInt32;
m_SettingsHaveChanged = true;
break;
default:
goto ErrorExit;
}
break;
case PN_PURGE:
if (PropInfoPntr->commands[0] == B_EXECUTE_PROPERTY &&
(ErrorCode = LoadDatabaseIfNeeded (TempString)) == B_OK &&
(ErrorCode = PurgeOldWords (TempString)) == B_OK)
break;
goto ErrorExit;
case PN_OLDEST:
if (PropInfoPntr->commands[0] == B_GET_PROPERTY &&
(ErrorCode = LoadDatabaseIfNeeded (TempString)) == B_OK)
{
ReplyMessage.AddInt32 (g_ResultName, m_OldestAge);
break;
}
goto ErrorExit;
case PN_EVALUATE:
case PN_EVALUATE_STRING:
if (PropInfoPntr->commands[0] == B_SET_PROPERTY)
{
if (!ArgumentGotString)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You need to specify a string for the "
"SET %s command", PropInfoPntr->name);
goto ErrorExit;
}
if ((ErrorCode = LoadDatabaseIfNeeded (TempString)) == B_OK)
{
if (PropInfoPntr->extra_data == PN_EVALUATE)
{
if ((ErrorCode = EvaluateFile (ArgumentString, &ReplyMessage,
TempString)) == B_OK)
break;
}
else
{
if ((ErrorCode = EvaluateString (ArgumentString, StringBufferSize,
&ReplyMessage, TempString)) == B_OK)
break;
}
}
}
goto ErrorExit;
case PN_RESET_TO_DEFAULTS:
if (PropInfoPntr->commands[0] == B_EXECUTE_PROPERTY)
{
DefaultSettings ();
break;
}
goto ErrorExit;
case PN_INSTALL_THINGS:
if (PropInfoPntr->commands[0] == B_EXECUTE_PROPERTY &&
(ErrorCode = InstallThings (TempString)) == B_OK)
break;
goto ErrorExit;
case PN_SCORING_MODE:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddString (g_ResultName,
g_ScoringModeNames[m_ScoringMode]);
break;
case B_SET_PROPERTY:
i = SM_MAX;
if (ArgumentGotString)
for (i = 0; i < SM_MAX; i++)
{
if (strcasecmp (ArgumentString, g_ScoringModeNames [i]) == 0)
{
m_ScoringMode = (ScoringModes) i;
m_SettingsHaveChanged = true;
break;
}
}
if (i >= SM_MAX)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You used the unrecognized \"%s\" as "
"a scoring mode for the SET %s command. Should be one of: ",
ArgumentGotString ? ArgumentString : "not specified",
PropInfoPntr->name);
for (i = 0; i < SM_MAX; i++)
{
strcat (TempString, g_ScoringModeNames [i]);
if (i < SM_MAX - 1)
strcat (TempString, ", ");
}
goto ErrorExit;
}
break;
default:
goto ErrorExit;
}
break;
case PN_TOKENIZE_MODE:
switch (PropInfoPntr->commands[0])
{
case B_GET_PROPERTY:
ReplyMessage.AddString (g_ResultName,
g_TokenizeModeNames[m_TokenizeMode]);
break;
case B_SET_PROPERTY:
i = TM_MAX;
if (ArgumentGotString)
for (i = 0; i < TM_MAX; i++)
{
if (strcasecmp (ArgumentString, g_TokenizeModeNames [i]) == 0)
{
m_TokenizeMode = (TokenizeModes) i;
m_SettingsHaveChanged = true;
break;
}
}
if (i >= TM_MAX)
{
ErrorCode = B_BAD_TYPE;
sprintf (TempString, "You used the unrecognized \"%s\" as "
"a tokenize mode for the SET %s command. Should be one of: ",
ArgumentGotString ? ArgumentString : "not specified",
PropInfoPntr->name);
for (i = 0; i < TM_MAX; i++)
{
strcat (TempString, g_TokenizeModeNames [i]);
if (i < TM_MAX - 1)
strcat (TempString, ", ");
}
goto ErrorExit;
}
break;
default:
goto ErrorExit;
}
break;
default:
sprintf (TempString, "Bug! Unrecognized property identification "
"number %d (should be between 0 and %d). Fix the entry in "
"the g_ScriptingPropertyList array!",
(int) PropInfoPntr->extra_data, PN_MAX - 1);
goto ErrorExit;
}
ReplyMessage.AddInt32 ("error", B_OK);
ErrorCode = MessagePntr->SendReply (&ReplyMessage,
this , 500000 );
if (ErrorCode != B_OK)
cerr << "ProcessScriptingMessage failed to send a reply message, code " <<
ErrorCode << " (" << strerror (ErrorCode) << ")" << " for " <<
CommandText.String () << endl;
SetCursor (B_CURSOR_SYSTEM_DEFAULT);
return;
ErrorExit:
ReplyMessage.AddInt32 ("error", ErrorCode);
ReplyMessage.AddString ("message", TempString);
DisplayErrorMessage (TempString, ErrorCode);
ErrorCode = MessagePntr->SendReply (&ReplyMessage,
this , 500000 );
if (ErrorCode != B_OK)
cerr << "ProcessScriptingMessage failed to send an error message, code " <<
ErrorCode << " (" << strerror (ErrorCode) << ")" << " for " <<
CommandText.String () << endl;
SetCursor (B_CURSOR_SYSTEM_DEFAULT);
}
received, we use a time delay to do the quit and make sure there are no pending
commands being processed by the auxiliary looper which is sending us commands.
Also, we have a countdown which can be interrupted by an incoming scripting
message in case one client tells us to quit while another one is still using us
(happens when you have two or more e-mail accounts). But if the system is
shutting down, quit immediately! */
void
ABSApp::Pulse ()
{
if (g_QuitCountdown == 0)
{
if (g_CommanderLooperPntr == NULL ||
!g_CommanderLooperPntr->IsBusy ())
PostMessage (B_QUIT_REQUESTED);
}
else if (g_QuitCountdown > 0)
{
cerr << "SpamDBM quitting in " << g_QuitCountdown << ".\n";
g_QuitCountdown--;
}
}
allow the request, otherwise reject it (and start the countdown if it hasn't
been started). */
bool
ABSApp::QuitRequested ()
{
BMessage *QuitMessage;
team_info RemoteInfo;
BMessenger RemoteMessenger;
team_id RemoteTeam;
the registrar server), if so, quit immediately. */
QuitMessage = CurrentMessage ();
if (QuitMessage != NULL && QuitMessage->IsSourceRemote ())
{
RemoteMessenger = QuitMessage->ReturnAddress ();
RemoteTeam = RemoteMessenger.Team ();
if (get_team_info (RemoteTeam, &RemoteInfo) == B_OK &&
strstr (RemoteInfo.args, "registrar") != NULL)
g_QuitCountdown = 0;
}
if (g_QuitCountdown == 0)
return BApplication::QuitRequested ();
if (g_QuitCountdown < 0)
g_QuitCountdown = 5;
return false;
}
equivalent to the number of messages added to the database) and too unpopular
(words not used by many messages). Hopefully this will get rid of words which
are just hunks of binary or other garbage. The database has been loaded
elsewhere. */
status_t
ABSApp::PurgeOldWords (char *ErrorMessage)
{
uint32 CurrentTime;
StatisticsMap::iterator CurrentIter;
StatisticsMap::iterator EndIter;
StatisticsMap::iterator NextIter;
char TempString [80];
strcpy (ErrorMessage, "Purge can't fail");
CurrentTime = m_TotalGenuineMessages + m_TotalSpamMessages - 1;
m_OldestAge = (uint32) -1 ;
EndIter = m_WordMap.end ();
NextIter = m_WordMap.begin ();
while (NextIter != EndIter) {
CurrentIter = NextIter++;
if (CurrentTime - CurrentIter->second.age >= m_PurgeAge &&
CurrentIter->second.genuineCount + CurrentIter->second.spamCount <=
m_PurgePopularity) {
m_WordMap.erase (CurrentIter);
if (m_WordCount > 0)
m_WordCount--;
m_DatabaseHasChanged = true;
}
else
{
if (CurrentIter->second.age < m_OldestAge)
m_OldestAge = CurrentIter->second.age;
}
}
if (m_WordCount != m_WordMap.size ()) {
sprintf (TempString, "Our word count of %" B_PRIu32 " doesn't match the "
"size of the database, %lu", m_WordCount, m_WordMap.size());
DisplayErrorMessage (TempString, -1, "Bug!");
m_WordCount = m_WordMap.size ();
}
return B_OK;
}
void
ABSApp::ReadyToRun ()
{
DatabaseWindow *DatabaseWindowPntr;
float JunkFloat;
BButton *TempButtonPntr;
BCheckBox *TempCheckBoxPntr;
font_height TempFontHeight;
BMenuBar *TempMenuBarPntr;
BMenuItem *TempMenuItemPntr;
BPopUpMenu *TempPopUpMenuPntr;
BRadioButton *TempRadioButtonPntr;
BRect TempRect;
const char *TempString = "Testing My Things";
BStringView *TempStringViewPntr;
BTextControl *TempTextPntr;
BWindow *TempWindowPntr;
out controls and other GUI elements. Set the spacing between buttons and
other controls to the width of the letter "M" in the user's desired font. */
g_MarginBetweenControls = (int) be_plain_font->StringWidth ("M");
be_plain_font->GetHeight (&TempFontHeight);
g_LineOfTextHeight = ceilf (
TempFontHeight.ascent + TempFontHeight.descent + TempFontHeight.leading);
vary based on the current font size. Make a temporary gadget, which is
attached to our window, then resize it to its prefered size so that it
accomodates the font size and other frills it needs. */
TempWindowPntr = new (std::nothrow) BWindow (BRect (10, 20, 200, 200),
"Temporary Window", B_DOCUMENT_WINDOW,
B_NO_WORKSPACE_ACTIVATION | B_ASYNCHRONOUS_CONTROLS);
if (TempWindowPntr == NULL) {
DisplayErrorMessage ("Unable to create temporary window for finding "
"sizes of controls.");
g_QuitCountdown = 0;
return;
}
TempRect = TempWindowPntr->Bounds ();
TempStringViewPntr = new (std::nothrow) BStringView (TempRect, TempString, TempString);
if (TempStringViewPntr != NULL) {
TempWindowPntr->Lock();
TempWindowPntr->AddChild (TempStringViewPntr);
TempStringViewPntr->GetPreferredSize (&JunkFloat, &g_StringViewHeight);
TempWindowPntr->RemoveChild (TempStringViewPntr);
TempWindowPntr->Unlock();
delete TempStringViewPntr;
}
control and can make life difficult. Make a temporary button, which
is attached to our window so that it resizes to accomodate the font size. */
TempButtonPntr = new (std::nothrow) BButton (TempRect, TempString, TempString, NULL);
if (TempButtonPntr != NULL) {
TempWindowPntr->Lock();
TempWindowPntr->AddChild (TempButtonPntr);
TempButtonPntr->GetPreferredSize (&JunkFloat, &g_ButtonHeight);
TempWindowPntr->RemoveChild (TempButtonPntr);
TempWindowPntr->Unlock();
delete TempButtonPntr;
}
TempTextPntr = new (std::nothrow) BTextControl (TempRect, TempString, NULL ,
TempString, NULL);
if (TempTextPntr != NULL) {
TempWindowPntr->Lock ();
TempWindowPntr->AddChild (TempTextPntr);
TempTextPntr->GetPreferredSize (&JunkFloat, &g_TextBoxHeight);
TempWindowPntr->RemoveChild (TempTextPntr);
TempWindowPntr->Unlock ();
delete TempTextPntr;
}
TempCheckBoxPntr = new (std::nothrow) BCheckBox (TempRect, TempString, TempString, NULL);
if (TempCheckBoxPntr != NULL) {
TempWindowPntr->Lock ();
TempWindowPntr->AddChild (TempCheckBoxPntr);
TempCheckBoxPntr->GetPreferredSize (&JunkFloat, &g_CheckBoxHeight);
TempWindowPntr->RemoveChild (TempCheckBoxPntr);
TempWindowPntr->Unlock ();
delete TempCheckBoxPntr;
}
TempRadioButtonPntr =
new (std::nothrow) BRadioButton (TempRect, TempString, TempString, NULL);
if (TempRadioButtonPntr != NULL) {
TempWindowPntr->Lock ();
TempWindowPntr->AddChild (TempRadioButtonPntr);
TempRadioButtonPntr->GetPreferredSize (&JunkFloat, &g_RadioButtonHeight);
TempWindowPntr->RemoveChild (TempRadioButtonPntr);
TempWindowPntr->Unlock ();
delete TempRadioButtonPntr;
}
TempMenuBarPntr = new (std::nothrow) BMenuBar (TempRect, TempString,
B_FOLLOW_LEFT | B_FOLLOW_TOP, B_ITEMS_IN_COLUMN,
true );
TempPopUpMenuPntr = new (std::nothrow) BPopUpMenu (TempString);
TempMenuItemPntr = new (std::nothrow) BMenuItem (TempString, new BMessage (12345), 'g');
if (TempMenuBarPntr != NULL && TempPopUpMenuPntr != NULL &&
TempMenuItemPntr != NULL)
{
TempPopUpMenuPntr->AddItem (TempMenuItemPntr);
TempMenuBarPntr->AddItem (TempPopUpMenuPntr);
TempWindowPntr->Lock ();
TempWindowPntr->AddChild (TempMenuBarPntr);
TempMenuBarPntr->GetPreferredSize (&JunkFloat, &g_PopUpMenuHeight);
TempWindowPntr->RemoveChild (TempMenuBarPntr);
TempWindowPntr->Unlock ();
delete TempMenuBarPntr;
}
TempWindowPntr->Lock ();
TempWindowPntr->Quit ();
SetPulseRate (500000);
if (g_CommandLineMode)
g_QuitCountdown = 0;
else
{
DatabaseWindowPntr = new (std::nothrow) DatabaseWindow ();
if (DatabaseWindowPntr == NULL) {
DisplayErrorMessage ("Unable to create window.");
g_QuitCountdown = 0;
} else {
DatabaseWindowPntr->Show ();
}
}
g_AppReadyToRunCompleted = true;
}
it. If the tokenize mode specifies that it isn't one of the ones we are
looking for, just skip it. For container type components, recursively examine
their contents, up to the maximum depth specified. */
status_t
ABSApp::RecursivelyTokenizeMailComponent (
BMailComponent *ComponentPntr,
const char *OptionalFileName,
set<string> &WordSet,
char *ErrorMessage,
int RecursionLevel,
int MaxRecursionLevel)
{
char AttachmentName [B_FILE_NAME_LENGTH];
BMailAttachment *AttachmentPntr;
BMimeType ComponentMIMEType;
BMailContainer *ContainerPntr;
BMallocIO ContentsIO;
const char *ContentsBufferPntr;
size_t ContentsBufferSize;
status_t ErrorCode;
bool ExamineComponent;
const char *HeaderKeyPntr;
const char *HeaderValuePntr;
int i;
int j;
const char *NameExtension;
int NumComponents;
BMimeType TextAnyMIMEType ("text");
BMimeType TextPlainMIMEType ("text/plain");
if (ComponentPntr == NULL)
return B_OK;
name of attachments, the encoding type, etc. */
if (m_TokenizeMode == TM_PLAIN_TEXT_HEADER ||
m_TokenizeMode == TM_ANY_TEXT_HEADER ||
m_TokenizeMode == TM_ALL_PARTS_HEADER ||
m_TokenizeMode == TM_JUST_HEADER)
{
for (i = 0; i < 1000; i++)
{
HeaderKeyPntr = ComponentPntr->HeaderAt (i);
if (HeaderKeyPntr == NULL)
break;
AddWordsToSet (HeaderKeyPntr, strlen (HeaderKeyPntr),
'H' , WordSet);
for (j = 0; j < 1000; j++)
{
HeaderValuePntr = ComponentPntr->HeaderField (HeaderKeyPntr, j);
if (HeaderValuePntr == NULL)
break;
AddWordsToSet (HeaderValuePntr, strlen (HeaderValuePntr),
'H', WordSet);
}
}
}
worth examining for words. */
ErrorCode = ComponentPntr->MIMEType (&ComponentMIMEType);
if (ErrorCode != B_OK)
{
sprintf (ErrorMessage, "ABSApp::RecursivelyTokenizeMailComponent: "
"Unable to get MIME type at level %d in \"%s\"",
RecursionLevel, OptionalFileName);
return ErrorCode;
}
if (ComponentMIMEType.Type() == NULL)
{
the main body text, otherwise it would get ignored. */
if (NULL != dynamic_cast<BTextMailComponent *>(ComponentPntr))
ComponentMIMEType.SetType ("text/plain");
}
if (!TextAnyMIMEType.Contains (&ComponentMIMEType) &&
NULL != (AttachmentPntr = dynamic_cast<BMailAttachment *>(ComponentPntr)))
{
attachment (which is often base64 encoded). Use the file name extension to
see if it really is text. */
NameExtension = NULL;
if (AttachmentPntr->FileName (AttachmentName) >= 0)
NameExtension = strrchr (AttachmentName, '.');
if (NameExtension != NULL)
{
if (strcasecmp (NameExtension, ".txt") == 0)
ComponentMIMEType.SetType ("text/plain");
else if (strcasecmp (NameExtension, ".htm") == 0 ||
strcasecmp (NameExtension, ".html") == 0)
ComponentMIMEType.SetType ("text/html");
}
}
switch (m_TokenizeMode)
{
case TM_PLAIN_TEXT:
case TM_PLAIN_TEXT_HEADER:
ExamineComponent = TextPlainMIMEType.Contains (&ComponentMIMEType);
break;
case TM_ANY_TEXT:
case TM_ANY_TEXT_HEADER:
ExamineComponent = TextAnyMIMEType.Contains (&ComponentMIMEType);
break;
case TM_ALL_PARTS:
case TM_ALL_PARTS_HEADER:
ExamineComponent = true;
break;
default:
ExamineComponent = false;
break;
}
if (ExamineComponent)
{
from whatever encoding was used) for text attachments. For other ones,
it's just the raw data, or perhaps decoded from base64 encoding. */
ContentsIO.SetBlockSize (16 * 1024);
ErrorCode = ComponentPntr->GetDecodedData (&ContentsIO);
if (ErrorCode == B_OK)
{
ContentsBufferPntr = (const char *) ContentsIO.Buffer ();
ContentsBufferSize = ContentsIO.BufferLength ();
if (ContentsBufferPntr != NULL )
AddWordsToSet (ContentsBufferPntr, ContentsBufferSize,
0 , WordSet);
}
}
if (RecursionLevel + 1 <= MaxRecursionLevel &&
NULL != (ContainerPntr = dynamic_cast<BMailContainer *>(ComponentPntr)))
{
NumComponents = ContainerPntr->CountComponents ();
for (i = 0; i < NumComponents; i++)
{
ComponentPntr = ContainerPntr->GetComponent (i);
ErrorCode = RecursivelyTokenizeMailComponent (ComponentPntr,
OptionalFileName, WordSet, ErrorMessage, RecursionLevel + 1,
MaxRecursionLevel);
if (ErrorCode != B_OK)
break;
}
}
return ErrorCode;
}
via Tracker's open-with menu item. If it is a database type file, then change
the database file name to it. Otherwise, ask the user whether they want to
classify it as spam or non-spam. There will be at most around 100 files, BeOS
R5.0.3's Tracker crashes if it tries to pass on more than that many using Open
With... etc. The command is sent to an intermediary thread where it is
asynchronously converted into a scripting message(s) that are sent back to this
BApplication. The intermediary is needed since we can't recursively execute
scripting messages while processing a message (this RefsReceived one). */
void
ABSApp::RefsReceived (BMessage *MessagePntr)
{
if (g_CommanderLooperPntr != NULL)
g_CommanderLooperPntr->CommandReferences (MessagePntr);
}
targetted at our database. */
BHandler * ABSApp::ResolveSpecifier (
BMessage *MessagePntr,
int32 Index,
BMessage *SpecifierMsgPntr,
int32 SpecificationKind,
const char *PropertyPntr)
{
int i;
if (SpecificationKind == B_DIRECT_SPECIFIER)
{
for (i = PN_MAX - 1; i >= 0; i--)
{
if (strcasecmp (PropertyPntr, g_PropertyNames [i]) == 0)
return this;
}
}
return BApplication::ResolveSpecifier (
MessagePntr, Index, SpecifierMsgPntr, SpecificationKind, PropertyPntr);
}
status_t ABSApp::SaveDatabaseIfNeeded (char *ErrorMessage)
{
if (m_DatabaseHasChanged)
return LoadSaveDatabase (false , ErrorMessage);
return B_OK;
}
one). Break it into parts: header, body and MIME components. Then add the
words in the portions that match the current tokenization settings to the set
of words. */
status_t ABSApp::TokenizeParts (
BPositionIO *PositionIOPntr,
const char *OptionalFileName,
set<string> &WordSet,
char *ErrorMessage)
{
status_t ErrorCode = B_OK;
BEmailMessage WholeEMail;
sprintf (ErrorMessage, "ABSApp::TokenizeParts: While getting e-mail "
"headers, had problems with \"%s\"", OptionalFileName);
ErrorCode = WholeEMail.SetToRFC822 (
PositionIOPntr ,
-1 , true );
if (ErrorCode < 0) goto ErrorExit;
ErrorCode = RecursivelyTokenizeMailComponent (&WholeEMail,
OptionalFileName, WordSet, ErrorMessage, 0 ,
(m_TokenizeMode == TM_JUST_HEADER) ? 0 : 500 );
ErrorExit:
return ErrorCode;
}
The file doesn't have to be an e-mail message since it isn't parsed for e-mail
headers or MIME headers or anything. It blindly adds everything that looks
like a word, though it does convert quoted printable codes to the characters
they represent. See also AddWordsToSet which does something more advanced. */
status_t ABSApp::TokenizeWhole (
BPositionIO *PositionIOPntr,
const char *OptionalFileName,
set<string> &WordSet,
char *ErrorMessage)
{
string AccumulatedWord;
uint8 Buffer [16 * 1024];
uint8 *BufferCurrentPntr = Buffer + 0;
uint8 *BufferEndPntr = Buffer + 0;
const char *IOErrorString =
"TokenizeWhole: Error %ld while reading \"%s\"";
size_t Length;
int Letter = ' ';
char HexString [4];
int NextLetter = ' ';
int NextNextLetter = ' ';
BufferCurrentPntr is the position of the next character to be read. When it
reaches BufferEndPntr, it is time to fill the buffer again. */
#define ReadChar(CharVar) \
{ \
if (BufferCurrentPntr < BufferEndPntr) \
CharVar = *BufferCurrentPntr++; \
else /* Try to fill the buffer. */ \
{ \
ssize_t AmountRead; \
AmountRead = PositionIOPntr->Read (Buffer, sizeof (Buffer)); \
if (AmountRead < 0) \
{ \
sprintf (ErrorMessage, IOErrorString, AmountRead, OptionalFileName); \
return AmountRead; \
} \
else if (AmountRead == 0) \
CharVar = EOF; \
else \
{ \
BufferEndPntr = Buffer + AmountRead; \
BufferCurrentPntr = Buffer + 0; \
CharVar = *BufferCurrentPntr++; \
} \
} \
}
set is used since we don't care how many times a word occurs. */
while (true)
{
characters (an equals sign followed by two hex digits or a new line). Note
that Letter can become EOF (-1) when end of file is reached. */
Letter = NextLetter;
NextLetter = NextNextLetter;
ReadChar (NextNextLetter);
sees an ordinary character. Or even nothing, if it is the hidden line
break combination. This may falsely corrupt stuff following an equals
sign, but usually won't. */
if (Letter == '=')
{
if ((NextLetter == '\r' && NextNextLetter == '\n') ||
(NextLetter == '\n' && NextNextLetter == '\r'))
{
ReadChar (NextLetter);
ReadChar (NextNextLetter);
continue;
}
if (NextLetter == '\n' || NextLetter == '\r')
{
NextLetter = NextNextLetter;
ReadChar (NextNextLetter);
continue;
}
if (NextNextLetter != EOF &&
isxdigit (NextLetter) && isxdigit (NextNextLetter))
{
HexString[0] = NextLetter;
HexString[1] = NextNextLetter;
HexString[2] = 0;
Letter = strtoul (HexString, NULL, 16 );
ReadChar (NextLetter);
ReadChar (NextNextLetter);
}
}
bit of information, such as MONEY vs Money, an indicator of spam. Well,
apparently that isn't all that useful a distinction, so do it. */
if (Letter >= 'A' && Letter < 'Z')
Letter = Letter + ('a' - 'A');
and all punctuation except for: apostrophe (so "it's" and possessive
versions of words get stored), dash (for hyphenated words), dollar sign
(for cash amounts), period (for IP addresses, we later remove trailing
(periods). Note that codes above 127 are UTF-8 characters, which we
consider non-space. */
if (Letter < 0 || (Letter < 128 && g_SpaceCharacters[Letter]))
{
while ((Length = AccumulatedWord.size()) > 0 &&
AccumulatedWord [Length-1] == '.')
AccumulatedWord.resize (Length - 1);
words which are too big (it's probably some binary encoded data). But
leave room for supercalifragilisticexpialidoceous. According to one web
site, pneumonoultramicroscopicsilicovolcanoconiosis is the longest word
currently in English. Note that some uuencoded data was seen with a 60
character line length. */
if (Length > 0 && Length <= g_MaxWordLength)
WordSet.insert (AccumulatedWord);
AccumulatedWord.resize (0);
}
else
AccumulatedWord.append (1 , (char) Letter);
word got processed. */
if (Letter == EOF)
break;
}
return B_OK;
}
* Implementation of the ClassificationChoicesView class, constructor,
* destructor and the rest of the member functions in mostly alphabetical
* order.
*/
ClassificationChoicesWindow::ClassificationChoicesWindow (
BRect FrameRect,
const char *FileName,
int NumberOfFiles)
: BWindow (FrameRect, "Classification Choices", B_TITLED_WINDOW,
B_NOT_ZOOMABLE | B_NOT_RESIZABLE | B_ASYNCHRONOUS_CONTROLS),
m_BulkModeSelectedPntr (NULL),
m_ChoosenClassificationPntr (NULL)
{
ClassificationChoicesView *SubViewPntr;
SubViewPntr = new ClassificationChoicesView (Bounds(),
FileName, NumberOfFiles);
AddChild (SubViewPntr);
SubViewPntr->ResizeToPreferred ();
ResizeTo (SubViewPntr->Frame().Width(), SubViewPntr->Frame().Height());
}
void
ClassificationChoicesWindow::MessageReceived (BMessage *MessagePntr)
{
BControl *ControlPntr;
if (MessagePntr->what >= MSG_CLASS_BUTTONS &&
MessagePntr->what < MSG_CLASS_BUTTONS + CL_MAX)
{
if (m_ChoosenClassificationPntr != NULL)
*m_ChoosenClassificationPntr =
(ClassificationTypes) (MessagePntr->what - MSG_CLASS_BUTTONS);
PostMessage (B_QUIT_REQUESTED);
return;
}
if (MessagePntr->what == MSG_BULK_CHECKBOX)
{
if (m_BulkModeSelectedPntr != NULL &&
MessagePntr->FindPointer ("source", (void **) &ControlPntr) == B_OK)
*m_BulkModeSelectedPntr = (ControlPntr->Value() == B_CONTROL_ON);
return;
}
if (MessagePntr->what == MSG_CANCEL_BUTTON)
{
PostMessage (B_QUIT_REQUESTED);
return;
}
BWindow::MessageReceived (MessagePntr);
}
void
ClassificationChoicesWindow::Go (
bool *BulkModeSelectedPntr,
ClassificationTypes *ChoosenClassificationPntr)
{
status_t ErrorCode = 0;
BView *MainViewPntr;
thread_id WindowThreadID;
m_BulkModeSelectedPntr = BulkModeSelectedPntr;
m_ChoosenClassificationPntr = ChoosenClassificationPntr;
if (m_ChoosenClassificationPntr != NULL)
*m_ChoosenClassificationPntr = CL_MAX;
Show ();
(have to wait for it to be showing). */
Lock ();
MainViewPntr = FindView ("ClassificationChoicesView");
if (MainViewPntr != NULL)
{
BRect TempRect;
BScreen TempScreen (this);
float X;
float Y;
TempRect = TempScreen.Frame ();
X = TempRect.Width() / 2;
Y = TempRect.Height() / 2;
TempRect = MainViewPntr->Frame();
X -= TempRect.Width() / 2;
Y -= TempRect.Height() / 2;
MoveTo (ceilf (X), ceilf (Y));
}
Unlock ();
WindowThreadID = Thread ();
if (WindowThreadID >= 0)
wait_for_thread (WindowThreadID, &ErrorCode);
}
* Implementation of the ClassificationChoicesView class, constructor,
* destructor and the rest of the member functions in mostly alphabetical
* order.
*/
ClassificationChoicesView::ClassificationChoicesView (
BRect FrameRect,
const char *FileName,
int NumberOfFiles)
: BView (FrameRect, "ClassificationChoicesView",
B_FOLLOW_TOP | B_FOLLOW_LEFT, B_WILL_DRAW | B_NAVIGABLE_JUMP),
m_FileName (FileName),
m_NumberOfFiles (NumberOfFiles),
m_PreferredBottomY (ceilf (g_ButtonHeight * 10))
{
}
void
ClassificationChoicesView::AttachedToWindow ()
{
BButton *ButtonPntr;
BCheckBox *CheckBoxPntr;
ClassificationTypes Classification;
float Margin;
float RowHeight;
float RowTop;
BTextView *TextViewPntr;
BRect TempRect;
char TempString [2048];
BRect TextRect;
float X;
SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR));
RowHeight = g_ButtonHeight;
if (g_CheckBoxHeight > RowHeight)
RowHeight = g_CheckBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
TempRect = Bounds ();
RowTop = TempRect.top;
Margin = ceilf ((RowHeight - g_StringViewHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TextRect = TempRect;
TextRect.OffsetTo (0, 0);
TextRect.InsetBy (g_MarginBetweenControls, 2);
sprintf (TempString, "How do you want to classify the file named \"%s\"?",
m_FileName);
TextViewPntr = new BTextView (TempRect, "FileText", TextRect,
B_FOLLOW_TOP | B_FOLLOW_LEFT, B_WILL_DRAW | B_FULL_UPDATE_ON_RESIZE);
AddChild (TextViewPntr);
TextViewPntr->SetText (TempString);
TextViewPntr->MakeEditable (false);
TextViewPntr->SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR));
TextViewPntr->ResizeTo (TempRect.Width (),
3 + TextViewPntr->TextHeight (0, sizeof (TempString)));
RowTop = TextViewPntr->Frame().bottom + Margin;
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
X = Bounds().left + g_MarginBetweenControls;
for (Classification = (ClassificationTypes) 0; Classification < CL_MAX;
Classification = (ClassificationTypes) ((int) Classification + 1))
{
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.left = X;
sprintf (TempString, "%s Button",
g_ClassificationTypeNames [Classification]);
ButtonPntr = new BButton (TempRect, TempString,
g_ClassificationTypeNames [Classification], new BMessage (
ClassificationChoicesWindow::MSG_CLASS_BUTTONS + Classification));
AddChild (ButtonPntr);
ButtonPntr->ResizeToPreferred ();
X = ButtonPntr->Frame().right + 3 * g_MarginBetweenControls;
}
RowTop += ceilf (RowHeight * 1.2);
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.left += g_MarginBetweenControls;
ButtonPntr = new BButton (TempRect, "Cancel Button",
"Cancel", new BMessage (ClassificationChoicesWindow::MSG_CANCEL_BUTTON));
AddChild (ButtonPntr);
ButtonPntr->ResizeToPreferred ();
X = ButtonPntr->Frame().right + g_MarginBetweenControls;
if (m_NumberOfFiles > 1)
{
Margin = ceilf ((RowHeight - g_CheckBoxHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.left = X;
sprintf (TempString, "Mark all %d remaining messages the same way.",
m_NumberOfFiles - 1);
CheckBoxPntr = new BCheckBox (TempRect, "BulkBox", TempString,
new BMessage (ClassificationChoicesWindow::MSG_BULK_CHECKBOX));
AddChild (CheckBoxPntr);
CheckBoxPntr->ResizeToPreferred ();
}
RowTop += RowHeight;
m_PreferredBottomY = RowTop;
}
void
ClassificationChoicesView::GetPreferredSize (float *width, float *height)
{
if (width != NULL)
*width = Bounds().Width();
if (height != NULL)
*height = m_PreferredBottomY;
}
* Implementation of the CommanderLooper class, constructor, destructor and the
* rest of the member functions in mostly alphabetical order.
*/
CommanderLooper::CommanderLooper ()
: BLooper ("CommanderLooper", B_NORMAL_PRIORITY),
m_IsBusy (false)
{
}
CommanderLooper::~CommanderLooper ()
{
g_CommanderLooperPntr = NULL;
delete g_CommanderMessenger;
g_CommanderMessenger = NULL;
}
looper itself to do the work later. That way the caller can continue doing
whatever they're doing, particularly if it's the BApplication. */
void
CommanderLooper::CommandArguments (int argc, char **argv)
{
int i;
BMessage InternalMessage;
InternalMessage.what = MSG_COMMAND_ARGUMENTS;
for (i = 0; i < argc; i++)
InternalMessage.AddString ("arg", argv[i]);
PostMessage (&InternalMessage);
}
message to ourself (so that the original message can be returned to the caller,
and if it is Tracker, it can close the file handles it has open). Optionally
allow preset classification rather than asking the user (set BulkMode to TRUE
and specify the class with BulkClassification). */
void
CommanderLooper::CommandReferences (
BMessage *MessagePntr,
bool BulkMode,
ClassificationTypes BulkClassification)
{
entry_ref EntryRef;
int i;
BMessage InternalMessage;
InternalMessage.what = MSG_COMMAND_FILE_REFS;
for (i = 0; MessagePntr->FindRef ("refs", i, &EntryRef) == B_OK; i++)
InternalMessage.AddRef ("refs", &EntryRef);
InternalMessage.AddBool ("BulkMode", BulkMode);
InternalMessage.AddInt32 ("BulkClassification", BulkClassification);
PostMessage (&InternalMessage);
}
busy working on something. */
bool
CommanderLooper::IsBusy ()
{
if (m_IsBusy)
return true;
if (IsLocked () || !MessageQueue()->IsEmpty ())
return true;
return false;
}
void
CommanderLooper::MessageReceived (BMessage *MessagePntr)
{
m_IsBusy = true;
if (MessagePntr->what == MSG_COMMAND_ARGUMENTS)
ProcessArgs (MessagePntr);
else if (MessagePntr->what == MSG_COMMAND_FILE_REFS)
ProcessRefs (MessagePntr);
else
BLooper::MessageReceived (MessagePntr);
m_IsBusy = false;
}
messages (possibly thousands) and sent them to the BApplication synchronously
(so we can print the result). */
void
CommanderLooper::ProcessArgs (BMessage *MessagePntr)
{
int32 argc = 0;
const char **argv = NULL;
int ArgumentIndex;
uint32 CommandCode;
const char *CommandWord;
status_t ErrorCode;
const char *ErrorTitle = "ProcessArgs";
char *EndPntr;
int32 i;
BMessage ReplyMessage;
BMessage ScriptMessage;
struct property_info *PropInfoPntr;
const char *PropertyName;
bool TempBool;
float TempFloat;
int32 TempInt32;
const char *TempStringPntr;
type_code TypeCode;
const char *ValuePntr;
into our argc and argv. */
ErrorCode = MessagePntr->GetInfo ("arg", &TypeCode, &argc);
if (ErrorCode != B_OK || TypeCode != B_STRING_TYPE)
{
DisplayErrorMessage ("Unable to find argument strings in message",
ErrorCode, ErrorTitle);
goto ErrorExit;
}
if (argc < 2)
{
cerr << PrintUsage;
DisplayErrorMessage ("You need to specify a command word, like GET, SET "
"and so on followed by a property, like DatabaseFile, and maybe "
"followed by a value of some sort", -1, ErrorTitle);
goto ErrorExit;
}
argv = (const char **) malloc (sizeof (char *) * argc);
if (argv == NULL)
{
DisplayErrorMessage ("Out of memory when allocating argv array",
ENOMEM, ErrorTitle);
goto ErrorExit;
}
for (i = 0; i < argc; i++)
{
if ((ErrorCode = MessagePntr->FindString ("arg", i, &argv[i])) != B_OK)
{
DisplayErrorMessage ("Unable to find argument in the BMessage",
ErrorCode, ErrorTitle);
goto ErrorExit;
}
}
CommandWord = argv[1];
if (strcasecmp (CommandWord, "quit") == 0)
{
g_QuitCountdown = 10;
goto ErrorExit;
}
if (strcasecmp (CommandWord, "set") == 0)
CommandCode = B_SET_PROPERTY;
else if (strcasecmp (CommandWord, "get") == 0)
CommandCode = B_GET_PROPERTY;
else if (strcasecmp (CommandWord, "count") == 0)
CommandCode = B_COUNT_PROPERTIES;
else if (strcasecmp (CommandWord, "create") == 0)
CommandCode = B_CREATE_PROPERTY;
else if (strcasecmp (CommandWord, "delete") == 0)
CommandCode = B_DELETE_PROPERTY;
else
CommandCode = B_EXECUTE_PROPERTY;
if (CommandCode == B_EXECUTE_PROPERTY)
{
PropertyName = CommandWord;
ArgumentIndex = 2;
}
else
{
if (CommandCode == B_SET_PROPERTY)
{
if (argc < 4)
{
cerr << PrintUsage;
DisplayErrorMessage ("SET commands require at least one "
"argument value after the property name", -1, ErrorTitle);
goto ErrorExit;
}
}
else
if (argc < 3)
{
cerr << PrintUsage;
DisplayErrorMessage ("You need to specify a property to act on",
-1, ErrorTitle);
goto ErrorExit;
}
PropertyName = argv[2];
ArgumentIndex = 3;
}
for (PropInfoPntr = g_ScriptingPropertyList + 0; true; PropInfoPntr++)
{
if (PropInfoPntr->name == 0)
{
cerr << PrintUsage;
DisplayErrorMessage ("The property specified isn't known or "
"doesn't support the requested action (usually means it is an "
"unknown command)", -1, ErrorTitle);
goto ErrorExit;
}
if (PropInfoPntr->commands[0] == CommandCode &&
strcasecmp (PropertyName, PropInfoPntr->name) == 0)
break;
}
arguments, repeat the message for each single argument and just change the
data portion for each extra argument. Send the command and wait for a reply,
which we'll print out. */
ScriptMessage.MakeEmpty ();
ScriptMessage.what = CommandCode;
ScriptMessage.AddSpecifier (PropertyName);
while (true)
{
if (ArgumentIndex < argc)
{
ValuePntr = argv[ArgumentIndex];
if (strcasecmp (ValuePntr, "yes") == 0 ||
strcasecmp (ValuePntr, "true") == 0)
ScriptMessage.AddBool (g_DataName, true);
else if (strcasecmp (ValuePntr, "no") == 0 ||
strcasecmp (ValuePntr, "false") == 0)
ScriptMessage.AddBool (g_DataName, false);
else
{
i = strtol (ValuePntr, &EndPntr, 0);
if (*EndPntr == 0)
ScriptMessage.AddInt32 (g_DataName, i);
else
ScriptMessage.AddString (g_DataName, ValuePntr);
}
}
ErrorCode = be_app_messenger.SendMessage (&ScriptMessage, &ReplyMessage);
if (ErrorCode != B_OK)
{
DisplayErrorMessage ("Unable to send scripting command",
ErrorCode, ErrorTitle);
goto ErrorExit;
}
standard output. */
if (ReplyMessage.FindString ("CommandText", &TempStringPntr) == B_OK)
{
TempInt32 = -1;
if (ReplyMessage.FindInt32 ("error", &TempInt32) == B_OK &&
TempInt32 == B_OK)
{
the returned values code for command line users to see. */
cout << "Result of command to " << TempStringPntr << " is:\t";
if (ReplyMessage.FindString (g_ResultName, &TempStringPntr) == B_OK)
cout << "\"" << TempStringPntr << "\"";
else if (ReplyMessage.FindInt32 (g_ResultName, &TempInt32) == B_OK)
cout << TempInt32;
else if (ReplyMessage.FindFloat (g_ResultName, &TempFloat) == B_OK)
cout << TempFloat;
else if (ReplyMessage.FindBool (g_ResultName, &TempBool) == B_OK)
cout << (TempBool ? "true" : "false");
else
cout << "just plain success";
if (ReplyMessage.FindInt32 ("count", &TempInt32) == B_OK)
cout << "\t(count " << TempInt32 << ")";
for (i = 0; (i < 50) &&
ReplyMessage.FindString ("words", i, &TempStringPntr) == B_OK &&
ReplyMessage.FindFloat ("ratios", i, &TempFloat) == B_OK;
i++)
{
if (i == 0)
cout << "\twith top words:\t";
else
cout << "\t";
cout << TempStringPntr << "/" << TempFloat;
}
cout << endl;
}
else
{
cout << "Failure of command " << TempStringPntr << ", error ";
cout << TempInt32 << " (" << strerror (TempInt32) << ")";
if (ReplyMessage.FindString ("message", &TempStringPntr) == B_OK)
cout << ", message: " << TempStringPntr;
cout << "." << endl;
}
}
ScriptMessage.RemoveName (g_DataName);
if (++ArgumentIndex >= argc)
break;
}
ErrorExit:
free (argv);
}
file, switch to using it as a database. Otherwise, treat them as text files
and add them to the database. Prompt the user for the spam or genuine or
uncertain (declassification) choice, with the option to bulk mark many files at
once. */
void
CommanderLooper::ProcessRefs (BMessage *MessagePntr)
{
bool BulkMode = false;
ClassificationTypes BulkClassification = CL_GENUINE;
ClassificationChoicesWindow *ChoiceWindowPntr;
BEntry Entry;
entry_ref EntryRef;
status_t ErrorCode;
const char *ErrorTitle = "CommanderLooper::ProcessRefs";
int32 NumberOfRefs = 0;
BPath Path;
int RefIndex;
BMessage ReplyMessage;
BMessage ScriptingMessage;
bool TempBool;
BFile TempFile;
int32 TempInt32;
char TempString [PATH_MAX + 1024];
type_code TypeCode;
TempInt32 = 0;
while (!g_AppReadyToRunCompleted && TempInt32++ < 10)
snooze (200000);
ErrorCode = MessagePntr->GetInfo ("refs", &TypeCode, &NumberOfRefs);
if (ErrorCode != B_OK || TypeCode != B_REF_TYPE || NumberOfRefs <= 0)
{
DisplayErrorMessage ("Unable to get refs from the message",
ErrorCode, ErrorTitle);
return;
}
if (MessagePntr->FindBool ("BulkMode", &TempBool) == B_OK)
BulkMode = TempBool;
if (MessagePntr->FindInt32 ("BulkClassification", &TempInt32) == B_OK &&
TempInt32 >= 0 && TempInt32 < CL_MAX)
BulkClassification = (ClassificationTypes) TempInt32;
for (RefIndex = 0;
MessagePntr->FindRef ("refs", RefIndex, &EntryRef) == B_OK;
RefIndex++)
{
ScriptingMessage.MakeEmpty ();
ScriptingMessage.what = 0;
ErrorCode = Entry.SetTo (&EntryRef, true );
if (ErrorCode != B_OK ||
((ErrorCode = B_ENTRY_NOT_FOUND) != 0
mwcc -nwhitehorn */ && !Entry.Exists ()) ||
((ErrorCode = Entry.GetPath (&Path)) != B_OK))
{
DisplayErrorMessage ("Bad entry reference encountered, will skip it",
ErrorCode, ErrorTitle);
BulkMode = false;
continue;
}
text at the start of the file, in case someone has edited the file with a
spreadsheet or other tool and lost the MIME type. */
if (Entry.IsFile ())
{
ErrorCode = TempFile.SetTo (&Entry, B_READ_ONLY);
if (ErrorCode != B_OK)
{
sprintf (TempString, "Unable to open file \"%s\" for reading, will "
"skip it", Path.Path ());
DisplayErrorMessage (TempString, ErrorCode, ErrorTitle);
BulkMode = false;
continue;
}
if (TempFile.Read (TempString, strlen (g_DatabaseRecognitionString)) ==
(int) strlen (g_DatabaseRecognitionString) && strncmp (TempString,
g_DatabaseRecognitionString, strlen (g_DatabaseRecognitionString)) == 0)
{
ScriptingMessage.what = B_SET_PROPERTY;
ScriptingMessage.AddSpecifier (g_PropertyNames[PN_DATABASE_FILE]);
ScriptingMessage.AddString (g_DataName, Path.Path ());
}
TempFile.Unset ();
}
something to be marked spam or genuine. */
if (ScriptingMessage.what == 0)
{
if (!Entry.IsFile ())
{
sprintf (TempString, "\"%s\" is not a file, can't do anything with it",
Path.Path ());
DisplayErrorMessage (TempString, -1, ErrorTitle);
BulkMode = false;
continue;
}
if (!BulkMode)
{
ChoiceWindowPntr = new ClassificationChoicesWindow (
BRect (40, 40, 40 + 50 * g_MarginBetweenControls,
40 + g_ButtonHeight * 5), Path.Path (), NumberOfRefs - RefIndex);
ChoiceWindowPntr->Go (&BulkMode, &BulkClassification);
if (BulkClassification == CL_MAX)
break;
}
ScriptingMessage.what = B_SET_PROPERTY;
if (BulkClassification == CL_GENUINE)
ScriptingMessage.AddSpecifier (g_PropertyNames[PN_GENUINE]);
else if (BulkClassification == CL_SPAM)
ScriptingMessage.AddSpecifier (g_PropertyNames[PN_SPAM]);
else if (BulkClassification == CL_UNCERTAIN)
ScriptingMessage.AddSpecifier (g_PropertyNames[PN_UNCERTAIN]);
else
break;
ScriptingMessage.AddString (g_DataName, Path.Path ());
}
BApplication will display any error messages for us. */
ErrorCode =
be_app_messenger.SendMessage (&ScriptingMessage, &ReplyMessage);
if (ErrorCode != B_OK)
{
DisplayErrorMessage ("Unable to send scripting command",
ErrorCode, ErrorTitle);
return;
}
mode. The message will already have been displayed in an alert box, if
server mode is off. */
if (ReplyMessage.FindInt32 ("error", &TempInt32) != B_OK ||
TempInt32 != B_OK)
BulkMode = false;
}
}
* Implementation of the ControlsView class, constructor, destructor and the
* rest of the member functions in mostly alphabetical order.
*/
ControlsView::ControlsView (BRect NewBounds)
: BView (NewBounds, "ControlsView", B_FOLLOW_TOP | B_FOLLOW_LEFT_RIGHT,
B_WILL_DRAW | B_PULSE_NEEDED | B_NAVIGABLE_JUMP | B_FRAME_EVENTS),
m_AboutButtonPntr (NULL),
m_AddExampleButtonPntr (NULL),
m_BrowseButtonPntr (NULL),
m_BrowseFilePanelPntr (NULL),
m_CreateDatabaseButtonPntr (NULL),
m_DatabaseFileNameTextboxPntr (NULL),
m_DatabaseLoadDone (false),
m_EstimateSpamButtonPntr (NULL),
m_EstimateSpamFilePanelPntr (NULL),
m_GenuineCountTextboxPntr (NULL),
m_IgnorePreviousClassCheckboxPntr (NULL),
m_InstallThingsButtonPntr (NULL),
m_PurgeAgeTextboxPntr (NULL),
m_PurgeButtonPntr (NULL),
m_PurgePopularityTextboxPntr (NULL),
m_ResetToDefaultsButtonPntr (NULL),
m_ScoringModeMenuBarPntr (NULL),
m_ScoringModePopUpMenuPntr (NULL),
m_ServerModeCheckboxPntr (NULL),
m_SpamCountTextboxPntr (NULL),
m_TimeOfLastPoll (0),
m_TokenizeModeMenuBarPntr (NULL),
m_TokenizeModePopUpMenuPntr (NULL),
m_WordCountTextboxPntr (NULL)
{
}
ControlsView::~ControlsView ()
{
if (m_BrowseFilePanelPntr != NULL)
{
delete m_BrowseFilePanelPntr;
m_BrowseFilePanelPntr = NULL;
}
if (m_EstimateSpamFilePanelPntr != NULL)
{
delete m_EstimateSpamFilePanelPntr;
m_EstimateSpamFilePanelPntr = NULL;
}
}
void
ControlsView::AttachedToWindow ()
{
float BigPurgeButtonTop;
BMessage CommandMessage;
const char *EightDigitsString = " 12345678 ";
float Height;
float Margin;
float RowHeight;
float RowTop;
ScoringModes ScoringMode;
const char *StringPntr;
BMenuItem *TempMenuItemPntr;
BRect TempRect;
char TempString [PATH_MAX];
TokenizeModes TokenizeMode;
float Width;
float X;
SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR));
TempRect = Bounds ();
X = TempRect.right;
RowTop = TempRect.top;
RowHeight = g_ButtonHeight;
if (g_TextBoxHeight > RowHeight)
RowHeight = g_TextBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
which are all database file related. */
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
CommandMessage.MakeEmpty ();
CommandMessage.what = B_CREATE_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_DATABASE_FILE]);
m_CreateDatabaseButtonPntr = new BButton (TempRect, "Create Button",
"Create", new BMessage (CommandMessage), B_FOLLOW_RIGHT | B_FOLLOW_TOP);
if (m_CreateDatabaseButtonPntr == NULL) goto ErrorExit;
AddChild (m_CreateDatabaseButtonPntr);
m_CreateDatabaseButtonPntr->SetTarget (be_app);
m_CreateDatabaseButtonPntr->ResizeToPreferred ();
m_CreateDatabaseButtonPntr->GetPreferredSize (&Width, &Height);
m_CreateDatabaseButtonPntr->MoveTo (X - Width, TempRect.top);
X -= Width + g_MarginBetweenControls;
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
m_BrowseButtonPntr = new BButton (TempRect, "Browse Button",
"Browse…", new BMessage (MSG_BROWSE_BUTTON), B_FOLLOW_RIGHT | B_FOLLOW_TOP);
if (m_BrowseButtonPntr == NULL) goto ErrorExit;
AddChild (m_BrowseButtonPntr);
m_BrowseButtonPntr->SetTarget (this);
m_BrowseButtonPntr->ResizeToPreferred ();
m_BrowseButtonPntr->GetPreferredSize (&Width, &Height);
m_BrowseButtonPntr->MoveTo (X - Width, TempRect.top);
X -= Width + g_MarginBetweenControls;
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
TempRect.right = X;
StringPntr = "Word Database:";
strcpy (m_DatabaseFileNameCachedValue, "Unknown...");
m_DatabaseFileNameTextboxPntr = new BTextControl (TempRect,
"File Name",
StringPntr ,
m_DatabaseFileNameCachedValue ,
new BMessage (MSG_DATABASE_NAME),
B_FOLLOW_LEFT_RIGHT | B_FOLLOW_TOP,
B_WILL_DRAW | B_NAVIGABLE | B_NAVIGABLE_JUMP);
AddChild (m_DatabaseFileNameTextboxPntr);
m_DatabaseFileNameTextboxPntr->SetTarget (this);
m_DatabaseFileNameTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
is space to the right where the top half of the big purge button will go. */
RowTop += RowHeight ;
BigPurgeButtonTop = RowTop;
TempRect = Bounds ();
X = TempRect.left;
RowHeight = g_TextBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
StringPntr = "Number of occurrences needed to store a word:";
m_PurgeAgeCachedValue = 12345678;
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
TempRect.left = X;
TempRect.right = TempRect.left +
be_plain_font->StringWidth (StringPntr) +
be_plain_font->StringWidth (EightDigitsString) +
3 * g_MarginBetweenControls;
sprintf (TempString, "%d", (int) m_PurgeAgeCachedValue);
m_PurgeAgeTextboxPntr = new BTextControl (TempRect,
"Purge Age",
StringPntr ,
TempString ,
new BMessage (MSG_PURGE_AGE),
B_FOLLOW_LEFT | B_FOLLOW_TOP,
B_WILL_DRAW | B_NAVIGABLE);
AddChild (m_PurgeAgeTextboxPntr);
m_PurgeAgeTextboxPntr->SetTarget (this);
m_PurgeAgeTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
button. */
RowTop += RowHeight ;
TempRect = Bounds ();
X = TempRect.left;
RowHeight = g_TextBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
StringPntr = "Number of messages to store words from:";
m_PurgePopularityCachedValue = 87654321;
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
TempRect.left = X;
TempRect.right = TempRect.left +
be_plain_font->StringWidth (StringPntr) +
be_plain_font->StringWidth (EightDigitsString) +
3 * g_MarginBetweenControls;
X = TempRect.right + g_MarginBetweenControls;
sprintf (TempString, "%d", (int) m_PurgePopularityCachedValue);
m_PurgePopularityTextboxPntr = new BTextControl (TempRect,
"Purge Popularity",
StringPntr ,
TempString ,
new BMessage (MSG_PURGE_POPULARITY),
B_FOLLOW_LEFT | B_FOLLOW_TOP,
B_WILL_DRAW | B_NAVIGABLE);
AddChild (m_PurgePopularityTextboxPntr);
m_PurgePopularityTextboxPntr->SetTarget (this);
m_PurgePopularityTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
on the right side. Twice as tall as a regular button too. */
StringPntr = "Remove Old Words";
Margin = ceilf ((((RowTop + RowHeight) - BigPurgeButtonTop) -
2 * g_TextBoxHeight) / 2);
TempRect.top = BigPurgeButtonTop + Margin;
TempRect.bottom = TempRect.top + 2 * g_TextBoxHeight;
TempRect.left = X;
TempRect.right = X + ceilf (2 * be_plain_font->StringWidth (StringPntr));
CommandMessage.MakeEmpty ();
CommandMessage.what = B_EXECUTE_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_PURGE]);
m_PurgeButtonPntr = new BButton (TempRect, "Purge Button",
StringPntr, new BMessage (CommandMessage), B_FOLLOW_LEFT | B_FOLLOW_TOP);
if (m_PurgeButtonPntr == NULL) goto ErrorExit;
m_PurgeButtonPntr->ResizeToPreferred();
AddChild (m_PurgeButtonPntr);
m_PurgeButtonPntr->SetTarget (be_app);
RowTop += RowHeight ;
TempRect = Bounds ();
X = TempRect.left;
RowHeight = g_CheckBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
StringPntr = "Allow Retraining on a Message";
m_IgnorePreviousClassCachedValue = false;
Margin = ceilf ((RowHeight - g_CheckBoxHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_CheckBoxHeight;
TempRect.left = X;
m_IgnorePreviousClassCheckboxPntr = new BCheckBox (TempRect,
"Ignore Check",
StringPntr,
new BMessage (MSG_IGNORE_CLASSIFICATION),
B_FOLLOW_TOP | B_FOLLOW_LEFT);
if (m_IgnorePreviousClassCheckboxPntr == NULL) goto ErrorExit;
AddChild (m_IgnorePreviousClassCheckboxPntr);
m_IgnorePreviousClassCheckboxPntr->SetTarget (this);
m_IgnorePreviousClassCheckboxPntr->ResizeToPreferred ();
m_IgnorePreviousClassCheckboxPntr->GetPreferredSize (&Width, &Height);
X += Width + g_MarginBetweenControls;
RowTop += RowHeight ;
TempRect = Bounds ();
RowHeight = g_CheckBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
StringPntr = "Print errors to Terminal";
m_ServerModeCachedValue = false;
Margin = ceilf ((RowHeight - g_CheckBoxHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_CheckBoxHeight;
m_ServerModeCheckboxPntr = new BCheckBox (TempRect,
"ServerMode Check",
StringPntr,
new BMessage (MSG_SERVER_MODE),
B_FOLLOW_TOP | B_FOLLOW_LEFT);
if (m_ServerModeCheckboxPntr == NULL) goto ErrorExit;
AddChild (m_ServerModeCheckboxPntr);
m_ServerModeCheckboxPntr->SetTarget (this);
m_ServerModeCheckboxPntr->ResizeToPreferred ();
m_ServerModeCheckboxPntr->GetPreferredSize (&Width, &Height);
and an explanation of what each mode does. */
RowTop += RowHeight ;
TempRect = Bounds ();
RowHeight = g_PopUpMenuHeight;
RowHeight = ceilf (RowHeight * 1.1);
Margin = ceilf ((RowHeight - g_PopUpMenuHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_PopUpMenuHeight;
m_TokenizeModeCachedValue = TM_MAX;
m_TokenizeModeMenuBarPntr = new BMenuBar (TempRect, "TokenizeModeMenuBar",
B_FOLLOW_LEFT_RIGHT | B_FOLLOW_TOP, B_ITEMS_IN_COLUMN,
false );
if (m_TokenizeModeMenuBarPntr == NULL) goto ErrorExit;
m_TokenizeModePopUpMenuPntr = new BPopUpMenu ("TokenizeModePopUpMenu");
if (m_TokenizeModePopUpMenuPntr == NULL) goto ErrorExit;
for (TokenizeMode = (TokenizeModes) 0;
TokenizeMode < TM_MAX;
TokenizeMode = (TokenizeModes) ((int) TokenizeMode + 1))
{
will send a canned command to the application to switch to the appropriate
tokenize mode. An optional explanation of each mode is added to the mode
name string. */
CommandMessage.MakeEmpty ();
CommandMessage.what = B_SET_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_TOKENIZE_MODE]);
CommandMessage.AddString (g_DataName, g_TokenizeModeNames[TokenizeMode]);
strcpy (TempString, g_TokenizeModeNames[TokenizeMode]);
switch (TokenizeMode)
{
case TM_WHOLE:
strcat (TempString, " - Scan everything");
break;
case TM_PLAIN_TEXT:
strcat (TempString, " - Scan e-mail body text except rich text");
break;
case TM_PLAIN_TEXT_HEADER:
strcat (TempString, " - Scan entire e-mail text except rich text");
break;
case TM_ANY_TEXT:
strcat (TempString, " - Scan e-mail body text and text attachments");
break;
case TM_ANY_TEXT_HEADER:
strcat (TempString, " - Scan entire e-mail text and text attachments (recommended)");
break;
case TM_ALL_PARTS:
strcat (TempString, " - Scan e-mail body and all attachments");
break;
case TM_ALL_PARTS_HEADER:
strcat (TempString, " - Scan all parts of the e-mail");
break;
case TM_JUST_HEADER:
strcat (TempString, " - Scan just the header (mail routing information)");
break;
default:
break;
}
TempMenuItemPntr =
new BMenuItem (TempString, new BMessage (CommandMessage));
if (TempMenuItemPntr == NULL) goto ErrorExit;
TempMenuItemPntr->SetTarget (be_app);
m_TokenizeModePopUpMenuPntr->AddItem (TempMenuItemPntr);
}
m_TokenizeModeMenuBarPntr->AddItem (m_TokenizeModePopUpMenuPntr);
AddChild (m_TokenizeModeMenuBarPntr);
and an explanation of what each mode does. */
RowTop += RowHeight ;
TempRect = Bounds ();
RowHeight = g_PopUpMenuHeight;
RowHeight = ceilf (RowHeight * 1.1);
Margin = ceilf ((RowHeight - g_PopUpMenuHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_PopUpMenuHeight;
m_ScoringModeCachedValue = SM_MAX;
m_ScoringModeMenuBarPntr = new BMenuBar (TempRect, "ScoringModeMenuBar",
B_FOLLOW_LEFT_RIGHT | B_FOLLOW_TOP, B_ITEMS_IN_COLUMN,
false );
if (m_ScoringModeMenuBarPntr == NULL) goto ErrorExit;
m_ScoringModePopUpMenuPntr = new BPopUpMenu ("ScoringModePopUpMenu");
if (m_ScoringModePopUpMenuPntr == NULL) goto ErrorExit;
for (ScoringMode = (ScoringModes) 0;
ScoringMode < SM_MAX;
ScoringMode = (ScoringModes) ((int) ScoringMode + 1))
{
will send a canned command to the application to switch to the appropriate
scoring mode. An optional explanation of each mode is added to the mode
name string. */
CommandMessage.MakeEmpty ();
CommandMessage.what = B_SET_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_SCORING_MODE]);
CommandMessage.AddString (g_DataName, g_ScoringModeNames[ScoringMode]);
strcpy (TempString, g_ScoringModeNames[ScoringMode]);
switch (ScoringMode)
{
case SM_ROBINSON:
strcat (TempString, " - Learning Method 1: Naive Bayesian");
break;
case SM_CHISQUARED:
strcat (TempString, " - Learning Method 2: Chi-Squared");
break;
default:
break;
}
*/
switch (ScoringMode)
{
case SM_ROBINSON:
strcpy (TempString, "Learning method 1: Naive Bayesian");
break;
case SM_CHISQUARED:
strcpy (TempString, "Learning method 2: Chi-Squared");
break;
default:
break;
}
TempMenuItemPntr =
new BMenuItem (TempString, new BMessage (CommandMessage));
if (TempMenuItemPntr == NULL) goto ErrorExit;
TempMenuItemPntr->SetTarget (be_app);
m_ScoringModePopUpMenuPntr->AddItem (TempMenuItemPntr);
}
m_ScoringModeMenuBarPntr->AddItem (m_ScoringModePopUpMenuPntr);
AddChild (m_ScoringModeMenuBarPntr);
button, one on the left and the other on the right. */
RowTop += RowHeight ;
TempRect = Bounds ();
RowHeight = g_ButtonHeight;
RowHeight = ceilf (RowHeight * 1.1);
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
CommandMessage.MakeEmpty ();
CommandMessage.what = B_EXECUTE_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_INSTALL_THINGS]);
m_InstallThingsButtonPntr = new BButton (TempRect, "Install Button",
"Install spam types",
new BMessage (CommandMessage),
B_FOLLOW_LEFT | B_FOLLOW_TOP);
if (m_InstallThingsButtonPntr == NULL) goto ErrorExit;
AddChild (m_InstallThingsButtonPntr);
m_InstallThingsButtonPntr->SetTarget (be_app);
m_InstallThingsButtonPntr->ResizeToPreferred ();
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
CommandMessage.MakeEmpty ();
CommandMessage.what = B_EXECUTE_PROPERTY;
CommandMessage.AddSpecifier (g_PropertyNames[PN_RESET_TO_DEFAULTS]);
m_ResetToDefaultsButtonPntr = new BButton (TempRect, "Reset Button",
"Default settings", new BMessage (CommandMessage),
B_FOLLOW_RIGHT | B_FOLLOW_TOP);
if (m_ResetToDefaultsButtonPntr == NULL) goto ErrorExit;
AddChild (m_ResetToDefaultsButtonPntr);
m_ResetToDefaultsButtonPntr->SetTarget (be_app);
m_ResetToDefaultsButtonPntr->ResizeToPreferred ();
m_ResetToDefaultsButtonPntr->GetPreferredSize (&Width, &Height);
m_ResetToDefaultsButtonPntr->MoveTo (TempRect.right - Width, TempRect.top);
RowTop += RowHeight ;
TempRect = Bounds ();
X = TempRect.left;
RowHeight = g_ButtonHeight;
RowHeight = ceilf (RowHeight * 1.1);
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
TempRect.left = X;
m_EstimateSpamButtonPntr = new BButton (TempRect, "Estimate Button",
"Scan a message",
new BMessage (MSG_ESTIMATE_BUTTON),
B_FOLLOW_LEFT | B_FOLLOW_TOP);
if (m_EstimateSpamButtonPntr == NULL) goto ErrorExit;
AddChild (m_EstimateSpamButtonPntr);
m_EstimateSpamButtonPntr->SetTarget (this);
m_EstimateSpamButtonPntr->ResizeToPreferred ();
X = m_EstimateSpamButtonPntr->Frame().right + g_MarginBetweenControls;
but don't tell anyone that! */
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
TempRect.left = X;
m_AddExampleButtonPntr = new BButton (TempRect, "Example Button",
"Train spam filter on a message",
new BMessage (MSG_BROWSE_BUTTON),
B_FOLLOW_LEFT_RIGHT | B_FOLLOW_TOP,
B_WILL_DRAW | B_NAVIGABLE | B_FULL_UPDATE_ON_RESIZE);
if (m_AddExampleButtonPntr == NULL) goto ErrorExit;
AddChild (m_AddExampleButtonPntr);
m_AddExampleButtonPntr->SetTarget (this);
m_AddExampleButtonPntr->ResizeToPreferred ();
X = m_AddExampleButtonPntr->Frame().right + g_MarginBetweenControls;
Margin = ceilf ((RowHeight - g_ButtonHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_ButtonHeight;
TempRect.left = X;
m_AboutButtonPntr = new BButton (TempRect, "About Button",
"About…",
new BMessage (B_ABOUT_REQUESTED),
B_FOLLOW_RIGHT | B_FOLLOW_TOP);
if (m_AboutButtonPntr == NULL) goto ErrorExit;
AddChild (m_AboutButtonPntr);
m_AboutButtonPntr->SetTarget (be_app);
count on the left. */
RowTop += RowHeight ;
TempRect = Bounds ();
RowHeight = g_TextBoxHeight;
RowHeight = ceilf (RowHeight * 1.1);
StringPntr = "Genuine messages:";
m_GenuineCountCachedValue = 87654321;
sprintf (TempString, "%d", (int) m_GenuineCountCachedValue);
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
TempRect.right = TempRect.left +
be_plain_font->StringWidth (StringPntr) +
be_plain_font->StringWidth (TempString) +
3 * g_MarginBetweenControls;
m_GenuineCountTextboxPntr = new BTextControl (TempRect,
"Genuine count",
StringPntr ,
TempString ,
NULL ,
B_FOLLOW_LEFT | B_FOLLOW_TOP,
B_WILL_DRAW );
AddChild (m_GenuineCountTextboxPntr);
m_GenuineCountTextboxPntr->SetTarget (this);
m_GenuineCountTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
m_GenuineCountTextboxPntr->SetEnabled (false);
StringPntr = "Word count:";
m_WordCountCachedValue = 87654321;
sprintf (TempString, "%d", (int) m_WordCountCachedValue);
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
Width = be_plain_font->StringWidth (StringPntr) +
be_plain_font->StringWidth (TempString) +
3 * g_MarginBetweenControls;
TempRect.left = ceilf ((TempRect.right - TempRect.left) / 2 - Width / 2);
TempRect.right = TempRect.left + Width;
m_WordCountTextboxPntr = new BTextControl (TempRect,
"Word count",
StringPntr ,
TempString ,
NULL ,
B_FOLLOW_H_CENTER | B_FOLLOW_TOP,
B_WILL_DRAW );
AddChild (m_WordCountTextboxPntr);
m_WordCountTextboxPntr->SetTarget (this);
m_WordCountTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
m_WordCountTextboxPntr->SetEnabled (false);
StringPntr = "Spam messages:";
m_SpamCountCachedValue = 87654321;
sprintf (TempString, "%d", (int) m_SpamCountCachedValue);
Margin = ceilf ((RowHeight - g_TextBoxHeight) / 2);
TempRect = Bounds ();
TempRect.top = RowTop + Margin;
TempRect.bottom = TempRect.top + g_TextBoxHeight;
TempRect.left = TempRect.right -
be_plain_font->StringWidth (StringPntr) -
be_plain_font->StringWidth (TempString) -
3 * g_MarginBetweenControls;
m_SpamCountTextboxPntr = new BTextControl (TempRect,
"Spam count",
StringPntr ,
TempString ,
NULL ,
B_FOLLOW_RIGHT | B_FOLLOW_TOP,
B_WILL_DRAW );
AddChild (m_SpamCountTextboxPntr);
m_SpamCountTextboxPntr->SetTarget (this);
m_SpamCountTextboxPntr->SetDivider (
be_plain_font->StringWidth (StringPntr) + g_MarginBetweenControls);
m_SpamCountTextboxPntr->SetEnabled (false);
buttons. */
RowTop += RowHeight ;
ResizeTo (Bounds().Width(), RowTop - Bounds().top + 1);
return;
ErrorExit:
DisplayErrorMessage ("Unable to initialise the controls view.");
}
void
ControlsView::BrowseForDatabaseFile ()
{
if (m_BrowseFilePanelPntr == NULL)
{
BEntry DirectoryEntry;
entry_ref DirectoryEntryRef;
BMessage GetDatabasePathCommand;
BMessage GetDatabasePathResult;
const char *StringPntr = NULL;
file panel can open to show the initial directory (the one where the
database file currently is). Note that we have to create it after the
window and view are up and running, otherwise the BMessenger won't point to
a valid looper/handler. First find out the current database file name to
use as a starting point. */
GetDatabasePathCommand.what = B_GET_PROPERTY;
GetDatabasePathCommand.AddSpecifier (g_PropertyNames[PN_DATABASE_FILE]);
be_app_messenger.SendMessage (&GetDatabasePathCommand,
&GetDatabasePathResult, 5000000 ,
5000000 );
if (GetDatabasePathResult.FindString (g_ResultName, &StringPntr) != B_OK ||
DirectoryEntry.SetTo (StringPntr) != B_OK ||
DirectoryEntry.GetParent (&DirectoryEntry) != B_OK)
DirectoryEntry.SetTo (".");
if (DirectoryEntry.GetRef (&DirectoryEntryRef) != B_OK)
{
DisplayErrorMessage (
"Unable to set up the file requestor starting directory. Sorry.");
return;
}
m_BrowseFilePanelPntr = new BFilePanel (
B_OPEN_PANEL ,
&be_app_messenger ,
&DirectoryEntryRef ,
B_FILE_NODE,
true ,
NULL ,
NULL ,
false ,
true );
}
if (m_BrowseFilePanelPntr != NULL)
m_BrowseFilePanelPntr->Show ();
}
void
ControlsView::BrowseForFileToEstimate ()
{
if (m_EstimateSpamFilePanelPntr == NULL)
{
BEntry DirectoryEntry;
entry_ref DirectoryEntryRef;
status_t ErrorCode;
BMessenger MessengerToSelf (this);
BPath PathToMailDirectory;
file panel can open to show the initial directory (the user's mail
directory). Note that we have to create the panel after the window and
view are up and running, otherwise the BMessenger won't point to a valid
looper/handler. */
ErrorCode = find_directory (B_USER_DIRECTORY, &PathToMailDirectory);
if (ErrorCode == B_OK)
{
PathToMailDirectory.Append ("mail");
ErrorCode = DirectoryEntry.SetTo (PathToMailDirectory.Path(),
true );
if (ErrorCode != B_OK || !DirectoryEntry.Exists ())
{
find_directory (B_USER_DIRECTORY, &PathToMailDirectory);
ErrorCode = DirectoryEntry.SetTo (PathToMailDirectory.Path(), true);
}
}
if (ErrorCode != B_OK)
PathToMailDirectory.SetTo (".");
DirectoryEntry.SetTo (PathToMailDirectory.Path(), true);
if (DirectoryEntry.GetRef (&DirectoryEntryRef) != B_OK)
{
DisplayErrorMessage (
"Unable to set up the file requestor starting directory. Sorry.");
return;
}
m_EstimateSpamFilePanelPntr = new BFilePanel (
B_OPEN_PANEL ,
&MessengerToSelf ,
&DirectoryEntryRef ,
B_FILE_NODE,
true ,
new BMessage (MSG_ESTIMATE_FILE_REFS) ,
NULL ,
false ,
true );
}
if (m_EstimateSpamFilePanelPntr != NULL)
m_EstimateSpamFilePanelPntr->Show ();
}
show the new size (the sub-items need to be resized too). Then make it redraw.
Well, actually just resetting the mark on the current item will resize it
properly. */
void
ControlsView::FrameResized (float, float)
{
m_ScoringModeCachedValue = SM_MAX;
m_TokenizeModeCachedValue = TM_MAX;
}
void
ControlsView::MessageReceived (BMessage *MessagePntr)
{
BMessage CommandMessage;
bool TempBool;
uint32 TempUint32;
switch (MessagePntr->what)
{
case MSG_BROWSE_BUTTON:
BrowseForDatabaseFile ();
break;
case MSG_DATABASE_NAME:
if (strcmp (m_DatabaseFileNameCachedValue,
m_DatabaseFileNameTextboxPntr->Text ()) != 0)
SubmitCommandString (PN_DATABASE_FILE, B_SET_PROPERTY,
m_DatabaseFileNameTextboxPntr->Text ());
break;
case MSG_ESTIMATE_BUTTON:
BrowseForFileToEstimate ();
break;
case MSG_ESTIMATE_FILE_REFS:
EstimateRefFilesAndDisplay (MessagePntr);
break;
case MSG_IGNORE_CLASSIFICATION:
TempBool = (m_IgnorePreviousClassCheckboxPntr->Value() == B_CONTROL_ON);
if (m_IgnorePreviousClassCachedValue != TempBool)
SubmitCommandBool (PN_IGNORE_PREVIOUS_CLASSIFICATION,
B_SET_PROPERTY, TempBool);
break;
case MSG_PURGE_AGE:
TempUint32 = strtoul (m_PurgeAgeTextboxPntr->Text (), NULL, 10);
if (m_PurgeAgeCachedValue != TempUint32)
SubmitCommandInt32 (PN_PURGE_AGE, B_SET_PROPERTY, TempUint32);
break;
case MSG_PURGE_POPULARITY:
TempUint32 = strtoul (m_PurgePopularityTextboxPntr->Text (), NULL, 10);
if (m_PurgePopularityCachedValue != TempUint32)
SubmitCommandInt32 (PN_PURGE_POPULARITY, B_SET_PROPERTY, TempUint32);
break;
case MSG_SERVER_MODE:
TempBool = (m_ServerModeCheckboxPntr->Value() == B_CONTROL_ON);
if (m_ServerModeCachedValue != TempBool)
SubmitCommandBool (PN_SERVER_MODE, B_SET_PROPERTY, TempBool);
break;
default:
BView::MessageReceived (MessagePntr);
}
}
any changes, update the displayed values. Since this is a read only
examination of the server, we go directly to the application rather than
sending it messages. Also, when sending messages, we can't find out what it is
doing while it is busy with a batch of spam additions (all the spam add
commands will be in the queue ahead of our requests for info). Instead, we
lock the BApplication (so it isn't changing things while we're looking) and
retrieve our values. */
void
ControlsView::PollServerForChanges ()
{
ABSApp *MyAppPntr;
BMenuItem *TempMenuItemPntr;
char TempString [PATH_MAX];
BWindow *WindowPntr;
WindowPntr = Window ();
if (WindowPntr == NULL)
return;
minimized. Similarly, if it gets turned on, maximize the window. Note that
the user can maximize the window manually, even while still in server mode.
*/
if (g_ServerMode != m_ServerModeCachedValue &&
m_ServerModeCheckboxPntr != NULL)
{
m_ServerModeCachedValue = g_ServerMode;
m_ServerModeCheckboxPntr->SetValue (
m_ServerModeCachedValue ? B_CONTROL_ON : B_CONTROL_OFF);
WindowPntr->Minimize (m_ServerModeCachedValue);
}
if (WindowPntr->IsMinimized ())
return;
nothing is there. But only do it once, so the user doesn't get a lot of
invalid database messages if one doesn't exist yet. In server mode, we never
get this far so it is only loaded when the user wants to see something. */
if (!m_DatabaseLoadDone)
{
m_DatabaseLoadDone = true;
SubmitCommandString (PN_DATABASE_FILE, B_COUNT_PROPERTIES, "");
}
without having to lock it. This is useful for displaying the number of words
as it is changing. First up is the purge age setting. */
MyAppPntr = dynamic_cast<ABSApp *> (be_app);
if (MyAppPntr == NULL)
return;
if (MyAppPntr->m_PurgeAge != m_PurgeAgeCachedValue &&
m_PurgeAgeTextboxPntr != NULL)
{
m_PurgeAgeCachedValue = MyAppPntr->m_PurgeAge;
sprintf (TempString, "%" B_PRIu32, m_PurgeAgeCachedValue);
m_PurgeAgeTextboxPntr->SetText (TempString);
}
if (MyAppPntr->m_PurgePopularity != m_PurgePopularityCachedValue &&
m_PurgePopularityTextboxPntr != NULL)
{
m_PurgePopularityCachedValue = MyAppPntr->m_PurgePopularity;
sprintf (TempString, "%" B_PRIu32, m_PurgePopularityCachedValue);
m_PurgePopularityTextboxPntr->SetText (TempString);
}
if (MyAppPntr->m_IgnorePreviousClassification !=
m_IgnorePreviousClassCachedValue &&
m_IgnorePreviousClassCheckboxPntr != NULL)
{
m_IgnorePreviousClassCachedValue =
MyAppPntr->m_IgnorePreviousClassification;
m_IgnorePreviousClassCheckboxPntr->SetValue (
m_IgnorePreviousClassCachedValue ? B_CONTROL_ON : B_CONTROL_OFF);
}
if (MyAppPntr->m_TotalGenuineMessages != m_GenuineCountCachedValue &&
m_GenuineCountTextboxPntr != NULL)
{
m_GenuineCountCachedValue = MyAppPntr->m_TotalGenuineMessages;
sprintf (TempString, "%" B_PRIu32, m_GenuineCountCachedValue);
m_GenuineCountTextboxPntr->SetText (TempString);
}
if (MyAppPntr->m_TotalSpamMessages != m_SpamCountCachedValue &&
m_SpamCountTextboxPntr != NULL)
{
m_SpamCountCachedValue = MyAppPntr->m_TotalSpamMessages;
sprintf (TempString, "%" B_PRIu32, m_SpamCountCachedValue);
m_SpamCountTextboxPntr->SetText (TempString);
}
if (MyAppPntr->m_WordCount != m_WordCountCachedValue &&
m_WordCountTextboxPntr != NULL)
{
m_WordCountCachedValue = MyAppPntr->m_WordCount;
sprintf (TempString, "%" B_PRIu32, m_WordCountCachedValue);
m_WordCountTextboxPntr->SetText (TempString);
}
if (MyAppPntr->m_TokenizeMode != m_TokenizeModeCachedValue &&
m_TokenizeModePopUpMenuPntr != NULL)
{
m_TokenizeModeCachedValue = MyAppPntr->m_TokenizeMode;
TempMenuItemPntr =
m_TokenizeModePopUpMenuPntr->ItemAt ((int) m_TokenizeModeCachedValue);
if (TempMenuItemPntr != NULL)
TempMenuItemPntr->SetMarked (true);
}
if (MyAppPntr->m_ScoringMode != m_ScoringModeCachedValue &&
m_ScoringModePopUpMenuPntr != NULL)
{
m_ScoringModeCachedValue = MyAppPntr->m_ScoringMode;
TempMenuItemPntr =
m_ScoringModePopUpMenuPntr->ItemAt ((int) m_ScoringModeCachedValue);
if (TempMenuItemPntr != NULL)
TempMenuItemPntr->SetMarked (true);
}
messages until we are done. Or if it is busy, the lock will fail. */
if (MyAppPntr->LockWithTimeout (100000) != B_OK)
return;
if (strcmp (MyAppPntr->m_DatabaseFileName.String (),
m_DatabaseFileNameCachedValue) != 0 &&
m_DatabaseFileNameTextboxPntr != NULL)
{
strcpy (m_DatabaseFileNameCachedValue,
MyAppPntr->m_DatabaseFileName.String ());
m_DatabaseFileNameTextboxPntr->SetText (m_DatabaseFileNameCachedValue);
WindowPntr->SetTitle (m_DatabaseFileNameCachedValue);
}
MyAppPntr->Unlock ();
}
void
ControlsView::Pulse ()
{
if (system_time () > m_TimeOfLastPoll + 200000)
{
PollServerForChanges ();
m_TimeOfLastPoll = system_time ();
}
}
* Implementation of the DatabaseWindow class, constructor, destructor and the
* rest of the member functions in mostly alphabetical order.
*/
DatabaseWindow::DatabaseWindow ()
: BWindow (BRect (30, 30, 620, 400),
"Haiku spam filter server",
B_DOCUMENT_WINDOW, B_ASYNCHRONOUS_CONTROLS)
{
BRect TempRect;
m_ControlsViewPntr = new ControlsView (Bounds ());
if (m_ControlsViewPntr == NULL)
goto ErrorExit;
AddChild (m_ControlsViewPntr);
TempRect = Bounds ();
TempRect.top = m_ControlsViewPntr->Frame().bottom + 1;
m_WordsViewPntr = new WordsView (TempRect);
if (m_WordsViewPntr == NULL)
goto ErrorExit;
AddChild (m_WordsViewPntr);
before the window is open so it doesn't flash onto the screen, and possibly
steal a keystroke or two. The ControlsView will further update the minimize
mode when it detects changes in the server mode. */
Minimize (g_ServerMode);
return;
ErrorExit:
DisplayErrorMessage ("Unable to initialise the window contents.");
}
void
DatabaseWindow::MessageReceived (BMessage *MessagePntr)
{
if (MessagePntr->what == B_MOUSE_WHEEL_CHANGED)
{
one which does scrolling so we don't need to worry about whether it has
focus or not. */
if (m_WordsViewPntr != NULL)
m_WordsViewPntr->MessageReceived (MessagePntr);
}
else
BWindow::MessageReceived (MessagePntr);
}
bool
DatabaseWindow::QuitRequested ()
{
be_app->PostMessage (B_QUIT_REQUESTED);
return true;
}
* Implementation of the word display view.
*/
WordsView::WordsView (BRect NewBounds)
: BView (NewBounds, "WordsView", B_FOLLOW_ALL_SIDES,
B_WILL_DRAW | B_FULL_UPDATE_ON_RESIZE | B_NAVIGABLE | B_PULSE_NEEDED),
m_ArrowLineDownPntr (NULL),
m_ArrowLineUpPntr (NULL),
m_ArrowPageDownPntr (NULL),
m_ArrowPageUpPntr (NULL),
m_LastTimeAKeyWasPressed (0)
{
font_height TempFontHeight;
GetFont (&m_TextFont);
m_TextFont.SetSize (ceilf (m_TextFont.Size() * 1.1));
m_TextFont.GetHeight (&TempFontHeight);
SetFont (&m_TextFont);
m_LineHeight = ceilf (TempFontHeight.ascent +
TempFontHeight.descent + TempFontHeight.leading);
m_AscentHeight = ceilf (TempFontHeight.ascent);
m_TextHeight = ceilf (TempFontHeight.ascent +
TempFontHeight.descent);
m_FocusedColour.red = 255;
m_FocusedColour.green = 255;
m_FocusedColour.blue = 255;
m_FocusedColour.alpha = 255;
m_UnfocusedColour.red = 245;
m_UnfocusedColour.green = 245;
m_UnfocusedColour.blue = 255;
m_UnfocusedColour.alpha = 255;
m_BackgroundColour = m_UnfocusedColour;
SetViewColor (m_BackgroundColour);
SetLowColor (m_BackgroundColour);
SetHighColor (0, 0, 0);
strcpy (m_FirstDisplayedWord, "a");
}
void
WordsView::AttachedToWindow ()
{
BPolygon DownLinePolygon (g_DownLinePoints,
sizeof (g_DownLinePoints) /
sizeof (g_DownLinePoints[0]));
BPolygon DownPagePolygon (g_DownPagePoints,
sizeof (g_DownPagePoints) /
sizeof (g_DownPagePoints[0]));
BPolygon UpLinePolygon (g_UpLinePoints,
sizeof (g_UpLinePoints) /
sizeof (g_UpLinePoints[0]));
BPolygon UpPagePolygon (g_UpPagePoints,
sizeof (g_UpPagePoints) /
sizeof (g_UpPagePoints[0]));
BPicture TempOffPicture;
BPicture TempOnPicture;
BRect TempRect;
backwards a word or a page of words buttons. They're the width of the scroll
bar area on the right, but twice as tall as usual, since there is no scroll
bar and that will make it easier to use them. First the up a line button. */
SetHighColor (0, 0, 0);
BeginPicture (&TempOffPicture);
FillPolygon (&UpLinePolygon);
SetHighColor (180, 180, 180);
StrokePolygon (&UpLinePolygon);
EndPicture ();
SetHighColor (128, 128, 128);
BeginPicture (&TempOnPicture);
FillPolygon (&UpLinePolygon);
EndPicture ();
TempRect = Bounds ();
TempRect.bottom = TempRect.top + 2 * B_H_SCROLL_BAR_HEIGHT;
TempRect.left = TempRect.right - B_V_SCROLL_BAR_WIDTH;
m_ArrowLineUpPntr = new BPictureButton (TempRect, "Up Line",
&TempOffPicture, &TempOnPicture,
new BMessage (MSG_LINE_UP), B_ONE_STATE_BUTTON,
B_FOLLOW_RIGHT | B_FOLLOW_TOP, B_WILL_DRAW | B_NAVIGABLE);
if (m_ArrowLineUpPntr == NULL) goto ErrorExit;
AddChild (m_ArrowLineUpPntr);
m_ArrowLineUpPntr->SetTarget (this);
SetHighColor (0, 0, 0);
BeginPicture (&TempOffPicture);
FillPolygon (&UpPagePolygon);
SetHighColor (180, 180, 180);
StrokePolygon (&UpPagePolygon);
EndPicture ();
SetHighColor (128, 128, 128);
BeginPicture (&TempOnPicture);
FillPolygon (&UpPagePolygon);
EndPicture ();
TempRect = Bounds ();
TempRect.top += 2 * B_H_SCROLL_BAR_HEIGHT + 1;
TempRect.bottom = TempRect.top + 2 * B_H_SCROLL_BAR_HEIGHT;
TempRect.left = TempRect.right - B_V_SCROLL_BAR_WIDTH;
m_ArrowPageUpPntr = new BPictureButton (TempRect, "Up Page",
&TempOffPicture, &TempOnPicture,
new BMessage (MSG_PAGE_UP), B_ONE_STATE_BUTTON,
B_FOLLOW_RIGHT | B_FOLLOW_TOP, B_WILL_DRAW | B_NAVIGABLE);
if (m_ArrowPageUpPntr == NULL) goto ErrorExit;
AddChild (m_ArrowPageUpPntr);
m_ArrowPageUpPntr->SetTarget (this);
SetHighColor (0, 0, 0);
BeginPicture (&TempOffPicture);
FillPolygon (&DownPagePolygon);
SetHighColor (180, 180, 180);
StrokePolygon (&DownPagePolygon);
EndPicture ();
SetHighColor (128, 128, 128);
BeginPicture (&TempOnPicture);
FillPolygon (&DownPagePolygon);
EndPicture ();
TempRect = Bounds ();
TempRect.bottom -= 3 * B_H_SCROLL_BAR_HEIGHT + 1;
TempRect.top = TempRect.bottom - 2 * B_H_SCROLL_BAR_HEIGHT;
TempRect.left = TempRect.right - B_V_SCROLL_BAR_WIDTH;
m_ArrowPageDownPntr = new BPictureButton (TempRect, "Down Page",
&TempOffPicture, &TempOnPicture,
new BMessage (MSG_PAGE_DOWN), B_ONE_STATE_BUTTON,
B_FOLLOW_RIGHT | B_FOLLOW_BOTTOM, B_WILL_DRAW | B_NAVIGABLE);
if (m_ArrowPageDownPntr == NULL) goto ErrorExit;
AddChild (m_ArrowPageDownPntr);
m_ArrowPageDownPntr->SetTarget (this);
SetHighColor (0, 0, 0);
BeginPicture (&TempOffPicture);
FillPolygon (&DownLinePolygon);
SetHighColor (180, 180, 180);
StrokePolygon (&DownLinePolygon);
EndPicture ();
SetHighColor (128, 128, 128);
BeginPicture (&TempOnPicture);
FillPolygon (&DownLinePolygon);
EndPicture ();
TempRect = Bounds ();
TempRect.bottom -= B_H_SCROLL_BAR_HEIGHT;
TempRect.top = TempRect.bottom - 2 * B_H_SCROLL_BAR_HEIGHT;
TempRect.left = TempRect.right - B_V_SCROLL_BAR_WIDTH;
m_ArrowLineDownPntr = new BPictureButton (TempRect, "Down Line",
&TempOffPicture, &TempOnPicture,
new BMessage (MSG_LINE_DOWN), B_ONE_STATE_BUTTON,
B_FOLLOW_RIGHT | B_FOLLOW_BOTTOM, B_WILL_DRAW | B_NAVIGABLE);
if (m_ArrowLineDownPntr == NULL) goto ErrorExit;
AddChild (m_ArrowLineDownPntr);
m_ArrowLineDownPntr->SetTarget (this);
return;
ErrorExit:
DisplayErrorMessage ("Problems while making view displaying the words.");
}
requires looking at the database in the BApplication, which may or may not be
available (if it isn't, don't draw, a redraw will usually be requested by the
Pulse member function when it keeps on noticing that the stuff on the display
doesn't match the database). */
void
WordsView::Draw (BRect UpdateRect)
{
float AgeDifference;
float AgeProportion;
float CenterX;
float ColumnLeftCenterX;
float ColumnMiddleCenterX;
float ColumnRightCenterX;
float CompensatedRatio;
StatisticsMap::iterator DataIter;
StatisticsMap::iterator EndIter;
rgb_color FillColour;
float GenuineProportion;
uint32 GenuineSpamSum;
float HeightPixels;
float HeightProportion;
float LeftBounds;
ABSApp *MyAppPntr;
uint32 NewestAge;
uint32 OldestAge;
float OneFifthTotalGenuine;
float OneFifthTotalSpam;
double RawProbabilityRatio;
float RightBounds;
float SpamProportion;
StatisticsPointer StatisticsPntr;
BRect TempRect;
char TempString [PATH_MAX];
float TotalGenuineMessages = 1.0;
float TotalSpamMessages = 1.0;
float Width;
float Y;
messages until we are done. Or if it is busy, the lock will fail. */
MyAppPntr = dynamic_cast<ABSApp *> (be_app);
if (MyAppPntr == NULL || MyAppPntr->LockWithTimeout (100000) != B_OK)
return;
if (MyAppPntr->m_TotalGenuineMessages > 0)
TotalGenuineMessages = MyAppPntr->m_TotalGenuineMessages;
OneFifthTotalGenuine = TotalGenuineMessages / 5;
if (MyAppPntr->m_TotalSpamMessages > 0)
TotalSpamMessages = MyAppPntr->m_TotalSpamMessages;
OneFifthTotalSpam = TotalSpamMessages / 5;
EndIter = MyAppPntr->m_WordMap.end ();
OldestAge = MyAppPntr->m_OldestAge;
NewestAge =
MyAppPntr->m_TotalGenuineMessages + MyAppPntr->m_TotalSpamMessages;
if (NewestAge == 0)
goto NormalExit;
NewestAge--;
AgeDifference = NewestAge - OldestAge;
LeftBounds = Bounds().left;
RightBounds = Bounds().right - B_V_SCROLL_BAR_WIDTH;
Width = RightBounds - LeftBounds;
FillColour.alpha = 255;
CenterX = ceilf (LeftBounds + Width * 0.5);
ColumnLeftCenterX = ceilf (LeftBounds + Width * 0.05);
ColumnMiddleCenterX = CenterX;
ColumnRightCenterX = ceilf (LeftBounds + Width * 0.95);
for (DataIter = MyAppPntr->m_WordMap.lower_bound (m_FirstDisplayedWord),
Y = Bounds().top;
DataIter != EndIter && Y < UpdateRect.bottom;
DataIter++, Y += m_LineHeight)
{
if (Y + m_LineHeight < UpdateRect.top)
continue;
genuineness of that particular word, plus the importance of the word and
the age of the word.
First calculate the compensated spam ratio (described elsewhere). It is
close to 0.0 for genuine words and close to 1.0 for pure spam. It is drawn
as a blue bar to the left of center if it is less than 0.5, and a red bar
on the right of center if it is greater than 0.5. At exactly 0.5 nothing
is drawn; the word is worthless as an indicator.
The height of the bar corresponds to the number of messages the word was
found in. Make the height proportional to the total of spam and genuine
messages for the word divided by the sum of the most extreme spam and
genuine counts in the database.
The staturation of the colour corresponds to the age of the word, with old
words being almost white rather than solid blue or red. */
StatisticsPntr = &DataIter->second;
SpamProportion = StatisticsPntr->spamCount / TotalSpamMessages;
GenuineProportion = StatisticsPntr->genuineCount / TotalGenuineMessages;
if (SpamProportion + GenuineProportion > 0.0f)
RawProbabilityRatio =
SpamProportion / (SpamProportion + GenuineProportion);
else
RawProbabilityRatio = g_RobinsonX;
data points, with a weight of 0.45 (RobinsonS). */
GenuineSpamSum =
StatisticsPntr->spamCount + StatisticsPntr->genuineCount;
CompensatedRatio =
(g_RobinsonS * g_RobinsonX + GenuineSpamSum * RawProbabilityRatio) /
(g_RobinsonS + GenuineSpamSum);
like "From", show up in all messages which made most other words just
appear as a thin line. I did a histogram plot of the sizes in my test
database, and figured that you get better coverage of 90% of the messages
if you use 1/5 of the total number as the count which gives you 100%
height. The other 10% get a full height bar, but most people wouldn't care
that they're super frequently used. */
HeightProportion = 0.5f * (StatisticsPntr->genuineCount /
OneFifthTotalGenuine + StatisticsPntr->spamCount / OneFifthTotalSpam);
if (HeightProportion > 1.0f)
HeightProportion = 1.0f;
HeightPixels = ceilf (HeightProportion * m_TextHeight);
if (AgeDifference <= 0.0f)
AgeProportion = 1.0;
else
AgeProportion = (StatisticsPntr->age - OldestAge) / AgeDifference;
TempRect.top = ceilf (Y + m_TextHeight / 2 - HeightPixels / 2);
TempRect.bottom = TempRect.top + HeightPixels;
if (CompensatedRatio < 0.5f)
{
TempRect.left = ceilf (
CenterX - 1.6f * (0.5f - CompensatedRatio) * (CenterX - LeftBounds));
TempRect.right = CenterX;
FillColour.red = 230 - (int) (AgeProportion * 230.0f);
FillColour.green = FillColour.red;
FillColour.blue = 255;
}
else
{
TempRect.left = CenterX;
TempRect.right = ceilf (
CenterX + 1.6f * (CompensatedRatio - 0.5f) * (RightBounds - CenterX));
FillColour.blue = 230 - (int) (AgeProportion * 230.0f);
FillColour.green = FillColour.blue;
FillColour.red = 255;
}
SetHighColor (FillColour);
SetDrawingMode (B_OP_COPY);
FillRect (TempRect);
genuine messages in the left 10% of the width, the word in the middle 80%,
and the number of spam messages using the word in the right 10%. */
SetHighColor (0, 0, 0);
SetDrawingMode (B_OP_OVER);
sprintf (TempString, "%" B_PRIu32, StatisticsPntr->genuineCount);
Width = m_TextFont.StringWidth (TempString);
MovePenTo (ceilf (ColumnLeftCenterX - Width / 2), Y + m_AscentHeight);
DrawString (TempString);
strcpy (TempString, DataIter->first.c_str ());
Width = m_TextFont.StringWidth (TempString);
MovePenTo (ceilf (ColumnMiddleCenterX - Width / 2), Y + m_AscentHeight);
DrawString (TempString);
sprintf (TempString, "%" B_PRIu32, StatisticsPntr->spamCount);
Width = m_TextFont.StringWidth (TempString);
MovePenTo (ceilf (ColumnRightCenterX - Width / 2), Y + m_AscentHeight);
DrawString (TempString);
}
displayed word) on the right, in the scroll bar margin, rotated 90 degrees to
fit between the page up and page down buttons. */
Width = m_TextFont.StringWidth (m_FirstDisplayedWord);
if (Width > 0)
{
TempRect = Bounds ();
TempRect.top += 4 * B_H_SCROLL_BAR_HEIGHT + 1;
TempRect.bottom -= 5 * B_H_SCROLL_BAR_HEIGHT + 1;
MovePenTo (TempRect.right - m_TextHeight + m_AscentHeight - 1,
ceilf ((TempRect.bottom + TempRect.top) / 2 + Width / 2));
m_TextFont.SetRotation (90);
SetFont (&m_TextFont, B_FONT_ROTATION);
DrawString (m_FirstDisplayedWord);
m_TextFont.SetRotation (0);
SetFont (&m_TextFont, B_FONT_ROTATION);
}
NormalExit:
have drawn. */
m_CachedTotalGenuineMessages = MyAppPntr->m_TotalGenuineMessages;
m_CachedTotalSpamMessages = MyAppPntr->m_TotalSpamMessages;
m_CachedWordCount = MyAppPntr->m_WordCount;
MyAppPntr->Unlock ();
}
the view (it's the word at or lexicographically after the word typed in). The
keys are appended to the starting word, until the user stops typing for a
while, then the next key will be the first letter of a new starting word. */
void
WordsView::KeyDown (const char *BufferPntr, int32 NumBytes)
{
int32 CharLength;
bigtime_t CurrentTime;
char TempString [40];
CurrentTime = system_time ();
if (NumBytes < (int32) sizeof (TempString))
{
memcpy (TempString, BufferPntr, NumBytes);
TempString [NumBytes] = 0;
CharLength = strlen (TempString);
if (CharLength == 1 &&
(TempString[0] == B_UP_ARROW ||
TempString[0] == B_DOWN_ARROW ||
TempString[0] == B_PAGE_UP ||
TempString[0] == B_PAGE_DOWN))
{
MoveTextUpOrDown ((TempString[0] == B_UP_ARROW) ? MSG_LINE_UP :
((TempString[0] == B_DOWN_ARROW) ? MSG_LINE_DOWN :
((TempString[0] == B_PAGE_UP) ? MSG_PAGE_UP : MSG_PAGE_DOWN)));
}
else if (CharLength > 1 ||
(CharLength == 1 && 32 <= (uint8) TempString[0]))
{
to the word and mark things for redisplay starting at the resulting word.
*/
if (CurrentTime - m_LastTimeAKeyWasPressed >= 1000000 )
strcpy (m_FirstDisplayedWord, TempString);
else if (strlen (m_FirstDisplayedWord) + CharLength <= g_MaxWordLength)
strcat (m_FirstDisplayedWord, TempString);
Invalidate ();
}
}
m_LastTimeAKeyWasPressed = CurrentTime;
BView::KeyDown (BufferPntr, NumBytes);
}
it, keystrokes will select the word to be displayed at the top of the list. */
void
WordsView::MakeFocus (bool Focused)
{
if (Focused)
m_BackgroundColour = m_FocusedColour;
else
m_BackgroundColour = m_UnfocusedColour;
SetViewColor (m_BackgroundColour);
SetLowColor (m_BackgroundColour);
can't be made transparent. */
if (m_ArrowLineDownPntr != NULL)
{
m_ArrowLineDownPntr->SetViewColor (m_BackgroundColour);
m_ArrowLineDownPntr->Invalidate ();
}
if (m_ArrowLineUpPntr != NULL)
{
m_ArrowLineUpPntr->SetViewColor (m_BackgroundColour);
m_ArrowLineUpPntr->Invalidate ();
}
if (m_ArrowPageDownPntr != NULL)
{
m_ArrowPageDownPntr->SetViewColor (m_BackgroundColour);
m_ArrowPageDownPntr->Invalidate ();
}
if (m_ArrowPageUpPntr != NULL)
{
m_ArrowPageUpPntr->SetViewColor (m_BackgroundColour);
m_ArrowPageUpPntr->Invalidate ();
}
Invalidate ();
BView::MakeFocus (Focused);
}
void
WordsView::MessageReceived (BMessage *MessagePntr)
{
int32 CountFound;
float DeltaY;
type_code TypeFound;
switch (MessagePntr->what)
{
case B_MOUSE_WHEEL_CHANGED:
if (MessagePntr->FindFloat ("be:wheel_delta_y", &DeltaY) != 0) break;
if (DeltaY < 0)
MoveTextUpOrDown (MSG_LINE_UP);
else if (DeltaY > 0)
MoveTextUpOrDown (MSG_LINE_DOWN);
break;
case MSG_LINE_DOWN:
case MSG_LINE_UP:
case MSG_PAGE_DOWN:
case MSG_PAGE_UP:
MoveTextUpOrDown (MessagePntr->what);
break;
case B_SIMPLE_DATA:
if (MessagePntr->GetInfo ("refs", &TypeFound, &CountFound) == B_OK &&
CountFound > 0 && TypeFound == B_REF_TYPE)
{
RefsDroppedHere (MessagePntr);
break;
}
dropped that the system knows about. */
default:
BView::MessageReceived (MessagePntr);
}
}
void
WordsView::MouseDown (BPoint)
{
if (!IsFocus ())
MakeFocus (true);
}
void
WordsView::MoveTextUpOrDown (uint32 MovementType)
{
StatisticsMap::iterator DataIter;
int i;
ABSApp *MyAppPntr;
int PageSize;
messages until we are done (we need to look at the word list directly). Or
if it is busy, the lock will fail. */
MyAppPntr = dynamic_cast<ABSApp *> (be_app);
if (MyAppPntr == NULL || MyAppPntr->LockWithTimeout (2000000) != B_OK)
return;
PageSize = (int) (Bounds().Height() / m_LineHeight - 1);
if (PageSize < 1)
PageSize = 1;
DataIter = MyAppPntr->m_WordMap.lower_bound (m_FirstDisplayedWord);
switch (MovementType)
{
case MSG_LINE_UP:
if (DataIter != MyAppPntr->m_WordMap.begin ())
DataIter--;
break;
case MSG_LINE_DOWN:
if (DataIter != MyAppPntr->m_WordMap.end ())
DataIter++;
break;
case MSG_PAGE_UP:
for (i = 0; i < PageSize; i++)
{
if (DataIter == MyAppPntr->m_WordMap.begin ())
break;
DataIter--;
}
break;
case MSG_PAGE_DOWN:
for (i = 0; i < PageSize; i++)
{
if (DataIter == MyAppPntr->m_WordMap.end ())
break;
DataIter++;
}
break;
}
if (DataIter != MyAppPntr->m_WordMap.end ())
strcpy (m_FirstDisplayedWord, DataIter->first.c_str ());
Invalidate ();
MyAppPntr->Unlock ();
}
changed. If the word list is different or the display has changed in some
other way, it will then try to refresh the display, repeating the attempt until
it gets successfully drawn. */
void
WordsView::Pulse ()
{
ABSApp *MyAppPntr;
MyAppPntr = dynamic_cast<ABSApp *> (be_app);
if (MyAppPntr == NULL)
return;
if (MyAppPntr->m_TotalGenuineMessages != m_CachedTotalGenuineMessages ||
MyAppPntr->m_TotalSpamMessages != m_CachedTotalSpamMessages ||
MyAppPntr->m_WordCount != m_CachedWordCount)
Invalidate ();
}
it is in the left third, add the file(s) as examples of genuine messages, right
third for spam messages and if it is in the middle third then evaluate the
file(s) for spaminess. */
void
WordsView::RefsDroppedHere (BMessage *MessagePntr)
{
float Left;
bool SpamExample = true;
float Third;
BPoint WhereDropped;
if (MessagePntr->FindPoint ("_drop_point_", &WhereDropped) != B_OK)
return;
ConvertFromScreen (&WhereDropped);
Third = Bounds().Width() / 3;
Left = Bounds().left;
if (WhereDropped.x < Left + Third)
SpamExample = false;
else if (WhereDropped.x < Left + 2 * Third)
{
EstimateRefFilesAndDisplay (MessagePntr);
return;
}
if (g_CommanderLooperPntr != NULL)
g_CommanderLooperPntr->CommandReferences (
MessagePntr, true , SpamExample ? CL_SPAM : CL_GENUINE);
}
* Finally, the main program which drives it all.
*/
int main (int argc, char**)
{
g_CommandLineMode = (argc > 1);
if (!g_CommandLineMode)
cout << PrintUsage;
g_CommanderLooperPntr = new CommanderLooper;
if (g_CommanderLooperPntr != NULL)
{
g_CommanderMessenger = new BMessenger (NULL, g_CommanderLooperPntr);
g_CommanderLooperPntr->Run ();
}
ABSApp MyApp;
if (MyApp.InitCheck () == 0)
{
MyApp.LoadSaveSettings (true );
MyApp.Run ();
}
if (g_CommanderLooperPntr != NULL)
{
g_CommanderLooperPntr->PostMessage (B_QUIT_REQUESTED);
snooze (100000);
}
cerr << "SpamDBM shutting down..." << endl;
return 0;
}