Setting up a search engine for website & Majordomo mailing list archive
I'd like to add a search function for my website, and since I run a (Majordomo based) mailing list which the site is associated with I want it to search through both the website and mail archive contents.
Unfortunately I'm no programmer (my friend who runs the web server knows about configuration, scripts and so on, but I don't want to take his time unnecessarily, which is why I'm trying to research all this first), so I'm hoping for an easy to install solution which I can do myself (I have FTP and Telnet access to the server, though just to my own user-area which I doubt will be of much use anyway and still remember common UNIX commands for moving around, doing file operations etc.), or get enough information about this beforehand (hence asking here) so he can quickly and easily install what's needed.
I've heard about Google and others providing free online site-search solutions, but I assume this also means accepting advertising, tracking and other nasties.... I'd rather not.
Also I don't understand how those search engines from the outside will be able to access the mailing list archive. The Majordomo mailing list system runs on the same server and all messages are archived there as well. I believe it's in the MBOX format. Several years ago I got the server owner to create a search function on the website for the mailing list archive. It's not very fast, the results are presented in a not very attractive manner, but it works. I also got him to create a filter which replaces email addresses with "XXXXX" characters etc. in order to protect my subscribers' addresses (the whole header of each message is still shown though which means scrolling down for each message). I'd like to have that filter refined as well, probably removing the email headers except for the date, subject and sender name (possibly anonymizing that as well somehow).
So, having said all of that -what options do I have and how do I implement them?
Google does search and index sites (and mailing list archives) - you could use that functionality with a custom google search page to search your own archive.
If you want to do anything more advanced, there are systems that will search and index email boxes for yo, so you could certainly get the data into something else but this would involve a little more development work to set it up and get it operational.
I don't think I need anything more advanced than what Google custom search can offer, but would like to avoid their advertising and tracking of people visiting my site (their yearly fee solution isn't really something I want for my non-profit site).
Are there any ready made solutions which can be installed on the server running my site/mailing list -free or to be purchased (a one time fee being reasonable is something I can possibly handle)? I'm sure the tech guy running the server can install it, but I'd rather make it a quick and easy installation for him if possible than ask him to design it from scratch.
If you use a database for your page data, I might do the following:
Create Tables to store the parsed emails in
Pipe the emails using PHP and write the parsed data to the database
Use SQL to search using LIKE. Return a result set to php.
PHP outputs, or outputs and filters the data based on user criteria when the search is submitted.
For your site you could use AJAX to post the search keywords and criteria to PHP to initiate step 2 (above).
Given that you have prefaced this by saying your not really a coder, then this may be way too much.
I don't know of any ready-made solution that services all of your needs above. Your needs are unfortunately very easy, so it is unlikely - no matter how hard you look - that you will find something you can just slot in. You may have to consider paying someone to either retrofit a few ready-made solutions together, or to setup something like above.
Thanks for your suggestions.
I've awaiting more info about the current server setup from my system admin so I can figure out exactly what I need but actually think all of it is simpler than first anticipated....
I can actually access the email archive from the web already, so I don't need the search engine to do any local/internal searching as I first thought. That should make things easier; just needing to search via HTTP, right?
I don't know if this'll do it, but I found a CGI script called FDSE (Fluid Dynamics Search Engine) which has gotten very positive reviews and doesn't cost a fortune (US$ 40). It's from 2005 and is no longer supported, but then again I'm having a hard time finding installable search engines at all. Whatever I do find seems to be from the late 90s up to around 2005/2006 and no longer developed/supported.
Does this mean that everybody uses either Google's custom search (free with advertising or paid without), or for large corporations: custom made search engines costing $$$$ ? Just about every site I come across has a search function, but I'm having trouble figuring out how they do it (trying to look into the HTML/CSS source).