Tuesday, December 5. 2006PHPBB Inactive Member Removal Cron Job
A commenter to a previous article asked exactly how I delete inactive members from a PHPBB forum that I run. So I'll try to explain. This solution runs on Linux/Unix systems...I'm sure it could be done for Windows, but I'll leave the particulars to you.
It's really two separate steps. First, you need a script which will handle the deletion of inactive members. I called mine cron.php. It deletes all inactive PHPBB users who don't activate within 48 hours. It looks like this: #!/usr/bin/php -q <?php // cron job to delete inactive users older than 48 hours $db=mysql_connect('server','user','password'); mysql_select_db('your_phpbb_database_here',$db); $strSQL="DELETE phpbb_users u, phpbb_user_group ug, " . "phpbb_groups AS g FROM phpbb_users u, " . "phpbb_user_group ug, phpbb_groups g WHERE " . "u.user_active=0 AND u.user_id>0 AND " . "u.user_id=ug.user_id AND ug.group_id=g.group_id " . "AND g.group_single_user=1 AND " . "FROM_UNIXTIME(u.user_regdate)<" . "DATE_SUB(NOW(),INTERVAL 2 DAY);"; mysql_query($strSQL,$db) or die(mysql_error()); mysql_close($db); ?> You'll need to make sure the /usr/bin/php points to the location of PHP on your system, and replace the MySQL server name, user, and password with yours. Now that you have a script, you need to tell the system to run it daily. You can do this with a cron job. If you have command line access to your website, you might be able to do this with "crontab -e". But my webhost has an administrative panel that lets you set up cron jobs on the web. If you can't set up a cron job, you could put the script into a web-accessible folder and periodically call its URL, either manually or through an automated process on your local PC. This idea works great if the majority of your spam registrations don't activate their account. Usually they just want their spam links in your member list. But I'm finding that more and more spammers are activating and posting, so it remains that we want to stop spammers from registering in the first place. I'm experimenting with another method, which I'll post about when I see some results. Update 2007-11-28: I replaced my original SQL statement with the SQL in comment #1 below, which I finally tested and it seems to work well. Wednesday, September 6. 2006Splitting a large XML file on Linux 2.4
A client recently had a problem processing an XML with his PHP script. "File too large" was the error, and the data file was over 2 gigabytes in size.
It turns out you can recompile PHP to deal with large files (see Requirements section here). But this blog entry made recompiling sound problematic...you get access to the large file but certain file functions break due to integer overflows. So I wanted to avoid this option. The Linux command line tools seemed to be able to deal with it OK: stuff like less, head and tail were working on it. So I decided to try to break the XML file into parts. The split command worked but indiscriminately cut right through the middle of a data record. I considered using csplit, but as this file contained over 700,000 data records, I didn't want to deal with that many individual XML files. I decided to write a Perl script to split the file into blocks of 100,000 records each. It didn't take long to put together and Perl's regular expression matching made handling the records easy on the small test data. For some reason I thought Perl would be OK with the large data file, but when I went to run the script, it too choked on the large file. I would have to recompile Perl to get around it. As the client's box is using a Red Hat package for Perl, I didn't want to mess with it. Then I had an idea. Since the Linux command line tools were handling the file OK, I wondered if I could trick my script by feeding it one line at a time. Instead of looping an open file in the Perl script, I used a loop like this: while( $line = <STDIN> ) { # do stuff } Then I called the script like this: cat bigfile.xml | ./split.pl And it worked! Sunday, August 27. 2006Drupal: An image module that uses filemanager module
I'm working on a personal project using Drupal. The image module allows you to let users post images to the site. However, it puts all of its files into a single directory, which is a performance concern if you want to have a lot of images on your site. The filemanager module has support for lots of files, but the image module was not written to use it.
So I decided to try my hand at Drupal coding. I started with image.module and overhauled it to work with filemanager.module. Along the way I found that Drupal 4.7 doesn't offer a good way to maintain certain data through the form preview, so I filed a bug. The workaround I came up with, and also used by a prominent Drupal coder, was to use the $_SESSION variable to maintain that data. So anyway, here is the new image.module code. Please feel free to try it and let me know what I should fix. Some notes: the current version 0.1, requires you to use only the original, preview, and thumbnail sizes, and also requires a database table called image_fm. I also could not come up with an elegant way to handle the case of a single user posting multiple images at the same time, sicne the $_SESSION variable used only allows for one per user at a time. This may be easy to fix, but I don't fully understand the Drupal form API yet. Thursday, August 17. 2006PHPBB Fake Members
It's becoming frustrating to be a PHPBB administrator, at least if you want to keep your memberlist clean. Form bots out there create fake users on your site in the hopes that your memberlist will show their spam URL. It's been an ongoing, and losing battle, to keep them out.
Update 2006-08-22: The fake users keep coming. So I came up with a cron job that runs this query once per day. It will remove inactive PHPBB users older than 48 hours. This gives time for the new users to properly activate. DELETE FROM phpbb_users WHERE user_active=0 AND user_id>0 AND FROM_UNIXTIME(user_regdate)<DATE_SUB(NOW(),INTERVAL 2 DAY); The user_id>0 part is to avoid deleting the Anonymous user, which has a user ID of -1 on my installation. Thursday, July 27. 2006Blogspam
My blog has been getting a mountain of trackback spam attempts lately. So far this week there's been over 4,500 POSTs to my blog trackback links (wasting over a half meg of bandwidth!). I turned off trackbacks last fall, but it doesn't stop the hordes of zombies and open proxies from trying. I'm still seriously considering writing my own blog software to deter spam traffic, simply by virtue of the forms and links being unique and unfamiliar to the automated spamming software out there. It would be interesting to see if the spam traffic would drop or they would just keep pounding the site. If only I had the time!
On a positive note, I managed to get one open proxy closed. It was running inadvertently on a server of a municipal government here in the U.S. A message to their webmaster got the ball rolling. The other 99% of the time I bother to notify a domain owner of an open proxy or infected system, my message is ignored. Wednesday, April 12. 2006Integrating other sites with PHPBB 2.0.20
In a previous entry, I detailed how I used some code from PHPBB to integrate its session management with my existing website. The idea is to include just enough PHPBB stuff to get PHPBB sessions working, and nothing else. Due to some session code changes introduced in the PHPBB update to version 2.0.20, I had to change the code some. Here is how it looks now:
define('IN_PHPBB', true);
$phpbb_root_path = '/somepath/'; include($phpbb_root_path . 'extension.inc'); include($phpbb_root_path . 'config.'.$phpEx); $ip_sep = explode('.',$_SERVER['REMOTE_ADDR']); $user_ip=sprintf('%02x%02x%02x%02x', $ip_sep[0], $ip_sep[1], $ip_sep[2], $ip_sep[3]); include($phpbb_root_path . 'includes/constants.'.$phpEx); include($phpbb_root_path . 'includes/sessions.'.$phpEx); include($phpbb_root_path . 'includes/db.'.$phpEx); $strSQL = "SELECT config_name, config_value FROM " . CONFIG_TABLE . " WHERE config_name IN ('cookie_name', " . "'cookie_path', 'cookie_domain', 'cookie_secure', " . "'rand_seed', 'session_length');"; if( !($result = $db->sql_query($strSQL)) ) { die('Could not query config information'); } while ( $row = $db->sql_fetchrow($result) ) { $board_config[$row['config_name']] = $row['config_value']; } $userdata = array(); $userdata = session_pagestart($user_ip, PAGE_INDEX); In addition, I had to copy the dss_rand() function out of PHPBB's includes/functions.php file into my startup-script. I think that's preferable to including the whole block of functions, but that's another option. You have also have to modify the message_die() function inside dss_rand() because I'm not including that function. I just used PHP's die() function and only included the text of the error, not the PHPBB specific parameters. Update 2006-07-17: This code is OK for PHPBB 2.0.21 also. Tuesday, April 4. 2006Figuring the Start of the Week with PHP
I have a time keeping utility written in PHP and MySQL. To make a query of the hours recorded so far in a given week, I needed to determine the start date of the week, in the MySQL format of 'YYYY-MM-DD HH:MM:SS'. I had been using this code:
date('Y-m-d H:i:s', mktime(0, 0, 0, date('m'), date('d')-date('w'), date('Y'))); It was successfully giving me midnight on Sunday for my code. Until the change to daylight savings time reared its ugly head (why can't we stay on DST all year?). This week that piece of code gave me '2006-04-01 23:00:00', so the rest of my code decided to include some time from Saturday. I worked through a couple variants, but settled on this replacement: date('Y-m-d', mktime(1, 0, 0, date('m'), date('d')-date('w'), date('Y'))) . ' 00:00:00'; This code adds an hour to the time computation. So the errant Saturday at 11PM gets returned to Sunday at midnight. Other weeks will show 1AM, and the switch back to standard time might show 2AM. Since I always want midnight, the simplest thing seemed to be to drop the time off the date function output entirely and just set it to midnight in the string. Monday, March 20. 2006Inbox Spam Update
In my last entry on spam, I mentioned I would move to using POPFile as my mail junk filter. I turned off the ineffective Bayes filtering of SpamAssassin and just left it to toss blatant spam using its other rules. I also disabled Thunderbird's junk filter.
POPFile has done well. So far it has classified nearly 14,000 messages, 85% of which were spam. So that means my local system had to download nearly 12,000 spams before the POPFile classifier could junk them. It would be great to have a system like this on the server to prevent that wasted message downloading. POPFile has been a bit too aggressive in classifying messages as spam. I still have to browse the junk folder now and then to make sure a legitimate message isn't there. On the other hand, very spams actually show up in the inbox. I'm still contemplating going to using a whitelist and blocking everything else, but I'll save that battle for another day. Wednesday, January 18. 2006More Inbox Spam Woes
Shortly after I wrote my complaint about Thunderbird's junk filter last October, I decided to try to train SpamAssassin's Bayes filter. Using procmail, PHP, and MySQL, I built a crude interface to keep copies of my incoming email and let me classify them as OK or spam. After training SpamAssassin with over 7000 messages, I have to say I'm really disappointed. SpamAssassin still regularly misses spam, and Thunderbird frequently misses the spam that SpamAssassin misses.
(I added a similar (but prettier) interface to a web-mailbox that my client uses, and unfortunately the SpamAssassin Bayes feature has been ineffective for them too.) POPFile is the only reliable blocker I've used, but it runs on the client and therefore you have to download all your mail, spam included, before POPFile processes it. I think the method I will try next is to turn off SpamAssassin's Bayes filter and just let it find the really obvious spam, then pass the rest onto the client for POPFile to sort out. Might as well turn off Thunderbird's junk filter at the same time. In the meantime I keep pondering allowing only whitelisted addresses send me mail. Everyone else could use my web contact form. I'd just have to be really careful that I whitelist all the websites I use. Thursday, January 5. 2006Banker's Rounding for PHP
Via Slashdot I found this detailed article regarding various rounding methods. There are a lot more ways to think about it besides just rounding up to the next highest digit on the fives.
The article reminded me of a problem I worked on last fall. PHP's standard rounding function round always rounds the fives up. This was causing an upward creep in my calculations for hours worked in time clock data. I needed a way to minimize that creep. I settled on a different method of rounding known as "banker's rounding." This method alternates the rounding of fives based on the even-/odd-ness of the preceding digit. So for example, a 3.5 rounds up to 4 and a 4.5 rounds down to 4. I have created a PHP function to do this, and the source code is here: GPL version; BSD version. You give the bround function two parameters: first the value to round, and second how many decimal places to keep. So bround(3.55,0) produces 4 and bround(3.55,1) produces 3.6. I hope it's helpful to someone and please don't hesitate to report bugs or a faster way to do this. (Normally I'm not a fan of using the Ternary Operator but in this case it keeps the function compact and is fairly straightforward.) Update 2007-04-23: Someone going by the name of "Hitlers Pet Gerbil" replied to my method stating it was "slightly incorrect." I posted a reply to the comment on the PHP site, but for some reason it was deleted. Today I stumbled upon a copy of my response, which was written on October 6 of last year: In reply to Mr. Pet Gerbil, I think you're wrong when you state "Your calculations are slightly incorrect." My calculations do take into account when the thousandth's digit is a 5. I ran your function and mine side-by-side and got the same results. Update 2007-10-01: Added BSD-style licensed version (see above).
« previous page
(Page 3 of 8, totaling 75 entries)
» next page
|
CategoriesQuicksearchArchivesSyndicate This Blog |