Saturday, December 31, 2005

Inventory Tracking Refrigerator


Tracking inventory of groceries in the house and making weekly shopping lists is not a suitable task for a human being. Sure, one can do it, but it's repetitive, boring and inefficient. I want my fridge to do it for me.




If my fridge had a barcode scanner and a touchscreen, I could scan items as I use them or restock them. The fridge could keep track of what I have, what I use, how quickly I use certain items, etc.




When it comes time to make my grocery list, the fridge should be able to tell me what I've run out of by generating a shopping list. I could then download that shopping list into SplashShopper on my Palm Pilot, add any atypical items and go shopping.

Tuesday, December 20, 2005

Guards in Perl


As is trendy for Perl hackers these days, I've been learning the functional programming language Haskell. When learning Standard ML in college it impressed me with its brevity and new view on programming. However, the course completed with nary a practical application and I left thinking that it was just a research language. A few years later, I tried my hand at Scheme while attempting to customize some reports in GnoTime. I quit after failing to get the templates generating the content I wanted. Now that I've seen some useful applications written in Haskell, I decided to give functional programming another shot.




I'm using Simon Thompson's Haskell: The Craft of Functional Programming as my text. I'm only through the first three chapters, but so far it has clearly explained the concepts and provided practical examples and exercises.




As I worked through some of the exercises with guards, I realized that I had been doing this same thing in Perl for a long time. Haskell's syntax is cleaner with less repetition, but the concept is the same. It's quite obvious, but here are three sample functions which demonstrate guards in Perl and a sample in Haskell for comparison. The function compares two integers and returns -1 if the first is less than the second, 1 if the first is greater than the second and 0 otherwise. Ignore the fact that Perl's <=> operator does this job already.



Haskell



compare :: Int -> Int -> Int
compare a b
| a > b = 1
| a == b = 0
| otherwise = -1


Perl 5



sub compare {
my ($a, $b) = @_;
return 1 if $a > $b;
return 0 if $a == $b;
return -1;
}


Perl 6



sub compare(Int $a, Int $b) returns Int {
return 1 if $a > $b;
return 0 if $a == $b;
return -1;
}

Friday, December 09, 2005

PDA stylus sensor


I've had a Palm Tungsten T for about two and a half years. I generally adapt myself to the quirks of particular hardware, but the steps to open the Tungsten, getting it ready for input, still annoy me.




  1. Remove the screen cover

  2. Slide the button pad downwards (or push the power button)

  3. Depress the stylus so that it will pop out

  4. Extract the stylus




Sometimes I can get the spring in the stylus to pop the stylus out of the slot. That eliminates one step, but it's still too many. When I can't get the stylus to jump for me, the extra extraction step becomes that much more annoying. I originally liked the slide-out button pad because it made the device more compact. Now I would trade the compact size for one less step when opening the device. I think that the proper solution is to add sensors to the Palm.




One approach is to put a sensor in the stylus well so that the Palm can determine when I have removed the stylus. That reduces the steps to




  1. Remove screen cover

  2. Remove stylus




Another approach is to attach a sensor to the screen cover. That reduces the open operation to the same steps as with a stylus sensor. Or, if I'm doing some quick finger tapping, there's only one step.




For my next PDA purchase, I'm going to put more weight on the steps to open factor. As an aside, I think PDAs should have many more built in sensors. Some examples:




  • thermometer

  • barometer

  • accelerometer

  • GPS




Use cases for those sensors will have to wait for another blog post.

Wednesday, November 23, 2005

IPSEC connection between Mac OS X and SonicWALL TZ 170


I got a snazzy SonicWALL TZ 170 firewall from work. It works much better than my previous Linksys firewall/router (which regularly overheated during the summer months). The SonicWALL also supports IPsec VPN. Since I'm travelling for the Thanksgiving holiday, I decided to get the VPN working with my iBook.




Although VPN Tracker has a slick interface with good defaults for numerous firewalls, $90 is too steep for me. I tried VaporSec, but I got the error message Can't get file of folder "::private:tmp" of startup disk. (-1728) and was unable to resolve it.




To get the VPN working without any extra programs, the first step was to change the VPN settings on the SonicWALL, so I:




  • Chose the VPN tab from the left menu

  • Edited the configuration for GroupVPN

  • Under the General tab


    • Chose IKE using Preshared Secret

    • Entered a shared secret


  • Under the Proposals tab for Phase 1


    • Chose DH Group 2

    • Chose AES-128 as the encryption algorithm

    • Chose SHA1 as the authentication method

    • Set the Life Time to 600 seconds.


  • Under the Proposals tab for Phase 2


    • Chose ESP as the protocol

    • Chose AES-128 as the encryption method

    • Chose SHA1 for the authentication method

    • Enable Perfect Forward Secrecy

    • Chose Group 1 for the DH Group

    • Set the Life Time to 28800 seconds


  • On the Advanced tab


    • Disabled XAUTH authentication

    • Left everything else with the defaults


  • Left all the defaults on the Client tab

  • Clicked OK and Enable the GroupVPN




Mac OS X comes with support for IPsec using the KAME tools racoon and setkey. The man pages for "racoon.conf" and "racoon" along with this sample configuration file helped me come up with the following configuration files:




Configuration for home.conf (for racoon).




path pre_shared_key "/etc/racoon/psk.txt";

remote anonymous {
lifetime time 24 hour;
exchange_mode main, aggressive, base;

proposal {
encryption_algorithm aes 128;
hash_algorithm sha1;
authentication_method pre_shared_key;
dh_group 2;
}
}

sainfo anonymous {
lifetime time 12 hour;
pfs_group 1;
encryption_algorithm aes 128;
authentication_algorithm hmac_sha1;
compression_algorithm deflate;
}



The file "/etc/racoon/psk.txt" contains a single line like this




69.144.112.123 this is my secret



For home.spd (configuration for setkey). In this configuration, 192.168.2.5 is the IP address of the laptop, 192.168.40.0/24 is my home network and 69.144.112.123 represents the external IP address of my SonicWALL TZ 170.




spdadd 192.168.2.0/24 192.168.40.0/24 any -P out ipsec esp/tunnel/192.168.2.5-69.144.112.123/require;
spdadd 192.168.40.0/24 192.168.2.0/24 any -P in ipsec esp/tunnel/69.144.112.123-192.168.2.5/require;



This next part is the portion that I didn't find documented anywhere. Once I have the configuration files written, how do I make it "go" The following worked for me (when acting as root).




setkey -f home.spd
racoon -F -f home.conf



And then, in another Terminal.app window ping 192.168.40.5. By specifying -F racoon will stay in the foreground which lets me easily stop the VPN when I'm done. The ping actually makes the VPN connection. You don't really need it since any other attempt to access the remote network causes the same effect.

Friday, November 18, 2005

Notification when mutt Receives New Mail


After several months of using KDE's kmail to access my IMAP account at work, I gave up and decided instead to use the wonderful mail client mutt. I wanted to switch because using 29M of RAM for a mail program seems ridiculous to me, especially when mutt does just fine in under 5M. Additionally, I could never configure kmail's key bindings to be what I like (surely my failure there).




So I created a new .muttrc file to hold all the info about my IMAP mailboxes at work. A simple mutt -F work.muttrc starts up an instance to access my work accounts (I found this more suitable than using account-hook configurations). mutt works like a charm and handles S/MIME attachments even better than kmail does.




But, I couldn't figure out a way for mutt to execute an arbitrary shell script when it finds that I have new mail in my box. It turns out that despite all of mutt's useful hooks and configuration, it is not able to do what I want. From the ridiculous and impolite comments on this thread it seems the behavior is a design decision not an oversight.




I wasn't about to go back to using kmail after having mutt set up so nicely. So, here's a patch that adds the functionality to mutt. The patch is against mutt version 1.5.10i and adds the configuration variable new_mail_notify. Set the variable to the path of a shell script and it will be run as soon as mutt notices that you have new mail in the current folder. The patch is a bit rough, but it's worked for me for about a week now.

Saturday, November 12, 2005

Apple simplicity


As I've experimented with my new iBook, I've come to the conclusion that Apple's design philosophy stems heavily from an emphasis on simplicity. A good example is the safe sleep feature found on the newer PowerBooks. Instead of offering separate hibernate and sleep modes like Windows and Linux do, Apple offers one mode which transforms into the other mode as necessary. Essentially, I don't want my computer to "sleep" or to "hibernate" I want it to stay where I was until I come back for it. If that's 5 minutes or 5 days, the computer should still do what I want. I shouldn't have to decide in advance how long I'll be gone.




Another example of the simplicity is the single button mouse. I disagree with this simplication since I efficiently use all three mouse buttons on my Linux machines (when I can't do it from the keyboard that is). Nevertheless, it seems that Apple determined that it was possible to get by with only one mouse button so they did (at least, until recently).




The layout of application screens is a third demonstration of simplicity in OS X. This article about KDE learning from OS X makes some interesting suggestions, particularly, points two and three about simple toolbars. There is no need for a program to display 30 icons for features that one rarely uses (talk about iconic sprawl). True, one can customize the toolbars in most operating systems, but one shouldn't have to reclaim real estate from the toolbar monster.




Apple reminds me of our new house shortly after we moved in. The previous owners had filled the house with all manner of clutter. When they moved, we removed all the clutter they left behind. We put infrequently-used items in closets. Anything we use less than monthly, we moved to the storage shed. Anything we use less than yearly, we donated or threw away. When the previous owners came by to visit, they commented on how much we had done with the place. Of course, we hadn't changed anything but throw out the superfluous junk. Apple seems to have applied the same principle to software design.




All this reminds me of the design philosophy behind 37 Signals: "Our products do less than the competition — intentionally." Or perhaps the philosophy of Unix utilities: "do few things and do them well." Overall, it's been a reminder to me to balance simplicity and functionality in my software design. At some level, all good solutions are simple and naturally apply themselves beyond the original problem domain.

Color in mutt from DarwinPorts is broken


Last night, I installed mutt-devel 1.5.11 from DarwinPorts. It worked wonderfully, but the colors didn't work. All I saw was black and white. I tried everything I could think of (yes, .muttrc had "color" directives). I tried to troubleshoot the problem through the archives of various mailing lists, but they offered no hints. I checked mutt -v to make sure that color was enabled. Sure enough, +HAVE_COLOR +HAVE_START_COLOR were in the list. Still, mutt color was broken.




This was my first attempt to get anything from DarwinPorts to work on Mac OS X, so I was a bit oblivious. As I further diagnosed the problem, I got annoyed that OS X vim also showed no colors. On top of that, ls -G showed no colors. Hmm, I see a trend.




Here's the history behind the problem. Last night as I was getting mutt to work on my shiny new iBook, I noticed that colors disappeared when I ssh'd into my desktop or home server. Both of those machines run Debian testing ("etch"). In my .bashrc on those two machines, I examine $TERM before setting the colors. I only accepted "rxvt" and "xterm" but OS X Terminal sets TERM to "xterm-color". I said to myself, "Self, just tell Terminal to identify itself as 'xterm' and colors will then work on your Debian machines." I listened to myself, made the change and the colors worked. What I didn't notice at that late hour was that they also broke everywhere else.




So, to get mutt and ls and vim to show colors again, I just told Terminal to identify itself as 'xterm-color' again and everything worked. To get my Debian machines working again, I simply added 'xterm-color' to the list of acceptable terminal types.




The moral of the story is: change settings in the system you know best.

Thursday, November 10, 2005

I hate software ... Yeah! for software


Yesterday, I noticed that debian testing ("etch") had upgraded KDE to version 3.4. Naturally, I apt-get update; apt-get upgrade and proceeded to get all the shiny new KDE toys along with a bunch of other updates. After restarting KDE, my sound is broken. I spent too much time this morning trying to get it working and it doesn't. Certainly, this is a result of my own ineptitude, but still, software is dumb.




A few minutes later, I was fooling with DCOP so that I could get the konversation IRC client to jump out of my system tray and show itself when I do Ctl+Alt+K. dcop konversation main_window show did the trick. A few minutes later, I had created a simple keyboard shortcut with KHotKeys which runs that cute little DCOP command to do exactly what I want (KHotKeys >> New Action >> Keyboard Shortcut - DCOP Call and then set the appropriate "DCOP Call Settings". Software is cool.




I get my new iBook this afternoon. Hopefully software will still be cool then.

Wednesday, October 26, 2005

Broken konversation audio notifications


For quite some time, I have been trying to get the konversation IRC client to play sounds when certain events happen in IRC. KDE has a slick, unified notifications system but for some reason, I couldn't ever get the sound notifications to work.




Well, I finally found the answer. This follow-up to a bug report about failing KDE audio notifications provides the solution. In brief,




  1. rm ~/.kde/share/config/knotifyrc

  2. killall knotify




That's it. When you try your next audio notification through knotify, everything should work.

Sunday, October 23, 2005

Name Equality Classes


It's late and I'm not thinking clearly, but this notion has been running through my head all day. This is related to my earlier thoughts about matching people. Let's look at the name "Michael", its nicknames and their Metaphone encodings (I switched from NYSIIS to Metaphone since my last post, mostly because Metaphone gives shorter encodings and my testing indicated that they perform about the same).




  • Michael - MXL

  • Mike - MK

  • Mick - MK

  • Mickey - MK




We can see that a person named "Michael" could plausibly have two Metaphone encodings: MXL and MK. In some sense, names with one of those two encodings are equivalent. If I was searching a census for records about "Michael" I should consider a name encoded MK as a possible match (even if it's the written name "Mack").




This is the part that my mind won't wrap around at this late hour. How can I involve these equivalence classes in the calculation of probabilities? Somehow any name with a code in the equivalence class needs to be incorporated in the calculation, but I'm not sure exactly how.




As my wife and I discussed this idea, I described to her some of my notions for how genealogy software and genealogy databases (like Ancestry) should interact. I imagined that as I search Ancestry, I find a record that I think might be a match. I drag that entry from Ancestry's webpage and drop it on the person in my genealogy program that I think matches. After I drop the entry, a menu displays asking me whether I want to merge the entry's information with the entry in my file (complete with source referencing, of course) or whether I want to compare my entry with the Ancestry entry. I choose "Comparison" and my genealogy program fetches the relevant record from Ancestry, compares it with all the data I have on file and returns a short report. The report indicates the probability that this is a match followed by a description of how it arrived at that number. I may then uncheck any factors in the calculation that seem irrelevant or incorrect and instantly see the newly calculated probabilty. Which reminds me that for the good of researchers the world over, Family Search and Ancestry should drink deeply of the web services Kool-Aid ©.




One of the benefits I see from being able to objectively calculate the probability that a given record matches the person I seek, is that it allows a researcher to account for numerous factors outside his knowledge. For example, if I'm looking for a person named "John", it matters greatly whether I'm looking for a John born in 1880 (8.2% of male births) or one born in 2004 (0.78% of male births).




Of course, a tool like this is bound to cause trouble when people think that the computer can do all their genealogy. All they have to do is drop a bunch of links on the genealogy program and merge the high-scoring entries. Regardless, I think the technique could offer valuable help to honest family historians.




Categories:
,

Thursday, October 20, 2005

Identifying People

Negative Example



Background




I've been developing a program to interface to a government-designed database (I know, that's my first problem). The data specification is poorly designed in several respects, but my current focus is the method of "unique identifiers" for persons entered into the database.




When this database was originally designed, the design team decided to use the combination of the person's first name, last name, gender and birthdate as the "unique identifier." To make things worse, they used only the first letter of the first name and the last letter of the last name. The system went live with this scheme. Of course, the "unique identifier" wouldn't be unique for twins John and Jacob Smith (JH being the name part of the identifier with genders and birthdates identical). Several months later, realizing the design failure, the data committee patched the system by adding the person's first service date to the "unique identifier."




This year, the data commitee decided to redesign the data system. Some of the changes were positive, but they apparently didn't see the silliness in trying to patch the existing identifier scheme. The latest specification creates a unique identifier based on the following information about the person.




  • first letter of the first name

  • last letter of the last name

  • gender

  • birthdate

  • mother's maiden name

  • birth city

  • birth state

  • birth country




The motivation for using all these fields as the "unique identifier" is that an individual may enter the database multiple times through independent sources. The committee wants to combine the information from these two sources so that all the information about the person is available at once.



Problems



Here are a few of the problems with this "unique identifier" system.


  • The birthday paradox promises us that this method of generating a "unique identifier" will fail. We cannot create a truly unique identifier from data which is non-unique. The best we can do is attempt to reduce the probability of a collision. However, as the number of persons in the database increases, this becomes increasingly difficult.


  • Neither the first letter of the first name nor the last letter of the last name will always be the same for a particular individual. For example, when spoken, the names "Aaron" and "Erin" are easily confused resulting in different first letters (this problem could be partly remedied with a good phonetic coding system such as NYSIIS). Likewise, people often use middle names or nicknames at different times in their lives which could cause different first letters (and phonetic codes).


  • As designed, this system requires every piece of information. If anything is missing, the person cannot be entered into the system. Anyone accustomed to working with data about people knows that information can always be missing.


  • The lengthy identifier is prone to data entry errors causing false duplicates.




What Should Have Happened




So what should this design committee have done? Their system is bound to fail and the "unique identifier" is bound to grow inexorably longer and longer as the database matures. In my opinion, they should have separated the problem of unique identifiers and the problem of matching duplicate individuals in the database.




Creating truly unique identifiers using a centralized system is an easy problem. We have examples such as IP addresses, domain names, MAC addresses, Social Security Numbers, credit card numbers, bank account numbers, etc. The design committee should have assigned each person in the system a unique identifier when an individual was added into the database. They could have also distributed a bundle of identifiers to each data entry location so that a connection to the central database is not always necessary during initial data entry.


Under this proposal, the data entry locations would not have to bother with the harder problem of matching duplicate individuals. Furthermore, a useful system such as the Luhn algorithm (Perl implementation) provides users of the unique identifiers with assurance that an identifier is acceptable. This can catch data entry errors before they cause trouble.



Matching People




Using only demographic characteristics, it can be quite difficult to determine whether two candidates are actually the same person. Anyone who has done genealogy research knows exactly how difficult and time consuming the problem can be. Essentially, we can never be certain that the two candidates are the same person, we can only attempt to increase our certainty that they are so. If you doubt that locating duplicate individuals is difficult, peruse a copy of Ancestry magazine for examples.



My Ideal




The difficulty of this task brings me to the real point of this article. Several times, I've thought it would be useful to have a Perl module which could look at the data for two individuals and provide a reliable estimate of the probability that they are the same person. The estimate would be based on statistical results and probability tables for various demographics.




I envision code something like this




my $a = Person->new(
given_name => 'John',
surname => 'Smith',
gender => 'male',
birth => '1802',
birth_place => 'Lexington, Kentucky',
source => 'handwritten',
);
my $b = Person->new(
given_name => 'John',
surname => 'Smyth',
gender => 'male',
death => '1864',
death_place => 'Kentucky',
source => 'handwritten | typed',
);

printf "match probability %.1f\n", match( $a, $b );



The above interface is unimportant, the important part is that you give it the information you know and the code does all the hard work calculating the probability that the two persons are the same individual.




Here's my rough idea of how the insides of match() would work.





  1. Notice the years 1802 and 1864 and only use data from approximately that time range.


  2. Notice the birth and death locations of Kentucky and only use data from that state or that region of the country.


  3. Calculate the probability that a male in Kentucky during the 1800s would have the first name John


  4. Calculate the probability that a handwritten "Smith" is the same as a handwritten-then-typed "Smyth".


  5. Calculate the probability that a male born in 1802 would die in 1864.


  6. Combine the forgoing probabilities into a single probability.



Hard Part




The hard part of implementing something like this would be acquiring the numerous statistical tables the algorithm requires. The simple case above requires at least the following tables





  • Distribution of male names in Kentucky between 1800–1870


  • Distribution of male life expectancies in Kentucky between 1800–1870


  • Analysis of errors that occur during the manual and keyed transcription of names.




I think the compilation of these tables is feasible, simply tedious and monumental. As more and more genealogical data is being placed in computer systems around the world, compiling these tables becomes easier. For example, the United States Social Security Administration has data on names back to 1880.




A tool such as I describe here would be enormously valuable for family history researchers. However it should also give you an idea why the design committee at the beginning of this article was foolish in trying to reduce such a difficult task into a simple database identifier.

Contextual::Return confusion


I had a short snippet of code using Contextual::Return something like this




use strict;
use warnings;

my $a = foo();
print "$a\n";

sub foo {
return
NUM { return 1234 }
STR { return "foo" }
;
}



but when I ran it, the output showed 1234 instead of foo. No matter what I did, $a was always the result of the first context I specified.




But, ah ha, I didn't actually use Contextual::Return. Adding the appropriate use Contextual::Return; line to the top of my snippet produced the correct behavior.




You would laugh if you knew how much time I spent debugging that confusion, but shouldn't there be a warning message or something since NUM and STR aren't defined? It seems so, but there wasn't.

Wednesday, October 19, 2005

blogs

Hmm, they have blogs on computers now.