HW3 Addenda, 6.872/HST950 Spring 2003

Census Data

I have gotten a few complaints from people having trouble accessing the name frequency tables on Singapore.  The original sources for them are from the U.S. Census Bureau, at the following address: http://www.census.gov/genealogy/names/.  The commercial company Hamrick provides an interesting interface that gives geographical distributions of names in the U.S. at http://www.hamrick.com/names/.  I don't know if it is possible (or permitted) to interface to their data via a high-volume matching program that would use these distributions in a probabilistic identification scheme.

One way to access the data on Singapore would be to extend the programs you had built for HW2.  If you have a mysql installation on your PC, you can connect remotely to Singapore's MySQL server as follows:

mysql -h singapore.lcs.mit.edu -u 6872 -p cwsscrubbed

the argument "-h <hostname>" is used to make the mysql interface connect not to your local machine but to a remote host.  The "-u 6872 -p" is needed to tell it to connect as our class' user and to ask for a password (which happens right after you enter this line).  The final "cwsscrubbed" tells it to connect to this database.  The 6872 user does not have privileges to connect to all the databases on this server.  (Note that for this to work from your command line, you either have to have your DOS PATH variable set to include the ...\mysql\bin directory or you have to cd to that directory first.)

Approach to answering questions

Note that many of the questions in this homework ask you to design some method or to estimate something for which none of us is likely to have actual knowledge.  I would like you to give actual numerical estimates in order to make sure you have thought through the problem, but you need to explain how you arrived at these in order to let us see your design or deduction process, because we have no way to check the accuracy of the number itself.