Farm Guide
From Pmgwiki
The farm machines are a set of 14 rack mounted servers in the CSAIL machine room, available for experiments. Warning: The data on these machines is not backed up. Use at your own risk. These machines are shared: please be courteous and use the reservation system. Use top or uptime to see if a machine is being used by someone else for something that may be computationally expensive. If so, please use a different machine.
Contents |
Mailing List
We use farmers@csail.mit.edu to communicate about the farm machines. Please get yourself added to this mailing list. See How to Add Someone to Mailing List for directions.
Disk Space
Each of the farm machines has the following configuration:
- Root partition: /dev/sda1 using ext3, occupying the entire disk (minus swap)
- Swap: /dev/sda2 (supposed to be 2x RAM = ~4 GB)
- /space: /dev/sdb1 using ext3, occupying the entire disk
In other words, if you need more disk space, put stuff in /space
Installing Software
Please don't run make install as root to install software system-wide. Please use Ubuntu packages, since that will permit the software to be painlessly installed across all the machines, and to be removed/upgraded. If you need something specific for your research, please consider installing it into your home directory, to avoid cluttering up the systems.
To install software:
- Find the package name in the Ubuntu package archive
- Install the package on all the machines as root: for i in `seq 1 14`; do ssh farm$i apt-get install [package-name]
- Document the package that you wanted in the "important packages" section below, so we don't forget in the future.
Using the Google Profiler
Google's documentation has more details, but these are the commands that work on the farm machines (and Ubuntu in general).
- Run the program with profiling enabled:
CPUPROFILE=out.prof LD_PRELOAD=/usr/lib/libprofiler.so.0 command
- Draw the profile as a pretty graphviz diagram (to a PS file):
google-pprof --ps <binary path> out.prof > out.ps
- View the
out.ps
postscript file. The big boxes are where the program spends most of its time. Good luck.
Note: Mac OS X's Preview program doesn't like the resulting postscript. Fix it by using: google-pprof --dot <binary path> out.prof | dot -Tps2 > out.ps
.
Using AFS
The farm machines are configured to use AFS. To access your CSAIL AFS files:
- Get a Kerberos ticket: kinit [username]. Type your CSAIL password.
- Get a Kerberos ticket for AFS: aklog.
- Access your files: ls /afs/csail.mit.edu/u/[first letter of user name]/[user name] Example: ls /afs/csail.mit.edu/u/e/evanj
- You can also list your tickets: klist; tokens
To access your Athena AFS account, first follow the above to get a Kerberos ticket for your CSAIL account. The following are adapted from the CSAIL cross-cell HOWTO:
- Create a cross-cell entry: aklog -cell athena.mit.edu. You will get a message like: created cross-cell entry for [username]@csail.mit.edu (Id 16383603) at athena.mit.edu
- Log in to an Athena machine, give your CSAIL account access to all the files in your home directory: cd; find . -name .snapshot -prune -o -type d -exec fs sa {} [username]@athena.mit.edu all \;
Reserving Machines
Please record reservations on the Farm Reservation page, to avoid conflicts with others. If you really need exclusive access to some of the machines for some reason (such as running careful performance tests), you can deny everyone else login access:
- Add the following line to the bottom of /etc/ssh/sshd_config: AllowUsers root [your user]
- Reload sshd's configuration: /etc/init.d/sshd reload
When you are done, be sure to remove your change to the config file and reload sshd again.
Important Packages
- am-utils: used for amd to automount NFS. At some point we might want to migrate to autofs, the in-kernel implementation.
- g++ gdb make gcc-4.2 valgrind subversion git-core mercurial manpages-dev manpages-posix-dev automake: Development tools, including GCC 4.2 for compiling older C code
- csh tcsh emacs: shells and editors that people use
- sun-java6-jdk ant ant-optional: Java development environment
- ntp: for time synchronization
- libxp6: needed for matlab
- python-psyco: much faster Python execution for long running programs
- libgoogle-perftools0: Includes Google's profiler as google-pprof
- vim-nox: fully functional version of vim, for those who use it to edit/write code. Can remove vim-tiny after installing this.
Hardware Details
farm1-4
- Dell PowerEdge 650
- 1x Intel(R) Pentium(R) 4 CPU 3.06GHz (with hyperthreading: two virtual CPUs)
- 2 GB RAM
- 2x 120 GB disks
- 2x Intel e1000 gigabit Ethernet (eth0 connected)
- BIOS: Revision A05 (except farm1, which is using A04)
farm5-15
- Dell PowerEdge SC1420
- 2x Intel(R) Xeon(TM) CPU 3.20GHz (with Hyperthreading: 4 virtual CPUs)
- 2 GB RAM
- 2x 160 GB disks
- 2x Intel e1000 gigabit Ethernet (eth0 connected)
- BIOS: Revision A03
Special Configurations
- farm2-4 have a RAID controller card in them. farm1 has this card removed. Besides that, their hardware is identical.
- farm2 has backups on it, on its second disk (/dev/sdb). These backups are mounted in /archive. As such, it does not have a second disk mounted on /space like the others. It still has the /space directory, to maintain compatability with the matlab symlinks.
- farm5 and farm6 have backups located in /space/archive.
Upgrading Ubuntu
In Dec '09, I (DRKP) upgraded the farm machines from Ubuntu intrepid to karmic. Here are my notes on how I did it.
Unfortunately the upgrade process requires an interactive terminal, meaning you can't use tools like parallel-ssh. I dealt with this by firing up 14 gnome-terminals, sshing into each farm machine, and using keyboardcast to send keystrokes to each terminal at once. This also has the benefit that if one machine's configuration is slightly different, it can be manually controlled. And it's pretty cool to watch!
- Make a backup of anything we might want to save -- I used dump -0 -f - / | gzip - > /space/root.0.gz although this is probably overkill. It might not hurt to make a copy of /etc, though.
- aptitude -y install update-manager-core
- do-release-upgrade
You'll need to answer y to confirm the upgrade, and also deal with any prompts it gives you while running the apt upgrade. Probably the defaults are fine, but this will depend on what you're upgrading.
do-release-upgrade only upgrades one major version at a time (i.e. intrepid -> jaunty), so you'll have to repeat it if you want to upgrade more than one version. Upgrading multiple versions at once isn't supported.
Finally, techsquare had some packages installed that were preventing me from upgrading. I unilaterally removed them because they're not really needed on the farm machine. If you need to do the same: echo "I am aware that this is a very bad idea" | aptitude -y remove aliases.d eaps-config environment-modules exports.d hosts.allow.d hosts.d hosts.deny.d ncview sudoers.d syslog.conf.d udunits ~n^ts-.\*
Installing Ubuntu Server
Installing Ubuntu on these machines is relatively straightforward.
- Back up important directories (I used the second disk /dev/sdb1): /etc /home /root
- Grab the Ubuntu Server CD from the Media Lab Ubuntu Mirror
- (Optional): I put Ubuntu on a USB key, instead of on a CD, by following the "flexible" directions in the Ubuntu install guide.
- Reboot. (Optional): For the USB key: Press F2 to enter bios; change Hard Drive boot order to put USB Flash first.
- Use the default partitioning on /dev/sda.
- Tell the installer to mount /dev/sdb1 as /space, but not partition it
- Install the default Ubuntu Server. Add the OpenSSH package, but no others.
- When logged in to the new system, edit /etc/network/interfaces to have the correct static IP address. Edit /etc/resolv.conf to have the right DNS server and search domain (copy from an existing machine)
- ifdown eth0; ifup eth0 to use the new configuration
- Edit /etc/apt/sources.list to use the media lab Ubuntu mirror, since that is faster: perl -pi -e 's/us.archive.ubuntu.com/ubuntu.media.mit.edu/' /etc/apt/sources.list
- apt-get update; apt-get upgrade
- With the base system installed, install the "important" packages above.
- Restore the SSH host keys: cp [backup location]/etc/ssh/ssh_host* /etc/ssh
- Fix automount to start at boot: update-rc.d am-utils defaults (Ubuntu bug filed: hopefully this will be unneeded in the future)
- Uncomment the line server pool.ntp.org in /etc/ntp.conf to get more accurate NTP synchronization.
- copy home directories from a backup
- Remove shadow passwords (needed for PMG stuff): pwunconv
- Copy passwd file to passwd.base. This gets used to produce the "real" passwd file, by adding users to it: cp /etc/passwd /etc/passwd.base
- Edit /etc/passwd.base to remove any user accounts. You can store any local passwords in /etc/passwd.local
- Install the PMG scripts and accounts:
cd / curl http://pmg.csail.mit.edu/internal/new-pmg.tar.gz | tar xzf - /usr/local/adm/bin/updatemachine
Installing AFS
The following procedure builds an AFS module package specific for the kernel being used on the system (located in /usr/src). This package can be used on other systems, provided that they have the same kernel. This can save a lot of time, so copying the package is highly recommended. Also see TIG's AFS on Ubuntu documentation, although these directions seem to work and are faster.
DRKP 2010/01/12: I think the best way to get the module installed nowadays is to install the openafs-modules-dkms package, which should automatically build the module from source and keep it up to date as the kernel gets upgraded. So installing it should avoid the need for the module-assistant steps below (although, of course, it will still take a few minutes to compile from source). YMMV, of course.
- (Optional): install the pre-build openafs kernel module package: dpkg -i packagefile.deb (Skip the module-assistant steps if you do this)
- sudo apt-get install openafs-krb5 openafs-client krb5-user module-assistant
- Accept the defaults. Set the AFS cell to csail.mit.edu
- sudo module-assistant prepare
- sudo module-assistant auto-install openafs
- sudo /etc/init.d/openafs-client restart
Installing Matlab
These directions are stolen from [1]. Also need to install libxp6.
mkdir /space/matlab cp /space/backup*/space/matlab7.4/etc/license.dat /space/matlab cd /space/backup*/space/matlab-download* ./install -t (probably don't need -t: -t only necessary when X is not available...) a (accept license) /space/matlab (where the install should go) c (continue) y (make links) (/usr/local/bin is fine) y (begin installation) matlab (test it minimally) quit
Cloning a Ubuntu Server
Installing individual machines using the procedure above is somewhat reasonable, but when installing more than a few servers, you want to automate the task. Here is how I installed Ubuntu across all the servers:
- Install and configure one machine with Ubuntu, as it should be replicated across all the machines.
- Burn System Rescue CD on a CD or on a USB key (I put in on the same USB key as the Ubuntu installer, so I could choose which to boot)
- Remount root using a bind mount, to avoid cloning stuff from other file systems (such as devfs, proc, etc): mkdir /temproot; mount --bind / /temproot
- Create a tar archive containing all the files in the root filesystem: cd /temproot; tar cpf /tmp/image.tar .
- Unmount the bind mount: cd; umount /temproot; rm -rf /temproot
- Start an "image server": while true; do nc -l 12345 < /tmp/image.tar; done
- Boot the destination machine using the System Rescue CD. (if you boot with rescuecd docache at the syslinux prompt you will be able to eject the CD/unmount the disk, which can be useful)
- Configure the network: net-setup eth0
- Run the script to clone the image: ./clone_farm.sh
- When done, use CTRL-C to stop the "image server", and rm /tmp/image.tar.
The clone script does all the main work. It will need to be customized, depending on what kind of configuration needs to be done. Look for TODO comments in the script:
#!/bin/sh set -e HOSTNAME=$1 if [ -z "$HOSTNAME" ]; then echo "missing hostname" exit 1 fi IP=`host -t A $1 | cut -d " " -f 4` if [ -z "$IP" -o "$IP" = "out;" ]; then echo "missing ip" exit 1 fi # Partition first hard drive # TODO: Fix size for your disk size. Size is in megabytes # Value for farm1-4: 110332 # Value for farm4-14: 148500 sfdisk /dev/sda -uM << EOF 0,110332,L,* ,,S EOF # Make file systems and mount root mke2fs -j /dev/sda1 mkswap /dev/sda2 mkdir /mnt/disk mount /dev/sda1 /mnt/disk cd /mnt/disk # Fetch and extract disk image # TODO: Replace host name with location of your "image server" nc farm13 12345 | tar xvf - --preserve # Fix UUIDs in /etc/fstab and /boot/grub/menu.lst UUID_ROOT=`vol_id --uuid /dev/sda1` UUID_SWAP=`vol_id --uuid /dev/sda2` UUID_SPACE=`vol_id --uuid /dev/sdb1` # TODO: Replace these UUIDs with the UUIDs from the source system perl -pi -e "s/d9539d01-2090-484e-8e70-067f40d6bd35/$UUID_ROOT/g;" etc/fstab boot/grub/menu.lst perl -pi -e "s/26a0ceda-4a9f-4a6f-9744-57d389088dc1/$UUID_SWAP/g;" etc/fstab perl -pi -e "s/21afaa7b-7cd9-403b-9cc4-794377a5b888/$UUID_SPACE/g;" etc/fstab # Mount proc and dev to make grub work mount -t proc none proc mount -o bind /dev dev # Install grub on the disk to make it bootable chroot . grub-install /dev/sda chroot . update-grub # Fix hostname and /etc/network/interfaces echo $HOSTNAME > etc/hostname # TODO: Fix IP address to match your source machine perl -pi -e "s/18.26.1.62/$IP/;" etc/network/interfaces # Remove the record of the network interfaces so they get redetected rm etc/udev/rules.d/70-persistent-net.rules # Copy original SSH keys from /space mkdir /mnt/space mount /dev/sdb1 /mnt/space -o ro cp /mnt/space/backup-200811*/etc/ssh/ssh_host_* etc/ssh