Farm Guide

From Pmgwiki

Jump to: navigation, search

The farm machines are a set of 14 rack mounted servers in the CSAIL machine room, available for experiments. Warning: The data on these machines is not backed up. Use at your own risk. These machines are shared: please be courteous and use the reservation system. Use top or uptime to see if a machine is being used by someone else for something that may be computationally expensive. If so, please use a different machine.

Contents

Mailing List

We use farmers@csail.mit.edu to communicate about the farm machines. Please get yourself added to this mailing list. See How to Add Someone to Mailing List for directions.

Reserving Machines

Please record reservations on the Farm Reservation page, to avoid conflicts with others. If you really need exclusive access to some of the machines for some reason (such as running careful performance tests), you can deny everyone else login access:

  1. Add the following line to the bottom of /etc/ssh/sshd_config: AllowUsers root [your user]
  2. Reload sshd's configuration: /etc/init.d/sshd reload

When you are done, be sure to remove your change to the config file and reload sshd again.

Disk Space

Each of the farm machines has the following configuration:

  • Root partition: /dev/sda1 using ext3, occupying the entire disk (minus swap)
  • Swap: /dev/sda2 (supposed to be 2x RAM = ~4 GB)
  • /space: /dev/sdb1 using ext3, occupying the entire disk

In other words, if you need more disk space, put stuff in /space

Installing Software

Please don't run make install as root to install software system-wide. Please use Ubuntu packages, since that will permit the software to be painlessly installed across all the machines, and to be removed/upgraded. If you need something specific for your research, please consider installing it into your home directory, to avoid cluttering up the systems.

To install software:

  1. Find the package name in the Ubuntu package archive
  2. Install the package on all the machines as root: for i in `seq 1 14`; do ssh farm$i apt-get install [package-name]
  3. Document the package that you wanted in the "important packages" section below, so we don't forget in the future.

Rebooting Remotely

farm5-14 all have a feature called IPMI, which can be used to reboot them and access BIOS messages remotely. They listen on a separate IP address: farmXXadm.csail.mit.edu, where XX is the machine id (eg. farm5 = farm5adm). They speak IPMI v1.5, and the login username is root; see Dan Ports for the password (it's not the same as the usual one, IPMI security is not great).

AFAIK, farm1-4 are too old to support IPMI, but I'd love to be proven wrong.

ipmitool (apt-get install ipmitool) supports about a billion options, but I've only found two of them to actually be useful:

Remote Power Cycling

ipmitool -I lan -H farm5adm -U root chassis power cycle

Serial Console Access

This can get you to the BIOS or bootloader configuration.

ipmitool -I lan -H farm5adm -U root isol activate

Then type <return>~. to exit.

Using the Google Profiler

Google's documentation has more details, but these are the commands that work on the farm machines (and Ubuntu in general).

  1. Run the program with profiling enabled: CPUPROFILE=out.prof LD_PRELOAD=/usr/lib/libprofiler.so.0 command
  2. Draw the profile as a pretty graphviz diagram (to a PS file): google-pprof --ps <binary path> out.prof > out.ps
  3. View the out.ps postscript file. The big boxes are where the program spends most of its time. Good luck.

Note: Mac OS X's Preview program doesn't like the resulting postscript. Fix it by using: google-pprof --dot <binary path> out.prof | dot -Tps2 > out.ps.

Using AFS

The farm machines are configured to use AFS. To access your CSAIL AFS files:

  1. Get a Kerberos ticket: kinit [username]. Type your CSAIL password.
  2. Get a Kerberos ticket for AFS: aklog.
  3. Access your files: ls /afs/csail.mit.edu/u/[first letter of user name]/[user name] Example: ls /afs/csail.mit.edu/u/e/evanj
  4. You can also list your tickets: klist; tokens

To access your Athena AFS account, first follow the above to get a Kerberos ticket for your CSAIL account. The following are adapted from the CSAIL cross-cell HOWTO:

  1. Create a cross-cell entry: aklog -cell athena.mit.edu. You will get a message like: created cross-cell entry for [username]@csail.mit.edu (Id 16383603) at athena.mit.edu
  2. Log in to an Athena machine, give your CSAIL account access to all the files in your home directory: cd; find . -name .snapshot -prune -o -type d -exec fs sa {} [username]@athena.mit.edu all \;


Important Packages

  • am-utils: used for amd to automount NFS. At some point we might want to migrate to autofs, the in-kernel implementation.
  • g++ gdb make gcc-4.2 valgrind subversion git-core mercurial manpages-dev manpages-posix-dev automake: Development tools, including GCC 4.2 for compiling older C code
  • csh tcsh emacs: shells and editors that people use
  • sun-java6-jdk ant ant-optional: Java development environment
  • ntp: for time synchronization
  • libxp6: needed for matlab
  • python-psyco: much faster Python execution for long running programs
  • libgoogle-perftools0: Includes Google's profiler as google-pprof
  • vim-nox: fully functional version of vim, for those who use it to edit/write code. Can remove vim-tiny after installing this.

Hardware Details

farm1-4

  • Dell PowerEdge 650
  • 1x Intel(R) Pentium(R) 4 CPU 3.06GHz (with hyperthreading: two virtual CPUs)
  • 2 GB RAM
  • 2x 120 GB disks
  • 2x Intel e1000 gigabit Ethernet (eth0 connected)
  • BIOS: Revision A05 (except farm1, which is using A04)

farm5-15

  • Dell PowerEdge SC1420
  • 2x Intel(R) Xeon(TM) CPU 3.20GHz (with Hyperthreading: 4 virtual CPUs)
  • 2 GB RAM
  • 2x 160 GB disks
  • 2x Intel e1000 gigabit Ethernet (eth0 connected)
  • BIOS: Revision A03


Special Configurations

  • farm2-4 have a RAID controller card in them. farm1 has this card removed. Besides that, their hardware is identical.
  • farm2 has backups on it, on its second disk (/dev/sdb). These backups are mounted in /archive. As such, it does not have a second disk mounted on /space like the others. It still has the /space directory, to maintain compatability with the matlab symlinks.
  • farm5 and farm6 have backups located in /space/archive.

Upgrading Ubuntu

In Dec '09, I (DRKP) upgraded the farm machines from Ubuntu intrepid to karmic. Here are my notes on how I did it.

Unfortunately the upgrade process requires an interactive terminal, meaning you can't use tools like parallel-ssh. I dealt with this by firing up 14 gnome-terminals, sshing into each farm machine, and using keyboardcast to send keystrokes to each terminal at once. This also has the benefit that if one machine's configuration is slightly different, it can be manually controlled. And it's pretty cool to watch!

  1. Make a backup of anything we might want to save -- I used dump -0 -f - / | gzip - > /space/root.0.gz although this is probably overkill. It might not hurt to make a copy of /etc, though.
  2. aptitude -y install update-manager-core
  3. do-release-upgrade

You'll need to answer y to confirm the upgrade, and also deal with any prompts it gives you while running the apt upgrade. Probably the defaults are fine, but this will depend on what you're upgrading.

do-release-upgrade only upgrades one major version at a time (i.e. intrepid -> jaunty), so you'll have to repeat it if you want to upgrade more than one version. Upgrading multiple versions at once isn't supported.

Finally, techsquare had some packages installed that were preventing me from upgrading. I unilaterally removed them because they're not really needed on the farm machine. If you need to do the same: echo "I am aware that this is a very bad idea" | aptitude -y remove aliases.d eaps-config environment-modules exports.d hosts.allow.d hosts.d hosts.deny.d ncview sudoers.d syslog.conf.d udunits ~n^ts-.\*

Installing Ubuntu Server

Installing Ubuntu on these machines is relatively straightforward.

  1. Back up important directories (I used the second disk /dev/sdb1): /etc /home /root
  2. Grab the Ubuntu Server CD from the Media Lab Ubuntu Mirror
  3. (Optional): I put Ubuntu on a USB key, instead of on a CD, by following the "flexible" directions in the Ubuntu install guide.
  4. Reboot. (Optional): For the USB key: Press F2 to enter bios; change Hard Drive boot order to put USB Flash first.
  5. Use the default partitioning on /dev/sda.
  6. Tell the installer to mount /dev/sdb1 as /space, but not partition it
  7. Install the default Ubuntu Server. Add the OpenSSH package, but no others.
  8. When logged in to the new system, edit /etc/network/interfaces to have the correct static IP address. Edit /etc/resolv.conf to have the right DNS server and search domain (copy from an existing machine)
  9. ifdown eth0; ifup eth0 to use the new configuration
  10. Edit /etc/apt/sources.list to use the media lab Ubuntu mirror, since that is faster: perl -pi -e 's/us.archive.ubuntu.com/ubuntu.media.mit.edu/' /etc/apt/sources.list
  11. apt-get update; apt-get upgrade
  12. With the base system installed, install the "important" packages above.
  13. Restore the SSH host keys: cp [backup location]/etc/ssh/ssh_host* /etc/ssh
  14. Fix automount to start at boot: update-rc.d am-utils defaults (Ubuntu bug filed: hopefully this will be unneeded in the future)
  15. Uncomment the line server pool.ntp.org in /etc/ntp.conf to get more accurate NTP synchronization.
  16. copy home directories from a backup
  17. Remove shadow passwords (needed for PMG stuff): pwunconv
  18. Copy passwd file to passwd.base. This gets used to produce the "real" passwd file, by adding users to it: cp /etc/passwd /etc/passwd.base
  19. Edit /etc/passwd.base to remove any user accounts. You can store any local passwords in /etc/passwd.local
  20. Install the PMG scripts and accounts:
cd /
curl http://pmg.csail.mit.edu/internal/new-pmg.tar.gz | tar xzf -
/usr/local/adm/bin/updatemachine


Installing AFS

The following procedure builds an AFS module package specific for the kernel being used on the system (located in /usr/src). This package can be used on other systems, provided that they have the same kernel. This can save a lot of time, so copying the package is highly recommended. Also see TIG's AFS on Ubuntu documentation, although these directions seem to work and are faster.

DRKP 2010/01/12: I think the best way to get the module installed nowadays is to install the openafs-modules-dkms package, which should automatically build the module from source and keep it up to date as the kernel gets upgraded. So installing it should avoid the need for the module-assistant steps below (although, of course, it will still take a few minutes to compile from source). YMMV, of course.

  1. (Optional): install the pre-build openafs kernel module package: dpkg -i packagefile.deb (Skip the module-assistant steps if you do this)
  2. sudo apt-get install openafs-krb5 openafs-client krb5-user module-assistant
  3. Accept the defaults. Set the AFS cell to csail.mit.edu
  4. sudo module-assistant prepare
  5. sudo module-assistant auto-install openafs
  6. sudo /etc/init.d/openafs-client restart

Installing Matlab

These directions are stolen from [1]. Also need to install libxp6.

mkdir /space/matlab
cp /space/backup*/space/matlab7.4/etc/license.dat /space/matlab
cd /space/backup*/space/matlab-download*
./install -t
        (probably don't need -t: -t only necessary when X is not available...)
a       (accept license)
/space/matlab   (where the install should go)
c       (continue)
y       (make links)
        (/usr/local/bin is fine)
y       (begin installation)
matlab  (test it minimally)
quit

Cloning a Ubuntu Server

Installing individual machines using the procedure above is somewhat reasonable, but when installing more than a few servers, you want to automate the task. Here is how I installed Ubuntu across all the servers:

  1. Install and configure one machine with Ubuntu, as it should be replicated across all the machines.
  2. Burn System Rescue CD on a CD or on a USB key (I put in on the same USB key as the Ubuntu installer, so I could choose which to boot)
  3. Remount root using a bind mount, to avoid cloning stuff from other file systems (such as devfs, proc, etc): mkdir /temproot; mount --bind / /temproot
  4. Create a tar archive containing all the files in the root filesystem: cd /temproot; tar cpf /tmp/image.tar .
  5. Unmount the bind mount: cd; umount /temproot; rm -rf /temproot
  6. Start an "image server": while true; do nc -l 12345 < /tmp/image.tar; done
  7. Boot the destination machine using the System Rescue CD. (if you boot with rescuecd docache at the syslinux prompt you will be able to eject the CD/unmount the disk, which can be useful)
  8. Configure the network: net-setup eth0
  9. Run the script to clone the image: ./clone_farm.sh
  10. When done, use CTRL-C to stop the "image server", and rm /tmp/image.tar.

The clone script does all the main work. It will need to be customized, depending on what kind of configuration needs to be done. Look for TODO comments in the script:

#!/bin/sh

set -e

HOSTNAME=$1
if [ -z "$HOSTNAME" ]; then
    echo "missing hostname"
    exit 1
fi

IP=`host -t A $1 | cut -d " " -f 4`
if [ -z "$IP" -o "$IP" = "out;" ]; then
    echo "missing ip"
    exit 1
fi

# Partition first hard drive
# TODO: Fix size for your disk size. Size is in megabytes
# Value for farm1-4: 110332
# Value for farm4-14: 148500
sfdisk /dev/sda -uM << EOF
0,110332,L,*
,,S
EOF

# Make file systems and mount root
mke2fs -j /dev/sda1
mkswap /dev/sda2
mkdir /mnt/disk
mount /dev/sda1 /mnt/disk
cd /mnt/disk

# Fetch and extract disk image
# TODO: Replace host name with location of your "image server"
nc farm13 12345 | tar xvf - --preserve

# Fix UUIDs in /etc/fstab and /boot/grub/menu.lst
UUID_ROOT=`vol_id --uuid /dev/sda1`
UUID_SWAP=`vol_id --uuid /dev/sda2`
UUID_SPACE=`vol_id --uuid /dev/sdb1`

# TODO: Replace these UUIDs with the UUIDs from the source system
perl -pi -e "s/d9539d01-2090-484e-8e70-067f40d6bd35/$UUID_ROOT/g;" etc/fstab boot/grub/menu.lst
perl -pi -e "s/26a0ceda-4a9f-4a6f-9744-57d389088dc1/$UUID_SWAP/g;" etc/fstab
perl -pi -e "s/21afaa7b-7cd9-403b-9cc4-794377a5b888/$UUID_SPACE/g;" etc/fstab

# Mount proc and dev to make grub work
mount -t proc none proc
mount -o bind /dev dev

# Install grub on the disk to make it bootable
chroot . grub-install /dev/sda
chroot . update-grub

# Fix hostname and /etc/network/interfaces
echo $HOSTNAME > etc/hostname
# TODO: Fix IP address to match your source machine
perl -pi -e "s/18.26.1.62/$IP/;" etc/network/interfaces

# Remove the record of the network interfaces so they get redetected
rm etc/udev/rules.d/70-persistent-net.rules

# Copy original SSH keys from /space
mkdir /mnt/space
mount /dev/sdb1 /mnt/space -o ro
cp /mnt/space/backup-200811*/etc/ssh/ssh_host_* etc/ssh
Personal tools