Farm Guide
From Pmgwiki
(→Installing AFS) |
(→Installing AFS) |
||
Line 120: | Line 120: | ||
=== Installing AFS === | === Installing AFS === | ||
- | The following procedure builds an AFS module package specific for the kernel being used on the system (located in <tt>/usr/src</tt>). This package can be used on other systems, provided that they have the same kernel. This can save <em>a lot</em> of time, so copying | + | The following procedure builds an AFS module package specific for the kernel being used on the system (located in <tt>/usr/src</tt>). This package can be used on other systems, provided that they have the same kernel. This can save <em>a lot</em> of time, so copying the package is highly recommended. Also see [http://tig.csail.mit.edu/twiki/bin/view/TIG/OpenAFSOnUbuntuLinux TIG's AFS on Ubuntu documentation], although these directions seem to work and are faster. |
+ | # (Optional): install the pre-build openafs kernel module package: <tt>dpkg -i <em>packagefile.deb</em></tt> (Skip the module-assistant steps if you do this) | ||
# <tt>sudo apt-get install openafs-krb5 openafs-client krb5-user module-assistant</tt> | # <tt>sudo apt-get install openafs-krb5 openafs-client krb5-user module-assistant</tt> | ||
# Accept the defaults. Set the AFS cell to <tt>csail.mit.edu</tt> | # Accept the defaults. Set the AFS cell to <tt>csail.mit.edu</tt> |
Revision as of 16:34, 13 January 2009
The farm machines are a set of 14 rack mounted servers in the CSAIL machine room, available for experiments. Warning: The data on these machines is not backed up. Use at your own risk. These machines are shared: please be courteous. Use top or uptime to see if a machine is being used by someone else for something that may be computationally expensive. If so, please use a different machine.
Contents |
Disk Space
Each of the farm machines has the following configuration:
- Root partition: /dev/sda1 using ext3, occupying the entire disk (minus swap)
- Swap: /dev/sda2 (supposed to be 2x RAM = ~4 GB)
- /space: /dev/sdb1 using ext3, occupying the entire disk
In other words, if you need more disk space, put stuff in /space
Installing Software
Please don't run make install as root to install software system-wide. Please use Ubuntu packages, since that will permit the software to be painlessly installed across all the machines, and to be removed/upgraded. If you need something specific for your research, please consider installing it into your home directory, to avoid cluttering up the systems.
To install software:
- Find the package name in the Ubuntu package archive
- Install the package on all the machines as root: for i in `seq 1 14`; do ssh farm$i apt-get install [package-name]
- Document the package that you wanted in the "important packages" section below, so we don't forget in the future.
Using AFS
The farm machines are configured to use AFS. To access your CSAIL AFS files:
- Get a Kerberos ticket: kinit [username]. Type your CSAIL password.
- Get a Kerberos ticket for AFS: aklog.
- Access your files: ls /afs/csail.mit.edu/u/[first letter of user name]/[user name] Example: ls /afs/csail.mit.edu/u/e/evanj
- You can also list your tickets: klist; tokens
To access your Athena AFS account, first follow the above to get a Kerberos ticket for your CSAIL account. The following are adapted from the CSAIL cross-cell HOWTO:
- Create a cross-cell entry: aklog -cell athena.mit.edu. You will get a message like: created cross-cell entry for [username]@csail.mit.edu (Id 16383603) at athena.mit.edu
- Log in to an Athena machine, give your CSAIL account access to all the files in your home directory: cd; find . -name .snapshot -prune -o -type d -exec fs sa {} [username]@athena.mit.edu all \;
Reserving Machines
If you really need exclusive access to some of the machines for some reason (such as running careful performance tests), you can deny everyone else login access:
- Add the following line to the bottom of /etc/ssh/sshd_config: AllowUsers root [your user]
- Reload sshd's configuration: /etc/init.d/sshd reload
When you are done, be sure to remove your change to the config file and reload sshd again.
Important Packages
- am-utils: used for amd to automount NFS. At some point we might want to migrate to autofs, the in-kernel implementation.
- g++ gdb make gcc-4.2 valgrind subversion git-core manpages-dev automake: Development tools, including GCC 4.2 for compiling older C code
- csh tcsh emacs: shells and editors that people use
- sun-java6-jdk ant ant-optional: Java development environment
- ntp: for time synchronization
- libxp6: needed for matlab
- python-psyco: much faster Python execution for long running programs
- libgoogle-perftools0: Includes Google's profiler as google-pprof
Hardware Details
farm1-4
- Dell PowerEdge 650
- 1x Intel(R) Pentium(R) 4 CPU 3.06GHz (with hyperthreading: two virtual CPUs)
- 2 GB RAM
- 2x 120 GB disks
- 2x Intel e1000 gigabit Ethernet (eth0 connected)
- BIOS: Revision A05 (except farm1, which is using A04)
farm5-15
- Dell PowerEdge SC1420
- 2x Intel(R) Xeon(TM) CPU 3.20GHz (with Hyperthreading: 4 virtual CPUs)
- 2 GB RAM
- 2x 160 GB disks
- 2x Intel e1000 gigabit Ethernet (eth0 connected)
- BIOS: Revision A03
Special Configurations
- farm2-4 have a RAID controller card in them. farm1 has this card removed. Besides that, their hardware is identical.
- farm2 has backups on it, on its second disk (/dev/sdb). These backups are mounted in /archive. As such, it does not have a second disk mounted on /space like the others. It still has the /space directory, to maintain compatability with the matlab symlinks.
- farm6 has backups located in /space/archive.
Installing Ubuntu Server
Installing Ubuntu on these machines is relatively straightforward.
- Back up important directories (I used the second disk /dev/sdb1): /etc /home /root
- Grab the Ubuntu Server CD from the Media Lab Ubuntu Mirror
- (Optional): I put Ubuntu on a USB key, instead of on a CD, by following the "flexible" directions in the Ubuntu install guide.
- Reboot. (Optional): For the USB key: Press F2 to enter bios; change Hard Drive boot order to put USB Flash first.
- Use the default partitioning on /dev/sda.
- Tell the installer to mount /dev/sdb1 as /space, but not partition it
- Install the default Ubuntu Server. Add the OpenSSH package, but no others.
- When logged in to the new system, edit /etc/network/interfaces to have the correct static IP address. Edit /etc/resolv.conf to have the right DNS server and search domain (copy from an existing machine)
- ifdown eth0; ifup eth0 to use the new configuration
- Edit /etc/apt/sources.list to use the media lab Ubuntu mirror, since that is faster: perl -pi -e 's/us.archive.ubuntu.com/ubuntu.media.mit.edu/' /etc/apt/sources.list
- apt-get update; apt-get upgrade
- With the base system installed, install the "important" packages above.
- Restore the SSH host keys: cp [backup location]/etc/ssh/ssh_host* /etc/ssh
- Fix automount to start at boot: update-rc.d am-utils defaults (Ubuntu bug filed: hopefully this will be unneeded in the future)
- Uncomment the line server pool.ntp.org in /etc/ntp.conf to get more accurate NTP synchronization.
- copy home directories from a backup
- Remove shadow passwords (needed for PMG stuff): pwunconv
- Copy passwd file to passwd.base. This gets used to produce the "real" passwd file, by adding users to it: cp /etc/passwd /etc/passwd.base
- Edit /etc/passwd.base to remove any user accounts. You can store any local passwords in /etc/passwd.local
- Install the PMG scripts and accounts:
cd / curl http://pmg.csail.mit.edu/internal/new-pmg.tar.gz | tar xzf - /usr/local/adm/bin/updatemachine
Installing AFS
The following procedure builds an AFS module package specific for the kernel being used on the system (located in /usr/src). This package can be used on other systems, provided that they have the same kernel. This can save a lot of time, so copying the package is highly recommended. Also see TIG's AFS on Ubuntu documentation, although these directions seem to work and are faster.
- (Optional): install the pre-build openafs kernel module package: dpkg -i packagefile.deb (Skip the module-assistant steps if you do this)
- sudo apt-get install openafs-krb5 openafs-client krb5-user module-assistant
- Accept the defaults. Set the AFS cell to csail.mit.edu
- sudo module-assistant prepare
- sudo module-assistant auto-install openafs
- sudo /etc/init.d/openafs-client restart
Installing Matlab
These directions are stolen from [1]. Also need to install libxp6.
mkdir /space/matlab cp /space/backup*/space/matlab7.4/etc/license.dat /space/matlab cd /space/backup*/space/matlab-download* ./install -t (probably don't need -t: -t only necessary when X is not available...) a (accept license) /space/matlab (where the install should go) c (continue) y (make links) (/usr/local/bin is fine) y (begin installation) matlab (test it minimally) quit
Cloning a Ubuntu Server
Installing individual machines using the procedure above is somewhat reasonable, but when installing more than a few servers, you want to automate the task. Here is how I installed Ubuntu across all the servers:
- Install and configure one machine with Ubuntu, as it should be replicated across all the machines.
- Burn System Rescue CD on a CD or on a USB key (I put in on the same USB key as the Ubuntu installer, so I could choose which to boot)
- Remount root using a bind mount, to avoid cloning stuff from other file systems (such as devfs, proc, etc): mkdir /temproot; mount --bind / /temproot
- Create a tar archive containing all the files in the root filesystem: cd /temproot; tar cpf /tmp/image.tar .
- Unmount the bind mount: cd; umount /temproot; rm -rf /temproot
- Start an "image server": while true; do nc -l 12345 < /tmp/image.tar; done
- Boot the destination machine using the System Rescue CD. (if you boot with rescuecd docache at the syslinux prompt you will be able to eject the CD/unmount the disk, which can be useful)
- Configure the network: net-setup eth0
- Run the script to clone the image: ./clone_farm.sh
- When done, use CTRL-C to stop the "image server", and rm /tmp/image.tar.
The clone script does all the main work. It will need to be customized, depending on what kind of configuration needs to be done. Look for TODO comments in the script:
#!/bin/sh set -e HOSTNAME=$1 if [ -z "$HOSTNAME" ]; then echo "missing hostname" exit 1 fi IP=`host -t A $1 | cut -d " " -f 4` if [ -z "$IP" -o "$IP" = "out;" ]; then echo "missing ip" exit 1 fi # Partition first hard drive # TODO: Fix size for your disk size. Size is in megabytes # Value for farm1-4: 110332 # Value for farm4-14: 148500 sfdisk /dev/sda -uM << EOF 0,110332,L,* ,,S EOF # Make file systems and mount root mke2fs -j /dev/sda1 mkswap /dev/sda2 mkdir /mnt/disk mount /dev/sda1 /mnt/disk cd /mnt/disk # Fetch and extract disk image # TODO: Replace host name with location of your "image server" nc farm13 12345 | tar xvf - --preserve # Fix UUIDs in /etc/fstab and /boot/grub/menu.lst UUID_ROOT=`vol_id --uuid /dev/sda1` UUID_SWAP=`vol_id --uuid /dev/sda2` UUID_SPACE=`vol_id --uuid /dev/sdb1` # TODO: Replace these UUIDs with the UUIDs from the source system perl -pi -e "s/d9539d01-2090-484e-8e70-067f40d6bd35/$UUID_ROOT/g;" etc/fstab boot/grub/menu.lst perl -pi -e "s/26a0ceda-4a9f-4a6f-9744-57d389088dc1/$UUID_SWAP/g;" etc/fstab perl -pi -e "s/21afaa7b-7cd9-403b-9cc4-794377a5b888/$UUID_SPACE/g;" etc/fstab # Mount proc and dev to make grub work mount -t proc none proc mount -o bind /dev dev # Install grub on the disk to make it bootable chroot . grub-install /dev/sda chroot . update-grub # Fix hostname and /etc/network/interfaces echo $HOSTNAME > etc/hostname # TODO: Fix IP address to match your source machine perl -pi -e "s/18.26.1.62/$IP/;" etc/network/interfaces # Remove the record of the network interfaces so they get redetected rm etc/udev/rules.d/70-persistent-net.rules # Copy original SSH keys from /space mkdir /mnt/space mount /dev/sdb1 /mnt/space -o ro cp /mnt/space/backup-200811*/etc/ssh/ssh_host_* etc/ssh