LEXIS-NEXIS® Academic Universe - Document
Copyright 2001 The New York Times Company
The New York Times
February 13, 2001, Tuesday, Late Edition - Final
SECTION: Section F; Page 1; Column 1; Science Desk
LENGTH: 1542 words
HEADLINE: READING THE BOOK OF LIFE;
Grad Student Becomes Gene Effort's Unlikely Hero
BYLINE:
By NICHOLAS WADE
BODY:
A surprising hero helped the consortium of academic scientists decoding the
human
genome to avoid a drubbing by its rival, the Celera Genomics company. Scientists
throughout the world are now beating an electronic path to his Web site, where
they can analyze and download the human
genome sequence. He is a graduate student at the University of California at Santa
Cruz, and his name is not Clark but James Kent.
In four hectic weeks last spring, Mr. Kent wrote a computer program that the
consortium's leaders hadn't realized how much they needed, one that assembles
the 400,000 fragments of DNA they had decoded into a coherent sequence. Using
100 computers that his senior colleague, Dr. David Haussler, had persuaded the
university to buy for the purpose, Mr. Kent performed his first
assembly on the human
genome on June
22, just four days before Dr. Francis S. Collins, the consortium's informal
leader, and Dr. J. Craig Venter of Celera, announced at the White House on June
26 that each had assembled the human
genome.
Since Dr. Venter has now stated that Celera finished its computer
assembly just the day before, on the night of June 25, it turns out that Mr. Kent's
brilliant improvisation was the first
assembly of the human genome, even though one that had and still contains many gaps.
"Without Jim Kent, the
assembly of the genome into the golden path wouldn't have happened," said Dr. Collins, referring to the nickname for the GigAssembler, as the
program is known.
Dr. Haussler, a professor of computer science, described his student as a
superstar.
"He's unbelievable," Dr. Haussler said.
"This program represents an amount of work that would have taken a team of 5 or
10 programmers at least six months or a year.
"Jim in four weeks created the GigAssembler by working night and day," Dr. Haussler said.
"He had to ice his wrists at night because of the
fury with which he created this extraordinarily complex piece of code."
Having finished the GigAssembler, Mr. Kent then wrote another program known as
a browser that enables many other kinds of genomic information to be aligned in
tracks above the raw sequence of DNA bases, the chemical letters in which the
human hereditary information is encoded. This extra information is essential
for making sense of the DNA sequence and in particular for discovering where
the genes are.
How did a mere private see the weakness in the order of battle that the
consortium's generals had missed? Their competitor, Dr. Venter, had always
relied on heavy-duty computation to hedge the enormous risks he took in his
sequencing strategy. In setting up Celera he ordered a machine with vast
processing power and four terabytes of memory, creating the largest computer in
civilian use.
The consortium's DNA sequencing
centers had nothing to match it because their strategy did not seem to require
large-scale computation. The consortium breaks the human genome into large
pieces known as BAC's and works from a simple map that shows how one BAC
overlaps with the next in a tiling path that extends across the genome.
In December 1999, Dr. Eric S. Lander of the Whitehead Institute, who heads the
consortium's genome analysis group, approached Dr. Haussler for help finding
the genes in the human genome sequence, the first step in annotating it. Dr.
Haussler decided the genome could not be properly analyzed until the pieces
being generated by the consortium were in the right order.
The pieces of DNA were so small that their average size was less than that of
the average gene.
Most of the consortium's BAC's were unfinished, each one consisting of up to
80 different pieces of unknown orientation and of unknown order within each BAC
-- hardly worthy of being called a sequence, as Dr. Venter liked to observe of
his competitors' efforts.
Dr. Haussler believed there was enough information, some generated
inadvertently by the consortium and some available from other sources, to
orient and align the BAC fragments correctly.
He immediately started writing an
assembly program. He persuaded the university's chancellor, Dr. Marcia Greenwood, that
there was a chance to make a historic contribution and that she needed to
advance him $250,000 to buy 100 computers with Pentium III processors. It was a
previous chancellor of U.C. Santa Cruz, Dr. Robert L. Sinsheimer, who in 1985
held the first meeting to explore the idea of sequencing the human genome.
But progress was slow. In May 2000, when Mr. Kent sent an e-mail message to his colleague to ask how the
assembly program was going, Dr. Haussler said he had to reply,
"Jim, it's looking grim."
Mr. Kent sent an e-mail message back that said he felt he could write an
assembly program using a simpler strategy.
"I sent back a one word e-mail, 'Godspeed,'
" Dr. Haussler said.
Mr. Kent, who turned 41 last Saturday, is a graduate student in his second
career. In his first, which lasted more than 10 years, he ran a computer
animation programming business.
Then he decided to go back to school and become a biologist. He started
analyzing the DNA of C. elegans, the laboratory roundworm. When Dr. Haussler
accepted the human genome assignment, Mr. Kent started adapting his worm
programs to human DNA.
Mr.
Kent said he offered to write a human genome
assembly program for the consortium because of his concern that the genome would be
locked up by commercial patents if an assembled sequence was not made publicly
available for all scientists to work on.
"The U.S. Patent Office is, in my mind, very irresponsible in letting people
patent a discovery rather than an invention," he said.
"It's very upsetting. So we wanted to get a public set of genes out as soon as
possible."
After receiving his supervisor's
"Godspeed" message, Mr. Kent started work on May 22. By June 22 he had written the
GigAssembler together with subsidiary programs, in 10,000 lines of code, and
had completed his first
assembly of the human genome.
"The two hard things, one is the repeat structure," he said, referring to the numerous very similar sequences of DNA
letters in the human genome,
"and then for me, something that makes the engineering very challenging is that
you read DNA in two directions, so you are merging a lot of code that has eight
directions."
The program must decide which of DNA's two strands each of the fragments
belongs to because the sequences of letters in each strand run in opposite
directions.
Mr. Kent then started work on another program, called the U.C.S.C. browser,
which allows the assembled DNA sequence to be viewed in terms of its structural
features, like genes and chromosomes. The browser allows one to search 21
tracks of information, each aligned with the underlying DNA sequence. The extra
information helps to identify genes.
Although the consortium had deposited its BAC data in GenBank, a public DNA
database, the U.C.S.C. browser gave biologists their first opportunity to view
the assembled human genome unless they were
subscribers to Celera's database. On July 7 the browser was posted on the World
Wide Web. There was an overwhelming response. On that day the U.C.S.C. servers
put out half a terabyte of information, Dr. Haussler said, and the browser now
gets 20,000 hits a day from all over the world.
"Nothing crashed, we just kept putting it out," he said.
"People wrote back it was fantastic we had put it out with no intellectual
property requirements, a gift with no strings attached."
Mr. Kent has designed the browser so that other biologists can add tracks of
data to it, increasing the wealth of annotation.
"It's been a wonderful stone soup, where other people have contributed bits," he said.
The consortium now lists the U.C.S.C. browser as the principal source for
viewing its human genome sequence.
The
consortium's three major centers, with their large staffs of computational
biologists, did not write an
assembly program of their own because they had not needed one in their pilot project,
on the worm genome, and perhaps did not realize how useful such a program would
be. The National Center for Biotechnology Information completed an
assembly program only recently.
"It's easy for Venter to say of course you should have had this all planned out," Dr. Haussler said.
"But he had the ability to organize an industrial-scale effort. Given the
culture of the public project, it would have been very difficult to graft on an
autocratic new software project, and if you had assigned a team to build this
whole thing over several months it would have been a very difficult
proposition. So what Jim has done is miraculous in many ways. No one expected
anyone could come in and put this
together in four weeks."
An even more telling accolade comes from Dr. Venter, who was astonished that
the consortium managed to put the human genome sequence together, given what he
regarded as the poor quality of its data.
"They used every piece of information available," he said.
"It was really quite clever, given the quality of their data. So honestly, we
are impressed. We were truly amazed because we predicted, based on their raw
data, that it would be nonassemblable. So what Haussler did was he came in and
saved them. Haussler put it all together."
http://www.nytimes.com
GRAPHIC: Photos: James Kent, a graduate student at the University of California at
Santa Cruz, took only four weeks to write a computer program that helped
assemble the human genome. (Peter DaSilva for The New York Times)(pg. F1); Dr.
David Haussler with computers used in the genome project. (Peter DaSilva for
The New York Times)(pg. F4)
LOAD-DATE: February 13, 2001