Loading Lexical Data into MySQL Database
Many clinical natural language tools use the NLM's UMLS Lexical
Tools to perform at least part of their processing. For example, in
order to be able to use the Metathesaurus index files such as MRXNS_ENG
,
one must transform the source text into the form produced by the norm
program. These lexical variant
generator tools incorporate their own private copy of the
UMLS lexical databases. However, it is also useful to be able to use
these same data in one's own custom programs. I have built a load
script that will import the lexical tables whose structure I have
been able to figure out into a MySQL database. In my case, I use the
same local database that is loaded with the Metathesaurus and the
Semantic Network, both of which come with load scripts as they are
distributed/generated.
The load script is linked below as a file suitable for sourcing in
MySQL. It was developed and tested (on a Mac running OS X 10.6.8)
with the 2011AB distribution of the UMLS data, and to use it, you
will need to follow these steps:
- Download the UMLS Knowledge Sources. These may be found at http://www.nlm.nih.gov/research/umls/
by following the "Downloads" link and selecting UMLS.
There are eight files to download:
- 2011AB.CHK
- 2011AB.MD5
- 2011AB-1-meta.nlm
- 2011AB-2-meta.nlm
- 2011AB-otherks.nlm
- mmsys.zip
- Copyright_Notice.txt
- README.txt
- Unzip the file mmsys.zip. This sets up the Metamorphosys
program. I have found that the only fuss-free way to run it is
to move the other seven files into the folder
MMSYS
that was created by unzipping. When Metamorphosys starts
up, point its inputs to that folder and select a new output
destination, which I will assume is .../UMLS
.
Metamorphosys creates three subfolders there, named LEX, META,
and NET.
- I configured Metamorphosys to create the lexical tools,
generate load scripts for MySQL, and select the English-only
terms from its knowledge sources.
- I
cd
to the META folder, start mysql
with privileges that allow me to create new databases, create a
database for this new installation of UMLS, and grant privileges
for read-only access to a user name/password that I can share
with my colleagues if they have UMLS credentials. I invoke:
create database 2011ab default character set utf8;
grant select on 2011ab.* to 'xxx'@'%' identified by
'yyy'
;
use 2011ab;
Note: it is important to invoke mysql from the directory that
holds the sources to be imported because the load scripts
produced by Metamorphosys assume that; hence all the changes of
directory below. Substitute appropriate user name and password
for xxx and yyy.
- I invoke the following within
mysql
source mysql_tables.sql
source mysql_indexes.sql
Note: These will take hours to execute, even on fast machines.
- I exit
mysql
, cd
to the NET
folder.
- For some reason, Metamorphosys zips up the load scripts, so I
have to unzip LoadScripts.zip.
- I start mysql again, with write privileges, selecting the
2011ab database.
- In MySQL, invoke:
source LoadScripts/mysql_net_tables.sql
- I exit
mysql
again, and cd
to the
LEX directory.
- Download the file mysql_lex_tables.sql
and place it into this directory.
- I start mysql as before, and invoke:
source mysql_lex_tables.sql
For caveats on assumptions and changes I have made to the
lexical table structures compared with the UMLS
documentation, see the comments at the beginning of that
file.
Future changes to the UMLS distribution may necessitate alterations
to these instructions, but for now they have worked on both Mac/Snow
Leopard and Debian Linux. Please let
me know if you have any difficulties.