Loading Lexical Data into MySQL Database

Many clinical natural language tools use the NLM's UMLS Lexical Tools to perform at least part of their processing. For example, in order to be able to use the Metathesaurus index files such as MRXNS_ENG, one must transform the source text into the form produced by the norm program. These lexical variant generator tools incorporate their own private copy of the UMLS lexical databases. However, it is also useful to be able to use these same data in one's own custom programs. I have built a load script that will import the lexical tables whose structure I have been able to figure out into a MySQL database. In my case, I use the same local database that is loaded with the Metathesaurus and the Semantic Network, both of which come with load scripts as they are distributed/generated.
The load script is linked below as a file suitable for sourcing in MySQL. It was developed and tested (on a Mac running OS X 10.6.8) with the 2011AB distribution of the UMLS data, and to use it, you will need to follow these steps:
  1. Download the UMLS Knowledge Sources. These may be found at http://www.nlm.nih.gov/research/umls/ by following the "Downloads" link and selecting UMLS.  There are eight files to download:
  2. Unzip the file mmsys.zip. This sets up the Metamorphosys program. I have found that the only fuss-free way to run it is to move the other seven files into the folder MMSYS that was created by unzipping.  When Metamorphosys starts up, point its inputs to that folder and select a new output destination, which I will assume is .../UMLS. Metamorphosys creates three subfolders there, named LEX, META, and NET.
  3. I configured Metamorphosys to create the lexical tools, generate load scripts for MySQL, and select the English-only terms from its knowledge sources.
  4. I cd to the META folder, start mysql with privileges that allow me to create new databases, create a database for this new installation of UMLS, and grant privileges for read-only access to a user name/password that I can share with my colleagues if they have UMLS credentials. I invoke: Note: it is important to invoke mysql from the directory that holds the sources to be imported because the load scripts produced by Metamorphosys assume that; hence all the changes of directory below. Substitute appropriate user name and password for xxx and yyy.
  5. I invoke the following within mysql
    Note: These will take hours to execute, even on fast machines.
  6. I exit mysql, cd to the NET folder.
  7. For some reason, Metamorphosys zips up the load scripts, so I have to unzip LoadScripts.zip.
  8. I start mysql again, with write privileges, selecting the 2011ab database.
  9. In MySQL, invoke:
  10. I exit mysql again, and cd to the LEX directory.
  11. Download the file mysql_lex_tables.sql and place it into this directory.
  12. I start mysql as before, and invoke: For caveats on assumptions and changes I have made to the lexical table structures compared with the UMLS documentation, see the comments at the beginning of that file.
Future changes to the UMLS distribution may necessitate alterations to these instructions, but for now they have worked on both Mac/Snow Leopard and Debian Linux.  Please let me know if you have any difficulties.