REAPR: Realignment for Prediction of Structural Non-coding RNA

Documentation

Please follow the Installation instructions to install a stand-alone version of REAPR and the required software packages Vienna RNA, RNAz, LocARNA, and (optionally squid) in a GNU/Linux (or other Unix-like) environment. REAPR runs with python v2.7 or higher (see Usage). In REAPR, configure the location of the above software packages with
Command: python configure.py [-h] [--locarna-prefix LOCARNA_PREFIX] 
	 	[--rnaz-prefix RNAZ_PREFIX] [--alistat ALISTAT]
		[--compalignp COMPALIGNP]

--help              Display help message
--locarna-prefix    Installation directory of LocARNA package.
--rnaz-prefix       Installation directory of RNAz package.
--alistat           Path to alistat command from squid package.

Usage

Formatting the WGA before running REAPR: (1) The WGA must be in MAF format. Each alignment block of the WGA must be in a separate MAF file. (2) Create a file with two columns separated by tabs. The first column lists the names of the WGA blocks , and the second column lists the location of the corresponding MAF alignment files. (3) Create a file with a list of all species in the WGA, separated by newlines. The species names must match those used in MAF alignment files. (4) Create a file with a Newick-format tree, without branch lengths, of all species. This tree is used as the guide for progressive alignment by LocARNA. Run REAPR by calling
  python REAPR.py [options]
with options
-h  --help            show this help message and exit
-a  --alignments      Space-separated list of WGA alignment block files.
-s  --species         Space-separated list of species in WGA. Species names
                      must be the same as those listed in the alignment
                      block files.

-g  --guide-tree      Species guide tree (in Newick format, without branch
    		      lengths) for progressive alignment by LocARNA.
-o  --output-folder   Directory to write output files  (Default: present 
    		      working directory)
-d  --delta           Space-separated list of realignment deviations.
    		      (Default: 20)
-t  --threshold       Stability filter threshold. Filter out windows whose
                      mean MFE z-score is above this threshold (Default: -1)
-p  --processes       Number of cores to use for multiprocessing (Default: 1)
-r  --ram-disk 	      Location of RAM Disk to write temporary files.
                      Specifying a RAM Disk minimizes random access on
                      disk storage.  This is highly recommended as
                      REAPR will write many small files.  (Note: this
                      is typically /dev/shm in Ubuntu, and other Linux
                      systems)
--alistat             Compute sequence identities of alignments using alistat
--compalignp          Compute change between original alignment and realignment
		      using compalignp

Output

REAPR will generate the following files in the folder specified with --output-folder (1) A 'wga' folder containing the resulting files of an RNAz screen on the original WGA. (2) A table 'original_wga.tab' containing a summary of the RNAz screen on the original WGA. (3) A 'loci' folder containing the resulting files from realigning loci and running an RNAz screen on the realignments. (4) For every delta specified for --delta, the tables 'locarna.g..tab' containing a summary of the RNAz screen on the realignments. (5) A table 'summary.tab' containing a summary of REAPR. It includes the RNAz score of every locus based on its alignment in the original WGA and after realignment. If specified with --alistat or --compalign, it also includes the sequence identities, computed by alistat, of the loci in the original WGA and after realignment, and how different the realignment is from the original, computed using compalignp. The first line of every table contains a header describing every column.