jbd2bindingevents.pl
processes JBD output to give the
location of discrete binding events. This is useful for cases where
there is uncertainty in JBD's output- the posterior probability of
binding may be high at many nearby locations because JBD can't identify
a single binding position that explains the data.
You'll need to experiment with thresholds to determine what you use from JBD's output. In particular, many people want p-values or some equivalent. You can obtain these in several ways, including negative examples (a set of known non-bound genes in some existing dataset) and randomized data (generating a dataset by scrambling the probes and then running JBD on the result).
jbd2bindingevents.pl
accepts input on STDIN and
generates output on STDOUT:
jbd2bindingevents.pl < sc_gcn4.1.1.jbd
produces this output when JBD was run with the default parameters
90 0.950 9.15705 68494 0.381 3.625875 115645 0.395 4.157376 141798 0.506 2.486367
The columns here are
jbd2bindingevents.pl
first identifies regions of
elevated posterior probability. You can set this background level with
the posteriorbackground
command line arg; the default is
.1.
Within each continuous region of elevated posterior probability,
jbd2bindingevents.pl
computes the maximum posterior
probability and the sum of bi * si (the "size").
It also computes the weighted center of the region where the weight is
bi * si.
The maximum posterior probability in a region gives an indication of
JBD's confidence that there is binding in the region. The size of an
event is a combination of the posterior probability and the binding
strength and reflects a combination of JBD's confidence that there is
binding in the region and the IP ratios associated with the binding
event. You can have jbd2bindingevents.pl
filter its output
by setting posteriorthresh
and/or sizethresh
on the command line. The default values are .2 and 2 which seem to be
reasonable for most datasets.