jbd2bindingevents.pl

jbd2bindingevents.pl processes JBD output to give the location of discrete binding events. This is useful for cases where there is uncertainty in JBD's output- the posterior probability of binding may be high at many nearby locations because JBD can't identify a single binding position that explains the data.

You'll need to experiment with thresholds to determine what you use from JBD's output. In particular, many people want p-values or some equivalent. You can obtain these in several ways, including negative examples (a set of known non-bound genes in some existing dataset) and randomized data (generating a dataset by scrambling the probes and then running JBD on the result).

jbd2bindingevents.pl accepts input on STDIN and generates output on STDOUT:

jbd2bindingevents.pl < sc_gcn4.1.1.jbd

produces this output when JBD was run with the default parameters

90      0.950   9.15705
68494   0.381   3.625875
115645  0.395   4.157376
141798  0.506   2.486367

The columns here are

position of the binding event
maximum posterior probability of binding associated with this binding event
"Size" of the binding event: the sum of the binding probability times the binding strength (b_i * s_i) across the bound region. Higher scores are associated with higher ratios and higher confidence binding.

jbd2bindingevents.pl first identifies regions of elevated posterior probability. You can set this background level with the posteriorbackground command line arg; the default is .1.

Within each continuous region of elevated posterior probability, jbd2bindingevents.pl computes the maximum posterior probability and the sum of b_i * s_i (the "size"). It also computes the weighted center of the region where the weight is b_i * s_i.

The maximum posterior probability in a region gives an indication of JBD's confidence that there is binding in the region. The size of an event is a combination of the posterior probability and the binding strength and reflects a combination of JBD's confidence that there is binding in the region and the IP ratios associated with the binding event. You can have jbd2bindingevents.pl filter its output by setting posteriorthresh and/or sizethresh on the command line. The default values are .2 and 2 which seem to be reasonable for most datasets.