Determining the Significance of Binding Events
Since JBD produces probabilities rather than p-values, one needs another
way to obtain traditional p-values or false-positive rates. There are
two approaches. In both cases, you'll probably want to approach the
problem by filtering the output of jbd2bindingevents.pl based on the
posterior probability and the "size."
Use a set of known non-bound regions
If you have a set of genes or regions in which you know there should be
no binding events, you can simply determine the number of binding events
in those regions at some threshold and divide to get a false positive
rate at that threshold. Determining a p-value for each binding event is
a little bit harder because there are two axes on which you can score
each event (probability of binding and size). We've generally used the
size alone to compute p-values.
Use randomized data
If you don't have a large enough set of regions that are known to
lack binding, you can run JBD on a randomized dataset. There are a few
things to consider.
- Randomize by probe or everything: if you randomize everything, then
you break the correlation that you expect to see between replicates.
This will tend to yield fewer binding calls on the randomized set, but
isn't a very fair comparison. A better technique is to scramble the
locations of the probes, keeping the observations for each replicate
together.
- Two axes: the same problem of having two axes (probability and size)
along which you can compare discrete binding events.
- Randomize each dataset: it's probably not a good comparison to use
the randomized version of dataset A to get p-values for dataset B. Use
a randomized version of B instead.
- Frequent binding: in datasets with lots of binding, you'll see that
the p-values may be lower than you'd expect. With few binding events,
the randomization will split up the probes that are enriched surrounding
a binding event. With a lot of binding, though, it's more likely that
an enriched probe will end up next to another enriched probe in the
scrambled version, leading JBD to detect a strong binding signal. This
is a common pitfal of binding analyses based on a fixed idea of how much
binding there should be in a dataset.