Determining the Significance of Binding Events

Since JBD produces probabilities rather than p-values, one needs another way to obtain traditional p-values or false-positive rates. There are two approaches. In both cases, you'll probably want to approach the problem by filtering the output of based on the posterior probability and the "size."

Use a set of known non-bound regions

If you have a set of genes or regions in which you know there should be no binding events, you can simply determine the number of binding events in those regions at some threshold and divide to get a false positive rate at that threshold. Determining a p-value for each binding event is a little bit harder because there are two axes on which you can score each event (probability of binding and size). We've generally used the size alone to compute p-values.

Use randomized data

If you don't have a large enough set of regions that are known to lack binding, you can run JBD on a randomized dataset. There are a few things to consider.

  1. Randomize by probe or everything: if you randomize everything, then you break the correlation that you expect to see between replicates. This will tend to yield fewer binding calls on the randomized set, but isn't a very fair comparison. A better technique is to scramble the locations of the probes, keeping the observations for each replicate together.
  2. Two axes: the same problem of having two axes (probability and size) along which you can compare discrete binding events.
  3. Randomize each dataset: it's probably not a good comparison to use the randomized version of dataset A to get p-values for dataset B. Use a randomized version of B instead.
  4. Frequent binding: in datasets with lots of binding, you'll see that the p-values may be lower than you'd expect. With few binding events, the randomization will split up the probes that are enriched surrounding a binding event. With a lot of binding, though, it's more likely that an enriched probe will end up next to another enriched probe in the scrambled version, leading JBD to detect a strong binding signal. This is a common pitfal of binding analyses based on a fixed idea of how much binding there should be in a dataset.