JBD's output files

JBD's output files are five tab delimited columns:
  1. position
  2. posterior probability of binding (pi_i)
  3. variance of the posterior probability of binding
  4. strength of binding (s_i)
  5. variance of the strength

For example, in a mammalian genome we saw this region with no binding

1182662 0.014   0.007   0.518   0.096
1182692 0.013   0.007   0.506   0.090
1182722 0.013   0.007   0.524   0.091
1182752 0.013   0.007   0.552   0.093
1182782 0.013   0.007   0.566   0.092
1182812 0.013   0.007   0.553   0.086
1182842 0.013   0.007   0.496   0.080
1182872 0.013   0.007   0.453   0.073
1182902 0.013   0.007   0.435   0.071
1182932 0.013   0.007   0.422   0.069

and this region with binding

37688946        0.016   0.009   1.411   0.376
37688976        0.016   0.009   1.916   0.568
37689006        0.015   0.008   2.707   0.820
37689036        0.015   0.008   3.700   1.007
37689066        0.029   0.021   4.949   1.222
37689096        0.209   0.155   6.103   1.535
37689126        0.540   0.232   6.909   1.783
37689156        0.626   0.216   7.629   2.156
37689186        0.587   0.225   8.380   2.781
37689216        0.541   0.232   9.194   3.706
37689246        0.467   0.234   9.709   4.906
37689276        0.346   0.213   9.324   5.955
37689306        0.151   0.119   7.400   5.789
37689336        0.094   0.077   5.198   4.459
37689366        0.061   0.050   3.542   3.510
37689396        0.043   0.034   2.685   2.415
37689426        0.033   0.025   2.176   1.583

In general, you should ignore the strength output unless the posterior probability of binding is above background (.1 is a safe value in most cases, though in many cases you can use a cutoff as low as .05). Wide areas of elevated posterior indicate either multiple binding events or uncertainty as to the exact binding location.

To convert JBD's output to a discrete set of binding events, you can run jbd2bindingevents.pl on an output file:

jbd2bindingevents.pl < sc_gcn4.1.1.jbd > sc_gcn4.1.1.events

This produces three tab delimited columns:

  1. position
  2. maximum posterior probability in the bound region
  3. "size": The sum of b_i * s_i over the bound region

We use a threshold on both the maximum posterior probability and the size. The maximum posterior probability roughly corresponds to confidence: how well does the data fit the model of binding at the best position. The size roughly corresponds to the enrichment ratio: how strong is the binding here.