ES_Sox2_2

Significant Events  : 8263
Insignificant Events: 2065
Filtered Events      : 4

Total positive sequences: 8263

K-merClusterOffsetPos HitNeg HitHGP
---AACAAAGG0-483732-213.6
---AACAATAG0-462519-164.2
---AACAATGG0-458520-150.8
--GAACAAAG0-559538-134.7
----ACAATAGA0-351416-134.3
----ACAAAGGA0-357136-129.7
-AGAACAAT0-652924-129.5
---AACAAAAG0-458854-118.9
--GAACAATA0-544015-113.2
----ACAATGGG0-343718-108.8
--AAACAAAG0-554050-108.8
----ACAATGGC0-342716-108.2
--GAACAATG0-544521-107.9
CAGAACAA0-743523-102.9
----ACAAAGGG0-342120-101.9
CATAACAA0-738211-100.8
-AGAACAAA0-649046-98.1
----ACAAAGGC0-343232-93.5
---GACAAAGG0-441126-93.0
---GACAATGG0-436818-88.6
Explanation:

On the top are the counts of binding events called by GEM (round_1) or GPS (round_2).

The line of "positive sequences" shows how many of the binding events are used for motif discovery. This number is limited by --k_seqs

Next are the links to the k-mer set motif (KSM) file, k-mer alignment file, and PWM/PFM file.

Next 20 top ranking k-mers are listed in the table.

  • K-mer: aligned k-mer
  • Cluster: the cluster ID of this k-mer, cluster 0 is the primary motif
  • Offset: the offset of this k-mer relative to the expected binding position (for the top k-mer, it is the 4th base of AACAAAGG). The k-mer position is defined as the left most base.
  • Pos Hit: Number of positive sequences containing this k-mer
  • Net Hit: Number of negative sequences containing this k-mer
  • HGP: log10 of hypergeometric p-value of this k-mer


Motif PWMMotif spatial distribution (w.r.t. primary PWM)
rc
PWM: 7.91/13.02, hit=6483+/888-, hgp=1e-1841.8
rc
PWM: 7.96/12.92, hit=1058+/364-, hgp=1e-85.7
rc
PWM: 7.68/12.65, hit=467+/132-, hgp=1e-46.5
Explanation:

First are the motifs in PWM format, each correspond to a k-mer set motif (KSM).

Below the motif logo (clicking on rc gives the reverse compliment PWM logo) are:

  • Optimal PWM score / Maximum PWM score. The optimal score is the score that gives most significant p-value (hgp) when scanning the positive and negative sequences.
  • Hit: Number of positive sequences containing a match to this PWM / Number of negative sequences containing a match to this PWM
  • hpg: Hypergeometric p-value of this PWM computed from the pos/neg hit counts given the number of total positive/negative sequences
The motif spatial distribution plots show the relative positions of each motif PWM match relative to the primary motif PWM match if they are both present in the same positive sequences.
  • The primary PWM (anchoring motif) is at position 0. The relative position of the PWM of interest is plotted.
  • The color is blue when the two motifs are in the same orientation (as display in the motif logos). The color is red when the motifs are in the opposite orientation.
  • The number pair represents the position and the counts of secondary motif at that position. For example, in the second plot, (-7, 150) means when the primary motif (Sox2) is at position 0, then there are 150 instances of secondary motif (Oct4, reverse compliment orientation) at the -7 position.
  • Total number of coocurrence of both motif instances.

Thus the second plot says, there are 722 cases of coocurrence of Sox2 and Oct4 motifs, and 150 of them are detected to have a motif offset of -7. This is consistent with the fact that Sox2 and Oct4 usually bind as heterodimer.

Note: For the motif spatial distribution plot, GEM reports all the motif instances. There could be multiple motif instances on both strands. And the PWM hit is counting the sequences (at most one hit per sequence). Thus the total count in the motif spatial distribution plot may be larger than PWM hit.