edu.mit.nlp.segmenter.dp
Class DPDocument

java.lang.Object
  extended by edu.mit.nlp.segmenter.Document
      extended by edu.mit.nlp.segmenter.dp.DPDocument

public class DPDocument
extends Document

Extends Document with some methods specifically for the DP implementation of Bayesian segmentation.


Field Summary
 boolean m_dcm
           
 boolean m_int_counts
           
 
Fields inherited from class edu.mit.nlp.segmenter.Document
m_words
 
Constructor Summary
DPDocument(double[][] sents, int N, boolean dcm)
           
 
Method Summary
 FastDigamma getDigamma()
           
 FastGamma getGamma()
           
static void main(String[] argv)
          Just does a unit test on some stuff
protected  void makeCumulCounts()
          Builds up the cumulative counts, a representation that facilitates fast computation later.
 double segDCMGradient(int start, int end, double prior)
          compute the gradient of the log-likelihood for a segment, under the DCM model
 double segLL(int start, int end, double prior)
          compute the log-likelihood of a segment
protected  double segLLDCM(int start, int end, double prior)
          compute the log likelihood of a segment under the DCM model
 double segLLExp(int start, int end, double logprior)
          compute the log-likelihood of a segment, given the log of the prior
 double segLLGradientExp(int start, int end, double logprior)
          compute the gradient of the log-likelihood for a segment, under the DCM model
protected  double segLLMAP(int start, int end, double prior)
          compute the log likelihood of a segment under the MAP language model
 double segMAPGradient(int start, int end, double prior)
          compute the gradient of the log-likelihood for a segment, under the MAP language model.
 void setDigamma(FastDigamma fastDigamma)
          If you have multiple documents, you might want to share the cache for the digamma function across all documents.
 void setGamma(FastGamma fastGamma)
          If you have multiple documents, you might want to share the cache for the gamma function across all documents.
 void setPrior(double prior)
           
 
Methods inherited from class edu.mit.nlp.segmenter.Document
D, D2, getSPs, getThetas, N, printDurs, setPDur, T
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_dcm

public boolean m_dcm

m_int_counts

public boolean m_int_counts
Constructor Detail

DPDocument

public DPDocument(double[][] sents,
                  int N,
                  boolean dcm)
Parameters:
sents - the sentences in the document
N - the number of segments. I forget what you do if this is unknown
dcm - whether you're using the DCM distribution (marginalizing the LMs). I haven't tested it with this set to false in a long time.
Method Detail

setGamma

public void setGamma(FastGamma fastGamma)
If you have multiple documents, you might want to share the cache for the gamma function across all documents. This lets you tell it to use a specific FastGamma cache.

Parameters:
fastGamma - the caching fastGamma object

getGamma

public FastGamma getGamma()
Returns:
the FastGamma caching gamma implementation used here

setDigamma

public void setDigamma(FastDigamma fastDigamma)
If you have multiple documents, you might want to share the cache for the digamma function across all documents. This lets you tell it to use a specific FastDigamma cache.

Parameters:
fastDigamma - the caching fastDigamma object

getDigamma

public FastDigamma getDigamma()
Returns:
the FastDigamma caching digamma implementation used here

setPrior

public void setPrior(double prior)
Parameters:
prior - the value of the symmetric Dirichlet prior

makeCumulCounts

protected void makeCumulCounts()
Builds up the cumulative counts, a representation that facilitates fast computation later.


segLLDCM

protected double segLLDCM(int start,
                          int end,
                          double prior)
compute the log likelihood of a segment under the DCM model

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
prior - the symmetric dirichlet prior to use

segLLMAP

protected double segLLMAP(int start,
                          int end,
                          double prior)
compute the log likelihood of a segment under the MAP language model

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
prior - the symmetric dirichlet prior to use this could be sped up by keeping caches of the log partitions and the log counts.

segLL

public double segLL(int start,
                    int end,
                    double prior)
compute the log-likelihood of a segment

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
prior - the symmetric dirichlet prior to use

segLLExp

public double segLLExp(int start,
                       int end,
                       double logprior)
compute the log-likelihood of a segment, given the log of the prior

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
logprior - the log of the symmetric dirichlet prior to use

segDCMGradient

public double segDCMGradient(int start,
                             int end,
                             double prior)
compute the gradient of the log-likelihood for a segment, under the DCM model

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
prior - the log of the symmetric dirichlet prior to use

segMAPGradient

public double segMAPGradient(int start,
                             int end,
                             double prior)
compute the gradient of the log-likelihood for a segment, under the MAP language model. not implemented

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
prior - the symmetric dirichlet prior to use

segLLGradientExp

public double segLLGradientExp(int start,
                               int end,
                               double logprior)
compute the gradient of the log-likelihood for a segment, under the DCM model

Parameters:
start - the index of the first sentence in the segment
end - the index of the last sentence in the segment
logprior - the log of the symmetric dirichlet prior to use

main

public static void main(String[] argv)
Just does a unit test on some stuff



Copyright © 2008 MIT. All Rights Reserved.