|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.mit.nlp.segmenter.dp.DPSeg
public class DPSeg
This class implements the dynamic programming Bayesian segmentation, for both DCM and MAP language models.
Now with EM estimation of priors. Note that we use log-priors everywhere. The reason is that the log of the prior is in [-inf,inf], while the prior itself is in [0,inf]. Since my LBFGS engine doesn't take constraints, it's better to search in log space. This requires only a small modification to the gradient computation.
Nested Class Summary | |
---|---|
protected class |
DPSeg.PriorOptimizer
A class for LBFGS optimization of the priors |
Field Summary | |
---|---|
boolean |
m_debug
|
Constructor Summary | |
---|---|
DPSeg(DPDocument[][] docs,
int[][] truths)
|
Method Summary | |
---|---|
double[] |
computeGradient(double[] logpriors)
computes the gradient of the likelihood, across the whole dataset. |
protected double[] |
computePDur(int T,
double edur,
double log_dispersion)
|
double |
computeTotalLL(double[] logpriors)
compute the loglikelihood for the whole dataset. |
double[] |
getParams()
|
int[][] |
getResponses()
get the segmentations |
void |
printSegs()
|
SegResult[] |
segEM(double[] init_params)
segEM estimates the parameters using a form of hard EM it computes the best segmentation given the current parameters, then does a gradient-based search for new parameters, and iterates. |
SegResult[] |
segment(double[] params)
segment each document in the dataset. |
protected SegResult[] |
segmentKnown(double[] params)
segment in the case that the number of segments per doc is known. same arguments as segment(double[]) |
protected SegResult[] |
segmentUnknown(double[] params)
segment in the case of an unknown number of segments. same arguments as segment(double[]) |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public boolean m_debug
Constructor Detail |
---|
public DPSeg(DPDocument[][] docs, int[][] truths)
docs
- The documents to segment. It's a 2D array for the multimodal segmentation case,
but if you're just doing text then it will be [N][1].truths
- The ground truth segmentations. [N][], with each row being an another array of ints.
I'd like to refactor so that this isn't necessary, but at the moment it is.Method Detail |
---|
public SegResult[] segEM(double[] init_params)
public void printSegs()
protected double[] computePDur(int T, double edur, double log_dispersion)
public SegResult[] segment(double[] params)
params
- the (log) parameters
the last entry in the input array
is the log of the dispersion parameter for the duration distribution.
the other ones are the logs of the priors (for each modality)
protected SegResult[] segmentUnknown(double[] params)
segment(double[])
protected SegResult[] segmentKnown(double[] params)
segment(double[])
public double computeTotalLL(double[] logpriors)
public double[] computeGradient(double[] logpriors)
public int[][] getResponses()
public double[] getParams()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |