|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.mit.nlp.segmenter.mcmc.CuCoSeg
public class CuCoSeg
Nested Class Summary | |
---|---|
protected class |
CuCoSeg.PriorOptimizer
An LBFGS optimizer to search the parameter space |
Constructor Summary | |
---|---|
CuCoSeg()
|
Method Summary | |
---|---|
void |
addCountsForSentence(int doc,
int t)
addCountsForSentence i -- the document j -- the sentenec uses the segs[] variable: complexity: K[i] + N[i][j], where K is the number of segs, and N[i][j] is the number of words in sent j |
protected void |
changeCountsForSentence(int doc,
int t,
int sign)
|
protected double |
computeCueLogProb()
computeCueLogProb() computes the log-likelihood of the cue phrase counts |
double |
computeLogProb()
computes the overall log probability |
double |
computeLogProb(int doc,
int seg)
computes the portion of the log-probability associated with a change to segment seg in doc considers the b-counts, o-counts, and the i-counts for seg, seg-1, and seg+1 (where applicable) |
double |
computeXtraProb()
|
Empirical |
getMoveProposal(int doc,
int seg)
generates an empirical distribution over moves of a given segmentation point |
edu.mit.nlp.segmenter.mcmc.CuCoSeg.Unigram[] |
getSortedUnigrams(LexMap lexmap,
int[] b_counts,
int[] non_b_counts)
|
void |
initialize(String config_filename)
Do whatever initialize you need from this config file |
void |
initSegs(String segfilename)
initSegs -- load initial segmentation guesses from a file. |
protected double |
minkaApprox(int[] counts)
sets the prior on the cue phrase language model, using the approximation proposed by Minka in "Estimating a Dirichlet Distribution" (eq 114) |
protected void |
printStatus(PrintStream out,
int i)
prints a status message. |
List[] |
segmentTexts(MyTextWrapper[] texts,
int[] K)
massively long method that segments all the texts |
void |
setDCMPrior(FastDCM dcm,
double prior)
Set the symmetric prior on the DCM language models |
void |
setDebug(boolean debug)
tells your d00d to set its debug flag |
void |
setPDurs()
Since durations are discrete, we keep a cache of the probability of each duration length. |
void |
subCountsForSentence(int doc,
int t)
|
void |
updateCounts(int lambda_b)
update the counts given a new lambda parameter |
void |
updateSegmentation(int doc,
int seg,
int amount)
update the segmentation move the segpt in the doc by the amount will update segs[] and also all the counts |
static boolean |
validMove(List segpoints,
int seg,
int amount)
assesses whether a given move is valid (doesn't cross segment boundaries) |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public CuCoSeg()
Method Detail |
---|
public void initialize(String config_filename)
Segmenter
initialize
in interface Segmenter
config_filename
- the path to the config filepublic void setDebug(boolean debug)
Segmenter
setDebug
in interface Segmenter
public void initSegs(String segfilename)
initSegs
in interface InitializableSegmenter
public List[] segmentTexts(MyTextWrapper[] texts, int[] K)
segmentTexts
in interface Segmenter
texts
- all the texts in the datasetK
- number of segments per document
protected void printStatus(PrintStream out, int i)
iteration LL [A1 A2 A3] [theta0 phi_b0 dispersion] Pk WD A1 = num moves accepted since last message A2 = proportion of moves accepted since last message A3 = f(.5), where f() is the annealing function theta0 = symmetric dirichlet prior on language models phi_b0 = symmetric dirichlet prior on cue phrases dispersion = dispersion parameter on segment durations (not used) Pk = metric of segmentation quality WD = other metric of segmentation quality
out
- the printstream to write the message toi
- the iteration numberpublic edu.mit.nlp.segmenter.mcmc.CuCoSeg.Unigram[] getSortedUnigrams(LexMap lexmap, int[] b_counts, int[] non_b_counts)
public void setPDurs()
public void setDCMPrior(FastDCM dcm, double prior)
dcm
- the DCM cacheprior
- the new priorprotected double minkaApprox(int[] counts)
public double computeXtraProb()
public double computeLogProb()
public double computeLogProb(int doc, int seg)
protected double computeCueLogProb()
public static boolean validMove(List segpoints, int seg, int amount)
public Empirical getMoveProposal(int doc, int seg)
public void updateSegmentation(int doc, int seg, int amount)
public void addCountsForSentence(int doc, int t)
public void subCountsForSentence(int doc, int t)
protected void changeCountsForSentence(int doc, int t, int sign)
public void updateCounts(int lambda_b)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |