Shi, QinfengAltun, YaseminSmola, AlexanderVishwanathan, S2015-12-10June 28-30http://hdl.handle.net/1885/49701In this paper, we study the problem of automatically segmenting written text into paragraphs. This is inherently a sequence labeling problem, however, previous approaches ignore this dependency. We propose a novel approach for automatic paragraph segmentation, namely training Semi-Markov models discriminatively using a Max-Margin method. This method allows us to model the sequential nature of the problem and to incorporate features of a whole paragraph, such as paragraph coherence which cannot be used in previous models. Experimental evaluation on four text corpora shows improvement over the previous state-of-the art method on this task.Keywords: Experimental evaluation; Paragraph segmentation; Semi Markov model; Sequence Labeling; State of the art; Text corpora; Written texts; Computational linguistics; Markov processes; Natural language processing systemsSemi-Markov models for sequence segmentation20072016-02-24