“We can infer another key advantage of Markov Models from this example. To express the probabilities of all possible states for any length of Markov Chain, we only need s^2 numbers stored, with ‘s’ representing the number of possible states.”— Chris Natale, chrisnatale.info
“Markov Chain, an OMM is a way of estimating the probability of a future series of events using only the previous event for each item in the series as inference. If a data structure exhibits this behavior, we can say it possesses the Markov Property.”— Chris Natale, chrisnatale.info
“Use conventional classification algorithms to classify substrings of document as ‘to be extracted’ or not.”— Christopher Manning, web.stanford.edu
“For these algorithms to work, we need to select some features from text. Some of the commonly used features for NER are unigram, left and right bigrams and trigrams, part of speech tags, whether the word is capitalized or not, is the first character capitalized or not, whether the word is surrounded…”— Jasneet Sabharwal, quora.com