publication . Doctoral thesis . 2015

Co-occurrence rate networks: towards separate training for undirected graphical models

Zhu, Zhemin;
Open Access
  • Published: 16 Oct 2015
  • Publisher: Universiteit Twente
  • Country: Netherlands
Abstract
Dependence is a universal phenomenon which can be observed everywhere. In machine learning, probabilistic graphical models (PGMs) represent dependence relations with graphs. PGMs find wide applications in natural language processing (NLP), speech processing, computer vision, biomedicine, information retrieval, etc. Many traditional models, such as hidden Markov models (HMMs), Kalman filters, can be put under the umbrella of PGMs. The central idea of PGMs is to decompose (factorize) a joint probability into a product of local factors. Learning, inference and storage can be conducted efficiently over the factorization representation. In this thesis, we propose a n...
Subjects
free text keywords: EWI-26925, IR-97338, METIS-311922
Related Organizations
Download from
Universiteit Twente Repository
Doctoral thesis . 2015
Provider: NARCIS

1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Probabilistic Graphical Models 11 2.1 Motivation: the Decomposition Strategy . . . . . . . . . . . . . . . 11 2.2 Conditional Independence and Probability Factorization . . . . 15 2.3 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Markov Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Co-occurrence Rate Networks 43 3.1 Co-occurrence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3 The Hypertree Representation . . . . . . . . . . . . . . . . . . . . 62 3.4 Co-occurrence Rate Networks . . . . . . . . . . . . . . . . . . . . 66 3.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Two-step Separate Training for CRNs 71 4.1 Maximum Likelihood Estimation of Co-occurrence Rate Networks 71 4.2 Separate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Experiments on Chain-structured CRNs 79 5.1 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Part-of-speech Tagging . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Related Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4 CRNs are Immune to the Label Bias Problem . . . . . . . . . . . 89 5.5 Training and Decoding . . . . . . . . . . . . . . . . . . . . . . . . 90 5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 A Review of Open Relation Extraction 97 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2 Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.3 Technical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7 SimpleIE: a Simplification Model for Open Relation Extraction 109 7.1 Motivated by Examples . . . . . . . . . . . . . . . . . . . . . . . 110 7.2 The Model: SimpleIE . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.3 Wikipedia Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.5 Noun Phrase Recognition . . . . . . . . . . . . . . . . . . . . . . 130 7.6 Related Work on Sentence Simplification . . . . . . . . . . . . . . 131 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Conclusions and Future Work 135 8.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.2 Research Questions Revisited . . . . . . . . . . . . . . . . . . . . 136 8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

A Appendix 145 A.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . 145 A.2 Proof of Il(G) , FBN(G) . . . . . . . . . . . . . . . . . . . . . . 145 A.3 Proof of the Hammersley-Clifford Theorem . . . . . . . . . . . . 146

Abstract
Dependence is a universal phenomenon which can be observed everywhere. In machine learning, probabilistic graphical models (PGMs) represent dependence relations with graphs. PGMs find wide applications in natural language processing (NLP), speech processing, computer vision, biomedicine, information retrieval, etc. Many traditional models, such as hidden Markov models (HMMs), Kalman filters, can be put under the umbrella of PGMs. The central idea of PGMs is to decompose (factorize) a joint probability into a product of local factors. Learning, inference and storage can be conducted efficiently over the factorization representation. In this thesis, we propose a n...
Subjects
free text keywords: EWI-26925, IR-97338, METIS-311922
Related Organizations
Download from
Universiteit Twente Repository
Doctoral thesis . 2015
Provider: NARCIS

1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Probabilistic Graphical Models 11 2.1 Motivation: the Decomposition Strategy . . . . . . . . . . . . . . . 11 2.2 Conditional Independence and Probability Factorization . . . . 15 2.3 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Markov Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Co-occurrence Rate Networks 43 3.1 Co-occurrence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3 The Hypertree Representation . . . . . . . . . . . . . . . . . . . . 62 3.4 Co-occurrence Rate Networks . . . . . . . . . . . . . . . . . . . . 66 3.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Two-step Separate Training for CRNs 71 4.1 Maximum Likelihood Estimation of Co-occurrence Rate Networks 71 4.2 Separate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Experiments on Chain-structured CRNs 79 5.1 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Part-of-speech Tagging . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Related Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4 CRNs are Immune to the Label Bias Problem . . . . . . . . . . . 89 5.5 Training and Decoding . . . . . . . . . . . . . . . . . . . . . . . . 90 5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 A Review of Open Relation Extraction 97 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2 Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.3 Technical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7 SimpleIE: a Simplification Model for Open Relation Extraction 109 7.1 Motivated by Examples . . . . . . . . . . . . . . . . . . . . . . . 110 7.2 The Model: SimpleIE . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.3 Wikipedia Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.5 Noun Phrase Recognition . . . . . . . . . . . . . . . . . . . . . . 130 7.6 Related Work on Sentence Simplification . . . . . . . . . . . . . . 131 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Conclusions and Future Work 135 8.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.2 Research Questions Revisited . . . . . . . . . . . . . . . . . . . . 136 8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

A Appendix 145 A.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . 145 A.2 Proof of Il(G) , FBN(G) . . . . . . . . . . . . . . . . . . . . . . 145 A.3 Proof of the Hammersley-Clifford Theorem . . . . . . . . . . . . 146

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue