TY - GEN
T1 - A topic-oriented syntactic component extraction model for social media
AU - Xu, Yanxiang
AU - Luo, Tiejian
AU - Xu, Guandong
AU - Pan, Rong
PY - 2012/1/1
Y1 - 2012/1/1
N2 - Topic-oriented understanding is to extract information from various language instances, which reflects the characteristics or trends of semantic information related to the topic via statistical analysis. The syntax analysis and modeling is the basis of such work. Traditional syntactic formalization approaches widely used in natural language understanding could not be simply applied to the text modeling in the context of topic-oriented understanding. In this paper, we review the information extraction mode, and summarize its inherent relationship with the "Subject- Predicate" syntactic structure in Aryan language. And we propose a syntactic element extraction model based on the "topic-description" structure, which contains six kinds of core elements, satisfying the desired requirement for topic-oriented understanding. This paper also describes the model composition, the theoretical framework of understanding process, the extraction method of syntactic components, and the prototype system of generating syntax diagrams. The proposed model is evaluated on the Reuters 21578 and SocialCom2009 data sets, and the results show that the recall and precision of syntactic component extraction are up to 93.9% and 88%, respectively, which further justifies the feasibility of generating syntactic component through the word dependencies.
AB - Topic-oriented understanding is to extract information from various language instances, which reflects the characteristics or trends of semantic information related to the topic via statistical analysis. The syntax analysis and modeling is the basis of such work. Traditional syntactic formalization approaches widely used in natural language understanding could not be simply applied to the text modeling in the context of topic-oriented understanding. In this paper, we review the information extraction mode, and summarize its inherent relationship with the "Subject- Predicate" syntactic structure in Aryan language. And we propose a syntactic element extraction model based on the "topic-description" structure, which contains six kinds of core elements, satisfying the desired requirement for topic-oriented understanding. This paper also describes the model composition, the theoretical framework of understanding process, the extraction method of syntactic components, and the prototype system of generating syntax diagrams. The proposed model is evaluated on the Reuters 21578 and SocialCom2009 data sets, and the results show that the recall and precision of syntactic component extraction are up to 93.9% and 88%, respectively, which further justifies the feasibility of generating syntactic component through the word dependencies.
UR - http://www.scopus.com/inward/record.url?scp=84870751228&partnerID=8YFLogxK
U2 - 10.1007/978-94-007-5086-9_29
DO - 10.1007/978-94-007-5086-9_29
M3 - Article in proceeding
AN - SCOPUS:84870751228
SN - 978-94-007-5085-2
VL - 182
T3 - Lecture Notes in Electrical Engineering
SP - 221
EP - 229
BT - Human Centric Technology and Service in Smart Space
A2 - Park, James J.
A2 - Jin, Qun
A2 - Yeo, Martin Sang-Soo
A2 - Hu, Bin
PB - Springer Publishing Company
ER -