gravatar

menghsien

Meng-Hsien Shih

Recently Published

Exploring Words in CHILDES Corpus
In this paper we try to model word acquisition by identifying new words based on early acquired words. While known words can account for 61% of words in the 3-year-old utterances, 15% of words can be identified simply based on the 2-year-old vocabulary. If we consider the acquisition of words at 2 year and 1 month old from the 2-year-old vocabulary, known words can account for 67% of words, and 13% can be automatically identified. From the results, word acquisition could be mostly based on early acquired vocabulary.
A Corpus-based Sentiment Analysis of Sarcasm
Sarcastic expressions have been a problem in sentiment classification. The purpose of our study is to use data acquired from Twitter to examine possible features of sarcasm in order to help improve accuracy in sarcasm detection. Our data come from tweets with the hashtag of #sarcasm, which indicates the speaker’s intention to be sarcastic. We have testified the functions of the following features: (1) the emotion performance of sarcastic tweets is more positive, (2) the relationship between the original tweets and the “@To User” tweets is with the same sentiment score, (3) the use of degree adverbs in sarcastic tweets, and (4) the high frequent words used in sarcastic tweets. We also examine a pattern of sentiment change within a sarcastic expression. Based on the results, we carried out two classification tasks of sarcasm detection and obtained a best performance of 83.3%, which shows that degree adverbs and the sentiment pattern within a sarcastic tweet proved to play an important role in the sentiment analysis of sarcasm.