Data Mining

In machine learning and natural language processing, topic modeling is a text-mining technique for discovering the latent "topics" hidden inside of the text body. I mainly use topic modeling for applications such as music mining and political blogs analysis.

Representative Work

[1] Wu, Q., Fokoue, E. (2018). Naive Dictionary On Musical Corpora: From Knowledge Representation To Pattern Recognition. arXiv:1811.12802 (Preprint) [link] 

[2] Wu, Q. (2018). Statistical Aspects of Music Mining: Naive Dictionary Representation. Thesis. RIT Scholar Works. [link]



Music Mining

Extensive studies have been conducted on both musical scores and audio tracks of western classical music with the finality of learning and detecting the key in which a particular piece of music was played. Both the Bayesian Approach and modern unsupervised learning via latent Dirichlet allocation have been used for such learning tasks. We venture out of the traditional western classical music and embrace and explore other music genres. We specifically employ Bayesian techniques and modern topic modeling methods to explore tasks such as: automatic improvisation detection, genre identification, and key detection.

More details can be found here: [link]

Text Mining

This ongoing project mainly explores the semantic structure via stemming and n-gramming with diverse topic model architectures. We use different Topic Modeling approaches on Political Blogs to see the performance of diverse methods. We perform N-gramming on the corpus collected from 2012 Political Blog Posts. We implement n-gram in the stemmed text after removing stop words, and study the improvement of topic modeling results via the variation of the posterior probability distributions of different MCMC runs. 

More details can be found here: [link]

 


Other Research Work

Using Format