Using topicmodels package for analysis of topics in texts

benjibex (Tracy Keys) April 9, 2017May 27, 2020 3 Comments

My vignette is about text mining and analysis, utilising the tm and topicmodels packages in R and Latent Dirichlet Allocation, to work out what the documents are written about without having to read them all!

The vignette shows you how to create a Document-Term Matrix, then uses LDA to work out what key themes are present in a body of documents (called a corpus) and assigns each document to the topics, with varying probabilities for each topic.

This tool can help a user find a relevant document without having to search for it by name, or even knowing what it was written about!

Anyway, here is the link to my vignette:

http://rpubs.com/benjibex/266565

I hope you find it useful.

Tracy

Join the Conversation

3 Comments

Malcolm says:

April 9, 2017 at 10:21 pm

Hi Tracy,
Interesting blog and covers an area I have looked at before. I wonder if it is possible to do this with PDF files, as I was faced with this issue at work and resorted to VBA to get through thousands of documents, but was pulling out numerical data rather than text. Using R may well have been another option and one long rainy day I might give it a try.

LikeLike

Reply
John says:

April 10, 2017 at 11:00 am

Hi Tracy,
I found your vignette a great summary to start working with topic modeling. I am still getting my head around this, but your post cleared some confusion for me as to “why” and then “how” we do certain things to the data to achieve the Topic models.
I’m still not quite making the connection with the probability calculations and what they mean, but I know practice will solve this.
Good Topic! – pun intended.

LikeLike

Reply
1. Tracy Keys says:
  
  April 10, 2017 at 12:04 pm
  
  Hi John,
  its basically just saying that a certain document is 30% about Topic 1, 20% about Topic 2 etc etc summing to 100%.
  I need to do more work to visualise that but ran out of time basically!
  TK
  
  LikeLike
  
  Reply

Using topicmodels package for analysis of topics in texts

Join the Conversation

Leave a comment

Cancel reply

Share this:

Related

Join the Conversation

Leave a comment

Cancel reply