Text

Due by 11:59 PM on Thursday, May 28, 2020

Getting started

For this exercise, you’ll download some books from Project Gutenberg and visualize patterns in the words.

You should use an RStudio Project to keep your files well organized (either on your computer or on RStudio.cloud). Either create a new project for this exercise only, or make a project for all your work in this class.

To help you, I’ve created a skeleton R Markdown file with a template for this exercise, along with some helpful starter code. Download that here and include it in your project:

In the end, the structure of your project directory should look something like this:

your-project-name\
  13-exercise.Rmd
  your-project-name.Rproj

To check that you put everything in the right places, you can download and unzip this file, which contains everything in the correct structure:

The example from today’s session will be incredibly helpful for this exercise.

This can be as simple or as complex as you want. You don’t need to make your plots super fancy, but if you’re feeling brave, experiment with changing colors or modifying themes and theme elements.

You’ll need to insert your own code chunks where needed. Rather than typing them by hand (that’s tedious and you might miscount the number of backticks!), use the “Insert” button at the top of the editing window, or type ctrl + alt + i on Windows, or + + i on macOS.

Task 1: Reflection

Write your reflection for the day’s readings.

Task 2: Word frequencies

Download 4+ books by some author on Project Gutenberg. Jane Austen, Victor Hugo, Emily Brontë, Lucy Maud Montgomery, Arthur Conan Doyle, Mark Twain, Henry David Thoreau, Fyodor Dostoyevsky, Leo Tolstoy. Anyone. Just make sure it’s all from the same author.

Make these two plots and describe what each tell about your author’s books:

  1. Top 10 most frequent words in each book
  2. Top 10 most unique words in each book (i.e. tf-idf)

100% optional bonus fun tasks

If you want, do some other things with the text you’ve downloaded. Make a “he verbs vs. she verbs” plot. Tag the parts of speech and find the most common verbs or nouns. Try some sentiment analysis.

Turning everything in

When you’re all done, click on the “Knit” button at the top of the editing window and create an HTML or Word version (or PDF if you’ve installed tinytex) of your document. Upload that file to iCollege.