Keywords: R, text processing, Obama, State of Union Adress
https://programminghistorian.org/lessons/basic-text-processing-in-r
A substantial amount of historical data is now available in the form of raw, digitized text. Common examples include letters, newspaper articles, personal notes, diary entries, legal documents and transcribed speeches. While some stand-alone software applications provide tools for analyzing text data, a programming language offers increased flexibility to analyze a corpus of text documents. In this tutorial we guide users through the basics of text analysis within the R programming language. The approach we take involves only using a tokenizer that parses text into elements such as words, phrases and sentences. By the end of the lesson users will be able to:
All of these will be demonstrated on a dataset from the text of United States Presidential State of the Union Addresses.
Sorry, there is no feedback available. Be the first one to provide feedback!
Institution: | The Programming Historian |
Year of publication: | 2017 |
Language: | english |
Type: | Tutorial |
Audience: | historians, philologists, linguists |
Level: | intermediate |
Prerequisites: | |
Media: | text/html |
Objective: | use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus |
Licence: | cc-by-4.0 |
Access: | open |
Creation date: | Wednesday, 9 August 2017 12:04:51 |
Last modified: | Sunday, 21 April 2024 10:44:13 |
BibTeX type: | @misc |
@misc(TeLeMaCo:385, author = "Arnold, Taylor and Tilton, Lauren", title = "{B}asic {T}ext {P}rocessing in {R}", year = "2017", url = "https://programminghistorian.org/lessons/basic-text-processing-in-r" )