Image of a hub, © Paul Watson, CC  BY-NC-SA 2.0 The Linguistic Teaching resources Hub
Image © Paul Watson, Licence CC BY-NC-SA 2.0

Basic Text Processing in R

* Taylor Arnold * Lauren Tilton *

Keywords: R, text processing, Obama, State of Union Adress

https://programminghistorian.org/lessons/basic-text-processing-in-r

A substantial amount of historical data is now available in the form of raw, digitized text. Common examples include letters, newspaper articles, personal notes, diary entries, legal documents and transcribed speeches. While some stand-alone software applications provide tools for analyzing text data, a programming language offers increased flexibility to analyze a corpus of text documents. In this tutorial we guide users through the basics of text analysis within the R programming language. The approach we take involves only using a tokenizer that parses text into elements such as words, phrases and sentences. By the end of the lesson users will be able to:

  • employ exploratory analyses to check for errors and detect high-level patterns;
  • apply basic stylometric methods over time and across authors;
  • approach document summarization to provide a high-level description of the elements in a corpus.

All of these will be demonstrated on a dataset from the text of United States Presidential State of the Union Addresses.

Feedback

Sorry, there is no feedback available. Be the first one to provide feedback!

Resource details

Institution: The Programming Historian
Year of publication: 2017
Language: english
Type: Tutorial
Audience: historians, philologists, linguists
Level: intermediate
Prerequisites:

R Basics with Tabular Data

Media: text/html
Objective:

use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus

Licence: cc-by-4.0
Access: open
Creation date: Wednesday, 9 August 2017 12:04:51
Last modified: Sunday, 21 April 2024 10:44:13
BibTeX type: @misc
BibTeX entry:
@misc(TeLeMaCo:385,
author = "Arnold, Taylor and Tilton, Lauren",
title = "{B}asic {T}ext {P}rocessing in {R}",
year = "2017",
url = "https://programminghistorian.org/lessons/basic-text-processing-in-r"
)
  
  

Helpdesk Button