Image of a hub, © Paul Watson, CC  BY-NC-SA 2.0 The Linguistic Teaching resources Hub
Image © Paul Watson, Licence CC BY-NC-SA 2.0

Python Programming for the Humanities

* Folgert Karsdorp * Maarten van Gompel * Matt Munson *

Keywords: python, NLP, text processing

https://github.com/sonofmun/FK-python-course/blob/master/README.md

The programming language Python is widely used within many scientific domains nowadays and the language is readily accessible to scholars from the Humanities. Python is an excellent choice for dealing with (linguistic as well as literary) textual data, which is so typical of the Humanities. In this book you will be thoroughly introduced to the language and be taught to program basic algorithmic procedures. The book expects no prior experience with programming, although we hope to provide some interesting insights and skills for more advanced programmers as well. The book consists of 10 chapters. Chapter 5 and Chapter 6 are still in draft status and not ready for use.

Chapter 1 starts with the very basics where we will try to whet your appetite. You will be asked to do many short quizes to test whether you really understand the material. Chapter 2 will introduce you to the task of text processing. You will learn how to read files from your computer, how to clean them and compute a frequency distribution over words. Chapter 3 deals with preprocessing text. You will learn some of the elementary tools to analyse your data. Chapter 4 is a more theoretical chapter that explains to you some of the basic programming principles, common practices and where to find documentation. In Chapter 5 things are becoming increasingly difficult. First, you will write a program to compute the readability of texts. Next, you will implement the basic algorithm that is behind authorship attribution! In Chapter 6 we will introduce you to the concept of Object Oriented Programming. You will implement a network structure with which you can analyze relations between people on Twitter. From Chapter 7 onwards, we will start working on more real applications. In Chapter 7 we will work on systems for searching through collections of text. We introduce you to the field of Information Retrieval and build a simple information retrieval system. This chapter furthers your knowledge about Object Oriented Programming. In Chapter 8 we create a complete web application to search through your own library of PDF files. This will be our first real application ready for use by end-users. The chapter introduces you to many modules available in the standard library as well as third-party modules. Chapter 9 will introduce you to some of the more advanced techniques used in automatic classification. We will implement a naive bayes classifier, show you a number of evaluation metrics and strategies and briefly address the question of parameter optimization. Chapter 10 focusses on hierarchical clustering, one of the important methods for unsupervized learning. We explain the basic methods for doining hierarchical clustering and create a simple implementation in Python.

This document describes the installation procedure for all the software needed for the Python class. If your stuck anywhere in the installation procedure, please do not hesitate to contact Folgert Karsdorp (folgert.karsdorp@meertens.knaw.nl).

Feedback

Sorry, there is no feedback available. Be the first one to provide feedback!

Resource details

Institution: Meertens Institute
Year of publication: 2014
Language: english
Type: Course
Audience: Humanists, Linguists
Level: intermediate
Prerequisites:

None

Media: application/x-ipynb
Objective:

Teaching NLP in Python

Licence:
Access: open
Creation date: Monday, 13 May 2019 14:24:17
Last modified: Wednesday, 24 April 2024 06:38:03
BibTeX type: @misc
BibTeX entry:
@misc(TeLeMaCo:417,
author = "Karsdorp, Folgert and Gompel, Maarten van and Munson, Matt",
title = "{P}ython {P}rogramming for the {H}umanities",
year = "2014",
url = "https://github.com/sonofmun/FK-python-course/blob/master/README.md"
)

Helpdesk Button