Image of a hub, © Paul Watson, CC  BY-NC-SA 2.0 The Linguistic Teaching resources Hub
Image © Paul Watson, Licence CC BY-NC-SA 2.0

Cleaning OCR’d text with Regular Expressions

* Laura Turner O'Hara *

Keywords: OCR, Regular Expressions, python

https://programminghistorian.org/en/lessons/cleaning-ocrd-text-with-regular-expressions

This tutorial explains problems coming with OCR’s texts, introduces regular expressions and than the Python package re for doing regular expressions in Python. The functions re.search and re.sub are explained in detail with a concrete application.

Feedback

Sorry, there is no feedback available. Be the first one to provide feedback!

Resource details

Institution: The Programming Historian
Year of publication: 2013
Language: english
Type: Tutorial
Audience: linguists, philologists, corpus linguists, psycholinguists, humanists
Level: basic
Prerequisites:

Basic python knowledge

Media: text/html
Objective:

Learn to regular expressions with Python

Licence: CC-BY 4.0
Access: open
Creation date: Friday, 13 September 2019 15:58:02
Last modified: Saturday, 27 April 2024 00:08:11
BibTeX type: @misc
BibTeX entry:
@misc(TeLeMaCo:423,
author = "O'Hara, Laura Turner",
title = "{C}leaning {O}{C}{R}’d text with {R}egular {E}xpressions",
year = "2013",
url = "https://programminghistorian.org/en/lessons/cleaning-ocrd-text-with-regular-expressions"
)

Helpdesk Button