Image of a hub, © Paul Watson, CC  BY-NC-SA 2.0 The Linguistic Teaching resources Hub
Image © Paul Watson, Licence CC BY-NC-SA 2.0

Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

* Adam Crymble *

Keywords: gazetteer, keyword extraction, text processing, python

http://programminghistorian.org/lessons/extracting-keywords

This lesson is useful for anyone who works with historical sources that are stored locally on their own computer, and that are transcribed into mutable electronic text (eg, .txt, .xml, .rtf, .md). It is particularly useful for people interested in identifying subsets of documents containing one or more of a fairly large number of keywords. This might be useful for identifying a relevant subset for closer reading, or for extracting and structuring the keywords in a format that can be used in another tool: as input for a mapping exercise, for example.

Feedback

Sorry, there is no feedback available. Be the first one to provide feedback!

Resource details

Institution: The Programming Historian
Year of publication: 2015
Language: english
Type: text for self-study
Audience: historians, philologists, linguists
Level: intermediate
Prerequisites:

Some python knowledge

Media: text/html
Objective:
Licence: CC-BY-2.0
Access: open
Creation date: Wednesday, 5 April 2017 16:26:54
Last modified: Thursday, 2 May 2024 09:05:24
BibTeX type: @misc
BibTeX entry:
@misc(TeLeMaCo:374,
author = "Crymble, Adam",
title = "{U}sing {G}azetteers to {E}xtract {S}ets of {K}eywords from {F}ree-{F}lowing {T}exts",
year = "2015",
url = "http://programminghistorian.org/lessons/extracting-keywords"
)

Helpdesk Button