You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
DATA101_projects/final_project.ipynb

1692 lines
50 KiB

This file contains hidden Unicode characters!

This file contains hidden Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

{
"cells": [
{
"cell_type": "markdown",
"id": "261018f3",
"metadata": {},
"source": [
"# Setup and Testing\n",
"\n",
"We'll bring plain text files into the notebook for processing. There are 7 text files, written in Latin in the first century BC, comprising Julius Caesar's _Commentaries on the Gallic Wars_.\n",
"\n",
"## Prerequisites\n",
"\n",
"1. Python versions 3.7, 3.8, or 3.9\n",
"2. __The Classical Language Toolkit (https://docs.cltk.org/en/latest/index.html)__\n",
"\n",
"## First steps"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "89dbb849",
"metadata": {},
"outputs": [],
"source": [
"from cltk import NLP"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "94f75b4e",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"𐤀 CLTK version '1.1.1'.\n",
"Pipeline for language 'Latin' (ISO: 'lat'): `LatinNormalizeProcess`, `LatinStanzaProcess`, `LatinEmbeddingsProcess`, `StopsProcess`, `LatinLexiconProcess`.\n"
]
}
],
"source": [
"cltk_nlp = NLP(language=\"lat\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "682f52b5",
"metadata": {},
"outputs": [],
"source": [
"# read the first file, Gallic Wars Book 1, which is in the same directory as this notebook\n",
"\n",
"with open(\"gall1.txt\") as fo:\n",
" caesar_book1 = fo.read()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4ed6afa1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur. Hi omnes lingua, institutis, legibus inter se differun'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# text snippet\n",
"\n",
"caesar_book1[:200]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "be84648b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Character count: 57955\n",
"Approximate token count: 8173\n"
]
}
],
"source": [
"# let's get some estimates\n",
"\n",
"print(\"Character count:\", len(caesar_book1))\n",
"print(\"Approximate token count:\", len(caesar_book1.split()))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "14c14bfa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[<class 'cltk.alphabet.processes.LatinNormalizeProcess'>, <class 'cltk.dependency.processes.LatinStanzaProcess'>, <class 'cltk.embeddings.processes.LatinEmbeddingsProcess'>, <class 'cltk.stops.processes.StopsProcess'>]\n"
]
}
],
"source": [
"# removing ``LatinLexiconProcess`` before running cltk_nlp.analyze because it's slow\n",
"\n",
"cltk_nlp.pipeline.processes.pop(-1)\n",
"print(cltk_nlp.pipeline.processes)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "62f04261",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: total: 2min 56s\n",
"Wall time: 2min 16s\n"
]
}
],
"source": [
"# now execute NLP algorithms upon input text\n",
"# execution time is ~60 sec on a my Thinkpad T460s\n",
"\n",
"%time cltk_doc = cltk_nlp.analyze(text=caesar_book1)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "f6234c1e",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"['Gallia',\n",
" 'est',\n",
" 'omnis',\n",
" 'divisa',\n",
" 'in',\n",
" 'partes',\n",
" 'tres',\n",
" ',',\n",
" 'quarum',\n",
" 'unam']"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# have a look at the first 10 words\n",
"\n",
"cltk_doc.tokens[:10] # note that punctuation is included here"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "c2e9aeb9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Gallia',\n",
" 'est',\n",
" 'omnis',\n",
" 'divisa',\n",
" 'in',\n",
" 'partes',\n",
" 'tres',\n",
" 'quarum',\n",
" 'unam',\n",
" 'incolunt']"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's remove punctuation\n",
"\n",
"caesar_tokens_no_punct = [token for token in cltk_doc.tokens if token not in ['.', ',', ':', ';']]\n",
"caesar_word_tokens_no_punct[:10]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "264c2d64",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Gallia', 'sum', 'omnis', 'divisa', 'in', 'pars', 'tres', ',', 'qui', 'unus']"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# instead of tokens(words), let's find the root words, or the \"lemmata\"\n",
"\n",
"cltk_doc.lemmata[:10]"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "55e72be4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Gallia',\n",
" 'sum',\n",
" 'omnis',\n",
" 'divisa',\n",
" 'in',\n",
" 'pars',\n",
" 'tres',\n",
" 'qui',\n",
" 'unus',\n",
" 'incaleo']"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's remove punctuation\n",
"\n",
"caesar_lemmata_no_punct = [token for token in cltk_doc.lemmata if token not in ['.', ',', ':', ';']]\n",
"caesar_lemmata_no_punct[:10]"
]
},
{
"cell_type": "markdown",
"id": "65ddfce2",
"metadata": {},
"source": [
"# Book 1"
]
},
{
"cell_type": "markdown",
"id": "4c889c94",
"metadata": {},
"source": [
"## A cursory look at Book 1 reveals the German King Ariovistus is the enemy most often mentioned by Caesar. Exactly how many times can we find Ariovistus in Book 1?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cd0f7ee6",
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "afcb7bfc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({'Gallia': 15,\n",
" 'sum': 223,\n",
" 'omnis': 66,\n",
" 'divisa': 2,\n",
" 'in': 177,\n",
" 'pars': 26,\n",
" 'tres': 7,\n",
" 'qui': 213,\n",
" 'unus': 23,\n",
" 'incaleo': 4,\n",
" 'Belgae': 3,\n",
" 'alius': 12,\n",
" 'Aquitani': 1,\n",
" 'tertius': 10,\n",
" 'ipse': 44,\n",
" 'lingua': 3,\n",
" 'Celtae': 1,\n",
" 'noster': 39,\n",
" 'Galli': 1,\n",
" 'appello': 9,\n",
" 'is': 269,\n",
" 'instituo': 3,\n",
" 'lex': 3,\n",
" 'inter': 17,\n",
" 'se': 162,\n",
" 'differo': 1,\n",
" 'Gallos': 4,\n",
" 'ab': 102,\n",
" 'Aquitanis': 1,\n",
" 'Garumna': 3,\n",
" 'flumen': 21,\n",
" 'Belgis': 1,\n",
" 'Matrona': 1,\n",
" 'et': 193,\n",
" 'Sequana': 1,\n",
" 'disco': 5,\n",
" 'fortis': 2,\n",
" 'propterea': 15,\n",
" 'quod': 82,\n",
" 'cultus': 2,\n",
" 'atque': 75,\n",
" 'humanitas': 2,\n",
" 'provinciae': 8,\n",
" 'longe': 7,\n",
" 'absum': 8,\n",
" 'minimeque': 1,\n",
" 'ad': 107,\n",
" 'mercator': 1,\n",
" 'saepe': 5,\n",
" 'commeo': 1,\n",
" 'effemino': 1,\n",
" 'animus': 10,\n",
" 'pertineo': 6,\n",
" 'importo': 1,\n",
" 'proximique': 1,\n",
" 'Germanis': 5,\n",
" 'trans': 7,\n",
" 'Rhenum': 15,\n",
" 'quicum': 3,\n",
" 'continenter': 2,\n",
" 'bellum': 29,\n",
" 'gero': 8,\n",
" 'Qua': 2,\n",
" 'de': 36,\n",
" 'causa': 25,\n",
" 'Helvetii': 23,\n",
" 'quoque': 1,\n",
" 'reliquus': 17,\n",
" 'virtute': 10,\n",
" 'praecedo': 1,\n",
" 'fere': 3,\n",
" 'cotidianus': 2,\n",
" 'proelium': 32,\n",
" 'cum': 80,\n",
" 'contendo': 18,\n",
" 'aut': 35,\n",
" 'suus': 104,\n",
" 'finis': 37,\n",
" 'prohibeo': 11,\n",
" 'obtineo': 7,\n",
" 'dico': 33,\n",
" 'initium': 3,\n",
" 'capio': 6,\n",
" 'Rhodano': 2,\n",
" 'contineo': 5,\n",
" 'Oceanum': 2,\n",
" 'Belgarum': 1,\n",
" 'attingo': 1,\n",
" 'etiam': 17,\n",
" 'Sequanis': 10,\n",
" 'Helvetiis': 11,\n",
" 'vergit': 1,\n",
" 'septentrio': 4,\n",
" 'exterior': 2,\n",
" 'Galliae': 22,\n",
" 'orior': 2,\n",
" 'inferus': 1,\n",
" 'Rheni': 3,\n",
" 'specto': 3,\n",
" 'sol': 2,\n",
" 'Aquitania': 1,\n",
" 'Pyrenaeos': 1,\n",
" 'mons': 20,\n",
" 'Hispaniam': 1,\n",
" 'occasum': 1,\n",
" 'Apud': 1,\n",
" 'Helvetios': 14,\n",
" 'nobilis': 6,\n",
" 'ditussus': 1,\n",
" 'Orgetorix': 8,\n",
" 'Is': 6,\n",
" 'M.': 7,\n",
" 'Messala': 2,\n",
" '[et': 1,\n",
" 'P.': 2,\n",
" ']': 12,\n",
" 'Pisone': 3,\n",
" 'consulus': 3,\n",
" 'regnum': 7,\n",
" 'cupiditas': 3,\n",
" 'induco': 2,\n",
" 'coniuratio': 1,\n",
" 'nobilitas': 2,\n",
" 'facio': 75,\n",
" 'civitati': 1,\n",
" 'persuatio': 2,\n",
" 'ut': 73,\n",
" 'copua': 5,\n",
" 'exeo': 4,\n",
" 'perfacilis': 1,\n",
" 'praesto': 2,\n",
" 'totus': 16,\n",
" 'imperium': 8,\n",
" 'potior': 3,\n",
" 'hic': 50,\n",
" 'faciliter': 3,\n",
" 'undique': 2,\n",
" 'locus': 32,\n",
" 'natura': 3,\n",
" 'ex': 59,\n",
" 'Rheno': 1,\n",
" 'latus': 1,\n",
" 'altus': 3,\n",
" 'ager': 16,\n",
" 'Helvetium': 1,\n",
" 'alter': 11,\n",
" 'Iura': 1,\n",
" 'Sequanos': 10,\n",
" 'lacus': 2,\n",
" 'Lemanno': 2,\n",
" 'provinciam': 11,\n",
" 'His': 4,\n",
" 'res': 71,\n",
" 'minus': 15,\n",
" 'lates': 1,\n",
" 'vagarentur': 1,\n",
" 'facile': 3,\n",
" 'finisimus': 3,\n",
" 'infero': 11,\n",
" 'possum': 65,\n",
" 'homo': 21,\n",
" 'bello': 1,\n",
" 'cupidus': 3,\n",
" 'magnus': 47,\n",
" 'dolor': 3,\n",
" 'adficio': 2,\n",
" 'Pro': 1,\n",
" 'multitudo': 11,\n",
" 'autem': 8,\n",
" 'pro': 14,\n",
" 'gloria': 1,\n",
" 'fortitudo': 1,\n",
" 'angustos': 1,\n",
" 'habeo': 45,\n",
" 'arbitror': 6,\n",
" 'longitudo': 1,\n",
" 'minius': 3,\n",
" 'passus': 11,\n",
" 'CCXL': 1,\n",
" 'latitudo': 1,\n",
" 'CLXXX': 1,\n",
" 'pateo': 2,\n",
" 'adduco': 12,\n",
" 'auctoritas': 4,\n",
" 'permo': 1,\n",
" 'constituo': 11,\n",
" 'proficiscor': 6,\n",
" 'comparo': 5,\n",
" 'iumentum': 1,\n",
" 'carrum': 1,\n",
" 'quam': 27,\n",
" 'numerus': 12,\n",
" 'coemo': 1,\n",
" 'sementis': 1,\n",
" 'itiner': 12,\n",
" 'copia': 23,\n",
" 'frumentum': 13,\n",
" 'suppeto': 2,\n",
" 'proximus': 8,\n",
" 'civitatibus': 1,\n",
" 'pax': 9,\n",
" 'amicitia': 11,\n",
" 'confirmo': 5,\n",
" 'Ad': 4,\n",
" 'conficio': 8,\n",
" 'biennium': 1,\n",
" 'satis': 10,\n",
" 'duco': 10,\n",
" 'annus': 7,\n",
" 'profectio': 2,\n",
" 'deligo': 6,\n",
" 'legatio': 4,\n",
" 'civitates': 6,\n",
" 'suscipio': 4,\n",
" 'In': 3,\n",
" 'persuadeo': 4,\n",
" 'Castico': 1,\n",
" 'Catamantaloedis': 1,\n",
" 'filius': 4,\n",
" 'Sequano': 1,\n",
" 'pater': 6,\n",
" 'multus': 21,\n",
" 'senatus': 8,\n",
" 'populus': 43,\n",
" 'Romani': 30,\n",
" 'amicus': 8,\n",
" 'civitate': 6,\n",
" 'occupo': 11,\n",
" 'ante': 11,\n",
" 'item': 4,\n",
" 'Dumnorigi': 2,\n",
" 'Haeduo': 1,\n",
" 'frater': 11,\n",
" 'Diviciaci': 4,\n",
" 'tempus': 14,\n",
" 'principatus': 4,\n",
" 'magne': 4,\n",
" 'plebs': 3,\n",
" 'accipio': 8,\n",
" 'idem': 19,\n",
" 'conor': 10,\n",
" 'filia': 3,\n",
" 'matrimonium': 2,\n",
" 'do': 20,\n",
" 'Perfacile': 1,\n",
" 'factus': 1,\n",
" 'ille': 18,\n",
" 'probo': 1,\n",
" 'perficio': 3,\n",
" 'civitatis': 5,\n",
" 'obtendo': 1,\n",
" 'non': 88,\n",
" 'dubius': 1,\n",
" 'quin': 6,\n",
" 'suusquis': 1,\n",
" 'exercitus': 27,\n",
" 'regna': 1,\n",
" 'concilio': 1,\n",
" 'Hac': 4,\n",
" 'oratio': 8,\n",
" 'fides': 6,\n",
" 'ius': 12,\n",
" 'iuro': 5,\n",
" 'per': 33,\n",
" 'potens': 1,\n",
" 'firmus': 2,\n",
" 'spero': 2,\n",
" 'indicium': 1,\n",
" 'enuntio': 7,\n",
" 'Moribus': 1,\n",
" 'vinculis': 1,\n",
" 'coego': 1,\n",
" 'damno': 1,\n",
" 'poena': 3,\n",
" 'sequor': 7,\n",
" 'oportet': 9,\n",
" 'ignis': 2,\n",
" 'cremoreo': 1,\n",
" 'Die': 1,\n",
" 'dictio': 1,\n",
" 'iudicium': 4,\n",
" 'familia': 2,\n",
" 'mile': 14,\n",
" 'decem': 1,\n",
" 'unusquisque': 1,\n",
" 'cogo': 5,\n",
" 'cliens': 2,\n",
" 'obaero': 1,\n",
" 'conduco': 1,\n",
" 'ne': 31,\n",
" 'eripuo': 1,\n",
" 'Cum': 8,\n",
" 'civitas': 4,\n",
" 'ob': 5,\n",
" 'incito': 1,\n",
" 'armis': 9,\n",
" 'exsequor': 1,\n",
" 'magistro': 3,\n",
" 'morior': 1,\n",
" 'neque': 50,\n",
" 'suspicium': 1,\n",
" 'mors': 3,\n",
" 'consciverit': 1,\n",
" 'Post': 1,\n",
" 'nihil': 6,\n",
" 'Ubi': 8,\n",
" 'iam': 6,\n",
" 'paro': 5,\n",
" 'oppida': 4,\n",
" 'duodecim': 1,\n",
" 'vicos': 2,\n",
" 'quadringo': 1,\n",
" 'privata': 1,\n",
" 'aedificium': 1,\n",
" 'incendo': 2,\n",
" 'praeter': 4,\n",
" 'porto': 1,\n",
" 'comburo': 1,\n",
" 'domus': 14,\n",
" 'reditio': 1,\n",
" 'spes': 10,\n",
" 'tollo': 4,\n",
" 'periculum': 10,\n",
" 'subeo': 2,\n",
" 'mensus': 1,\n",
" 'molitus': 1,\n",
" 'cibarius': 1,\n",
" 'quisque': 6,\n",
" 'effero': 1,\n",
" 'iubeo': 7,\n",
" 'Persuadent': 1,\n",
" 'Rauracis': 1,\n",
" 'Tulingis': 1,\n",
" 'Latobrigis': 1,\n",
" 'utor': 32,\n",
" 'consilium': 12,\n",
" 'oppis': 2,\n",
" 'vicisque': 1,\n",
" 'exsumo': 1,\n",
" 'Boiosque': 1,\n",
" 'incolo': 1,\n",
" 'Noricum': 1,\n",
" 'transisro': 1,\n",
" 'Noreiamque': 1,\n",
" 'oppugno': 4,\n",
" 'recipio': 10,\n",
" 'socius': 6,\n",
" 'ascisco': 1,\n",
" 'Erant': 1,\n",
" 'omnino': 7,\n",
" 'duo': 10,\n",
" 'itener': 5,\n",
" 'angustus': 1,\n",
" 'difficilis': 2,\n",
" 'Iuram': 2,\n",
" 'Rhodanum': 5,\n",
" 'vix': 1,\n",
" 'singulus': 5,\n",
" 'caro': 3,\n",
" 'impendo': 1,\n",
" 'perpaux': 2,\n",
" 'multo': 3,\n",
" 'expedio': 3,\n",
" 'Helvetiorum': 13,\n",
" 'Allobrogum': 4,\n",
" 'nuper': 3,\n",
" 'paco': 1,\n",
" 'Rhodanus': 2,\n",
" 'fluo': 2,\n",
" 'isqe': 1,\n",
" 'nullus': 7,\n",
" 'vado': 1,\n",
" 'transeo': 13,\n",
" 'Extremum': 1,\n",
" 'oppidus': 4,\n",
" 'proximusque': 1,\n",
" 'Genava': 1,\n",
" 'Ex': 4,\n",
" 'oppido': 5,\n",
" 'pons': 3,\n",
" 'Allobrogibus': 4,\n",
" 'vel': 7,\n",
" 'persuatuo': 1,\n",
" 'nondum': 2,\n",
" 'bonum': 2,\n",
" 'Romanum': 6,\n",
" 'viderentur': 2,\n",
" 'existimo': 12,\n",
" 'vi': 1,\n",
" 'cogco': 1,\n",
" 'eo': 7,\n",
" 'patior': 8,\n",
" 'Omnus': 1,\n",
" 'dies': 34,\n",
" 'ripa': 4,\n",
" 'Rhodani': 2,\n",
" 'conveniant': 1,\n",
" 'a.': 1,\n",
" 'd.': 1,\n",
" 'V.': 1,\n",
" 'Kal.': 1,\n",
" 'Apr.': 1,\n",
" 'L.': 7,\n",
" 'A.': 1,\n",
" 'Gabinio': 1,\n",
" 'Caesari': 12,\n",
" 'nuntio': 6,\n",
" 'iter': 16,\n",
" 'maturo': 2,\n",
" 'urbs': 2,\n",
" 'Galliam': 13,\n",
" 'ultimus': 4,\n",
" 'Genavam': 2,\n",
" 'pervenit': 4,\n",
" 'Provinciae': 1,\n",
" 'miles': 15,\n",
" 'impero': 9,\n",
" '(': 4,\n",
" 'legium': 2,\n",
" ')': 4,\n",
" 'rescindo': 1,\n",
" 'adventu': 5,\n",
" 'certus': 7,\n",
" 'legatum': 15,\n",
" 'mitto': 25,\n",
" 'Nammeius': 1,\n",
" 'Verucloetius': 1,\n",
" 'princeps': 6,\n",
" 'sine': 12,\n",
" 'ullus': 4,\n",
" 'maleficium': 3,\n",
" 'rogo': 5,\n",
" 'voluntate': 7,\n",
" 'licet': 8,\n",
" 'Caesar': 43,\n",
" 'memoria': 6,\n",
" 'teneo': 11,\n",
" 'Cassium': 3,\n",
" 'consul': 2,\n",
" 'occis': 2,\n",
" 'pulgo': 3,\n",
" 'sub': 7,\n",
" 'iugum': 3,\n",
" 'concedo': 4,\n",
" 'puto': 7,\n",
" 'inimicus': 2,\n",
" 'facultas': 5,\n",
" 'tempero': 2,\n",
" 'iniurius': 16,\n",
" 'Tamen': 1,\n",
" 'spatium': 5,\n",
" 'intercedo': 3,\n",
" 'dum': 2,\n",
" 'convenirent': 1,\n",
" 'respondeo': 8,\n",
" 'delibero': 1,\n",
" 'sumo': 4,\n",
" 'si': 43,\n",
" 'quis': 16,\n",
" 'vellent': 5,\n",
" 'April.': 1,\n",
" 'reverterentur': 1,\n",
" 'Interea': 1,\n",
" 'legio': 17,\n",
" 'miletis': 1,\n",
" 'provincia': 4,\n",
" 'convenerant': 1,\n",
" 'influo': 2,\n",
" 'Sequanorum': 7,\n",
" 'XVIIII': 1,\n",
" 'murus': 2,\n",
" 'altitudo': 3,\n",
" 'pes': 5,\n",
" 'sedecim': 1,\n",
" 'fossamque': 1,\n",
" 'perduco': 1,\n",
" 'Eo': 7,\n",
" 'opus': 6,\n",
" 'perfectus': 2,\n",
" 'praesidium': 6,\n",
" 'dispono': 1,\n",
" 'castella': 1,\n",
" 'communis': 5,\n",
" 'invito': 2,\n",
" 'venit': 2,\n",
" 'lego': 7,\n",
" 'reverterunt': 2,\n",
" 'nego': 1,\n",
" 'mos': 1,\n",
" 'exemplum': 2,\n",
" 'vim': 3,\n",
" 'ostendo': 5,\n",
" 'deiecto': 1,\n",
" 'navibus': 2,\n",
" 'iungo': 2,\n",
" 'ratio': 6,\n",
" 'complur': 2,\n",
" 'vadis': 1,\n",
" 'paruus': 6,\n",
" 'numquam': 2,\n",
" 'interdio': 1,\n",
" 'saepius': 2,\n",
" 'noctus': 1,\n",
" 'perrumpo': 1,\n",
" 'munitio': 3,\n",
" 'concursus': 1,\n",
" 'teles': 1,\n",
" 'repuldo': 1,\n",
" 'conatus': 1,\n",
" 'desisto': 4,\n",
" 'Relinquebatur': 1,\n",
" 'via': 1,\n",
" 'invitis': 1,\n",
" 'propter': 12,\n",
" 'angustia': 3,\n",
" 'spons': 2,\n",
" 'Dumnorigem': 4,\n",
" 'Haeduum': 1,\n",
" 'deprecator': 1,\n",
" 'impetro': 5,\n",
" 'Dumnorix': 2,\n",
" 'gratia': 13,\n",
" 'largitio': 1,\n",
" 'apud': 12,\n",
" 'novis': 1,\n",
" 'studeo': 1,\n",
" 'beneficium': 7,\n",
" 'obstriho': 1,\n",
" 'volebat': 2,\n",
" 'Itaque': 4,\n",
" 'Sequani': 3,\n",
" 'renuntio': 2,\n",
" 'Haeduorum': 10,\n",
" 'Santonum': 1,\n",
" 'Tolosatium': 1,\n",
" 'intellego': 13,\n",
" 'bellicosus': 1,\n",
" 'maximeque': 1,\n",
" 'frumentarius': 4,\n",
" 'finitimus': 2,\n",
" 'Ob': 2,\n",
" 'T.': 2,\n",
" 'Labienum': 3,\n",
" 'praefico': 1,\n",
" 'Italiam': 2,\n",
" 'dui': 1,\n",
" 'ibi': 4,\n",
" 'conscribo': 2,\n",
" 'circum': 3,\n",
" 'Aquileiam': 1,\n",
" 'hiemo': 1,\n",
" 'hibernus': 3,\n",
" 'educo': 4,\n",
" 'Alpes': 1,\n",
" 'quinque': 1,\n",
" 'Ibi': 3,\n",
" 'Ceutrones': 1,\n",
" 'Graioceli': 1,\n",
" 'Caturiges': 1,\n",
" 'superior': 5,\n",
" 'Compluribus': 1,\n",
" 'puldeo': 3,\n",
" 'Ocelus': 1,\n",
" 'citer': 3,\n",
" 'Vocontiorum': 1,\n",
" 'septimus': 2,\n",
" 'inde': 2,\n",
" 'Segusiavos': 1,\n",
" 'extra': 1,\n",
" 'primus': 13,\n",
" 'traduxo': 1,\n",
" 'pervenerant': 1,\n",
" 'iderque': 3,\n",
" 'populo': 2,\n",
" 'Haedui': 8,\n",
" 'defendo': 4,\n",
" 'Caesarem': 17,\n",
" 'auxilium': 12,\n",
" 'ita': 12,\n",
" 'Romano': 10,\n",
" 'mereo': 3,\n",
" 'paes': 2,\n",
" 'conspectus': 3,\n",
" 'vastari': 1,\n",
" 'liber': 5,\n",
" '[eorum': 1,\n",
" 'servitutem': 2,\n",
" 'abduco': 1,\n",
" 'expugno': 1,\n",
" 'debeo': 3,\n",
" 'Ambarri': 1,\n",
" 'necessarius': 3,\n",
" 'consanguineus': 2,\n",
" 'depopulo': 1,\n",
" 'hostius': 14,\n",
" 'Item': 4,\n",
" 'Allobroges': 1,\n",
" 'possessio': 3,\n",
" 'fuga': 10,\n",
" 'demonstro': 1,\n",
" 'solum': 5,\n",
" 'Quibus': 4,\n",
" 'exspecto': 3,\n",
" 'statuo': 5,\n",
" 'fortuna': 5,\n",
" 'socium': 1,\n",
" 'consumo': 1,\n",
" 'Santonos': 1,\n",
" 'pervenirent': 1,\n",
" 'Flumen': 1,\n",
" 'Arar': 1,\n",
" 'incredibilis': 2,\n",
" 'lenitas': 1,\n",
" 'oculus': 2,\n",
" 'uter': 1,\n",
" 'iudico': 6,\n",
" 'ratus': 1,\n",
" 'lintus': 2,\n",
" 'explorator': 5,\n",
" 'trado': 5,\n",
" 'quartus': 4,\n",
" 'vero': 3,\n",
" 'citra': 1,\n",
" 'Ararim': 1,\n",
" 'vigilia': 5,\n",
" 'castra': 40,\n",
" 'profectus': 3,\n",
" 'transero': 1,\n",
" 'Eos': 1,\n",
" 'impedio': 3,\n",
" 'inopino': 1,\n",
" 'adgredior': 2,\n",
" 'concido': 1,\n",
" 'mando': 4,\n",
" 'silvas': 1,\n",
" 'abdo': 2,\n",
" 'pagus': 2,\n",
" 'Tigurinus': 1,\n",
" 'nam': 3,\n",
" 'Helvetia': 1,\n",
" 'quattuor': 4,\n",
" 'Hic': 3,\n",
" 'exedo': 2,\n",
" 'interfecio': 2,\n",
" 'Ita': 5,\n",
" 'sive': 5,\n",
" 'casus': 1,\n",
" 'deus': 2,\n",
" 'immortalis': 2,\n",
" 'Helvetiae': 1,\n",
" 'insignus': 2,\n",
" 'calamitas': 5,\n",
" 'persolo': 1,\n",
" 'publicus': 5,\n",
" 'sed': 19,\n",
" 'privatas': 1,\n",
" 'ultus': 1,\n",
" 'socer': 1,\n",
" 'Pisonis': 1,\n",
" 'avum': 1,\n",
" 'Pisonem': 1,\n",
" 'Tigurini': 1,\n",
" 'Hoc': 3,\n",
" 'consequor': 3,\n",
" 'Arari': 3,\n",
" 'curo': 2,\n",
" 'traduco': 7,\n",
" 'repentinus': 1,\n",
" 'commo': 2,\n",
" 'XX': 3,\n",
" 'aegerrime': 1,\n",
" 'Divico': 2,\n",
" 'Cassiano': 1,\n",
" 'dux': 2,\n",
" 'Caesare': 8,\n",
" 'ago': 8,\n",
" 'Romanus': 2,\n",
" 'iturus': 1,\n",
" 'ubi': 4,\n",
" 'esse': 5,\n",
" 'voluisset': 2,\n",
" 'sin': 1,\n",
" 'persequor': 1,\n",
" 'perseveraret': 1,\n",
" 'reminiscor': 1,\n",
" 'veteris': 2,\n",
" 'incommodus': 1,\n",
" 'pristinus': 1,\n",
" 'virtutis': 2,\n",
" 'Quod': 15,\n",
" 'improviso': 1,\n",
" 'pag': 1,\n",
" 'adoro': 1,\n",
" 'ii': 3,\n",
" 'fero': 8,\n",
" 'magnopere': 2,\n",
" 'virtuti': 1,\n",
" 'tribuo': 2,\n",
" 'despiceo': 1,\n",
" 'ego': 3,\n",
" 'magis': 4,\n",
" 'dolus': 1,\n",
" 'insidius': 1,\n",
" 'nitor': 1,\n",
" 'Quare': 1,\n",
" 'committo': 10,\n",
" 'constitto': 2,\n",
" 'internecio': 1,\n",
" 'nomen': 4,\n",
" 'prodeo': 3,\n",
" 'dubitatio': 1,\n",
" 'commemoro': 2,\n",
" 'gravius': 5,\n",
" 'meritum': 1,\n",
" 'accido': 8,\n",
" 'aliqui': 4,\n",
" 'conscius': 1,\n",
" 'cavere': 1,\n",
" 'decipio': 1,\n",
" 'quare': 4,\n",
" 'timeo': 5,\n",
" 'contumelia': 1,\n",
" 'oblivisci': 1,\n",
" 'vellet': 6,\n",
" 'num': 1,\n",
" 'recentius': 1,\n",
" 'tempto': 2,\n",
" 'Haeduos': 10,\n",
" 'Ambarros': 1,\n",
" 'Allobrogas': 1,\n",
" 'vexassent': 1,\n",
" 'depono': 1,\n",
" '?': 9,\n",
" 'victoria': 3,\n",
" 'tam': 7,\n",
" 'insolenter': 1,\n",
" 'glorio': 1,\n",
" 'quodque': 2,\n",
" 'diu': 5,\n",
" 'impeno': 1,\n",
" 'toleo': 2,\n",
" 'admiror': 1,\n",
" 'Consuesse': 1,\n",
" 'enim': 3,\n",
" 'commutatio': 1,\n",
" 'doleo': 1,\n",
" 'scelus': 1,\n",
" 'ulcio': 1,\n",
" 'velint': 3,\n",
" 'secundus': 3,\n",
" 'interdum': 2,\n",
" 'diuturnus': 1,\n",
" 'impunitas': 1,\n",
" 'tamen': 8,\n",
" 'obsis': 17,\n",
" 'polliceo': 1,\n",
" 'Haeduis': 11,\n",
" 'sociisque': 2,\n",
" 'consuero': 3,\n",
" 'testis': 2,\n",
" 'responluo': 1,\n",
" 'discedo': 5,\n",
" 'Postero': 1,\n",
" 'movent': 1,\n",
" 'Idem': 1,\n",
" 'equitasque': 3,\n",
" 'mineus': 4,\n",
" 'praemitto': 2,\n",
" 'videant': 1,\n",
" 'hos': 14,\n",
" 'Qui': 5,\n",
" 'cupidae': 1,\n",
" 'nosus': 4,\n",
" 'agmen': 3,\n",
" 'insequor': 4,\n",
" 'alienus': 1,\n",
" 'equitas': 12,\n",
" 'paucus': 7,\n",
" 'cado': 1,\n",
" 'Quo': 1,\n",
" 'quingo': 2,\n",
" 'equis': 7,\n",
" 'tantus': 9,\n",
" 'equitus': 8,\n",
" 'propulo': 1,\n",
" 'audacius': 2,\n",
" 'subsisto': 1,\n",
" 'agmo': 2,\n",
" 'lacesso': 3,\n",
" 'coepi': 9,\n",
" 'praesentia': 1,\n",
" 'hoso': 4,\n",
" 'rapina': 1,\n",
" 'pabulatio': 1,\n",
" 'populatio': 1,\n",
" 'circiter': 10,\n",
" 'XV': 3,\n",
" 'ample': 7,\n",
" 'quinis': 1,\n",
" 'senis': 1,\n",
" 'milus': 5,\n",
" 'interseo': 1,\n",
" 'Interim': 2,\n",
" 'cognodo': 1,\n",
" 'publice': 1,\n",
" 'pollicio': 4,\n",
" 'flagito': 1,\n",
" 'Nam': 3,\n",
" 'frigus': 1,\n",
" '[': 3,\n",
" 'pono': 4,\n",
" 'modus': 7,\n",
" 'maturus': 2,\n",
" 'pabulus': 1,\n",
" 'quidem': 8,\n",
" 'subvexerat': 1,\n",
" 'averterant': 1,\n",
" 'nolo': 3,\n",
" 'Diem': 1,\n",
" 'confero': 9,\n",
" 'comporto': 1,\n",
" 'adsum': 3,\n",
" 'diutius': 3,\n",
" 'inster': 2,\n",
" 'metior': 2,\n",
" 'convocatis': 1,\n",
" 'principis': 1,\n",
" 'Diviciaco': 3,\n",
" 'Lisco': 1,\n",
" 'summus': 15,\n",
" 'praeo': 3,\n",
" 'vergobretum': 1,\n",
" 'creo': 1,\n",
" 'annuus': 1,\n",
" 'vitae': 1,\n",
" 'nex': 1,\n",
" 'potestas': 5,\n",
" 'graviter': 1,\n",
" 'accuso': 2,\n",
" 'necessario': 2,\n",
" 'propinquus': 2,\n",
" 'hostis': 5,\n",
" 'sublevetur': 1,\n",
" 'praesertim': 2,\n",
" 'prex': 2,\n",
" 'destituo': 1,\n",
" 'quer': 1,\n",
" 'Tum': 3,\n",
" 'demum': 3,\n",
" 'Liscus': 1,\n",
" 'Caesaris': 7,\n",
" 'antea': 3,\n",
" 'taceo': 2,\n",
" 'propono': 2,\n",
" 'valeat': 1,\n",
" 'privatim': 1,\n",
" 'plus': 4,\n",
" 'Hos': 1,\n",
" 'seditio': 1,\n",
" 'imprus': 1,\n",
" 'deterreo': 2,\n",
" 'Gallorum': 8,\n",
" 'Romanorum': 1,\n",
" 'perfero': 2,\n",
" 'dubito': 3,\n",
" '[debeant': 1,\n",
" 'supero': 6,\n",
" 'libertas': 2,\n",
" 'ero': 2,\n",
" 'Ab': 1,\n",
" 'coerceo': 1,\n",
" 'Quin': 1,\n",
" 'quanto': 1,\n",
" 'Lisci': 1,\n",
" 'designo': 1,\n",
" 'sentio': 2,\n",
" 'praesum': 5,\n",
" 'iacto': 2,\n",
" 'celeriter': 2,\n",
" 'concilium': 6,\n",
" 'dimitto': 3,\n",
" 'Liscum': 1,\n",
" 'retineo': 2,\n",
" 'Quaerit': 1,\n",
" 'solus': 4,\n",
" 'conventu': 1,\n",
" 'Dicit': 1,\n",
" 'libere': 1,\n",
" 'secretum': 2,\n",
" 'quaero': 7,\n",
" 'repereo': 1,\n",
" 'vera': 2,\n",
" 'audacium': 1,\n",
" 'liberalitas': 2,\n",
" 'novarum': 1,\n",
" 'Complures': 1,\n",
" 'portoria': 1,\n",
" 'resiquuque': 1,\n",
" 'vectigalia': 2,\n",
" 'parvo': 1,\n",
" 'pretium': 1,\n",
" 'redeoueo': 1,\n",
" 'contra': 3,\n",
" 'audeo': 4,\n",
" 'nemo': 5,\n",
" 'familiaris': 4,\n",
" 'augio': 1,\n",
" 'largior': 1,\n",
" 'equitatus': 1,\n",
" 'sumptus': 1,\n",
" 'semper': 1,\n",
" 'aliero': 1,\n",
" 'largiter': 1,\n",
" 'potentia': 2,\n",
" 'mater': 3,\n",
" 'Biturigibus': 1,\n",
" 'illic': 1,\n",
" 'conloco': 5,\n",
" 'uxor': 2,\n",
" 'sorus': 1,\n",
" 'nuptus': 1,\n",
" 'Favere': 1,\n",
" 'cupeo': 1,\n",
" 'adfinitas': 1,\n",
" 'odi': 1,\n",
" 'Romanos': 2,\n",
" 'deminuo': 2,\n",
" 'Diviciacus': 3,\n",
" 'antiquus': 2,\n",
" 'honor': 2,\n",
" 'restituo': 4,\n",
" 'Si': 6,\n",
" 'Romanis': 4,\n",
" 'venire': 6,\n",
" 'despero': 4,\n",
" 'Reperiebat': 1,\n",
" 'equester': 2,\n",
" 'adversum': 2,\n",
" 'Dumnorige': 2,\n",
" 'perterro': 5,\n",
" 'cognosco': 12,\n",
" 'suspicio': 3,\n",
" 'accedo': 3,\n",
" 'iniussus': 1,\n",
" 'inscio': 1,\n",
" 'magistratus': 1,\n",
" 'animadverteret': 1,\n",
" 'civitatem': 3,\n",
" 'animadvertere': 1,\n",
" 'repugno': 1,\n",
" 'studium': 1,\n",
" 'voluntatem': 2,\n",
" 'egregius': 2,\n",
" 'iustitia': 1,\n",
" 'temperantia': 1,\n",
" 'cognoverat': 1,\n",
" 'supplicium': 3,\n",
" 'offendo': 1,\n",
" 'verebatur': 1,\n",
" 'prius': 4,\n",
" 'quisquam': 3,\n",
" 'Diviciacum': 2,\n",
" 'vocari': 1,\n",
" 'interpretus': 1,\n",
" 'remoueo': 3,\n",
" 'C.': 6,\n",
" 'Valerium': 2,\n",
" 'Troucillum': 1,\n",
" 'conloquor': 2,\n",
" 'simul': 1,\n",
" 'commonefacio': 1,\n",
" '[Gallorum': 1,\n",
" 'separatim': 2,\n",
" 'Petit': 1,\n",
" 'hortor': 1,\n",
" 'offensio': 1,\n",
" 'lacrima': 2,\n",
" 'compleo': 2,\n",
" 'obsecro': 1,\n",
" 'scio': 4,\n",
" 'nec': 2,\n",
" 'quimquam': 1,\n",
" 'domeo': 1,\n",
" 'adulescentia': 1,\n",
" 'credo': 1,\n",
" 'nervis': 1,\n",
" 'minuo': 1,\n",
" 'paene': 1,\n",
" 'pernicies': 1,\n",
" 'Sese': 1,\n",
" 'amor': 1,\n",
" 'fraternus': 2,\n",
" 'existimatio': 1,\n",
" 'vulgi': 1,\n",
" 'commoveri': 2,\n",
" 'averterentur': 1,\n",
" 'Haec': 5,\n",
" 'verbis': 2,\n",
" 'fleo': 1,\n",
" 'peto': 10,\n",
" ...})"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# here is a dictionary of every word in the Book 1 along with how many times each word appears\n",
"\n",
"caesar_word_counts = Counter(caesar_lemmata_no_punct)\n",
"caesar_word_counts"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "db365cc0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"caesar_word_counts['Ariovistus']"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "3babdc48",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"caesar_word_counts['Ariovistum']"
]
},
{
"cell_type": "markdown",
"id": "74495760",
"metadata": {},
"source": [
"## The above two lines show that the lemmatizer does not work for proper names. We'll have to search the text for every grammatical case"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "c73fffd1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"42\n"
]
}
],
"source": [
"# show me how many times Ariovistus is named in this text, for every case of the word \"Ariovistus\", namely: nominative, vocative, accusative, genative, dative, ablative\n",
"\n",
"\n",
"nom = caesar_word_counts['Ariovistus']\n",
"voc = caesar_word_counts['Arioviste']\n",
"acc = caesar_word_counts['Ariovistum']\n",
"gen = caesar_word_counts['Ariovisti']\n",
"abl = caesar_word_counts['Ariovisto'] # same as dative case\n",
"\n",
"print(nom + acc + voc + gen + abl)"
]
},
{
"cell_type": "markdown",
"id": "b9e7aee4",
"metadata": {},
"source": [
"# Book 2\n",
"## Let's do the same processing on Book 2, simplifying the code as we go. We will choose another target, the Druid Diviciacus"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "2eb5d063",
"metadata": {},
"outputs": [],
"source": [
"fo.close()\n",
"with open(\"gall2.txt\") as fo:\n",
" caesar_book2 = fo.read()\n",
"\n",
"cltk_doc2 = cltk_nlp.analyze(text=caesar_book2)"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "bdf36811",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5\n"
]
}
],
"source": [
"caesar_word_counts = Counter(cltk_doc2.tokens)\n",
"nom = caesar_word_counts['Diviciacus']\n",
"voc = caesar_word_counts['Diviciace']\n",
"acc = caesar_word_counts['Diviciacum']\n",
"gen = caesar_word_counts['Diviciaci']\n",
"abl = caesar_word_counts['Diviciaco'] # same as dative case\n",
"print(nom + acc + voc + gen + abl)\n"
]
},
{
"cell_type": "markdown",
"id": "c4760968",
"metadata": {},
"source": [
"# Book 3\n",
"## Viridovix, the Gallic Chieftan"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ccb05754",
"metadata": {},
"outputs": [],
"source": [
"fo.close()\n",
"with open(\"gall3.txt\") as fo:\n",
" caesar_book3 = fo.read()\n",
"\n",
"cltk_doc3 = cltk_nlp.analyze(text=caesar_book3)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "6b4366a4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5\n"
]
}
],
"source": [
"caesar_word_counts = Counter(cltk_doc3.tokens)\n",
"\n",
"nom = caesar_word_counts['Viridovix']\n",
"# voc = caesar_word_counts['Viridovix'] # same as nominative\n",
"acc = caesar_word_counts['Viridovigem']\n",
"gen = caesar_word_counts['Viridovigis']\n",
"dat = caesar_word_counts['Viridovigi']\n",
"abl = caesar_word_counts['Viridovige']\n",
"print(nom + acc + voc + gen + abl)\n"
]
},
{
"cell_type": "markdown",
"id": "ff2a3a78",
"metadata": {},
"source": [
"# Book 4\n",
"## Ariovistus mentioned again, but just one time. On to Book 5"
]
},
{
"cell_type": "markdown",
"id": "c9feaf60",
"metadata": {},
"source": [
"# Book 5\n",
"## The Belgic King and Chieftan Ambiorix"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "db4432da",
"metadata": {},
"outputs": [],
"source": [
"fo.close()\n",
"with open(\"gall5.txt\") as fo:\n",
" caesar_book5 = fo.read()\n",
"\n",
"cltk_doc5 = cltk_nlp.analyze(text=caesar_book5)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ff8b94d2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20\n"
]
}
],
"source": [
"caesar_word_counts = Counter(cltk_doc5.tokens)\n",
"\n",
"nom = caesar_word_counts['Ambiorix']\n",
"# voc = caesar_word_counts['Ambiorix'] # same as nominative\n",
"acc = caesar_word_counts['Ambiorigem']\n",
"gen = caesar_word_counts['Ambiorigis']\n",
"dat = caesar_word_counts['Ambiorigi']\n",
"abl = caesar_word_counts['Ambiorige']\n",
"print(nom + acc + voc + gen + abl)\n"
]
},
{
"cell_type": "markdown",
"id": "d484c577",
"metadata": {},
"source": [
"# Book 6\n",
"## Ambiorix, once more"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "8b23a458",
"metadata": {},
"outputs": [],
"source": [
"fo.close()\n",
"with open(\"gall6.txt\") as fo:\n",
" caesar_book6 = fo.read()\n",
"\n",
"cltk_doc6 = cltk_nlp.analyze(text=caesar_book6)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "65adfaf5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"18\n"
]
}
],
"source": [
"caesar_word_counts = Counter(cltk_doc6.tokens)\n",
"\n",
"nom = caesar_word_counts['Ambiorix']\n",
"# voc = caesar_word_counts['Ambiorix'] # same as nominative\n",
"acc = caesar_word_counts['Ambiorigem']\n",
"gen = caesar_word_counts['Ambiorigis']\n",
"dat = caesar_word_counts['Ambiorigi']\n",
"abl = caesar_word_counts['Ambiorige']\n",
"print(nom + acc + voc + gen + abl)\n"
]
},
{
"cell_type": "markdown",
"id": "d90188c5",
"metadata": {},
"source": [
"# Book 7\n",
"## Vercingetorix, King and Chieftan of the Arverni and leader of the unified Gallic revolt against the Romans. Of all the antagonists, he is mentioned most by Caesar."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "ca39d70e",
"metadata": {},
"outputs": [],
"source": [
"fo.close()\n",
"with open(\"gall7.txt\") as fo:\n",
" caesar_book7 = fo.read()\n",
"\n",
"cltk_doc7 = cltk_nlp.analyze(text=caesar_book7)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "204e8dae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"46\n"
]
}
],
"source": [
"caesar_word_counts = Counter(cltk_doc7.tokens)\n",
"\n",
"nom = caesar_word_counts['Vercingetorix']\n",
"# voc = caesar_word_counts['Vercingetorix'] # same as nominative\n",
"acc = caesar_word_counts['Vercingetorigem']\n",
"gen = caesar_word_counts['Vercingetorigis']\n",
"dat = caesar_word_counts['Vercingetorigi']\n",
"abl = caesar_word_counts['Vercingetorige']\n",
"print(nom + acc + voc + gen + abl)\n"
]
},
{
"cell_type": "markdown",
"id": "2d5f7bf8",
"metadata": {},
"source": [
"# Results"
]
},
{
"cell_type": "markdown",
"id": "6f58d6fe",
"metadata": {},
"source": [
"### For these seven books of _The Gallic Wars_ we knew ahead of time who were the main foes Caesar mentions in each book. The task has been to count the number of mentions in each text and to infer their relative importance in the resistance to the Roman campaigns. The quantitative results we arrived at here could also have been found using the search function of a text editor. But the methods provided by the Classical Language Toolkit are appropriate for this text because they take into account the morphology and syntax of the language. Indeed, a long text such as a novel might not easily be handled by a text editor, and a more powerful set of instruments for natural language processing is often required."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}