{ "cells": [ { "cell_type": "markdown", "id": "261018f3", "metadata": {}, "source": [ "# Setup and Testing\n", "\n", "We'll bring plain text files into the notebook for processing. There are 7 text files, written in Latin in the first century BC, comprising Julius Caesar's _Commentaries on the Gallic Wars_.\n", "\n", "## Prerequisites\n", "\n", "1. Python versions 3.7, 3.8, or 3.9\n", "2. __The Classical Language Toolkit (https://docs.cltk.org/en/latest/index.html)__\n", "\n", "## First steps" ] }, { "cell_type": "code", "execution_count": 1, "id": "89dbb849", "metadata": {}, "outputs": [], "source": [ "from cltk import NLP" ] }, { "cell_type": "code", "execution_count": 2, "id": "94f75b4e", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "‎𐤀 CLTK version '1.1.1'.\n", "Pipeline for language 'Latin' (ISO: 'lat'): `LatinNormalizeProcess`, `LatinStanzaProcess`, `LatinEmbeddingsProcess`, `StopsProcess`, `LatinLexiconProcess`.\n" ] } ], "source": [ "cltk_nlp = NLP(language=\"lat\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "682f52b5", "metadata": {}, "outputs": [], "source": [ "# read the first file, Gallic Wars Book 1, which is in the same directory as this notebook\n", "\n", "with open(\"gall1.txt\") as fo:\n", " caesar_book1 = fo.read()" ] }, { "cell_type": "code", "execution_count": 6, "id": "4ed6afa1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur. Hi omnes lingua, institutis, legibus inter se differun'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# text snippet\n", "\n", "caesar_book1[:200]" ] }, { "cell_type": "code", "execution_count": 7, "id": "be84648b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Character count: 57955\n", "Approximate token count: 8173\n" ] } ], "source": [ "# let's get some estimates\n", "\n", "print(\"Character count:\", len(caesar_book1))\n", "print(\"Approximate token count:\", len(caesar_book1.split()))" ] }, { "cell_type": "code", "execution_count": 4, "id": "14c14bfa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[, , , ]\n" ] } ], "source": [ "# removing ``LatinLexiconProcess`` before running cltk_nlp.analyze because it's slow\n", "\n", "cltk_nlp.pipeline.processes.pop(-1)\n", "print(cltk_nlp.pipeline.processes)" ] }, { "cell_type": "code", "execution_count": 9, "id": "62f04261", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: total: 2min 56s\n", "Wall time: 2min 16s\n" ] } ], "source": [ "# now execute NLP algorithms upon input text\n", "# execution time is ~60 sec on a my Thinkpad T460s\n", "\n", "%time cltk_doc = cltk_nlp.analyze(text=caesar_book1)" ] }, { "cell_type": "code", "execution_count": 41, "id": "f6234c1e", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['Gallia',\n", " 'est',\n", " 'omnis',\n", " 'divisa',\n", " 'in',\n", " 'partes',\n", " 'tres',\n", " ',',\n", " 'quarum',\n", " 'unam']" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# have a look at the first 10 words\n", "\n", "cltk_doc.tokens[:10] # note that punctuation is included here" ] }, { "cell_type": "code", "execution_count": 47, "id": "c2e9aeb9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Gallia',\n", " 'est',\n", " 'omnis',\n", " 'divisa',\n", " 'in',\n", " 'partes',\n", " 'tres',\n", " 'quarum',\n", " 'unam',\n", " 'incolunt']" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's remove punctuation\n", "\n", "caesar_tokens_no_punct = [token for token in cltk_doc.tokens if token not in ['.', ',', ':', ';']]\n", "caesar_word_tokens_no_punct[:10]" ] }, { "cell_type": "code", "execution_count": 43, "id": "264c2d64", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Gallia', 'sum', 'omnis', 'divisa', 'in', 'pars', 'tres', ',', 'qui', 'unus']" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# instead of tokens(words), let's find the root words, or the \"lemmata\"\n", "\n", "cltk_doc.lemmata[:10]" ] }, { "cell_type": "code", "execution_count": 49, "id": "55e72be4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Gallia',\n", " 'sum',\n", " 'omnis',\n", " 'divisa',\n", " 'in',\n", " 'pars',\n", " 'tres',\n", " 'qui',\n", " 'unus',\n", " 'incaleo']" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's remove punctuation\n", "\n", "caesar_lemmata_no_punct = [token for token in cltk_doc.lemmata if token not in ['.', ',', ':', ';']]\n", "caesar_lemmata_no_punct[:10]" ] }, { "cell_type": "markdown", "id": "65ddfce2", "metadata": {}, "source": [ "# Book 1" ] }, { "cell_type": "markdown", "id": "4c889c94", "metadata": {}, "source": [ "## A cursory look at Book 1 reveals the German King Ariovistus is the enemy most often mentioned by Caesar. Exactly how many times can we find Ariovistus in Book 1?" ] }, { "cell_type": "code", "execution_count": 7, "id": "cd0f7ee6", "metadata": {}, "outputs": [], "source": [ "from collections import Counter" ] }, { "cell_type": "code", "execution_count": 60, "id": "afcb7bfc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'Gallia': 15,\n", " 'sum': 223,\n", " 'omnis': 66,\n", " 'divisa': 2,\n", " 'in': 177,\n", " 'pars': 26,\n", " 'tres': 7,\n", " 'qui': 213,\n", " 'unus': 23,\n", " 'incaleo': 4,\n", " 'Belgae': 3,\n", " 'alius': 12,\n", " 'Aquitani': 1,\n", " 'tertius': 10,\n", " 'ipse': 44,\n", " 'lingua': 3,\n", " 'Celtae': 1,\n", " 'noster': 39,\n", " 'Galli': 1,\n", " 'appello': 9,\n", " 'is': 269,\n", " 'instituo': 3,\n", " 'lex': 3,\n", " 'inter': 17,\n", " 'se': 162,\n", " 'differo': 1,\n", " 'Gallos': 4,\n", " 'ab': 102,\n", " 'Aquitanis': 1,\n", " 'Garumna': 3,\n", " 'flumen': 21,\n", " 'Belgis': 1,\n", " 'Matrona': 1,\n", " 'et': 193,\n", " 'Sequana': 1,\n", " 'disco': 5,\n", " 'fortis': 2,\n", " 'propterea': 15,\n", " 'quod': 82,\n", " 'cultus': 2,\n", " 'atque': 75,\n", " 'humanitas': 2,\n", " 'provinciae': 8,\n", " 'longe': 7,\n", " 'absum': 8,\n", " 'minimeque': 1,\n", " 'ad': 107,\n", " 'mercator': 1,\n", " 'saepe': 5,\n", " 'commeo': 1,\n", " 'effemino': 1,\n", " 'animus': 10,\n", " 'pertineo': 6,\n", " 'importo': 1,\n", " 'proximique': 1,\n", " 'Germanis': 5,\n", " 'trans': 7,\n", " 'Rhenum': 15,\n", " 'quicum': 3,\n", " 'continenter': 2,\n", " 'bellum': 29,\n", " 'gero': 8,\n", " 'Qua': 2,\n", " 'de': 36,\n", " 'causa': 25,\n", " 'Helvetii': 23,\n", " 'quoque': 1,\n", " 'reliquus': 17,\n", " 'virtute': 10,\n", " 'praecedo': 1,\n", " 'fere': 3,\n", " 'cotidianus': 2,\n", " 'proelium': 32,\n", " 'cum': 80,\n", " 'contendo': 18,\n", " 'aut': 35,\n", " 'suus': 104,\n", " 'finis': 37,\n", " 'prohibeo': 11,\n", " 'obtineo': 7,\n", " 'dico': 33,\n", " 'initium': 3,\n", " 'capio': 6,\n", " 'Rhodano': 2,\n", " 'contineo': 5,\n", " 'Oceanum': 2,\n", " 'Belgarum': 1,\n", " 'attingo': 1,\n", " 'etiam': 17,\n", " 'Sequanis': 10,\n", " 'Helvetiis': 11,\n", " 'vergit': 1,\n", " 'septentrio': 4,\n", " 'exterior': 2,\n", " 'Galliae': 22,\n", " 'orior': 2,\n", " 'inferus': 1,\n", " 'Rheni': 3,\n", " 'specto': 3,\n", " 'sol': 2,\n", " 'Aquitania': 1,\n", " 'Pyrenaeos': 1,\n", " 'mons': 20,\n", " 'Hispaniam': 1,\n", " 'occasum': 1,\n", " 'Apud': 1,\n", " 'Helvetios': 14,\n", " 'nobilis': 6,\n", " 'ditussus': 1,\n", " 'Orgetorix': 8,\n", " 'Is': 6,\n", " 'M.': 7,\n", " 'Messala': 2,\n", " '[et': 1,\n", " 'P.': 2,\n", " ']': 12,\n", " 'Pisone': 3,\n", " 'consulus': 3,\n", " 'regnum': 7,\n", " 'cupiditas': 3,\n", " 'induco': 2,\n", " 'coniuratio': 1,\n", " 'nobilitas': 2,\n", " 'facio': 75,\n", " 'civitati': 1,\n", " 'persuatio': 2,\n", " 'ut': 73,\n", " 'copua': 5,\n", " 'exeo': 4,\n", " 'perfacilis': 1,\n", " 'praesto': 2,\n", " 'totus': 16,\n", " 'imperium': 8,\n", " 'potior': 3,\n", " 'hic': 50,\n", " 'faciliter': 3,\n", " 'undique': 2,\n", " 'locus': 32,\n", " 'natura': 3,\n", " 'ex': 59,\n", " 'Rheno': 1,\n", " 'latus': 1,\n", " 'altus': 3,\n", " 'ager': 16,\n", " 'Helvetium': 1,\n", " 'alter': 11,\n", " 'Iura': 1,\n", " 'Sequanos': 10,\n", " 'lacus': 2,\n", " 'Lemanno': 2,\n", " 'provinciam': 11,\n", " 'His': 4,\n", " 'res': 71,\n", " 'minus': 15,\n", " 'lates': 1,\n", " 'vagarentur': 1,\n", " 'facile': 3,\n", " 'finisimus': 3,\n", " 'infero': 11,\n", " 'possum': 65,\n", " 'homo': 21,\n", " 'bello': 1,\n", " 'cupidus': 3,\n", " 'magnus': 47,\n", " 'dolor': 3,\n", " 'adficio': 2,\n", " 'Pro': 1,\n", " 'multitudo': 11,\n", " 'autem': 8,\n", " 'pro': 14,\n", " 'gloria': 1,\n", " 'fortitudo': 1,\n", " 'angustos': 1,\n", " 'habeo': 45,\n", " 'arbitror': 6,\n", " 'longitudo': 1,\n", " 'minius': 3,\n", " 'passus': 11,\n", " 'CCXL': 1,\n", " 'latitudo': 1,\n", " 'CLXXX': 1,\n", " 'pateo': 2,\n", " 'adduco': 12,\n", " 'auctoritas': 4,\n", " 'permo': 1,\n", " 'constituo': 11,\n", " 'proficiscor': 6,\n", " 'comparo': 5,\n", " 'iumentum': 1,\n", " 'carrum': 1,\n", " 'quam': 27,\n", " 'numerus': 12,\n", " 'coemo': 1,\n", " 'sementis': 1,\n", " 'itiner': 12,\n", " 'copia': 23,\n", " 'frumentum': 13,\n", " 'suppeto': 2,\n", " 'proximus': 8,\n", " 'civitatibus': 1,\n", " 'pax': 9,\n", " 'amicitia': 11,\n", " 'confirmo': 5,\n", " 'Ad': 4,\n", " 'conficio': 8,\n", " 'biennium': 1,\n", " 'satis': 10,\n", " 'duco': 10,\n", " 'annus': 7,\n", " 'profectio': 2,\n", " 'deligo': 6,\n", " 'legatio': 4,\n", " 'civitates': 6,\n", " 'suscipio': 4,\n", " 'In': 3,\n", " 'persuadeo': 4,\n", " 'Castico': 1,\n", " 'Catamantaloedis': 1,\n", " 'filius': 4,\n", " 'Sequano': 1,\n", " 'pater': 6,\n", " 'multus': 21,\n", " 'senatus': 8,\n", " 'populus': 43,\n", " 'Romani': 30,\n", " 'amicus': 8,\n", " 'civitate': 6,\n", " 'occupo': 11,\n", " 'ante': 11,\n", " 'item': 4,\n", " 'Dumnorigi': 2,\n", " 'Haeduo': 1,\n", " 'frater': 11,\n", " 'Diviciaci': 4,\n", " 'tempus': 14,\n", " 'principatus': 4,\n", " 'magne': 4,\n", " 'plebs': 3,\n", " 'accipio': 8,\n", " 'idem': 19,\n", " 'conor': 10,\n", " 'filia': 3,\n", " 'matrimonium': 2,\n", " 'do': 20,\n", " 'Perfacile': 1,\n", " 'factus': 1,\n", " 'ille': 18,\n", " 'probo': 1,\n", " 'perficio': 3,\n", " 'civitatis': 5,\n", " 'obtendo': 1,\n", " 'non': 88,\n", " 'dubius': 1,\n", " 'quin': 6,\n", " 'suusquis': 1,\n", " 'exercitus': 27,\n", " 'regna': 1,\n", " 'concilio': 1,\n", " 'Hac': 4,\n", " 'oratio': 8,\n", " 'fides': 6,\n", " 'ius': 12,\n", " 'iuro': 5,\n", " 'per': 33,\n", " 'potens': 1,\n", " 'firmus': 2,\n", " 'spero': 2,\n", " 'indicium': 1,\n", " 'enuntio': 7,\n", " 'Moribus': 1,\n", " 'vinculis': 1,\n", " 'coego': 1,\n", " 'damno': 1,\n", " 'poena': 3,\n", " 'sequor': 7,\n", " 'oportet': 9,\n", " 'ignis': 2,\n", " 'cremoreo': 1,\n", " 'Die': 1,\n", " 'dictio': 1,\n", " 'iudicium': 4,\n", " 'familia': 2,\n", " 'mile': 14,\n", " 'decem': 1,\n", " 'unusquisque': 1,\n", " 'cogo': 5,\n", " 'cliens': 2,\n", " 'obaero': 1,\n", " 'conduco': 1,\n", " 'ne': 31,\n", " 'eripuo': 1,\n", " 'Cum': 8,\n", " 'civitas': 4,\n", " 'ob': 5,\n", " 'incito': 1,\n", " 'armis': 9,\n", " 'exsequor': 1,\n", " 'magistro': 3,\n", " 'morior': 1,\n", " 'neque': 50,\n", " 'suspicium': 1,\n", " 'mors': 3,\n", " 'consciverit': 1,\n", " 'Post': 1,\n", " 'nihil': 6,\n", " 'Ubi': 8,\n", " 'iam': 6,\n", " 'paro': 5,\n", " 'oppida': 4,\n", " 'duodecim': 1,\n", " 'vicos': 2,\n", " 'quadringo': 1,\n", " 'privata': 1,\n", " 'aedificium': 1,\n", " 'incendo': 2,\n", " 'praeter': 4,\n", " 'porto': 1,\n", " 'comburo': 1,\n", " 'domus': 14,\n", " 'reditio': 1,\n", " 'spes': 10,\n", " 'tollo': 4,\n", " 'periculum': 10,\n", " 'subeo': 2,\n", " 'mensus': 1,\n", " 'molitus': 1,\n", " 'cibarius': 1,\n", " 'quisque': 6,\n", " 'effero': 1,\n", " 'iubeo': 7,\n", " 'Persuadent': 1,\n", " 'Rauracis': 1,\n", " 'Tulingis': 1,\n", " 'Latobrigis': 1,\n", " 'utor': 32,\n", " 'consilium': 12,\n", " 'oppis': 2,\n", " 'vicisque': 1,\n", " 'exsumo': 1,\n", " 'Boiosque': 1,\n", " 'incolo': 1,\n", " 'Noricum': 1,\n", " 'transisro': 1,\n", " 'Noreiamque': 1,\n", " 'oppugno': 4,\n", " 'recipio': 10,\n", " 'socius': 6,\n", " 'ascisco': 1,\n", " 'Erant': 1,\n", " 'omnino': 7,\n", " 'duo': 10,\n", " 'itener': 5,\n", " 'angustus': 1,\n", " 'difficilis': 2,\n", " 'Iuram': 2,\n", " 'Rhodanum': 5,\n", " 'vix': 1,\n", " 'singulus': 5,\n", " 'caro': 3,\n", " 'impendo': 1,\n", " 'perpaux': 2,\n", " 'multo': 3,\n", " 'expedio': 3,\n", " 'Helvetiorum': 13,\n", " 'Allobrogum': 4,\n", " 'nuper': 3,\n", " 'paco': 1,\n", " 'Rhodanus': 2,\n", " 'fluo': 2,\n", " 'isqe': 1,\n", " 'nullus': 7,\n", " 'vado': 1,\n", " 'transeo': 13,\n", " 'Extremum': 1,\n", " 'oppidus': 4,\n", " 'proximusque': 1,\n", " 'Genava': 1,\n", " 'Ex': 4,\n", " 'oppido': 5,\n", " 'pons': 3,\n", " 'Allobrogibus': 4,\n", " 'vel': 7,\n", " 'persuatuo': 1,\n", " 'nondum': 2,\n", " 'bonum': 2,\n", " 'Romanum': 6,\n", " 'viderentur': 2,\n", " 'existimo': 12,\n", " 'vi': 1,\n", " 'cogco': 1,\n", " 'eo': 7,\n", " 'patior': 8,\n", " 'Omnus': 1,\n", " 'dies': 34,\n", " 'ripa': 4,\n", " 'Rhodani': 2,\n", " 'conveniant': 1,\n", " 'a.': 1,\n", " 'd.': 1,\n", " 'V.': 1,\n", " 'Kal.': 1,\n", " 'Apr.': 1,\n", " 'L.': 7,\n", " 'A.': 1,\n", " 'Gabinio': 1,\n", " 'Caesari': 12,\n", " 'nuntio': 6,\n", " 'iter': 16,\n", " 'maturo': 2,\n", " 'urbs': 2,\n", " 'Galliam': 13,\n", " 'ultimus': 4,\n", " 'Genavam': 2,\n", " 'pervenit': 4,\n", " 'Provinciae': 1,\n", " 'miles': 15,\n", " 'impero': 9,\n", " '(': 4,\n", " 'legium': 2,\n", " ')': 4,\n", " 'rescindo': 1,\n", " 'adventu': 5,\n", " 'certus': 7,\n", " 'legatum': 15,\n", " 'mitto': 25,\n", " 'Nammeius': 1,\n", " 'Verucloetius': 1,\n", " 'princeps': 6,\n", " 'sine': 12,\n", " 'ullus': 4,\n", " 'maleficium': 3,\n", " 'rogo': 5,\n", " 'voluntate': 7,\n", " 'licet': 8,\n", " 'Caesar': 43,\n", " 'memoria': 6,\n", " 'teneo': 11,\n", " 'Cassium': 3,\n", " 'consul': 2,\n", " 'occis': 2,\n", " 'pulgo': 3,\n", " 'sub': 7,\n", " 'iugum': 3,\n", " 'concedo': 4,\n", " 'puto': 7,\n", " 'inimicus': 2,\n", " 'facultas': 5,\n", " 'tempero': 2,\n", " 'iniurius': 16,\n", " 'Tamen': 1,\n", " 'spatium': 5,\n", " 'intercedo': 3,\n", " 'dum': 2,\n", " 'convenirent': 1,\n", " 'respondeo': 8,\n", " 'delibero': 1,\n", " 'sumo': 4,\n", " 'si': 43,\n", " 'quis': 16,\n", " 'vellent': 5,\n", " 'April.': 1,\n", " 'reverterentur': 1,\n", " 'Interea': 1,\n", " 'legio': 17,\n", " 'miletis': 1,\n", " 'provincia': 4,\n", " 'convenerant': 1,\n", " 'influo': 2,\n", " 'Sequanorum': 7,\n", " 'XVIIII': 1,\n", " 'murus': 2,\n", " 'altitudo': 3,\n", " 'pes': 5,\n", " 'sedecim': 1,\n", " 'fossamque': 1,\n", " 'perduco': 1,\n", " 'Eo': 7,\n", " 'opus': 6,\n", " 'perfectus': 2,\n", " 'praesidium': 6,\n", " 'dispono': 1,\n", " 'castella': 1,\n", " 'communis': 5,\n", " 'invito': 2,\n", " 'venit': 2,\n", " 'lego': 7,\n", " 'reverterunt': 2,\n", " 'nego': 1,\n", " 'mos': 1,\n", " 'exemplum': 2,\n", " 'vim': 3,\n", " 'ostendo': 5,\n", " 'deiecto': 1,\n", " 'navibus': 2,\n", " 'iungo': 2,\n", " 'ratio': 6,\n", " 'complur': 2,\n", " 'vadis': 1,\n", " 'paruus': 6,\n", " 'numquam': 2,\n", " 'interdio': 1,\n", " 'saepius': 2,\n", " 'noctus': 1,\n", " 'perrumpo': 1,\n", " 'munitio': 3,\n", " 'concursus': 1,\n", " 'teles': 1,\n", " 'repuldo': 1,\n", " 'conatus': 1,\n", " 'desisto': 4,\n", " 'Relinquebatur': 1,\n", " 'via': 1,\n", " 'invitis': 1,\n", " 'propter': 12,\n", " 'angustia': 3,\n", " 'spons': 2,\n", " 'Dumnorigem': 4,\n", " 'Haeduum': 1,\n", " 'deprecator': 1,\n", " 'impetro': 5,\n", " 'Dumnorix': 2,\n", " 'gratia': 13,\n", " 'largitio': 1,\n", " 'apud': 12,\n", " 'novis': 1,\n", " 'studeo': 1,\n", " 'beneficium': 7,\n", " 'obstriho': 1,\n", " 'volebat': 2,\n", " 'Itaque': 4,\n", " 'Sequani': 3,\n", " 'renuntio': 2,\n", " 'Haeduorum': 10,\n", " 'Santonum': 1,\n", " 'Tolosatium': 1,\n", " 'intellego': 13,\n", " 'bellicosus': 1,\n", " 'maximeque': 1,\n", " 'frumentarius': 4,\n", " 'finitimus': 2,\n", " 'Ob': 2,\n", " 'T.': 2,\n", " 'Labienum': 3,\n", " 'praefico': 1,\n", " 'Italiam': 2,\n", " 'dui': 1,\n", " 'ibi': 4,\n", " 'conscribo': 2,\n", " 'circum': 3,\n", " 'Aquileiam': 1,\n", " 'hiemo': 1,\n", " 'hibernus': 3,\n", " 'educo': 4,\n", " 'Alpes': 1,\n", " 'quinque': 1,\n", " 'Ibi': 3,\n", " 'Ceutrones': 1,\n", " 'Graioceli': 1,\n", " 'Caturiges': 1,\n", " 'superior': 5,\n", " 'Compluribus': 1,\n", " 'puldeo': 3,\n", " 'Ocelus': 1,\n", " 'citer': 3,\n", " 'Vocontiorum': 1,\n", " 'septimus': 2,\n", " 'inde': 2,\n", " 'Segusiavos': 1,\n", " 'extra': 1,\n", " 'primus': 13,\n", " 'traduxo': 1,\n", " 'pervenerant': 1,\n", " 'iderque': 3,\n", " 'populo': 2,\n", " 'Haedui': 8,\n", " 'defendo': 4,\n", " 'Caesarem': 17,\n", " 'auxilium': 12,\n", " 'ita': 12,\n", " 'Romano': 10,\n", " 'mereo': 3,\n", " 'paes': 2,\n", " 'conspectus': 3,\n", " 'vastari': 1,\n", " 'liber': 5,\n", " '[eorum': 1,\n", " 'servitutem': 2,\n", " 'abduco': 1,\n", " 'expugno': 1,\n", " 'debeo': 3,\n", " 'Ambarri': 1,\n", " 'necessarius': 3,\n", " 'consanguineus': 2,\n", " 'depopulo': 1,\n", " 'hostius': 14,\n", " 'Item': 4,\n", " 'Allobroges': 1,\n", " 'possessio': 3,\n", " 'fuga': 10,\n", " 'demonstro': 1,\n", " 'solum': 5,\n", " 'Quibus': 4,\n", " 'exspecto': 3,\n", " 'statuo': 5,\n", " 'fortuna': 5,\n", " 'socium': 1,\n", " 'consumo': 1,\n", " 'Santonos': 1,\n", " 'pervenirent': 1,\n", " 'Flumen': 1,\n", " 'Arar': 1,\n", " 'incredibilis': 2,\n", " 'lenitas': 1,\n", " 'oculus': 2,\n", " 'uter': 1,\n", " 'iudico': 6,\n", " 'ratus': 1,\n", " 'lintus': 2,\n", " 'explorator': 5,\n", " 'trado': 5,\n", " 'quartus': 4,\n", " 'vero': 3,\n", " 'citra': 1,\n", " 'Ararim': 1,\n", " 'vigilia': 5,\n", " 'castra': 40,\n", " 'profectus': 3,\n", " 'transero': 1,\n", " 'Eos': 1,\n", " 'impedio': 3,\n", " 'inopino': 1,\n", " 'adgredior': 2,\n", " 'concido': 1,\n", " 'mando': 4,\n", " 'silvas': 1,\n", " 'abdo': 2,\n", " 'pagus': 2,\n", " 'Tigurinus': 1,\n", " 'nam': 3,\n", " 'Helvetia': 1,\n", " 'quattuor': 4,\n", " 'Hic': 3,\n", " 'exedo': 2,\n", " 'interfecio': 2,\n", " 'Ita': 5,\n", " 'sive': 5,\n", " 'casus': 1,\n", " 'deus': 2,\n", " 'immortalis': 2,\n", " 'Helvetiae': 1,\n", " 'insignus': 2,\n", " 'calamitas': 5,\n", " 'persolo': 1,\n", " 'publicus': 5,\n", " 'sed': 19,\n", " 'privatas': 1,\n", " 'ultus': 1,\n", " 'socer': 1,\n", " 'Pisonis': 1,\n", " 'avum': 1,\n", " 'Pisonem': 1,\n", " 'Tigurini': 1,\n", " 'Hoc': 3,\n", " 'consequor': 3,\n", " 'Arari': 3,\n", " 'curo': 2,\n", " 'traduco': 7,\n", " 'repentinus': 1,\n", " 'commo': 2,\n", " 'XX': 3,\n", " 'aegerrime': 1,\n", " 'Divico': 2,\n", " 'Cassiano': 1,\n", " 'dux': 2,\n", " 'Caesare': 8,\n", " 'ago': 8,\n", " 'Romanus': 2,\n", " 'iturus': 1,\n", " 'ubi': 4,\n", " 'esse': 5,\n", " 'voluisset': 2,\n", " 'sin': 1,\n", " 'persequor': 1,\n", " 'perseveraret': 1,\n", " 'reminiscor': 1,\n", " 'veteris': 2,\n", " 'incommodus': 1,\n", " 'pristinus': 1,\n", " 'virtutis': 2,\n", " 'Quod': 15,\n", " 'improviso': 1,\n", " 'pag': 1,\n", " 'adoro': 1,\n", " 'ii': 3,\n", " 'fero': 8,\n", " 'magnopere': 2,\n", " 'virtuti': 1,\n", " 'tribuo': 2,\n", " 'despiceo': 1,\n", " 'ego': 3,\n", " 'magis': 4,\n", " 'dolus': 1,\n", " 'insidius': 1,\n", " 'nitor': 1,\n", " 'Quare': 1,\n", " 'committo': 10,\n", " 'constitto': 2,\n", " 'internecio': 1,\n", " 'nomen': 4,\n", " 'prodeo': 3,\n", " 'dubitatio': 1,\n", " 'commemoro': 2,\n", " 'gravius': 5,\n", " 'meritum': 1,\n", " 'accido': 8,\n", " 'aliqui': 4,\n", " 'conscius': 1,\n", " 'cavere': 1,\n", " 'decipio': 1,\n", " 'quare': 4,\n", " 'timeo': 5,\n", " 'contumelia': 1,\n", " 'oblivisci': 1,\n", " 'vellet': 6,\n", " 'num': 1,\n", " 'recentius': 1,\n", " 'tempto': 2,\n", " 'Haeduos': 10,\n", " 'Ambarros': 1,\n", " 'Allobrogas': 1,\n", " 'vexassent': 1,\n", " 'depono': 1,\n", " '?': 9,\n", " 'victoria': 3,\n", " 'tam': 7,\n", " 'insolenter': 1,\n", " 'glorio': 1,\n", " 'quodque': 2,\n", " 'diu': 5,\n", " 'impeno': 1,\n", " 'toleo': 2,\n", " 'admiror': 1,\n", " 'Consuesse': 1,\n", " 'enim': 3,\n", " 'commutatio': 1,\n", " 'doleo': 1,\n", " 'scelus': 1,\n", " 'ulcio': 1,\n", " 'velint': 3,\n", " 'secundus': 3,\n", " 'interdum': 2,\n", " 'diuturnus': 1,\n", " 'impunitas': 1,\n", " 'tamen': 8,\n", " 'obsis': 17,\n", " 'polliceo': 1,\n", " 'Haeduis': 11,\n", " 'sociisque': 2,\n", " 'consuero': 3,\n", " 'testis': 2,\n", " 'responluo': 1,\n", " 'discedo': 5,\n", " 'Postero': 1,\n", " 'movent': 1,\n", " 'Idem': 1,\n", " 'equitasque': 3,\n", " 'mineus': 4,\n", " 'praemitto': 2,\n", " 'videant': 1,\n", " 'hos': 14,\n", " 'Qui': 5,\n", " 'cupidae': 1,\n", " 'nosus': 4,\n", " 'agmen': 3,\n", " 'insequor': 4,\n", " 'alienus': 1,\n", " 'equitas': 12,\n", " 'paucus': 7,\n", " 'cado': 1,\n", " 'Quo': 1,\n", " 'quingo': 2,\n", " 'equis': 7,\n", " 'tantus': 9,\n", " 'equitus': 8,\n", " 'propulo': 1,\n", " 'audacius': 2,\n", " 'subsisto': 1,\n", " 'agmo': 2,\n", " 'lacesso': 3,\n", " 'coepi': 9,\n", " 'praesentia': 1,\n", " 'hoso': 4,\n", " 'rapina': 1,\n", " 'pabulatio': 1,\n", " 'populatio': 1,\n", " 'circiter': 10,\n", " 'XV': 3,\n", " 'ample': 7,\n", " 'quinis': 1,\n", " 'senis': 1,\n", " 'milus': 5,\n", " 'interseo': 1,\n", " 'Interim': 2,\n", " 'cognodo': 1,\n", " 'publice': 1,\n", " 'pollicio': 4,\n", " 'flagito': 1,\n", " 'Nam': 3,\n", " 'frigus': 1,\n", " '[': 3,\n", " 'pono': 4,\n", " 'modus': 7,\n", " 'maturus': 2,\n", " 'pabulus': 1,\n", " 'quidem': 8,\n", " 'subvexerat': 1,\n", " 'averterant': 1,\n", " 'nolo': 3,\n", " 'Diem': 1,\n", " 'confero': 9,\n", " 'comporto': 1,\n", " 'adsum': 3,\n", " 'diutius': 3,\n", " 'inster': 2,\n", " 'metior': 2,\n", " 'convocatis': 1,\n", " 'principis': 1,\n", " 'Diviciaco': 3,\n", " 'Lisco': 1,\n", " 'summus': 15,\n", " 'praeo': 3,\n", " 'vergobretum': 1,\n", " 'creo': 1,\n", " 'annuus': 1,\n", " 'vitae': 1,\n", " 'nex': 1,\n", " 'potestas': 5,\n", " 'graviter': 1,\n", " 'accuso': 2,\n", " 'necessario': 2,\n", " 'propinquus': 2,\n", " 'hostis': 5,\n", " 'sublevetur': 1,\n", " 'praesertim': 2,\n", " 'prex': 2,\n", " 'destituo': 1,\n", " 'quer': 1,\n", " 'Tum': 3,\n", " 'demum': 3,\n", " 'Liscus': 1,\n", " 'Caesaris': 7,\n", " 'antea': 3,\n", " 'taceo': 2,\n", " 'propono': 2,\n", " 'valeat': 1,\n", " 'privatim': 1,\n", " 'plus': 4,\n", " 'Hos': 1,\n", " 'seditio': 1,\n", " 'imprus': 1,\n", " 'deterreo': 2,\n", " 'Gallorum': 8,\n", " 'Romanorum': 1,\n", " 'perfero': 2,\n", " 'dubito': 3,\n", " '[debeant': 1,\n", " 'supero': 6,\n", " 'libertas': 2,\n", " 'ero': 2,\n", " 'Ab': 1,\n", " 'coerceo': 1,\n", " 'Quin': 1,\n", " 'quanto': 1,\n", " 'Lisci': 1,\n", " 'designo': 1,\n", " 'sentio': 2,\n", " 'praesum': 5,\n", " 'iacto': 2,\n", " 'celeriter': 2,\n", " 'concilium': 6,\n", " 'dimitto': 3,\n", " 'Liscum': 1,\n", " 'retineo': 2,\n", " 'Quaerit': 1,\n", " 'solus': 4,\n", " 'conventu': 1,\n", " 'Dicit': 1,\n", " 'libere': 1,\n", " 'secretum': 2,\n", " 'quaero': 7,\n", " 'repereo': 1,\n", " 'vera': 2,\n", " 'audacium': 1,\n", " 'liberalitas': 2,\n", " 'novarum': 1,\n", " 'Complures': 1,\n", " 'portoria': 1,\n", " 'resiquuque': 1,\n", " 'vectigalia': 2,\n", " 'parvo': 1,\n", " 'pretium': 1,\n", " 'redeoueo': 1,\n", " 'contra': 3,\n", " 'audeo': 4,\n", " 'nemo': 5,\n", " 'familiaris': 4,\n", " 'augio': 1,\n", " 'largior': 1,\n", " 'equitatus': 1,\n", " 'sumptus': 1,\n", " 'semper': 1,\n", " 'aliero': 1,\n", " 'largiter': 1,\n", " 'potentia': 2,\n", " 'mater': 3,\n", " 'Biturigibus': 1,\n", " 'illic': 1,\n", " 'conloco': 5,\n", " 'uxor': 2,\n", " 'sorus': 1,\n", " 'nuptus': 1,\n", " 'Favere': 1,\n", " 'cupeo': 1,\n", " 'adfinitas': 1,\n", " 'odi': 1,\n", " 'Romanos': 2,\n", " 'deminuo': 2,\n", " 'Diviciacus': 3,\n", " 'antiquus': 2,\n", " 'honor': 2,\n", " 'restituo': 4,\n", " 'Si': 6,\n", " 'Romanis': 4,\n", " 'venire': 6,\n", " 'despero': 4,\n", " 'Reperiebat': 1,\n", " 'equester': 2,\n", " 'adversum': 2,\n", " 'Dumnorige': 2,\n", " 'perterro': 5,\n", " 'cognosco': 12,\n", " 'suspicio': 3,\n", " 'accedo': 3,\n", " 'iniussus': 1,\n", " 'inscio': 1,\n", " 'magistratus': 1,\n", " 'animadverteret': 1,\n", " 'civitatem': 3,\n", " 'animadvertere': 1,\n", " 'repugno': 1,\n", " 'studium': 1,\n", " 'voluntatem': 2,\n", " 'egregius': 2,\n", " 'iustitia': 1,\n", " 'temperantia': 1,\n", " 'cognoverat': 1,\n", " 'supplicium': 3,\n", " 'offendo': 1,\n", " 'verebatur': 1,\n", " 'prius': 4,\n", " 'quisquam': 3,\n", " 'Diviciacum': 2,\n", " 'vocari': 1,\n", " 'interpretus': 1,\n", " 'remoueo': 3,\n", " 'C.': 6,\n", " 'Valerium': 2,\n", " 'Troucillum': 1,\n", " 'conloquor': 2,\n", " 'simul': 1,\n", " 'commonefacio': 1,\n", " '[Gallorum': 1,\n", " 'separatim': 2,\n", " 'Petit': 1,\n", " 'hortor': 1,\n", " 'offensio': 1,\n", " 'lacrima': 2,\n", " 'compleo': 2,\n", " 'obsecro': 1,\n", " 'scio': 4,\n", " 'nec': 2,\n", " 'quimquam': 1,\n", " 'domeo': 1,\n", " 'adulescentia': 1,\n", " 'credo': 1,\n", " 'nervis': 1,\n", " 'minuo': 1,\n", " 'paene': 1,\n", " 'pernicies': 1,\n", " 'Sese': 1,\n", " 'amor': 1,\n", " 'fraternus': 2,\n", " 'existimatio': 1,\n", " 'vulgi': 1,\n", " 'commoveri': 2,\n", " 'averterentur': 1,\n", " 'Haec': 5,\n", " 'verbis': 2,\n", " 'fleo': 1,\n", " 'peto': 10,\n", " ...})" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# here is a dictionary of every word in the Book 1 along with how many times each word appears\n", "\n", "caesar_word_counts = Counter(caesar_lemmata_no_punct)\n", "caesar_word_counts" ] }, { "cell_type": "code", "execution_count": 93, "id": "db365cc0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "caesar_word_counts['Ariovistus']" ] }, { "cell_type": "code", "execution_count": 76, "id": "3babdc48", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "caesar_word_counts['Ariovistum']" ] }, { "cell_type": "markdown", "id": "74495760", "metadata": {}, "source": [ "## The above two lines show that the lemmatizer does not work for proper names. We'll have to search the text for every grammatical case" ] }, { "cell_type": "code", "execution_count": 84, "id": "c73fffd1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "42\n" ] } ], "source": [ "# show me how many times Ariovistus is named in this text, for every case of the word \"Ariovistus\", namely: nominative, vocative, accusative, genative, dative, ablative\n", "\n", "\n", "nom = caesar_word_counts['Ariovistus']\n", "voc = caesar_word_counts['Arioviste']\n", "acc = caesar_word_counts['Ariovistum']\n", "gen = caesar_word_counts['Ariovisti']\n", "abl = caesar_word_counts['Ariovisto'] # same as dative case\n", "\n", "print(nom + acc + voc + gen + abl)" ] }, { "cell_type": "markdown", "id": "b9e7aee4", "metadata": {}, "source": [ "# Book 2\n", "## Let's do the same processing on Book 2, simplifying the code as we go. We will choose another target, the Druid Diviciacus" ] }, { "cell_type": "code", "execution_count": 100, "id": "2eb5d063", "metadata": {}, "outputs": [], "source": [ "fo.close()\n", "with open(\"gall2.txt\") as fo:\n", " caesar_book2 = fo.read()\n", "\n", "cltk_doc2 = cltk_nlp.analyze(text=caesar_book2)" ] }, { "cell_type": "code", "execution_count": 113, "id": "bdf36811", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "caesar_word_counts = Counter(cltk_doc2.tokens)\n", "nom = caesar_word_counts['Diviciacus']\n", "voc = caesar_word_counts['Diviciace']\n", "acc = caesar_word_counts['Diviciacum']\n", "gen = caesar_word_counts['Diviciaci']\n", "abl = caesar_word_counts['Diviciaco'] # same as dative case\n", "print(nom + acc + voc + gen + abl)\n" ] }, { "cell_type": "markdown", "id": "c4760968", "metadata": {}, "source": [ "# Book 3\n", "## Viridovix, the Gallic Chieftan" ] }, { "cell_type": "code", "execution_count": 18, "id": "ccb05754", "metadata": {}, "outputs": [], "source": [ "fo.close()\n", "with open(\"gall3.txt\") as fo:\n", " caesar_book3 = fo.read()\n", "\n", "cltk_doc3 = cltk_nlp.analyze(text=caesar_book3)" ] }, { "cell_type": "code", "execution_count": 19, "id": "6b4366a4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "caesar_word_counts = Counter(cltk_doc3.tokens)\n", "\n", "nom = caesar_word_counts['Viridovix']\n", "# voc = caesar_word_counts['Viridovix'] # same as nominative\n", "acc = caesar_word_counts['Viridovigem']\n", "gen = caesar_word_counts['Viridovigis']\n", "dat = caesar_word_counts['Viridovigi']\n", "abl = caesar_word_counts['Viridovige']\n", "print(nom + acc + voc + gen + abl)\n" ] }, { "cell_type": "markdown", "id": "ff2a3a78", "metadata": {}, "source": [ "# Book 4\n", "## Ariovistus mentioned again, but just one time. On to Book 5" ] }, { "cell_type": "markdown", "id": "c9feaf60", "metadata": {}, "source": [ "# Book 5\n", "## The Belgic King and Chieftan Ambiorix" ] }, { "cell_type": "code", "execution_count": 15, "id": "db4432da", "metadata": {}, "outputs": [], "source": [ "fo.close()\n", "with open(\"gall5.txt\") as fo:\n", " caesar_book5 = fo.read()\n", "\n", "cltk_doc5 = cltk_nlp.analyze(text=caesar_book5)" ] }, { "cell_type": "code", "execution_count": 17, "id": "ff8b94d2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20\n" ] } ], "source": [ "caesar_word_counts = Counter(cltk_doc5.tokens)\n", "\n", "nom = caesar_word_counts['Ambiorix']\n", "# voc = caesar_word_counts['Ambiorix'] # same as nominative\n", "acc = caesar_word_counts['Ambiorigem']\n", "gen = caesar_word_counts['Ambiorigis']\n", "dat = caesar_word_counts['Ambiorigi']\n", "abl = caesar_word_counts['Ambiorige']\n", "print(nom + acc + voc + gen + abl)\n" ] }, { "cell_type": "markdown", "id": "d484c577", "metadata": {}, "source": [ "# Book 6\n", "## Ambiorix, once more" ] }, { "cell_type": "code", "execution_count": 20, "id": "8b23a458", "metadata": {}, "outputs": [], "source": [ "fo.close()\n", "with open(\"gall6.txt\") as fo:\n", " caesar_book6 = fo.read()\n", "\n", "cltk_doc6 = cltk_nlp.analyze(text=caesar_book6)" ] }, { "cell_type": "code", "execution_count": 21, "id": "65adfaf5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18\n" ] } ], "source": [ "caesar_word_counts = Counter(cltk_doc6.tokens)\n", "\n", "nom = caesar_word_counts['Ambiorix']\n", "# voc = caesar_word_counts['Ambiorix'] # same as nominative\n", "acc = caesar_word_counts['Ambiorigem']\n", "gen = caesar_word_counts['Ambiorigis']\n", "dat = caesar_word_counts['Ambiorigi']\n", "abl = caesar_word_counts['Ambiorige']\n", "print(nom + acc + voc + gen + abl)\n" ] }, { "cell_type": "markdown", "id": "d90188c5", "metadata": {}, "source": [ "# Book 7\n", "## Vercingetorix, King and Chieftan of the Arverni and leader of the unified Gallic revolt against the Romans. Of all the antagonists, he is mentioned most by Caesar." ] }, { "cell_type": "code", "execution_count": 24, "id": "ca39d70e", "metadata": {}, "outputs": [], "source": [ "fo.close()\n", "with open(\"gall7.txt\") as fo:\n", " caesar_book7 = fo.read()\n", "\n", "cltk_doc7 = cltk_nlp.analyze(text=caesar_book7)" ] }, { "cell_type": "code", "execution_count": 26, "id": "204e8dae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "46\n" ] } ], "source": [ "caesar_word_counts = Counter(cltk_doc7.tokens)\n", "\n", "nom = caesar_word_counts['Vercingetorix']\n", "# voc = caesar_word_counts['Vercingetorix'] # same as nominative\n", "acc = caesar_word_counts['Vercingetorigem']\n", "gen = caesar_word_counts['Vercingetorigis']\n", "dat = caesar_word_counts['Vercingetorigi']\n", "abl = caesar_word_counts['Vercingetorige']\n", "print(nom + acc + voc + gen + abl)\n" ] }, { "cell_type": "markdown", "id": "2d5f7bf8", "metadata": {}, "source": [ "# Results" ] }, { "cell_type": "markdown", "id": "6f58d6fe", "metadata": {}, "source": [ "### For these seven books of _The Gallic Wars_ we knew ahead of time who were the main foes Caesar mentions in each book. The task has been to count the number of mentions in each text and to infer their relative importance in the resistance to the Roman campaigns. The quantitative results we arrived at here could also have been found using the search function of a text editor. But the methods provided by the Classical Language Toolkit are appropriate for this text because they take into account the morphology and syntax of the language. Indeed, a long text such as a novel might not easily be handled by a text editor, and a more powerful set of instruments for natural language processing is often required." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }