7.8 KiB
National Film Registry Automation Guide
This guide explains how to automatically pull and setup data for National Film Registry movies from any year.
Overview
The NFR automation system consists of:
setup_nfr.py- Script to fetch LOC announcements and extract film datanew_nfr.py- Script to create blog posts for NFR movies- ollama - Local AI to help extract structured data from web pages
Quick Start
Basic Usage
# Setup data for a specific year
python3 scripts/setup_nfr.py 2023
# With a known URL
python3 scripts/setup_nfr.py 2015 --url "https://newsroom.loc.gov/news/..."
# Without ollama (basic extraction)
python3 scripts/setup_nfr.py 2022 --no-ollama
With Ollama (Recommended)
Ollama provides much better extraction of film descriptions from the LOC announcements.
# Default (uses ollama at 192.168.0.109:11434)
python3 scripts/setup_nfr.py 2023
# Custom ollama host
python3 scripts/setup_nfr.py 2023 --ollama-host http://localhost:11434
# Custom model
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2:latest
Setting Up Ollama
What is Ollama?
Ollama is a tool for running large language models locally. We use it to:
- Parse HTML content from LOC announcements
- Extract film titles, years, and descriptions
- Structure the data into Python dictionaries
Installing Ollama
Your server at 192.168.0.109 should already have ollama running. To verify:
curl http://192.168.0.109:11434/api/tags
If you need to install it locally:
# macOS / Linux
curl https://ollama.ai/install.sh | sh
# Start the server
ollama serve
# Pull a model
ollama pull llama3.2
Ollama Configuration
The script uses these environment variables:
# Set custom ollama host
export OLLAMA_HOST=http://192.168.0.109:11434
# Set custom model (default: llama3.2)
export OLLAMA_MODEL=llama3.2
# Then run the script
python3 scripts/setup_nfr.py 2023
Testing Ollama Connection
Test if ollama is accessible:
# Test API endpoint
curl http://192.168.0.109:11434/api/tags
# Test generation
curl http://192.168.0.109:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Say hello",
"stream": false
}'
How It Works
Step 1: Find the LOC Announcement
The script needs the URL of the Library of Congress announcement for your year. For example:
- 2024: https://newsroom.loc.gov/news/25-films-named-to-national-film-registry-for-preservation/s/55d5285d-916f-4105-b7d4-7fc3ba8664e3
- 2023: Search at https://newsroom.loc.gov/
- Older: Check https://blogs.loc.gov/now-see-hear/
You can provide the URL with --url or the script will prompt you.
Step 2: Fetch the Content
The script downloads the HTML content from the announcement page.
Step 3: Extract Film Data
With ollama (recommended):
- Sends the HTML to ollama
- Asks it to extract all 25 films with titles, years, and descriptions
- Returns structured JSON data
Without ollama (fallback):
- Uses regex patterns to find film titles and years
- May miss descriptions or get incomplete data
- Requires manual review and editing
Step 4: Generate Python Dictionary
Creates a Python file like:
# 2023 National Film Registry inductees with LOC descriptions
# Source: https://newsroom.loc.gov/news/...
NFR_2023 = {
"Film Title": {
"year": 1999,
"description": "Selected for its groundbreaking..."
},
# ... more films
}
Step 5: Integration
The generated file is saved to scripts/nfr_data/nfr_YEAR.py. You can then:
- Review and edit the file
- Copy the dictionary into
scripts/new_nfr.py - Update the script to handle the new year
Complete Example
Let's set up 2023 NFR data:
# 1. Run the setup script
python3 scripts/setup_nfr.py 2023
# The script will prompt:
# > Please find the LOC announcement URL for 2023.
# > Enter the URL: https://newsroom.loc.gov/news/...
# 2. Script fetches and extracts (using ollama)
# ✓ Extracted 25 films
# Preview:
# 1. Terminator 2 (1991)
# Recognized for groundbreaking visual effects...
# ... and 24 more
# 3. Confirm and save
# Save this data? (Y/n): y
# ✓ Saved to scripts/nfr_data/nfr_2023.py
# 4. Review the generated file
cat scripts/nfr_data/nfr_2023.py
# 5. Copy the dictionary into new_nfr.py
# (You can do this manually or we can create a script to merge)
Directory Structure
scripts/
├── setup_nfr.py # Main automation script
├── new_nfr.py # Create blog posts
├── nfr_data/ # Generated NFR data files
│ ├── nfr_2023.py
│ ├── nfr_2024.py
│ └── ...
└── NFR_AUTOMATION.md # This file
Troubleshooting
Ollama Connection Errors
# Check if ollama is running
curl http://192.168.0.109:11434/api/tags
# Check network connectivity
ping 192.168.0.109
# Try with localhost if running locally
python3 scripts/setup_nfr.py 2023 --ollama-host http://localhost:11434
Extraction Problems
If extraction fails:
# Try without ollama first (gets basic structure)
python3 scripts/setup_nfr.py 2023 --no-ollama
# Then manually edit the descriptions in nfr_data/nfr_2023.py
Model Not Found
# On the ollama server, pull the model
ssh user@192.168.0.109
ollama pull llama3.2
# Or use a different model you have
python3 scripts/setup_nfr.py 2023 --ollama-model mistral
Finding LOC Announcements
Recent Years (2010-present)
Check the newsroom:
https://newsroom.loc.gov/
Search for "national film registry" + year
Older Years
Check the blog:
https://blogs.loc.gov/now-see-hear/
Or the registry page:
https://www.loc.gov/programs/national-film-preservation-board/film-registry/
Complete Registry List
For a complete list by year:
https://www.loc.gov/programs/national-film-preservation-board/film-registry/complete-national-film-registry-listing/
Advanced Usage
Custom Output Location
python3 scripts/setup_nfr.py 2023 \
--output /tmp/nfr_2023.py
Batch Processing Multiple Years
# Create a simple loop
for year in 2020 2021 2022 2023; do
python3 scripts/setup_nfr.py $year
done
Using Different AI Models
# Llama 3.2 (default, good balance)
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2
# Mistral (faster, less accurate)
python3 scripts/setup_nfr.py 2023 --ollama-model mistral
# Larger models for better extraction
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2:70b
Integration with new_nfr.py
After generating NFR data, integrate it into new_nfr.py:
Option 1: Manual Copy
- Open
scripts/nfr_data/nfr_2023.py - Copy the
NFR_2023dictionary - Add it to
scripts/new_nfr.pyafterNFR_2024 - Update the
create_nfr_postfunction to checkNFR_2023too
Option 2: Import (Future Enhancement)
# In new_nfr.py
from nfr_data.nfr_2023 import NFR_2023
from nfr_data.nfr_2024 import NFR_2024
NFR_DATA = {
2023: NFR_2023,
2024: NFR_2024,
}
Tips
- Always review the output - AI extraction is good but not perfect
- Keep source URLs - Add them to the generated dictionaries
- Check film counts - Should be 25 films per year
- Verify years - Make sure film years are in reasonable ranges
- Edit descriptions - Feel free to trim or rephrase for your blog
Next Steps
- Generate data for years you want to cover
- Review and edit the descriptions
- Integrate into
new_nfr.py - Start creating blog posts with
python3 scripts/new_nfr.py "Film Title"
Questions?
- Check if ollama is running:
curl http://192.168.0.109:11434/api/tags - Test the script with 2024 (known working):
python3 scripts/setup_nfr.py 2024 - Use
--no-ollamato see basic extraction - Look at generated files in
scripts/nfr_data/