Files
marcus-web/scripts/NFR_AUTOMATION.md

7.8 KiB

National Film Registry Automation Guide

This guide explains how to automatically pull and setup data for National Film Registry movies from any year.

Overview

The NFR automation system consists of:

  1. setup_nfr.py - Script to fetch LOC announcements and extract film data
  2. new_nfr.py - Script to create blog posts for NFR movies
  3. ollama - Local AI to help extract structured data from web pages

Quick Start

Basic Usage

# Setup data for a specific year
python3 scripts/setup_nfr.py 2023

# With a known URL
python3 scripts/setup_nfr.py 2015 --url "https://newsroom.loc.gov/news/..."

# Without ollama (basic extraction)
python3 scripts/setup_nfr.py 2022 --no-ollama

Ollama provides much better extraction of film descriptions from the LOC announcements.

# Default (uses ollama at 192.168.0.109:11434)
python3 scripts/setup_nfr.py 2023

# Custom ollama host
python3 scripts/setup_nfr.py 2023 --ollama-host http://localhost:11434

# Custom model
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2:latest

Setting Up Ollama

What is Ollama?

Ollama is a tool for running large language models locally. We use it to:

  • Parse HTML content from LOC announcements
  • Extract film titles, years, and descriptions
  • Structure the data into Python dictionaries

Installing Ollama

Your server at 192.168.0.109 should already have ollama running. To verify:

curl http://192.168.0.109:11434/api/tags

If you need to install it locally:

# macOS / Linux
curl https://ollama.ai/install.sh | sh

# Start the server
ollama serve

# Pull a model
ollama pull llama3.2

Ollama Configuration

The script uses these environment variables:

# Set custom ollama host
export OLLAMA_HOST=http://192.168.0.109:11434

# Set custom model (default: llama3.2)
export OLLAMA_MODEL=llama3.2

# Then run the script
python3 scripts/setup_nfr.py 2023

Testing Ollama Connection

Test if ollama is accessible:

# Test API endpoint
curl http://192.168.0.109:11434/api/tags

# Test generation
curl http://192.168.0.109:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Say hello",
  "stream": false
}'

How It Works

Step 1: Find the LOC Announcement

The script needs the URL of the Library of Congress announcement for your year. For example:

You can provide the URL with --url or the script will prompt you.

Step 2: Fetch the Content

The script downloads the HTML content from the announcement page.

Step 3: Extract Film Data

With ollama (recommended):

  • Sends the HTML to ollama
  • Asks it to extract all 25 films with titles, years, and descriptions
  • Returns structured JSON data

Without ollama (fallback):

  • Uses regex patterns to find film titles and years
  • May miss descriptions or get incomplete data
  • Requires manual review and editing

Step 4: Generate Python Dictionary

Creates a Python file like:

# 2023 National Film Registry inductees with LOC descriptions
# Source: https://newsroom.loc.gov/news/...
NFR_2023 = {
    "Film Title": {
        "year": 1999,
        "description": "Selected for its groundbreaking..."
    },
    # ... more films
}

Step 5: Integration

The generated file is saved to scripts/nfr_data/nfr_YEAR.py. You can then:

  1. Review and edit the file
  2. Copy the dictionary into scripts/new_nfr.py
  3. Update the script to handle the new year

Complete Example

Let's set up 2023 NFR data:

# 1. Run the setup script
python3 scripts/setup_nfr.py 2023

# The script will prompt:
# > Please find the LOC announcement URL for 2023.
# > Enter the URL: https://newsroom.loc.gov/news/...

# 2. Script fetches and extracts (using ollama)
# ✓ Extracted 25 films
# Preview:
#   1. Terminator 2 (1991)
#      Recognized for groundbreaking visual effects...
#   ... and 24 more

# 3. Confirm and save
# Save this data? (Y/n): y
# ✓ Saved to scripts/nfr_data/nfr_2023.py

# 4. Review the generated file
cat scripts/nfr_data/nfr_2023.py

# 5. Copy the dictionary into new_nfr.py
# (You can do this manually or we can create a script to merge)

Directory Structure

scripts/
├── setup_nfr.py          # Main automation script
├── new_nfr.py            # Create blog posts
├── nfr_data/             # Generated NFR data files
│   ├── nfr_2023.py
│   ├── nfr_2024.py
│   └── ...
└── NFR_AUTOMATION.md     # This file

Troubleshooting

Ollama Connection Errors

# Check if ollama is running
curl http://192.168.0.109:11434/api/tags

# Check network connectivity
ping 192.168.0.109

# Try with localhost if running locally
python3 scripts/setup_nfr.py 2023 --ollama-host http://localhost:11434

Extraction Problems

If extraction fails:

# Try without ollama first (gets basic structure)
python3 scripts/setup_nfr.py 2023 --no-ollama

# Then manually edit the descriptions in nfr_data/nfr_2023.py

Model Not Found

# On the ollama server, pull the model
ssh user@192.168.0.109
ollama pull llama3.2

# Or use a different model you have
python3 scripts/setup_nfr.py 2023 --ollama-model mistral

Finding LOC Announcements

Recent Years (2010-present)

Check the newsroom:

https://newsroom.loc.gov/

Search for "national film registry" + year

Older Years

Check the blog:

https://blogs.loc.gov/now-see-hear/

Or the registry page:

https://www.loc.gov/programs/national-film-preservation-board/film-registry/

Complete Registry List

For a complete list by year:

https://www.loc.gov/programs/national-film-preservation-board/film-registry/complete-national-film-registry-listing/

Advanced Usage

Custom Output Location

python3 scripts/setup_nfr.py 2023 \
  --output /tmp/nfr_2023.py

Batch Processing Multiple Years

# Create a simple loop
for year in 2020 2021 2022 2023; do
  python3 scripts/setup_nfr.py $year
done

Using Different AI Models

# Llama 3.2 (default, good balance)
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2

# Mistral (faster, less accurate)
python3 scripts/setup_nfr.py 2023 --ollama-model mistral

# Larger models for better extraction
python3 scripts/setup_nfr.py 2023 --ollama-model llama3.2:70b

Integration with new_nfr.py

After generating NFR data, integrate it into new_nfr.py:

Option 1: Manual Copy

  1. Open scripts/nfr_data/nfr_2023.py
  2. Copy the NFR_2023 dictionary
  3. Add it to scripts/new_nfr.py after NFR_2024
  4. Update the create_nfr_post function to check NFR_2023 too

Option 2: Import (Future Enhancement)

# In new_nfr.py
from nfr_data.nfr_2023 import NFR_2023
from nfr_data.nfr_2024 import NFR_2024

NFR_DATA = {
    2023: NFR_2023,
    2024: NFR_2024,
}

Tips

  1. Always review the output - AI extraction is good but not perfect
  2. Keep source URLs - Add them to the generated dictionaries
  3. Check film counts - Should be 25 films per year
  4. Verify years - Make sure film years are in reasonable ranges
  5. Edit descriptions - Feel free to trim or rephrase for your blog

Next Steps

  1. Generate data for years you want to cover
  2. Review and edit the descriptions
  3. Integrate into new_nfr.py
  4. Start creating blog posts with python3 scripts/new_nfr.py "Film Title"

Questions?

  • Check if ollama is running: curl http://192.168.0.109:11434/api/tags
  • Test the script with 2024 (known working): python3 scripts/setup_nfr.py 2024
  • Use --no-ollama to see basic extraction
  • Look at generated files in scripts/nfr_data/