This came in 2006 after attending a talk on bioinformatics.
I had the idea of making an email client that would take the
methods of bioinformatics and apply them to spam-detection.
Searches through input and outputs sequences that are repeated.
Because it's intended for text files, control characters are
ignored.
FindPatterns [filename] [-b] [-e] [-i] [-o] [-v] [-m<n>] [-l<n>] [-g<n>] [-?|h]
- filename
- Attempt to read input from this file, otherwise uses stdin.
- -b
- Keep a buffer to count repeated matches (!o -> b.)
- -e
- Echo input.
- -i
- Case-insensitive (not implemented.)
- -n
- Don't display matches at the end.
- -o
- Output matches immediately as they are found.
- -s
- Silent mode - plain output with no extra characters.
- -v
- Verbose comments while outputting.
- -g<n>
- Set memory buffer granularity to the closest power of two
lower than <n> bytes (default 1024.)
- -l<n>
- Set match limit to <n> matches (default 4096; 0 -> no limit.)
- -m<n>
- Set minimum match length to <n> symbols (default 3).
- -?|h
- Display this help screen and exit.
Adding -<s>- will turn off switch <s>.
Also included is a simple KillSpam email client that takes the patterns
generated (from FindPatterns) and eliminates all the emails that have
matching patterns.