SIM tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp,
Miranda, and natural language.

It is used:
* to detect potentially duplicated code fragments in large software
  projects, in program text, in shell scripts and in documentation.
* to detect plagiarism in software projects, educational and otherwise.