SIM tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp, Miranda, and natural language. It is used: * to detect potentially duplicated code fragments in large software projects, in program text, in shell scripts and in documentation. * to detect plagiarism in software projects, educational and otherwise.