New port: converters/py-webencodings: Character encoding aliases for legacy web content

PR:		226316
Submitted by:	Marcin Cieślak <saper@saper.info>
Approved by:	tcberner (mentor, implicit)
This commit is contained in:
Yuri Victorovich 2018-03-04 09:10:41 +00:00
parent 3adfb2c604
commit 79735e8a8b
Notes: svn2git 2021-03-31 03:12:20 +00:00
svn path=/head/; revision=463559
4 changed files with 38 additions and 0 deletions

View File

@ -159,6 +159,7 @@
SUBDIR += py-iconv
SUBDIR += py-rencode
SUBDIR += py-unidecode
SUBDIR += py-webencodings
SUBDIR += py-zfec
SUBDIR += rcctools
SUBDIR += recode

View File

@ -0,0 +1,18 @@
# $FreeBSD$
PORTNAME= webencodings
DISTVERSION= 0.5.1
CATEGORIES= converters www python
MASTER_SITES= CHEESESHOP
PKGNAMEPREFIX= ${PYTHON_PKGNAMEPREFIX}
MAINTAINER= saper@saper.info
COMMENT= Character encoding aliases for legacy web content
LICENSE= BSD3CLAUSE
USES= python
USE_PYTHON= distutils autoplist
NO_ARCH= yes
.include <bsd.port.mk>

View File

@ -0,0 +1,3 @@
TIMESTAMP = 1520037325
SHA256 (webencodings-0.5.1.tar.gz) = b36a1c245f2d304965eb4e0a82848379241dc04b865afcc4aab16748587e1923
SIZE (webencodings-0.5.1.tar.gz) = 9721

View File

@ -0,0 +1,16 @@
In order to be compatible with legacy web content when interpreting
something like Content-Type: text/html; charset=latin1, tools need
to use a particular set of aliases for encoding labels as well as
some overriding rules.
For example, US-ASCII and iso-8859-1 on the web are actually aliases for
windows-1252, and an UTF-8 or UTF-16 BOM takes precedence over any other
encoding declaration.
The Encoding standard defines all such details so that implementations
do not have to reverse-engineer each other.
This module has encoding labels and BOM detection, but the actual
implementation for encoders and decoders is Python's.
WWW: https://github.com/SimonSapin/python-webencodings