Description
Patchwork UTF-8 gives PHP developpers extensive, portable and performant
handling of UTF-8 and grapheme clusters.
It provides both :
Patchwork UTF-8 alternatives and similar libraries
Based on the "Strings" category.
Alternatively, view Patchwork UTF-8 alternatives based on common mentions on social networks and blogs.
-
Mobile-Detect
Mobile_Detect is a lightweight PHP class for detecting mobile devices (including tablets). It uses the User-Agent string combined with specific HTTP headers to detect the mobile environment. -
SQL Formatter
A lightweight php class for formatting sql statements. Handles automatic indentation and syntax highlighting. -
Device Detector
The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model. -
Slugify
Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte. -
Jieba-PHP
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module. -
URLify
A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs. -
Google Translate For Free
Library for free use Google Translator. With attempts connecting on failure and array support. -
Case converter
Convert strings between 13 naming conventions: Snake case, Camel case, Kebab case, Pascal case, Ada case, Train case, Cobol case, Macro case, Upper case, Lower case, Title case, Sentence case and Dot notation. -
Russian metaphone phonetic algorithm implementation for PHP
Russian metaphone algorithm implementation
SaaSHub - Software Alternatives and Reviews
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Patchwork UTF-8 or a related project?
README
Patchwork UTF-8 for PHP
Patchwork UTF-8 gives PHP developpers extensive, portable and performant handling of UTF-8 and grapheme clusters.
It provides both :
- a portability layer for
mbstring
,iconv
, and intlNormalizer
andgrapheme_*
functions, - an UTF-8 grapheme clusters aware replica of native string functions.
It can also serve as a documentation source referencing the practical problems that arise when handling UTF-8 in PHP: Unicode concepts, related algorithms, bugs in PHP core, workarounds, etc.
Version 1.2 adds best-fit mappings for UTF-8 to Code Page approximations. It also adds Unicode filesystem access under Windows, using preferably wfio or a COM based fallback otherwise.
Portability
Unicode handling in PHP is best performed using a combo of mbstring
, iconv
,
intl
and pcre
with the u
flag enabled. But when an application is expected
to run on many servers, you should be aware that these 4 extensions are not
always enabled.
Patchwork UTF-8 provides pure PHP implementations for 3 of those 4 extensions.
pcre
compiled with unicode support is required but is widely available.
The following set of portability-fallbacks allows an application to run on a
server even if one or more of those extensions are not enabled:
- utf8_encode, utf8_decode,
mbstring
: mb_check_encoding, mb_convert_case, mb_convert_encoding, mb_decode_mimeheader, mb_detect_encoding, mb_detect_order, mb_encode_mimeheader, mb_encoding_aliases, mb_get_info, mb_http_input, mb_http_output, mb_internal_encoding, mb_language, mb_list_encodings, mb_output_handler, mb_strlen, mb_strpos, mb_strrpos, mb_strtolower, mb_strtoupper, mb_stripos, mb_stristr, mb_strrchr, mb_strrichr, mb_strripos, mb_strstr, mb_strwidth, mb_substitute_character, mb_substr, mb_substr_count,iconv
: iconv, iconv_mime_decode, iconv_mime_decode_headers, iconv_get_encoding, iconv_set_encoding, iconv_mime_encode, ob_iconv_handler, iconv_strlen, iconv_strpos, iconv_strrpos, iconv_substr,intl
: Normalizer, grapheme_extract, grapheme_stripos, grapheme_stristr, grapheme_strlen, grapheme_strpos, grapheme_strripos, grapheme_strrpos, grapheme_strstr, grapheme_substr, normalizer_is_normalized, normalizer_normalize.
Patchwork\Utf8
Grapheme clusters should always be
considered when working with generic Unicode strings. The Patchwork\Utf8
class implements the quasi-complete set of native string functions that need
UTF-8 grapheme clusters awareness. Function names, arguments and behavior
carefully replicates native PHP string functions.
Some more functions are also provided to help handling UTF-8 strings:
- filter(): normalizes to UTF-8 NFC, converting from CP-1252 when needed,
- isUtf8(): checks if a string contains well formed UTF-8 data,
- toAscii(): generic UTF-8 to ASCII transliteration,
- strtocasefold(): unicode transformation for caseless matching,
- strtonatfold(): generic case sensitive transformation for collation matching,
- strwidth(): computes the width of a string when printed on a terminal,
- wrapPath(): unicode filesystem access under Windows and other OSes.
Mirrored string functions are: strlen, substr, strpos, stripos, strrpos, strripos, strstr, stristr, strrchr, strrichr, strtolower, strtoupper, wordwrap, chr, count_chars, ltrim, ord, rtrim, trim, str_ireplace, str_pad, str_shuffle, str_split, str_word_count, strcmp, strnatcmp, strcasecmp, strnatcasecmp, strncasecmp, strncmp, strcspn, strpbrk, strrev, strspn, strtr, substr_compare, substr_count, substr_replace, ucfirst, lcfirst, ucwords, number_format, utf8_encode, utf8_decode, json_decode, filter_input, filter_input_array.
Notably missing (but hard to replicate) are printf-family functions.
The implementation favors performance over full edge cases handling. It generally works on UTF-8 normalized strings and provides filters to get them.
As the turkish locale requires special cares, a Patchwork\TurkishUtf8
class
is provided for working with this locale. It clones all the features of
Patchwork\Utf8
but knows about the turkish specifics.
Usage
The recommended way to install Patchwork UTF-8 is through
composer. Just create a composer.json
file and run
the php composer.phar install
command to install it:
{
"require": {
"patchwork/utf8": "~1.2"
}
}
Then, early in your bootstrap sequence, you have to configure your environment:
\Patchwork\Utf8\Bootup::initAll(); // Enables the portablity layer and configures PHP for UTF-8
\Patchwork\Utf8\Bootup::filterRequestUri(); // Redirects to an UTF-8 encoded URL if it's not already the case
\Patchwork\Utf8\Bootup::filterRequestInputs(); // Normalizes HTTP inputs to UTF-8 NFC
Run phpunit
to see the code in action.
Make sure that you are confident about using UTF-8 by reading Character Sets / Character Encoding Issues and Handling UTF-8 with PHP, or PHP et UTF-8 for french readers.
You should also get familiar with the concept of Unicode Normalization and Grapheme Clusters.
Do not blindly replace all use of PHP's string functions. Most of the time you will not need to, and you will be introducing a significant performance overhead to your application.
Screen your input on the outer perimeter so that only well formed UTF-8 pass
through. When dealing with badly formed UTF-8, you should not try to fix it
(see Unicode Security Considerations).
Instead, consider it as CP-1252 and use
Patchwork\Utf8::utf8_encode()
to get an UTF-8 string. Don't forget also to
choose one unicode normalization form and stick to it. NFC is now the defacto
standard. Patchwork\Utf8::filter()
implements this behavior: it converts from
CP1252 and to NFC.
This library is orthogonal to mbstring.func_overload
and will not work if the
php.ini setting is enabled.
Licensing
Patchwork\Utf8 is free software; you can redistribute it and/or modify it under the terms of the (at your option):
Unicode handling requires tedious work to be implemented and maintained on the long run. As such, contributions such as unit tests, bug reports, comments or patches licensed under both licenses are really welcomed.
I hope many projects could adopt this code and together help solve the unicode subject for PHP.
*Note that all licence references and agreements mentioned in the Patchwork UTF-8 README section above
are relevant to that project's source code only.