diff options
author | Minteck <contact@minteck.org> | 2021-12-21 16:52:28 +0100 |
---|---|---|
committer | Minteck <contact@minteck.org> | 2021-12-21 16:52:28 +0100 |
commit | 46e43f4bde4a35785b4997b81e86cd19f046b69b (patch) | |
tree | c53c2f826f777f9d6b2d249dab556feb72a6c3a6 /node_modules/wcwidth/docs/index.md | |
download | langdetect-46e43f4bde4a35785b4997b81e86cd19f046b69b.tar.gz langdetect-46e43f4bde4a35785b4997b81e86cd19f046b69b.tar.bz2 langdetect-46e43f4bde4a35785b4997b81e86cd19f046b69b.zip |
Commit
Diffstat (limited to 'node_modules/wcwidth/docs/index.md')
-rw-r--r-- | node_modules/wcwidth/docs/index.md | 65 |
1 files changed, 65 insertions, 0 deletions
diff --git a/node_modules/wcwidth/docs/index.md b/node_modules/wcwidth/docs/index.md new file mode 100644 index 0000000..5c5126d --- /dev/null +++ b/node_modules/wcwidth/docs/index.md @@ -0,0 +1,65 @@ +### Javascript porting of Markus Kuhn's wcwidth() implementation + +The following explanation comes from the original C implementation: + +This is an implementation of wcwidth() and wcswidth() (defined in +IEEE Std 1002.1-2001) for Unicode. + +http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html +http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html + +In fixed-width output devices, Latin characters all occupy a single +"cell" position of equal width, whereas ideographic CJK characters +occupy two such cells. Interoperability between terminal-line +applications and (teletype-style) character terminals using the +UTF-8 encoding requires agreement on which character should advance +the cursor by how many cell positions. No established formal +standards exist at present on which Unicode character shall occupy +how many cell positions on character terminals. These routines are +a first attempt of defining such behavior based on simple rules +applied to data provided by the Unicode Consortium. + +For some graphical characters, the Unicode standard explicitly +defines a character-cell width via the definition of the East Asian +FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes. +In all these cases, there is no ambiguity about which width a +terminal shall use. For characters in the East Asian Ambiguous (A) +class, the width choice depends purely on a preference of backward +compatibility with either historic CJK or Western practice. +Choosing single-width for these characters is easy to justify as +the appropriate long-term solution, as the CJK practice of +displaying these characters as double-width comes from historic +implementation simplicity (8-bit encoded characters were displayed +single-width and 16-bit ones double-width, even for Greek, +Cyrillic, etc.) and not any typographic considerations. + +Much less clear is the choice of width for the Not East Asian +(Neutral) class. Existing practice does not dictate a width for any +of these characters. It would nevertheless make sense +typographically to allocate two character cells to characters such +as for instance EM SPACE or VOLUME INTEGRAL, which cannot be +represented adequately with a single-width glyph. The following +routines at present merely assign a single-cell width to all +neutral characters, in the interest of simplicity. This is not +entirely satisfactory and should be reconsidered before +establishing a formal standard in this area. At the moment, the +decision which Not East Asian (Neutral) characters should be +represented by double-width glyphs cannot yet be answered by +applying a simple rule from the Unicode database content. Setting +up a proper standard for the behavior of UTF-8 character terminals +will require a careful analysis not only of each Unicode character, +but also of each presentation form, something the author of these +routines has avoided to do so far. + +http://www.unicode.org/unicode/reports/tr11/ + +Markus Kuhn -- 2007-05-26 (Unicode 5.0) + +Permission to use, copy, modify, and distribute this software +for any purpose and without fee is hereby granted. The author +disclaims all warranties with regard to this software. + +Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c + + + |