|author||Denys Vlasenko <email@example.com>||2010-02-01 14:58:08 (GMT)|
|committer||Denys Vlasenko <firstname.lastname@example.org>||2010-02-01 14:58:08 (GMT)|
Signed-off-by: Denys Vlasenko <email@example.com>
1 files changed, 56 insertions, 0 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt
new file mode 100644
@@ -0,0 +1,56 @@
+ Unicode support in busybox
+There are several scenarios where we need to handle unicode
+ Shell input
+We want to correctly handle input of unicode characters.
+There are several problems with it. Just handling input
+as sequence of bytes would break any editing. This was fixed
+and now lineedit operates on the array of wchar_t's.
+But we also need to handle the following problematic moments:
+* It is unreasonable to expect that output device supports
+ _any_ unicode chars. Perhaps we need to avoid printing
+ those chars which are not supported by output device.
+ Examples: chars which are not present in the font,
+ chars which are not assigned in unicode,
+ combining chars (especially trying to combine bad pairs:
+ a_chinese_symbol + "combining grave accent" = ??!)
+* We need to account for the fact that unicode chars have
+ different widths: 0 for combining chars, 1 for usual,
+ 2 for ideograms (are there 3+ wide chars?).
+* Bidirectional handling. If user wants to echo a phrase
+ in Hebrew, he types: echo "srettel werbeH"
+This case is a bit similar to "shell input", but unlike shell,
+editors may encounder many more unexpected unicode sequences
+(try to load a random binry file...), and they need to preserve
+them, unlike shell which can afford to drop bogus input.
+ more, less
+ ls (multi-column display)
+ top, ps
+ Filename display (in error messages and elsewhere)
+TODO: write an email to Asmus Freytag (firstname.lastname@example.org),
+author of http://unicode.org/reports/tr11/