FreeCalypso > hg > freecalypso-tools
changeset 805:a43c5dc251dc
doc/User-phone-tools: new sms-pdu-decode backslash escapes
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Thu, 25 Mar 2021 05:10:43 +0000 |
parents | 30fbaa652ea5 |
children | 843850c526b7 |
files | doc/User-phone-tools |
diffstat | 1 files changed, 36 insertions(+), 4 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/User-phone-tools Thu Mar 25 03:26:23 2021 +0000 +++ b/doc/User-phone-tools Thu Mar 25 05:10:43 2021 +0000 @@ -200,8 +200,7 @@ By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are -displayed as the '?' error character and the presence of such decoding errors -is indicated in the Length: header. This conservative default behaviour can be +converted into backslash escapes. This conservative default behaviour can be modified as follows: -e option extends the potential output character repertoire from 7-bit ASCII to @@ -209,8 +208,41 @@ i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 environments. --u option extends the potential output character repertoire to the entire Basic -Multilingual Plane of Unicode, and changes the output encoding to UTF-8. +-u option extends the potential output character repertoire to all of Unicode, +and changes the output encoding to UTF-8. + +Regardless of whether the source message character set is GSM7 or UCS-2 and +irrespective of -e or -u options, any backslash characters are always escaped +as \\, and any CR characters are represented as \r. Additional backslash +escape encodings depend on the source message character set: + +* If the source message character set is GSM7, the following additional + backslash escapes can be emitted: + + - In the absence of -u option, the Euro currency symbol is converted to \E; + + - Any GSM7 escape characters (0x1B) that aren't part of a valid escape + sequence for [\]^ or {|}~ or \E are represented as \e; + + - Any GSM7 characters that either can't be represented in the output character + set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are + represented as \xX, where xX is the original GSM7 code point in 2-digit + hexadecimal form between 00 and 7F; + + - Invalid GSM7 escape sequences are emitted as \e\xX. + +* If the source message character set is UCS-2, the following additional + backslash escapes can be emitted: + + - Invalid UCS-2 characters falling onto control character code points are + emitted as \u00XX; + + - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when + running without -u option) are emitted as \uXXXX; + + - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane + Unicode character is reconstructed and emitted as \UXXXXXX in the absence + of -u option, or as the appropriate UTF-8 byte sequence with -u. -h option causes the user data portion of every message to be displayed as a raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the