view doc/User-phone-tools @ 805:a43c5dc251dc

doc/User-phone-tools: new sms-pdu-decode backslash escapes
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 25 Mar 2021 05:10:43 +0000
parents b5235f8240b9
children 8cf7d41f2821
line wrap: on
line source

FreeCalypso User Phone Tools are a new software addition to the FreeCalypso
family.  These tools are programs that run on a Unix host computer such as a
GNU/Linux PC or laptop and communicate with a FreeCalypso phone or modem via
the standard AT command interface, rather than any of the formerly proprietary
interfaces specific to TI's internal architecture.  The following tools are
currently available:

fcup-at		Issues an arbitrary AT command given on the command line.

fcup-settime	Issues AT+CCLK command to the target to set its clock to the
		host computer's notion of local time.

fcup-smdump	Retrieves a dump of SMS records (received, sent or stored
		messages) from the FC device's SMS storage (currently SIM
		storage; ME storage may be implemented in the future),
		optionally deleting them from the severely space-limited
		SIM/ME storage afterward.

fcup-smsend*	Tools for sending outgoing SMS from a host computer through a
		FreeCalypso phone or modem and/or writing such outgoing SMS
		into the FC device's SMS storage.

fcup-smwrite	Debug and development tool: writes arbitrary message records
		into the FC device's SMS storage (currently SIM storage) in any
		of the possible 4 states, with arbitrary incoming or outgoing
		SMS PDU content.

Because these tools communicate with the target via standards-defined AT
commands, in theory they ought to work with any AT-command-speaking 3GPP phone
or modem and not just our own FreeCalypso.  However, experience has shown that
in the case of the common proprietary implementations, practice does not match
theory: when I (Mychaela) tried these same AT commands against a random
off-the-shelf proprietary modem (Huawei E303 USB stick modem for 3G), the
following problems were seen:

* The essential AT+CMGL=4 command for retrieving the full set of SMS records
  from SIM storage in PDU mode appears to be broken: all I got was a hang.
  Its text mode counterpart AT+CMGL="ALL" produces incomplete output.

* Qualcomm/Huawei's implementation of the AT command interface does not allow
  AT+CSCS to be set to "HEX"; our fcup-smdump implementation uses this setting
  so that the phonebook names returned along with SMS PDUs in the +CMGL
  responses can be parsed reliably no matter what weird characters they might
  contain.

* Setting AT+CSCS to "8859-1" is not supported either; this setting is used by
  our fcup-smsend and fcup-smsendmult tools when sending in text mode.

* Sending outgoing SMS with fcup-smsend in PDU mode (which does not touch
  AT+CSCS) works in that the message goes out, but the tool complains afterward
  because the echo after the ^Z is different from what our tools expect.

Because of these quirks, our FC User Phone Tools officially work only with our
own FreeCalypso phones and modems, and are not expected to work against various
proprietary implementations.  Let us not forget that the broken and buggy nature
of the common proprietary implementations is the very reason why we need
FreeCalypso in the first place.

Target interface options
========================

Our fcup-* tools can communicate with the AT-command-speaking target in one of
two ways:

* The default is the standard AT command interface over a dedicated UART.  As
  of this writing, the only FreeCalypso device that provides a full-featured AT
  command interface of this kind is our FCDEV3B modem, but the ultimate goal of
  the project is to build our own end user phone handset (a Libre Dumbphone)
  that will also provide a full-featured AT command interface on its USB port
  via a built-in CP2102 or FT232R chip.

* As a dirty hack, one can run FreeCalypso GSM fw on some alien hw targets,
  currently Motorola C1xx and Pirelli DP-L10.  In this hacked-up configuration
  there is no dedicated UART available for a standard AT command interface, but
  there is a hack that allows a limited subset of AT commands to be passed over
  the RVTMUX binary packet interface provided by the running FreeCalypso GSM fw.
  Our fcup-* tools can work with this alternate target interface option and
  thereby support these crippled targets.

The AT-over-RVTMUX mechanism was originally invented back in 2015 as a
development aid, and was never intended for production use or to support any
kind of end user functionality.  One of the limitations of its original
incarnation was that the strings that are sent to ATI via this interface were
limited to 254 characters, whereas sending or writing SMS in hex format
requires longer strings.  As of early 2019, this limitation has been lifted:
our Magnetite and Selenite firmwares from 20190109 onward support an extended
version of our AT-over-RVTMUX hack that allows longer strings to be sent in
pieces, and the present version of our FC User Phone Tools suite will send the
strings it generates via this extended mechanism whenever they exceed the old
254 character limit.  The new mechanism works correctly starting with the
20190128 firmware release for modem products and the 20190129 fw release for
Mot C1xx phones, thus when the present version of FC User Phone Tools is used
to communicate with our current firmwares, both target interface options provide
equivalent functionality on all supported targets.

All fcup-* tools take the following common command line options for selecting
the AT command target interface:

-B baud		Valid only when -p is also given; selects a different baud rate
		than the default 115200 bps.

-n		Dry run debug mode with no target interface at all: the AT
		commands which would otherwise be sent to the target are simply
		printed on stdout.

-p ttyport	Names the serial port to be used to talk to the target.

-R		Use the AT-over-RVTMUX interface instead of the standard AT
		command interface over a dedicated UART.

-X program	Use the specified external program as the AT target
		communication back-end; read the source code for the details.

-R and -p options interact as follows:

Neither -R	The standard dedicated AT command interface is used;
nor -p		FC_GSM_DEVICE= environment variable needs to be set
		to point to the serial port.

-p only		The standard dedicated AT command interface is used;
		the serial port is named with the -p option.

-R only		AT-over-RVTMUX interface is used; the fcup-* tool connects
		to an already running rvinterf process.

-R and -p	AT-over-RVTMUX interface is used; a new rvinterf process
		is launched to talk RVTMUX on the specified serial port.

Retrieving and decoding stored SMS
==================================

As of this writing, our current FreeCalypso GSM firmware supports only SIM
storage for SMS, i.e., there is no working mechanism currently for storing SMS
records (received and sent messages) in the phone's or modem's own flash file
system.  The capacity of this SIM SMS storage is determined by the SIM issuer,
but it is typically quite limited, on the order of 20 to 30 messages.

The model adopted for FreeCalypso is that incoming (and possibly saved outgoing)
messages initially accumulate in the SIM storage as they come in, and then the
user periodically transfers them to her larger host computer, simultaneously
deleting them from the SIM storage to reclaim the limited space.  The retrieval
of stored SMS from FreeCalypso GSM devices is accomplished with our fcup-smdump
utility; like all SMS operations with the current tools+firmware combination,
this operation works exactly the same whether the FC GSM device offers a full-
featured AT command interface or only AT over RVTMUX.  SMS retrieval is always
done in PDU mode, and the output from fcup-smdump contains raw SMS PDUs in the
form of long hex strings.  A separate utility called sms-pdu-decode then does
what its name says.

The intended mode of usage is something like this:

fcup-smdump -d >> long-term-sms-log

The -d option to fcup-smdump tells it to delete the retrieved messages from the
SIM or future ME storage; this option should only be used when the output is
redirected into some kind of longer-term storage.  In the above model the file
named long-term-sms-log becomes what its name says as new messages retrieved
from the FC GSM device get added to it; the format will look like this:

Received message:
XXXXXX...

Received message:
XXXXXX...

Sent message:
XXXXXX...

Stored unsent message:
XXXXXX...

Received message:
XXXXXX...

Each of the "XXXXXX..." lines will be a long hex string giving an SMS PDU.  The
idea is that the complete record of all received and sent messages should be
stored on the user's big computer in raw PDU form, rather than decoded, and the
decoding utility sms-pdu-decode should be invoked by the user (with the message
log file as input) as needed for reading these messages.

The message decoding utility sms-pdu-decode does its best to decode and show
everything without dropping any bits: in addition to the actual decoded message
characters and the From/To address (the "end user" content of the message), it
decodes and shows the SC address, the first octet, the MR octet for outgoing
messages, PID and DCS octets, the SC timestamp or the validity period fields,
and the UDH bytes if present.  However, some bits can still be lost in the
decoding, which is why it is important to archive messages in the raw PDU form:

* Padding bits used to round the From/To address and septet-based user data to
  an octet boundary and to round any UDH to a septet boundary are not decoded.

* If the user data portion of the message is 8-bit or compressed data (per the
  DCS octet), it is shown as a raw hex dump, which is lossless, but if it is
  GSM7 or UCS-2 text (GSM 03.38 character encodings), the characters are
  converted to the user's character set (plain ASCII only by default) for
  display, and some characters may not be displayable.

Character sets and encodings
----------------------------

By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any
GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are
converted into backslash escapes.  This conservative default behaviour can be
modified as follows:

-e option extends the potential output character repertoire from 7-bit ASCII to
8-bit ISO 8859-1.  Any 8859-1 high characters are emitted as single bytes,
i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8
environments.

-u option extends the potential output character repertoire to all of Unicode,
and changes the output encoding to UTF-8.

Regardless of whether the source message character set is GSM7 or UCS-2 and
irrespective of -e or -u options, any backslash characters are always escaped
as \\, and any CR characters are represented as \r.  Additional backslash
escape encodings depend on the source message character set:

* If the source message character set is GSM7, the following additional
  backslash escapes can be emitted:

  - In the absence of -u option, the Euro currency symbol is converted to \E;

  - Any GSM7 escape characters (0x1B) that aren't part of a valid escape
    sequence for [\]^ or {|}~ or \E are represented as \e;

  - Any GSM7 characters that either can't be represented in the output character
    set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are
    represented as \xX, where xX is the original GSM7 code point in 2-digit
    hexadecimal form between 00 and 7F;

  - Invalid GSM7 escape sequences are emitted as \e\xX.

* If the source message character set is UCS-2, the following additional
  backslash escapes can be emitted:

  - Invalid UCS-2 characters falling onto control character code points are
    emitted as \u00XX;

  - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when
    running without -u option) are emitted as \uXXXX;

  - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane
    Unicode character is reconstructed and emitted as \UXXXXXX in the absence
    of -u option, or as the appropriate UTF-8 byte sequence with -u.

-h option causes the user data portion of every message to be displayed as a
raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the
unpacked septets.

Composing and sending outgoing SMS
==================================

When used in the default PDU mode (which now works on all targets with our
current firmware and tools), the primary SMS sending/writing tool fcup-smsend
offers the following capabilities:

* Sending outgoing messages in either GSM7 or UCS-2 encoding;
* Sending either single or long (concatenated) SMS;
* Message body input in ASCII, ISO 8859-1 or UTF-8;
* Message body input either on the command line or on stdin;
* Any messages sent through this tool (single or concatenated) may be
  multiline, i.e., may contain embedded newlines;
* Messages sent in GSM7 encoding can contain ASCII characters [\]^ and {|}~
  - the tool is smart enough to do the necessary escape encoding.

The default and preferred AT command interface mode for sending/writing SMS is
PDU mode, which works great when the GSM device provides a proper AT command
interface.  However, when a message of maximum or near-maximum length is being
submitted to the modem in PDU mode, the hex string that needs to be sent is
quite long, and at the time when our FC User Phone Tools were first designed
and written, our AT-over-RVTMUX mechanism could not handle such long strings.
Because we sought to have at least limited SMS sending and writing support for
crippled Motorola and Pirelli targets, we also implemented text mode support in
fcup-smsend and fcup-smsendmult, enabled with the -t option.  In this text (-t)
mode the following restrictions apply:

* Only single SMS can be sent, not concatenated;
* Only GSM7-encoded messages can be sent, not UCS-2;
* No multiline messages can be sent, i.e., no newlines in the message body;
* ASCII characters [\]^ and {|}~ won't be sent correctly - GSM 07.05 text mode
  drops them.

Now that we have extended our AT-over-RVTMUX mechanism to support longer strings
and gained full support for PDU mode on all targets, the above -t mode is no
longer necessary for any use case, as the default PDU mode is a proper superset
in functionality.  However, support for this -t mode has been retained, as
removing software functionality for no good reason is not the way of FOSS.

The invokation syntax is as follows:

fcup-smsend [options] dest-addr [message]

The destination address must be given on the command line; the address digits
may be optionally followed by a comma and an address type byte, either decimal
or hexadecimal with 0x prefix.  The default address type is 0x91 if the number
begins with a '+' or 0x81 otherwise.  If the message body is given on the
command line, it must be given as a single argument; if no message body argument
is given, the message body will be read from stdin.  Any trailing newlines are
stripped before SMS encoding.

The following options are supported, in addition to the common target interface
options listed earlier:

-c		Enables concatenated SMS.  Concatenated SMS will be sent only
		if the message body exceeds 160 GSM7 or 70 UCS-2 characters,
		otherwise plain SMS will be sent whether -c is given or not -
		but the -c option enables the possibility of sending
		concatenated SMS.

-C refno	Enables concatenated SMS like -c, but also explicitly sets the
		concatenated SMS reference number to be used.  The number can
		be either decimal or hexadecimal with 0x prefix.

-q		Concatenated SMS quiet mode.  If -c is given without -q, the
		tool prints a message on stdout indicating whether the message
		was sent as single or concatenated, and in how many parts.
		-q suppresses this additional output.

-t		Use text mode instead of PDU mode on the AT command interface.
		This option is incompatible with -c and with -U, and introduces
		other restrictions listed above.

-u		By default, if the message body input contains any 8-bit
		characters, they are interpreted as ISO 8859-1.  With -u they
		are interpreted as UTF-8 instead.  This option is only relevant
		for GSM7 output encoding, and it is implemented by converting
		the input first from UTF-8 to 8859-1, and then from 8859-1 to
		GSM7 - thus all UTF-8 input characters must fall into the
		8859-1 repertoire, and it is not currently possible to send
		GSM7-encoded messages containing the few Greek letters or the
		Euro currency symbol allowed by GSM 03.38 encoding.

-U		Send message in UCS-2 encoding instead of GSM7.  Any 8-bit
		characters in the message body input are interpreted as UTF-8,
		and the entire Basic Multilingual Plane of Unicode is allowed.

-w		By default the outgoing message is sent out on the GSM network
		with the AT+CMGS command.  With this -w option, the message is
		first written into SIM or future ME SMS storage with AT+CMGW,
		then sent out on the GSM network with AT+CMSS.

-W		Write only, not send: the message is written into storage with
		AT+CMGW and no further action is taken.  The modem's +CMGW:
		responses with message storage indices are forwarded to stdout.
		With this option the destination address argument can be a null
		string or omitted altogether.

Concatenated SMS reference numbers
----------------------------------

Every concatenated SMS transmission needs a reference number, and this number
needs to increment from one concatenated SMS to the next, to help message
recipients sort out which is which.  If the reference number is not given
explicitly with -C, fcup-smsend creates (opens with O_RDWR|O_CREAT) a file
named .concat_sms_refno in the invoking user's $HOME directory; automatically
incrementing reference numbers are maintained in this file.  The initial seed
is an XOR of all bytes of the current time returned by gettimeofday(2),
followed by simple linear incrementing; these reference numbers do not need to
be random in any kind of cryptographically secure sense.

fcup-smsendmult
===============

As an alternative to sending concatenated SMS, one can use the fcup-smsendmult
utility to send several single (no UDH) messages in one batch.  This utlity
supports both text and PDU modes (PDU mode is still the preferred default when
it can be used), and when PDU mode is used, it supports both GSM7 and UCS-2
output encodings just like fcup-smsend.  The messages to be sent are read from
stdin, and each input line produces a new message.

The entire batch of messages can be sent to a single recipient, or each message
in the batch can have its own individual destination address.  If the
destination address is given on the command line, each input line read from
stdin is just a message body; if no destination address is given on the command
line, each input line must have the following format:

<dest addr><white space><message body>

-t, -u, -U, -w and -W command line options are unchanged from fcup-smsend.

This fcup-smsendmult method of sending batched SMS was originally envisioned as
an alternative to concatenated SMS for crippled hw targets that couldn't support
sending SMS in PDU mode, but that limitation has now been lifted.  Because we
do not remove already-implemented functionality for no good reason, the tool
currently remains in search of new potential use cases.

fcup-smsendpdu
==============

This utility sends out SMS PDUs that have been prepared externally; it only
works in PDU mode - originally it was limited to high-end FreeCalypso hardware
with a full AT command interface, but now we've got PDU mode working on all
targets.  The PDUs to be sent out are read from stdin, one long hex string PDU
per line; one can send either a single message or a batch.  Because the
destination address and all content details are encoded in the PDU, the tool
does not care if the messages are going to the same recipient or to different
recipients, nor does it care if they constitute a concatenated SMS transmission
or not.  -w and -W options work the same way as in fcup-smsend and
fcup-smsendmult.

fcup-smwrite
============

This utility is a debug and development tool; it differs from fcup-smsendpdu in
the following ways:

* fcup-smsendpdu can send messages out with AT+CMGS, write them into memory
  with AT+CMGW, or do a write-then-send sequence (-w option) with AT+CMGW
  followed by AT+CMSS.  fcup-smwrite only issues AT+CMGW commands.

* fcup-smwrite passes a second argument to AT+CMGW that sets the message state
  to any of the possible 4 values; fcup-smsend* -W put them in the "stored
  unsent" state.

* The input to fcup-smsendpdu is just PDU hex strings; the input to
  fcup-smwrite needs to have the same format as fcup-smdump output in order to
  indicate what state each message should be written in.