view doc/Arch-design @ 2:b203ebebe9b3

doc/Arch-design: fill out sections 2.4.[2-5]
author Mychaela Falconia <falcon@freecalypso.org>
date Fri, 22 Dec 2023 06:38:16 +0000
parents c4f8a32af088
children b084a9542471
line wrap: on
line source

Themyscira Wireless SMSC implementation
Architectural design specification

1. Purpose and scope of the software

The purpose of the present software project is to facilitate store-and-forward
SMS exchange among the following parties:

* Locally owned mobile telephone numbers (LOMTNs) that belong to Themyscira
  Wireless, with Short Message Service accessed either via the local GSM network
  (Osmocom-based) or via direct command line access to the SMSC;

* The outside world: the total set of all SMS-capable E.164 telephone numbers
  in the world, with whom our users must be able to freely exchange SMS just
  like users of any other cellular phone carrier in USA;

* USA-specific 5-digit and 6-digit short codes: these services aren't accessible
  from anywhere in the world, only from USA (each country has its own services
  of this type), but because we are located in USA, we must provide the same
  access to public services as any other cellular phone carrier;

* Any downstream parties who enter into an interconnection agreement with ThemWi
  for the purpose of sharing our SMS uplink to the outside world.

1.1. NANP specifics

The design of our SMSC makes the following assumptions that are specific to
North American Numbering Plan:

* All LOMTNs and all downstream peer MTNs are expected to be NANP numbers;
  any/all SMS source or destination numbers in country codes other than +1 are
  treated as belonging in the Outside World, accessible only via the SMPP
  "uplink" connection to our upstream SMS connectivity provider.

* The set of SMS destination numbers that can be sent to the upstream includes
  not only non-NANP and not-locally-known NANP E.164 numbers, but also any/all
  SMS short codes in USA-specific NXXXX or NXXXXX format.

* In the case of Mobile-Originated SMS from the local GSM network, if the
  user-entered destination number is not explicitly international (TON=1) and
  does not fit the format of a USA SMS short code, other USA-customary dialing
  formats are supported, as in 10-digit NPANXXXXXX or 11-digit 1NPANXXXXXX
  without '+' prefix.

themwi-nanp software package is a strict dependency for themwi-smsc: themwi-nanp
utilities must be used to manage the database of locally owned NANP numbers,
and the present software uses themwi-nanp libraries to access that database.

1.2. Hierarchical arrangement of upstream and downstream peers

The telecom landscape in USA is such that anyone can obtain 10-digit telephone
numbers (TNs) very easily and very cheaply, but making them SMS-capable (able
to function as Mobile Telephone Numbers or MTNs) is much more difficult.
Suitably equipped providers such as Bandwidth.com are generally unwilling to
provide service directly to small customers, and we (Themyscira Wireless team)
were able to find only one company (Sopranica Telecom) who buys P2P SMS
interconnection service from Bandwidth and was willing to resell to us.

Suppose that many different ultra-small parties wish to set up their own indie
GSM networks in different parts of USA.  Each of these tiny fiefdoms can serve
as its own administration and get its own TNs from a provider such as BulkVS.
How would all of these tiny fiefdoms then add SMS capability?  The feedback we
got from Sopranica is that asking them to set up a sub-account on their
Bandwidth service for each microfiefdom would be too much work - hence San Diego
2G Association (the primary instance of Themyscira Wireless) will need to serve
as a third-level reseller, getting Bandwidth SMS interconnection service from
Sopranica and then further subletting it to other microfiefdoms.

Vertical hierarchy support in ThemWi-SMSC is designed to support the just-
described use case.  Each SMSC instance has a set of locally owned mobile TNs
(LOMTNs, owned by the local fiefdom operating this SMSC instance), a single
upstream SMPP link pointing up the hierarchy tree (toward the Outside World)
and any number of downstream SMPP links to downstream peers.  The total set of
phone numbers known to each SMSC instance is its own local set (themwi-nanp
database of locally owned TNs) plus the set of numbers assigned to downstream
peers - all other E.164 numbers everywhere in the world (plus all non-E.164 USA
SMS short codes) belong in the Outside World and are sent to the "uplink"
connection.  Messages are then routed as follows:

* Any SM originating from a local GSM subscriber can go to another GSM
  subscriber, to a known downstream peer or to the Outside World.

* Any SM that are injected directly into the SMSC from local shell access are
  treated the same way as Mobile-Originated SMS from local GSM users - hence
  this mechanism can be used to send SMS to the local GSM network or to the
  Outside World.

* Any SM coming from the uplink connection can be addressing a local GSM
  subscriber or a downstream peer - but either way it must be a number known
  to this SMSC, otherwise something is badly misconfigured somewhere.

* Any SM coming from a downlink connection can go to a local GSM subscriber, to
  a different downstream peer or to the Outside World.

1.2.1. Direction of SMPP connections

Despite the name "Short Message Peer to Peer", SMPP is an asymmetric client-
server protocol, not symmetric peer-to-peer.  Our primary, above-all-else
requirement when it comes to SMPP is to connect to the "big daddy" SMSC of
Bandwidth.com, the one that allows us to receive SMS from and send SMS to
anywhere in the Outside World.  BW requires that we connect to their SMSC server
in the role of an SMPP client and bind as a bidirectional transceiver - both
message directions then flow over this single long-lived TCP connection from our
client to their server.

This externally imposed requirement dictates the entire architectural design of
ThemWi-SMSC with respect to SMPP.  Each instance of ThemWi-SMSC can have a
single upstream peer to whom we connect in the role of an SMPP client, and it
can optionally act as an SMPP server accepting TCP connections from downstream
peers.  The master instance of ThemWi-SMSC at smsc.sandiego2g.org will point
its "upstream" link at Bandwidth.com SMPP server, using credentials given to us
by Sopranica, whereas other small fiefdoms who wish to join our service resale
tree will point the "upstream" link of their ThemWi-SMSC instances to
smsc.sandiego2g.org, and we (SD2G) will assign them authentication credentials
and manage their downstream number pools.

1.3. Possible use outside of originally intended North American use case

If your situation and/or interests do not match the very specific use case for
which the present software is designed (if you are located outside of North
America, and/or you have no interest in attaining SMS interconnection with the
national mobile telephony environment of whichever country you call home), you
can still play with the present implementation of GSM-oriented SMSC: the uplink
connection to the Outside World can be omitted, and if you don't have real TNs
(telephone numbers) in North American Numbering Plan (either because you are
outside of North America or because you are in NA but not interested in official
phone network interconnection), you can operate ThemWi-SMSC (plus the attached
Osmocom GSM network) with fake NANP numbers instead.

To be clear, this support for modes of usage outside of the primary design goals
of ThemWi-SMSC is intended only to facilitate "play" and evaluation (getting a
feel for what may be the first SMSC implementation connecting to Osmocom CNI
via GSUP), not for serious long-term usage.  If your actual desired use case is
an isolated GSM network with a totally ad hoc or "free" numbering plan (the
default which one gets with a "vanilla" installation of Osmocom CNI), or a GSM
network that is interconnected with the national mobile telephony environment
of some country other than USA, you need a different SMSC design that is
tailored for your numbering plan (free-form or non-USA national) that will be
different from NANP, and for local telecom environment quirks that will almost
certainly be different from those in USA.

If you like the general idea and overall design of ThemWi-SMSC, but require an
adaptation to a different numbering plan or a different telecom environment
(isolated or a national interconnect in some other country), you should be able
to take the present code base and modify just the numbering plan aspects,
producing a derivative-work SMSC for your different needs.

2. ThemWi-SMSC software architecture

2.1. Modularity of components

A complete deployment of ThemWi-SMSC, as in our own use case at Themyscira
Wireless, includes a local GSM network (Osmocom-based) and a connection to the
hierarchical SMPP tree that eventually leads to the Outside World SMS
connectivity provider at the top.  However, our software implementation will be
modular, divided into separate software components for:

* The internal core of the SMSC (one daemon process and some command line
  utilities);

* A pair of daemon processes devoted to the task of connecting the SMSC to the
  local Osmocom-based GSM network, to be omitted if you don't have one;

* A dedicated daemon process serving the SMPP link to the upstream peer, to be
  omitted if you have no upstream link;

* Another dedicated sw component serving downstream peer SMPP connections, one
  process instance per downstream peer, or none if you have no such peers.

This modularity allows the software to be used and (hopefully) appreciated
outside of its primary intended use case.  At one extreme, someone could have
an isolated Osmocom GSM network, modify it slightly to use MSISDNs that look
like (fake) NANP numbers, hook up ThemWi-SMSC and use this SMSC as a replacement
for the Osmocom-default one, paving the way for factoring the SMSC function out
of OsmoMSC.  At the other extreme, if someone is located in USA and wishes to
interconnect to the world of SMS through the chain of 3 resellers (Bandwidth
followed by Sopranica followed by San Diego 2G Association), they can run an
instance of ThemWi-SMSC without any GSM network at all.  (You will still need
Osmocom libraries, but no Osmocom processes and no hardware.)  In such a
deployment, all incoming SMS to your number(s) will be written into the
persistent store which you can read, and you can send outgoing SMS with a
command line utility.

2.2. Persistent message store

Every SM that passes through ThemWi-SMSC gets written into an append-only
persistent message store (PMS).  Because this store is append-only, no messages
are ever deleted - however, each message in PMS can be in one of two states:
active or historical.  An active SM is one for which the SMSC still needs to
make delivery attempts, either attempts at GSM MT delivery or attempts at
delivery to the appropriate upstream or downstream SMPP peer.  A historical SM
is one for which no further action will be taken by any component of our SMSC.
An SM can enter "historical" state in several ways:

* For some LOMTNs the act of writing incoming messages into PMS constitutes
  final delivery in itself, and no other delivery actions are needed.  In this
  case a newly entered SM is directly written into PMS in the "historical"
  state, without ever going through "active".

* For messages that need to be delivered to a GSM MS or to an SMPP peer, once
  that delivery has been made successfully, the message transitions from active
  to historical.

* In the case of failed deliveries (permament error, or expiration time reached
  after repeated temporary failures), the failed message also transitions from
  active to historical.

The persistent message store is a simple binary file (/var/sms/pms.bin)
consisting of directly abutted 'struct sm_record' records.  Each message record
is exactly 256 bytes (see struct definition - we were able to fit everything we
needed under the 256 byte mark, and then padded the struct to perfect round
size), and this perfect power-of-2 record size makes it very easy to perform
operations such as binary search via mmap or stripping initial megabytes of
historical records - see subsequent sections for more detailed description.

PMS is append-only as already stated, but already-written records do not become
fully immutable until they become historical.  For as long as a given SM is in
the active state, themwi-smsc-core daemon can and will update that record in
pms.bin:

* For messages addressed to local GSM subscribers, dest_imsi will be filled
  when the MSISDN-to-IMSI lookup operation on the destination number succeeds;

* Upon discharge (successful delivery, permanent error or validity period
  expiration after temporary failures), themwi-smsc-core will transition the
  sm_record into historical state by filling disposition and time_disch struct
  members;

* Additional info may be written into dest_extra_info upon discharge, depending
  on the destination type and thus the mode of final delivery.

Once an sm_record transitions into historical state, it is then immutable for
archival purposes; archives of historical messages can be kept for years or even
decades, depending on local administration policy.

2.2.1. Historical megabyte count

Given the simple binary structure of the main PMS file, each megabyte (2**20
bytes) holds exactly 4096 messages.  It is envisioned that as a busy SMSC runs
for a long time, a significant number of historical messages will accumulate,
and the content of PMS may become many megabytes of historical messages followed
by some active SMs at the end.  When themwi-smsc-core daemon restarts, it has
to read the entire PMS in order to collect all still-active SMs.  Having to
read through many megabytes of historical SMs to get to active ones at the end
becomes unacceptable at large archive sizes, hence a mechanism is needed for
marking where the historical-only portion ends and the possibly-active portion
begins.

There will be an auxiliary file named historical-mb, containing a single ASCII
line giving the number of historical megabytes in pms.bin.  If this file reads
1, the first 4096 SM records are historical, if the auxiliary file reads 2, the
first 8192 SM records are historical, and so forth.  This auxiliary file will be
used as follows:

* Upon startup, themwi-smsc-core will read this historical-mb file and skip that
  many initial megabytes of pms.bin;

* At run time, themwi-smsc-core will track the index of the oldest still-active
  SM in PMS.  Whenever this index crosses a megabyte boundary, historical-mb
  will be updated.

2.2.2. Offline storage

Even with the historical-mb mechanism of the previous section, the fact remains
that disk space on live servers is not infinite.  If the archive of historical
messages grows so big that it needs to be removed from the SMSC server to free
up disk space, one can carry out the following procedure:

* Temporarily stop themwi-smsc-core daemon at the level of runit or systemctl
  or whatever you are using - this operation will bring down the entire SMSC,
  so do it during a scheduled maintenance window;

* Use dd to split pms.bin into historical and active portions:

  dd if=pms.bin of=pms-hist.bin bs=1048576 count=N
  dd if=pms.bin of=pms-new.bin bs=1048576 skip=N

* Move pms-hist.bin to offline storage;

* Replace the long file with the shortened one:

  mv pms-new.bin pms.bin
  echo 0 > historical-mb

* Re-enable themwi-smsc-core and restart all other SMSC daemons.

2.2.3. themwi-smsc-dump reading tool

The program named themwi-smsc-dump will be a standalone command line utility
(fully static in its operation, not talking to any daemons or services) for
reading and parsing (decoding) pms.bin.  It will open pms.bin with O_RDONLY, do
a read-only mmap on it, and then access this PMS as a memory-mapped file.
Several different modes of operation will be provided:

* It will be possible to dump and decode the entire PMS, as needed during early
  debugging.

* It will be possible to specify a starting date/time at which the dump should
  begin.  As records are added in strict forward chronological order, it is
  possible to find a record nearest (by time_entry timestamp) to a given time
  point by binary search, very efficient on a memory-mapped file.

* Once the dump has a starting point (beginning of the file or a time point
  found by binary search), the tool can be told to dump till the end, display
  some count of messages, or run until a certain ending date/time is crossed.

* The tool can dump all message records in the selected range, or only those
  matching specific filters such as a particular source or destination type, or
  a specific phone number.

The complexity described above is needed for the following reasons:

* One radical idea is to grant limited access (by way of a very strict wrapper)
  to themwi-smsc-dump to unprivileged users of the network served by the SMSC,
  i.e., to end users.  The idea is that each individual user should be able to
  give their ssh public key to the administrator of the community network, and
  then ssh into a special restricted service on the SMSC that does not grant
  any system shell access, but allows them to access services under their own
  phone number.  Such an empowered end user should be able to submit SMS from
  their own phone number using the power of a full-size computer (as opposed to
  very painful text entry on the numeric keypad of a traditional GSM phone),
  and to see a full log of all messages received by or sent from their own
  phone number.

* By the nature of her job, the administrator of the SMSC (and of the community
  GSM network to which this SMSC belongs) necessarily has access to every
  message that passes through the system, all metadata and actual content.
  While this access is technically necessary, an administrator who is worthy of
  her trusted position must not abuse this trust, and must do everything
  possible to avoid looking at users' private message content when it is not
  necessary to do so for technical troubleshooting reasons.  Toward this
  objective, themwi-smsc-dump must make it easy to look at only technically
  necessary information, without throwing unnecessary private info into the
  operator's eyeballs.

2.3. themwi-smsc-core daemon operation

The core daemon (long-lived process) of ThemWi-SMSC is named themwi-smsc-core.
Aside from themwi-smsc-dump read-only tool, themwi-smsc-core will be the only
software component that accesses pms.bin directly - all other components of
ThemWi-SMSC will connect to a UNIX domain local socket provided by
themwi-smsc-core.  In more detail, the core daemon will perform the following
functions:

* Read the potentially-active (not marked as historical-only) tail portion of
  PMS on startup, catch all still-active SMs and hold them in RAM-based data
  structures;

* Listen on a UNIX domain local socket of type SOCK_SEQPACKET, meaning
  connection- and message-oriented;

* Accept message submission (or entry) commands from other ThemWi-SMSC
  components connecting to this socket;

* Allow those socket-connecting SMSC components to register themselves as
  performing special roles (GSM network interface, IMSI resolver, uplink and
  downlink SMPP connection handlers), and send notification packets to those
  role-handlers when an active SM needs that type of processing;

* When these just-described role-handlers respond with success or failure of
  message handling, discharge the SM into historical state (either delivered or
  failed), or in one special case (successful completion of MSISDN-to-IMSI
  lookup) promote the SM from need-IMSI-lookup state into GSM-MT-delivery state.

The key feature of themwi-smsc-core daemon is that it can stay up and running
even when all other ThemWi-SMSC daemon processes are shut down.  It won't be
particularly useful in this state, and won't be able to bring any outstanding
active SMs any closer toward delivery, but the key point is that dependency
graph arrows between sw components point in only one direction.

2.4. Message entry paths

Every new SM enters the SMSC by way of one of our sw components making a local
socket connection to themwi-smsc-core and sending it a "submit new message"
command packet.  The following ThemWi-SMSC sw components will be able to enter
new SMs in this manner:

* A special command line utility named themwi-smsc-submit will perform just
  this function and nothing else;

* GSM network interface daemon themwi-smsc-gsmif will submit SMs received from
  GSM subscribers as MO messages;

* Upstream SMPP link handler themwi-smsc-uplink will submit SMs received from
  the upstream connection, i.e., from the outside world;

* Downstream SMPP link handlers will submit SMs received from downstream peers.

Most of the common processing functions, such routing and validation steps,
will be performed by themwi-smsc-core.  Once all admission-time checks pass,
the new SM will be written into PMS, and if the destination is anything other
than write-into-PMS-only, the new active SM will also be added to the core
daemon's in-RAM data structures.  Further delivery steps will happen if and when
the appropriate role-handler connects to themwi-smsc-core and accepts messages
for processing.

2.4.1. Routing of Short Messages
 
For every incoming SM, themwi-smsc-core will apply routing based on the
destination address in addr_to_orig member of the submitted struct sm_record.
Referring to the general principles of section 1.1, this step is very specific
to the numbering plan (NANP) for which ThemWi-SMSC is designed.  The following
routing rules will be applied:

* If the destination number is international (TON=1) and the country code is
  anything other than +1, the destination is set to SME_CLASS_UPSTREAM.

* If the destination number is NANP, entered in international TON=1 format or
  in one of local-culture formats (10-digit NPANXXXXXX or 11-digit 1NPANXXXXXX,
  TON=0), NANP validation rules are applied and outright-invalid numbers are
  rejected.  The validated NANP number is looked up in themwi-nanp database of
  locally owned phone numbers; if the number is locally owned, the destination
  is either SME_CLASS_LOCAL or SME_CLASS_GSM, depending on how the number is
  assigned, or the message may be rejected if the locally-owned number is of a
  type that cannot receive SMS.  If there is no hit in the database of locally
  owned numbers, another number database gets a lookup, the one for numbers of
  downstream peers - a hit in that database will set the destination to
  SME_CLASS_DOWNSTREAM.  Finally, if the NANP destination number doesn't hit
  anywhere, the destination is SME_CLASS_UPSTREAM.

* If the destination number is a USA SMS short code of form NXXXX or NXXXXX,
  the destination is SME_CLASS_UPSTREAM.

* In the case of locally originated SMs only (coming from GSM MO or from
  themwi-smsc-submit command line utility), special 4-digit numbers may be
  defined in the number database of themwi-nanp that are meaningful only
  locally.  If one of those numbers matches, the destination is SME_CLASS_LOCAL
  or SME_CLASS_GSM according to the exact number type.

* If none of the above conditions match, the message is rejected as unroutable.

What is the difference between SME_CLASS_LOCAL and SME_CLASS_GSM destinations?
Answer: SME_CLASS_LOCAL means that writing the SM into PMS constitutes final
delivery, and nothing more needs to be done.  OTOH, destination of SME_CLASS_GSM
means that an MSISDN-to-IMSI lookup needs to be performed, followed by GSM MT
delivery.

There is one additional routing mode that is available only via
themwi-smsc-submit, or perhaps future specialized network sw components that
incorporate the same function: if a locally generated MT message needs to be
sent to a local GSM MS addressed by IMSI, with no destination phone number
existing at all, themwi-smsc-submit can instruct themwi-smsc-core to skip the
routing step, with the destination preset to SME_CLASS_GSM and dest_imsi
prefilled.

2.4.2. Permission to send to the uplink

Not every local phone number served by ThemWi-SMSC is allowed to send SMS to
our upstream interconnection point with Bandwidth.com SMPP server.  As explained
in section 1.2, our access to Bandwidth P2P SMS interconnection service is
through a reseller (Sopranica Telecom), and our arrangement is such that we
have to pay for each individual phone number for which P2P SMS interconnection
service is provided.  The economics of the situation are such that the total
set of NANP numbers (good for calls) we rent from BulkVS is greater than the
subset for which we enable outside SMS interconnection service through
Bandwidth+Sopranica.  Therefore, we have a flag in our themwi-nanp database of
locally owned numbers (NUMBER_FLAG_SMSPROV) which we set only on certain
numbers, those that are provisioned for outside SMS interconnection and which
are therefore allowed to send SMS to the outside world.  All other locally
owned phone numbers (those without this flag) can only exchange SMS within our
fiefdom, including our downstream peers.

For each newly submitted SM, themwi-smsc-core will make a routing determination
per the previous section, and if the destination is SME_CLASS_UPSTREAM, the
identity of the sender will be checked.  The sender will need to be a locally
owned number with upstream SMS permission bit set, otherwise the message is
rejected.

2.4.3. PID and DCS constraints

Special codes in PID and DCS octets can invoke many special functions that go
far beyond ordinary human-to-human SMS: setting and clearing voice mail waiting
indication flags, SIM OTA communication, silent SMS etc.  While there are
legitimate use cases for all of these special services, and an SMSC
implementation should provide a way for duly authorized network components to
send such special SMS to local GSM subscribers, it would be irresponsible for
a public MNO to allow any Alice to send such SMS-encoded trojans to any Bob, or
to accept the same from Big Bad outside world and forward them directly to
unsuspecting local users.

The solution adopted for ThemWi-SMSC is that each sw component that accepts SMs
from untrusted parties will apply filtering rules to both PID and DCS octets.
In the case of messages originated from local GSM MS, themwi-smsc-gsmif will be
responsible for preening PID and DCS, whereas in the case of messages coming
from the outside world, the responsibility falls on themwi-smsc-uplink instead.

The specific masks or ACLs of which PID and DCS codes should be accepted will
be configurable; the recommended default is to:

* allow any PID in 000xxxxx range (0x00 through 0x1F), but no others;

* allow DCS 0x00 (GSM7 text) and 0x08 (UCS-2 text), but no others.

2.4.4. Validity period and expiry time

Given the store-and-forward nature of SMS, the amount of work spent trying to
deliver a message to a "difficult" destination must be bounded.  The standard
SMS architecture of GSM 03.40 provides the notion of a validity period,
optionally specified by message senders, as the mechanism for limiting the
lifetime of a message that cannot be delivered right away.

Message validity periods and expiry times will be handled as follows in
ThemWi-SMSC:

* At the socket interface from message-submitting components to
  themwi-smsc-core, the VP will always be communicated in relative form, as a
  count of seconds.  Special value 0 means that the source is not setting the
  VP, and a system-wide default needs to be applied.

* If themwi-smsc-gsmif receives an absolute-format VP from GSM MS, it will
  convert to relative seconds-from-present before submitting the SM to
  themwi-smsc-core.

* themwi-smsc-core will have two configurable settings with regard to message
  longevity: default VP and maximum VP.  The default VP setting will be applied
  when no VP is set at the message source: themwi-smsc-submit without explicit
  VP, MO SM from a GSM MS without VP setting, or a message from the outside
  world where SMPP never provides a VP on incoming messages.  OTOH, the maximum
  VP setting will serve as a cap in case a user did specify an explicit VP, but
  it is unreasonably long.

2.4.5. Duplicate message detection

One can easily envision various scenarios in which a duplicate copy is received
for an earlier message which is still active, i.e., still queued for delivery
to its destination.  Instead of adding such duplicates to the queue, it is
desirable to be able to detect and suppress them.  The details remain to be
worked out.

3. SMS communication via direct shell access

To be filled.

4. Interface to local Osmocom GSM network

GSUP and separate MSISDN-to-IMSI lookup, to be described.

5. SMPP connection handlers and outside-world SM exchange

To be filled.