FreeCalypso > hg > freecalypso-docs
view SIM-data-formats @ 72:2ac10b2cde4f
DUART28-PPD-surprise article written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 19 Jul 2021 05:01:15 +0000 |
parents | ce044aa49baf |
children | 7609ff4be49f |
line wrap: on
line source
FreeCalypso is developing a family of several different tools that operate on SIM cards and user data (primarily phonebooks) stored in them, accessing the same underlying data through various mechanisms: * fc-simtool in our FC SIM tools suite operates on SIM cards inserted into a smart card "reader" device, without going through any kind of phone or other GSM device - most direct manipulation of SIM user data content. * Our FC host tools suite features a new utility called fc-simint - it is a front end to fc-simtool that operates on SIM cards inserted into Calypso phones or FC modem boards, working on the same principle as fc-loadtool (suspending and bypassing the Calypso device's regular operational firmware), but operating on the device's SIM interface rather than its flash. * We have a FreeCalypso User Phone Tools suite that communicates with FC modem boards and the future FC phone handset via AT commands. We have plans to add phonebook manipulation commands to this suite (based on AT+CPBR and AT+CPBW), reading and writing phonebook data files in the same format as fc-simtool. Because we have several different tools (some already written, others only planned) that will need to read and write exactly the same data formats, and because these tools will have to live in different source repositories (totally different underlying hardware and system library requirements), the data format specification needs to be global and independent of particular hw tools - it is the present document. GSM 03.38 / 23.038 string representation ======================================== The world of GSM does not use ASCII - in all places where ASCII strings would appear in the world of ordinary computing, GSM uses its own different 7-bit character set instead, defined in GSM TS 03.38 or 3GPP TS 23.038. Many SIM card data files (including phonebooks) contain so-called alpha fields in which GSM 03.38 (not ASCII!) characters are packed into 8-bit bytes, with the high bit zeroed. (These alpha fields also allow alternative UCS-2 encodings, distinguished by the high bit being set - but we handle this case separately.) Some other SIM card data files (EF_PNN for example) contain GSM 03.38 7-bit text strings packed into bytes like in SMS. However, when we store text strings (such as phonebook contact names) that have been read out of a SIM (or are intended to be written to a SIM) in UNIX text files, or pass them around in command line arguments, we need an ASCII-based representation of these text strings that are encoded in GSM7 in the actual GSM/SIM world. Furthermore, our ASCII representation needs to be 100% lossless and well-defined. Our function for lossless conversion of GSM 03.38 strings to ASCII operates as follows: * The output is always enclosed in double-quote characters, as in "text string". * All GSM7 code points that map to characters that are also present in ASCII translate to these ASCII characters: for example, GSM7 code 0x00 becomes '@', and GSM7 code 0x02 becomes '$'. * Any double-quote characters in the data are escaped with a backslash, becoming \" * GSM7 escape sequences for ASCII characters [\]^ and {|}~ are recognized and converted to these ASCII characters; \ is then escaped in the output as \\ * GSM7 escape sequence for the Euro currency symbol is recognized and converted to \E * GSM7 code points corresponding to CR and LF are represented as \r and \n * GSM7 escape characters that are not part of a valid sequence for [\]^ or {|}~ (or for \E) are represented as \e * All other GSM7 characters that cannot be represented in ASCII in any other way are represented as \xX escapes, where xX is a two-digit hexadecimal number in the range between 00 and 7F, inclusive. The result of these rules is as follows: * If the text item consists entirely of characters that exist in ASCII (the most common use case), it will appear naturally in ASCII, even if it contains characters like '@' and '$' that have different code points in GSM7, or characters in the [\]^ and {|}~ sets that require escaping in GSM7. * Any text item containing weird characters will still be converted losslessly, so it can be written back into the SIM or decoded manually by a GSM7-knowing user, and the representation in data files and command output is always printable ASCII, nothing else. * In cases where an occasional weird character appears in an otherwise ASCII- dominated string, it is easy to both mentally decode and manually enter such characters when necessary. For example, if one of your SIM contacts is a lady named Michele who spells her name in the French way, with an accent grave on the first 'e' (non-ASCII character U+00E8), her name shall be entered as "Mich\04le", nicely preserving the needed non-ASCII character whose GSM 03.38 code point is 0x04. When a string argument that is destined for conversion to GSM7 is parsed, our input parser always interprets any backslash (\) characters as escapes; it understands all of the same escapes sequences which we emit in output: \" literal " \\ literal \ (encoded in GSM 03.38 as another form of escape) \E Euro currency symbol (ditto) \e GSM 03.38 escape character 0x1B \n GSM 03.38 LF character 0x0A \r GSM 03.38 CR character 0x0D \xX GSM 03.38 code point xX, passed through literally If the input contains ASCII characters which do not exist in GSM7 (` and all control characters except \n and \r), it is an error. If our ASCII-to-GSM7 conversion functions are given 8-bit input, such input is interpreted as ISO 8859-1: any 8859-1 high characters that have GSM7 counterparts will be translated accordingly. (Non-GSM7-mappable high characters are an error just like non-GSM7-mappable ASCII chars.) However, our output is always 7-bit ASCII only, using \xX escapes for GSM 03.38 characters that fall outside of ASCII. Phonebook file format ===================== fc-simtool pb-dump command displays SIM phonebook content on the terminal or saves it in a file in the format defined here, and other tools such as fc-simtool pb-restore and pb-update commands need to be able to read back the same format losslessly. The phonebook file format is hereby shown by way of example: #1: #646#,0x81 "Check Minutes" #2: #674#,0x81 "Check Text Usage" #3: #225#,0x81 "Check Balance" #4: 8675309,0x81 "Jenny" #5: 88211016401,0x91 "sysmoUSIM-SJS1 MSISDN" #6: 44444,0x81 HEX 810B0893BEC03ABEBC209A9FA1A1 #7: *123#,0x81 "" #8: 5551234,0x81 "HEX magic spells by Mich\04le" The rules are as follows: * Each line in the file format represents one phonebook record. * The decimal number between the initial '#' and the following ':' is the record number in the phonebook, between 1 and 255 as in the SIM protocol READ RECORD and UPDATE RECORD commands. * The phone number is always given without quotes, and consists only of digits and '*' and '#' characters - no '+' international symbol is allowed in this file format. * The TON/NPI byte is required, is always given in hex as 0xXX (no other form allowed in this file format), and is separated from the phone number digit string by a comma. Note how this byte usually equals 0x91 for international numbers (those entered with a '+' in typical UIs) or 0x81 otherwise. * Either a quoted-string or a hex-string is always present at the end of each record, giving the alpha tag for the phonebook entry. This field is mandatory in the file format; if there is no alpha tag (really meaning empty alpha tag), the line ends with empty quoted-string "". * Quoted-strings for the alpha tag are used for either empty/null or GSM7-encoded alpha tags; hex-strings are used for UCS2-encoded alpha tags. * The format of hex-string alpha tags is as shown in entry #6 in the example above - this example gives a contact name in Russian. (Full decoding of this contact name is left as an exercise for adventurous readers - see ETSI TS 102 221 Annex A and the Cyrillic block of Unicode.) * Hex-strings can be used for any arbitrary bytes in the alpha tag, but are only needed for UCS-2 encodings. Every possible GSM7 string can be represented in our quoted-string notation. * The quoted-string (GSM 03.38) form of the alpha tag must always be quoted, even if quotes seem optional like in the "Jenny" example above (record #4). The absence of quotes is what allows the HEX keyword to be distinguished: compare and contrast records #6 and #8 in the example. The above format applies when the almost-never-used CCP and EXT bytes in the phonebook record both equal 0xFF, meaning not used. In the unlikely case when these fields are used, the following extra fields are added to the line-based representation: * If CCP != 0xFF, a "CCP=%u " field is inserted between the phone number and the alpha tag. * If EXT != 0xFF, a "EXT=%u " field is inserted between the phone number and the alpha tag. * If both CCP and EXT are present, the CCP= field appears before the EXT= field, same order as in the SIM binary record.