# HG changeset patch
# User Michael Spacefalcon <msokolov@ivan.Harhan.ORG>
# Date 1372555050 0
# Node ID 343b6b2f178b9bce8cbee4cd06bb3341792a7174
# Parent  d19b4e20ff9f4916795b73e0e895569b76835cfa
beginning of Mokopir-FFS verbal description

diff -r d19b4e20ff9f -r 343b6b2f178b mpffs/Description
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mpffs/Description	Sun Jun 30 01:17:30 2013 +0000
@@ -0,0 +1,280 @@
+This is a description, based on reverse engineering, of the flash file system
+(FFS) implemented in Pirelli's original firmware for the DP-L10 GSM/WiFi dual
+mode mobile phone, and in the Closedmoko GTA0x modem firmware.  Not knowing the
+"proper" name for this FFS, and needing _some_ identifier to refer to it, I
+have named it Mokopir-FFS, from "Moko" and "Pirelli" - sometimes abbreviated
+further to MPFFS.
+
+(I have previously called the FFS in question MysteryFFS; but now that I've
+ successfully reverse-engineered it, it isn't such a mystery any more :-)
+
+At a high functional level, Mokopir-FFS presents the following features:
+
+* Has a directory tree structure like UNIX file systems;
+
+* The file system API that must be implemented inside the proprietary firmware
+  appears to use UNIX-style pathnames; doing strings on firmware images reveals
+  pathname strings like these:
+
+  /var/dbg/dar
+  /gsm/l3/rr_white_list
+  /gsm/l3/rr_medium_rxlev_thr
+  /gsm/l3/rr_upper_rxlev_thr
+  /gsm/l3/shield
+
+  Parsing the corresponding FFS image with tools included in the present
+  package has confirmed that the directory structure implied by these pathnames
+  does indeed exist in the FFS.
+
+* Absolutely no DOS-ish semantics seen anywhere: no 8.3 filenames and no
+  colon-separated device names (seen in the TSM30 file system source, for
+  example) are visible in the Closedmoko/Pirelli FFS.
+
+* File contents are stored uncompressed, but not necessarily contiguous: one
+  could probably store a file in FFS which is bigger than the flash sector
+  size, it which case it can never be contiguous in a writable FFS (see below),
+  and the firmware implementation seems to limit chunk sizes to a fairly small
+  number: on the Pirelli phones all largish files are divided into chunks of
+  8 KiB each, and on my GTA02 the largest observed chunk size is only 2 KiB.
+
+  The smaller files, like the IMEI and the firmware ID strings in my GTA02 FFS,
+  are contiguous.
+
+* The FFS structure is such that the length of "user" payload data stored in
+  each chunk (and consequently, in each file) can be known exactly in bytes,
+  with the files/chunks able to contain arbitrary binary data.  (This property
+  may seem obvious or trivial, as all familiar UNIX and DOS file systems have
+  it, but contrast with RT-11 for example.)
+
+* The flash file system is a writable one: the running firmware can create,
+  delete and overwrite files (and possibly directories too) in the live FFS;
+  thus the FFS design is such that allows these operations to be performed
+  within the physical constraints of NOR flash write operations.
+
+I have reverse-engineered this Mokopir-FFS on a read-only level.  What it means
+is that I, or anyone else who can read this document and the accompanying
+source for the listing/extraction utilities, can take a Mokopir-FFS image read
+out of a device and see/extract its full content: the complete directory tree
+and the exact binary byte content of all files contained therein.
+
+However, the knowledge possessed by the present hacker (and conveyed in this
+document and the accompanying source code) is NOT sufficient for constructing a
+valid Mokopir-FFS image "in vitro" given a tree of directories and files, or
+for making modifications to the file or directory content on an existing image
+and producing a content-modified image that is also valid; valid as in suitable
+for the original proprietary firmware to make its normal read and write
+operations without noticing anything amiss.
+
+Constructing "de novo" Mokopir-FFS images or modifying existing images in such
+a way that they remain 100% valid for all read and write operations of the
+original proprietary firmware would, at the very minimum, require an
+understanding of the meaning of *all* fields on the on-media FFS format.  Some
+of these fields are still left as "non-understood" for now though: a read-only
+implementation can get away with simply ignoring them, but a writer/generator
+would have to put *something* in those fields.
+
+As you read the "read-only" description of the Mokopir-FFS on-media format in
+the remainder of this document, it should become fairly obvious which pieces
+are missing before our understanding of this FFS can be elevated to a
+"writable" level.
+
+However, when it comes to writing new code to run on the two Calypso phones in
+question (Closedmoko and Pirelli), it seems, at least to the present hacker,
+that a read-only understanding of Mokopir-FFS should be sufficient:
+
+* In the case of Closedmoko GTA0x modems, the FFS is seen to contain the IMEI
+  and the RF calibration data.  The format of the former is obvious; the latter
+  not so much - but in any case, the information of interest is clearly of a
+  read-only nature.  It's difficult to tell (or rather, I haven't bothered to
+  experiment enough) whether the Closedmoko firmware does any writes to FFS or
+  if the FFS is treated as read-only outside of the production line environment,
+  but in any case, it seems to me that for any 3rd party replacement firmware,
+  the best strategy would be to treat the FFS as a read-only source of IMEI and
+  RF calibration data, and nothing more.
+
+* In the case of Pirelli phones, the FFS is used to store user data: sent and
+  received SMS (and MMS/email/whatever), call history, UI settings, pictures
+  taken with the camera, and whatever else.  It also stores a ton of files
+  which I can only presume were meant to be immutable except at the time of
+  firmware updates: graphics for the UI, ringtones, i18n UI strings, and even
+  "helper" firmware images for the WiFi and VoIP processors.  However, no IMEI
+  or RF calibration data are anywhere to be found in the FFS - instead this
+  information appears to be stored in the "factory block" at the end of the
+  flash (in its own sector) outside of the FFS.
+
+  Being able to parse FFS images extracted out of Pirelli phones "in vitro"
+  allows us to steal some of these helper files (UI artwork, ringtones,
+  WiFi/VoIP helpers), and some of these might even come useful to firmware
+  replacement projects, but it seems to me that a replacement firmware would
+  be better off using its own FFS design for storing user data, and as to
+  retrieving the original IMEI and RF calibration data, the original FFS isn't
+  of any use for that anyway.
+
+=======================
+Moko/Pirelli FFS format
+=======================
+
+OK, now that I'm done with the introduction, we can get to the actual
+Mokopir-FFS format.
+
+* On the GTA0x modem (or at least on my GTA02; my sample size is 1) the FFS
+  occupies 7 flash sectors of 64 KiB each at offsets 0x380000 through 0x3E0000,
+  inclusive.
+
+(The 4 MiB NOR flash chip used by Closedmoko has an independent R/W bank
+ division between the first 3 MiB and the last 1 MiB.  The first 3 MiB are used
+ to hold the field-flashable closed firmware images distributed as *.m0 files;
+ the independent last megabyte holds the FFS, and thus the FW could be
+ implemented to do FFS writes while running from flash in the main bank.
+ Less than half of that last megabyte appears to be used for the FFS though;
+ the rest appears to be unused - blank flash observed.)
+
+* On the Pirelli the FFS occupies 18 sectors of 256 KiB each at offsets 0
+  through 0x440000 (inclusive) of the 2nd flash chip select, the one wired to
+  nCS3 on the Calypso.
+
+Each flash sector allocated to FFS begins with the following signature:
+
+00000000:  46 66 73 23 10 02 xx yy  zz FF FF FF FF FF FF FF  Ffs#............
+
+The bytes shown as xx and yy above serve a non-understood purpose; as a guess,
+they may hold some info for the flash wear leveling algorithm: in a "virgin"
+FFS image like that found in my GTA02 (which never had a SIM card in it and
+never made or received a call) or read out of a "virgin" Pirelli phone that
+hasn't seen any active use yet, both of these bytes are FFs, but when I look at
+FFS images read out of the Pirelli which I currently use as my everyday-use
+cellphone, I see other values in sectors which must have been erased and
+rewritten.  A read-only implementation can ignore these bytes, as mine does.
+
+The byte shown as zz is more important though, even to a read-only
+implementation.  The 3 values I've encountered in this byte so far are AB, BD
+and BF.  Per my current understanding, in a "healthy" FFS exactly one sector
+will have AB in its header, exactly one will have BF, and the rest will have
+BD.  The meanings are (or appear to be):
+
+AB: the sector holds a vital data structure which I have called the active
+    index block;
+BD: the sector holds regular data;
+BF: the sector is blank except for the header, can be turned into a new AB or
+    BD.
+
+(Note that a flash program operation, which can turn 1s into 0s but not the
+ other way around, can turn BF into either AB or BD - but neither AB nor BD can
+ be turned into any other valid value.)
+
+In a "virgin" FFS image (as explained above) the first FFS sector is AB, the
+last one is BF, and the ones in between are BDs.
+
+An FFS read operation (a search for a given pathname, or a listing of all
+present directories and files) needs to start with locating the active index
+block - the FFS sector with AB in the header.  Following this header, which is
+treated as being 16 bytes long (almost everything in Mokopir-FFS is aligned on
+16-byte boundaries), the active index block contains a linear array of 16-byte
+records, each record describing an FFS object: directory, file or file
+continuation chunk.
+
+Here is my current understanding of the 16-byte index block record structure:
+
+2 bytes: Length of the described chunk in bytes
+1 byte:	 Purpose/meaning not understood, ignored by my current code
+1 byte:	 Object type
+2 bytes: Descendant pointer
+2 bytes: Sibling pointer
+4 bytes: Data pointer
+4 bytes: Purpose/meaning not understood, ignored by my current code
+
+(On the Calypso phones of interest, all multibyte fields are in the native
+ little-endian byte order of the ARM7TDMI processor.)
+
+The active index block gets filled with these records as objects are created;
+the first record goes right after the 'Ffs#'...AB header (padded to 16 bytes);
+the last record (at any given moment) is followed by blank flash for the
+remainder of the sector.  Records thus appear in the order in which they are
+created, which bears no direct relation to the directory tree structure.
+
+The objects, each described by a record in the index block, are organized into
+a tree structure by the descendant and sibling pointers, plus the object type
+indicator byte.  Let's start with the latter; the following objtype byte values
+have been observed:
+
+00: deleted object - a read-only implementation should ignore everything except
+    the descendant and sibling pointers.  (A write-capable implementation would
+    need more care - it would need a way of reclaiming dirty flash space taken
+    up by deleted/overwritten files.)
+
+E1: a special file - see the description of the /.journal file further down
+F1: a regular file (head chunk thereof)
+F2: a directory
+F4: file continuation chunk (explained below)
+
+Each record in the index block has an associated chunk in one of the data
+sectors; the index record contains fields giving the address and length of this
+chunk.  The length of a chunk is always a nonzero multiple of 16 bytes, and is
+stored (as a number in bytes) in the first 16-bit field of the 16-byte index
+entry.  The address of each chunk is given by the data pointer field of the
+index record, and it is reckoned in 16-byte units (thereby 16-byte alignment is
+required) from the beginning of the FFS sector group in the flash address space.
+
+For objects of type F1 and F2 (regular files and directories) the just-described
+chunk begins with the name of the file or subdirectory as a NUL-terminated ASCII
+string.  This name is just for the current level of the directory tree, just
+like in UNIX directories, thus one will have chunk names like gsm, l3, eplmn
+etc, rather than /gsm/l3/eplmn.  One practical effect is that one can't readily
+see pathnames or any of the directory structure by looking at an FFS image as a
+raw hex dump; the structure is only revealed when one uses a parsing program
+like those which accompany this document.
+
+In the case of directories, the "chunk" part of the object contains only the
+name of the directory itself, padded with FFs to a 16-byte boundary.  For
+example, an FFS directory named /gsm would be represented by an object
+consisting of two flash writes: a 16-byte entry in the active index block, with
+the object type byte set to F2, and a corresponding 16-byte chunk in one of the
+data sectors, with the 16 bytes containing "gsm", a terminating NUL byte, and
+12 FF bytes to pad up to 16.  In the case of files, this name may be following
+by the first chunk of file data content, as explained further down.
+
+In order to parse the FFS directory tree (whether the objective is to dump the
+whole thing recursively or to find a specific file given a pathname), one needs
+to first (well, after finding the active AB block) find the root directory node.
+The root directory object is similar to other directory objects: it has a type
+of F2, and an associated chunk of 16 bytes in one of the data sectors.  The
+latter contains the name of the root node: on the Pirelli it is "/", whereas on
+my GTA02 it is "/ffs-root".
+
+The astute reader should notice that it really makes no sense to store a name
+for the root node, and indeed, this name plays no part in the traversal of the
+directory tree given an absolute pathname.  But instead this name, or rather
+its first character, appears to be used for the purpose of locating the root
+node itself.  At first I had assumed that the index record for the root node is
+always the first record in the active index block right after the signature
+header - that is how it is in "virgin" FFS images, and also in some quite non-
+virgin ones I have pulled from my daily-use Pirelli.  Naturally my first version
+of the Mokopir-FFS (then called MysteryFFS) extraction utility expected the root
+node to always be at index #1.  But then I got some additional Pirelli phones,
+and discovered that in certain cases, index record #1 is a deleted object (the
+original root node which has been deleted), and the new active root node is
+somewhere in the middle of the index!
+
+Thus it appears that in order to find the active root node, one needs to scan
+the active index block linearly from the beginning (disregarding the tree
+structure pointers in this initial pass), looking for a non-deleted object of
+type F2 (a directory) whose corresponding name chunk sports a name beginning
+with the '/' character.  (Anyone who's been raised in UNIX will immediately
+know that the path separator character '/' is the only character other than NUL
+that's absolutely forbidden in the individual filenames - so this special
+"root node name" is the only case of a '/' character appearing in what would
+otherwise be a regular filename.)
+
+[What causes the root node to be somewhere other than at index #1?  I assume it
+ has to do with the dirty space reclamation / data movement algorithm.  In a
+ "virgin" FFS image the very first sector is the active index block, and the
+ following sector is the first to hold chunks, beginning with the name chunk of
+ the root node.  Now what happens if all data in that sector aside from the
+ root node name and some other mostly-static directory names becomes dirty,
+ i.e., belonging to deleted or overwritten files?  How would that flash space
+ get reclaimed?  I assume that the FFS firmware algorithm moves all still-active
+ chunks to a new flash sector, invalidating the old copies - turning the latter
+ into deleted objects.  The root node will be among them.  Then at some point
+ the active index block is going to fill up too, and will need to be rewritten
+ into a new sector - at which point the previously-deleted index entries are
+ omitted and the root node becomes #1 again...]