diff doc/Loadtools-performance @ 671:e66fafeeb377

doc/Loadtools-performance: new faster flash operations
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 08 Mar 2020 03:43:11 +0000
parents 8c6e7b7e701c
children f2a023c20653
line wrap: on
line diff
--- a/doc/Loadtools-performance	Sun Mar 08 01:47:57 2020 +0000
+++ b/doc/Loadtools-performance	Sun Mar 08 03:43:11 2020 +0000
@@ -1,15 +1,20 @@
-Dumping and programming flash
-=============================
+Memory dump performance
+=======================
 
 Here are the expected run times for the flash dump2bin operation of dumping the
-entire flash content of a Calypso GSM device:
+entire flash content of a Calypso GSM device with the current version of
+fc-loadtool which uses the new binary transfer protocol:
 
 Dump of 4 MiB flash (e.g., Openmoko GTA01/02 or Mot C139/140) at 115200 baud:
-12m53s
+6m4s
+
+The same 4 MiB flash dump at 812500 baud: 0m52s
 
-The same 4 MiB flash dump at 812500 baud: 1m50s
+Dump of 8 MiB flash (e.g., Mot C155/156) at 812500 baud: 1m44s
 
-Dump of 8 MiB flash (e.g., Mot C155/156) at 812500 baud: 3m40s
+These times are a 2x improvement compared to all previous versions of
+fc-loadtool (prior to fc-host-tools-r13) which used a hex-based transfer
+protocol.
 
 Because of the architecture of fc-loadtool and its loadagent back-end, the run
 time of a flash dump operation depends only on the serial baud rate and the
@@ -21,34 +26,65 @@
 port hardware - this host system dependency exists because of the way these
 operations are implemented in our architecture.
 
+Flash programming operations
+============================
+
 Here are some examples of expected flash programming times, all obtained on the
 Mother's Slackware 14.2 host system:
 
 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware
-image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s
+image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 0m19s to
+erase 37 sectors, 3m45s to program the image.
 
 Flashing the same OM GTA02 modem with the same fw image, using a CP2102
-USB-serial cable at 812500 baud: 1m52s
+USB-serial cable at 812500 baud: 0m19s to erase, 0m51s to program.
 
 Flashing a Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board
-(S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 2m11s
+(S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 0m24s to erase
+13 sectors (4 small and 9 large), 1m27s to program the image.
+
+Regardless of whether you execute these two steps separately or use one of our
+new flash e-program-{bin,m0,srec} commands, flash programming is always done in
+two steps: first the erase operation covering the needed range of sectors, then
+the actual programming operation that includes the data transfer.
 
-These times are just for the flash program-bin operation, not counting the
-flash erase which must be done first.  Flash erase times are determined
-entirely by physical processes inside the flash chip and are not affected by
-software design or the serial link: for each sector to be erased, fc-loadtool
-issues the sector erase command to the flash chip and then polls the chip for
-operation completion status; the polling is done over the serial link and thus
-may seem very slow, but the extra bit of latency added by the finite polling
-speed is still negligible compared to the time of the actual sector erase
-operation inside the flash chip.  In contrast, the execution time of a flash
-program-bin operation is a sum of 3 components:
+Flash erase times are determined entirely by physical processes inside the
+flash chip and thus should not be affected by software design or the serial
+link: for each sector to be erased, fc-loadtool issues the sector erase command
+to the flash chip and then polls the chip for operation completion status; the
+polling is done over the serial link and thus may seem very slow, but the extra
+bit of latency added by the finite polling speed is still negligible (at least
+on the Mother's Slackware system) compared to the time of the actual sector
+erase operation inside the flash chip.  One remaining flaw is that in our
+current implementation the issuance of each individual sector erase command to
+the flash chip takes 6 command-response exchanges between fc-loadtool and
+loadagent; on my Slackware host system this extra overhead is still negligible
+compared to the 0.5s or more for the actual erase operation time, but this
+overhead may become more significant on host systems with higher latency.
+
+After the erase operation, the execution time of the main flash programming
+operation is a sum of 3 components:
 
 * The time it takes for the bits to be transferred over the serial link;
 * The time it takes for the flash programming operation to complete on the
   target (physics inside the flash chip);
 * The overhead of command-response exchanges between fc-loadtool and loadagent.
 
+Because image data transfer is taking place in this step, flash programming at
+812500 baud is faster than 115200 baud, although it is not the same 7x
+improvement as happens with flash dumps.  The present version of fc-loadtool
+also uses a new binary transfer protocol instead of the hex-based one used in
+previous versions (prior to fc-host-tools-r13); this change produces a 2x
+improvement for OM GTA02 flashing, but only a smaller improvement for FCDEV3B
+flashing.
+
+Notice the difference in flash programming times between GTA02 and FCDEV3B: the
+fw image size is almost exactly the same, any difference in latency between
+CP2102 and FT2232D is less likely to produce such significant time difference
+given our current 2048 byte transfer block size, thus the difference in physical
+flash program operation times between K5A3281CTM and S71PL129N flash chips seems
+to be the most likely explanation.
+
 Programming flash using program-m0 or program-srec
 ==================================================