FreeCalypso > hg > freecalypso-tools
comparison doc/Loadtools-performance @ 630:8c6e7b7e701c
doc/Loadtools-performance: updates for new program-m0 and setserial
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 29 Feb 2020 21:22:27 +0000 |
parents | 6824c4d55848 |
children | e66fafeeb377 |
comparison
equal
deleted
inserted
replaced
629:0f70fe9395c4 | 630:8c6e7b7e701c |
---|---|
1 Dumping and programming flash | |
2 ============================= | |
3 | |
1 Here are the expected run times for the flash dump2bin operation of dumping the | 4 Here are the expected run times for the flash dump2bin operation of dumping the |
2 entire flash content of a Calypso GSM device: | 5 entire flash content of a Calypso GSM device: |
3 | 6 |
4 Dump of 4 MiB flash (e.g., Openmoko GTA01/02 or Mot C139/140) at 115200 baud: | 7 Dump of 4 MiB flash (e.g., Openmoko GTA01/02 or Mot C139/140) at 115200 baud: |
5 12m53s | 8 12m53s |
17 run times do depend on the host system and USB-serial adapter or other serial | 20 run times do depend on the host system and USB-serial adapter or other serial |
18 port hardware - this host system dependency exists because of the way these | 21 port hardware - this host system dependency exists because of the way these |
19 operations are implemented in our architecture. | 22 operations are implemented in our architecture. |
20 | 23 |
21 Here are some examples of expected flash programming times, all obtained on the | 24 Here are some examples of expected flash programming times, all obtained on the |
22 Mother's Slackware 14.2 host system, using the flash program-bin command as | 25 Mother's Slackware 14.2 host system: |
23 opposed to program-m0 or program-srec: | |
24 | 26 |
25 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware | 27 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware |
26 image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s | 28 image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s |
27 | 29 |
28 Flashing the same OM GTA02 modem with the same fw image, using a CP2102 | 30 Flashing the same OM GTA02 modem with the same fw image, using a CP2102 |
45 * The time it takes for the bits to be transferred over the serial link; | 47 * The time it takes for the bits to be transferred over the serial link; |
46 * The time it takes for the flash programming operation to complete on the | 48 * The time it takes for the flash programming operation to complete on the |
47 target (physics inside the flash chip); | 49 target (physics inside the flash chip); |
48 * The overhead of command-response exchanges between fc-loadtool and loadagent. | 50 * The overhead of command-response exchanges between fc-loadtool and loadagent. |
49 | 51 |
50 If you are starting out with a firmware image in m0 format, converting it to | 52 Programming flash using program-m0 or program-srec |
51 binary with mokosrec2bin (like our FC Magnetite build system always does) and | 53 ================================================== |
52 then flashing via program-bin is faster than flashing the original m0 image | |
53 directly via program-m0. Following the last example above of flashing a | |
54 Magnetite hybrid fw image into an FCDEV3B, the flashing operation via | |
55 program-bin took 2m11s; flashing the same image via program-m0 took 3m54s. | |
56 | 54 |
57 Flashing via program-bin is faster than program-m0 or program-srec because the | 55 Prior to fc-host-tools-r12 flash programming via flash program-m0 or |
58 program-bin operation uses a larger unit size internally. fc-loadtool | 56 program-srec commands was much slower than flash program-bin. The reason for |
59 implements all flash programming operations by sending AMFW or INFW commands to | 57 this performance discrepancy was that the original implementation of these |
60 loadagent; each AMFW or INFW command carries a string of 16-bit words to be | 58 commands from 2013 was very straightforward: they operated in one pass, reading |
61 programmed. Our program-bin operation programs 256 bytes at a time, i.e., | 59 the S-record image file, and as each individual S-record was read, it was turned |
62 sends one AMFW or INFW command per 256 bytes of image payload; our program-m0 | 60 into an AMFW or INFW command to loadagent. In the case of *.m0 files generated |
63 and program-srec operations program one S-record at a time, i.e., each S-record | 61 by TI's hex470 post-linker, each S-record carries 30 bytes of payload, thus the |
64 in the source image turns into its own AMFW or INFW command to loadagent. In | 62 flashing operation proceeded in 30-byte units, incurring the overhead of a |
65 the case of m0 images produced by TI's hex470 post-linker, each S-record carries | 63 command-response exchange for every 30 bytes. In contrast, our current flash |
66 30 bytes of payload, thus flashing that m0 image directly with program-m0 will | 64 program-bin implementation sends 256 bytes of payload per each AMFW or INFW |
67 proceed in 30-byte units, whereas converting it to binary and then flashing with | 65 command; this larger unit size decreases the overhead of command-response |
68 program-bin will proceed in 256-byte units. The smaller unit size slows down | 66 exchanges between fc-loadtool and loadagent. |
69 the overall operation by increasing the overhead of command-response exchanges. | |
70 | 67 |
71 XRAM loading via fc-xram is similar to flash program-m0 and program-srec in that | 68 Why do we need flash program-m0 and program-srec commands at all, why not |
72 fc-xram sends a separate ML command to loadagent for each S-record, thus the | 69 simply convert all SREC images to straight binary first and then program with |
73 total XRAM image loading time is not only the serial bit transfer time, but also | 70 flash program-bin? The reason is that S-record images can contain multiple |
74 the overhead of command-response exchanges between fc-xram and loadagent. Going | 71 discontiguous program regions with gaps in between. All of our current |
75 back to the same FC Magnetite fw image that can be flashed into an FCDEV3B in | 72 FreeCalypso firmwares built with TI's TMS470 toolchain contain a few small gaps |
76 2m11s via program-bin or in 3m54s via program-m0, doing an fc-xram load of that | 73 in the fwimage.m0 file, filled with 0xFF bytes when converted to straight binary |
77 same fw image (built as ramimage.srec) into the same FCDEV3B via the same | 74 with mokosrec2bin, but TI's own firmwares built for 8 MiB flash configurations |
78 FT2232D adapter at 812500 baud takes 2m54s - thus we can see that fc-xram | 75 often had much bigger gaps in them. |
79 loading is faster than flash program-m0 or program-srec, but slower than flash | 76 |
80 program-bin. | 77 As of fc-host-tools-r12 we finally have a more efficient solution for flashing |
78 discontiguous SREC images: our new implementation of flash program-m0 and | |
79 program-srec commands begins with a preliminary pass (pure host operation, no | |
80 target interaction) of reading the S-record image file; the payload bits are | |
81 written into a temporary binary file (automatically deleted afterward), while | |
82 the address and length of each discontiguous region are remembered internally. | |
83 Then the actual flash programming operation proceeds just like program-bin, | |
84 reading from the internal binary file and sending 256 bytes of payload at a time | |
85 to loadagent, but using the remembered knowledge of where the discontiguous | |
86 regions lie. | |
87 | |
88 XRAM loading via fc-xram | |
89 ======================== | |
90 | |
91 Our current fc-xram implementation is similar to the old 2013 implementation of | |
92 flash program-m0 and program-srec commands in that fc-xram sends a separate ML | |
93 command to loadagent for each S-record, thus the total XRAM image loading time | |
94 is not only the serial bit transfer time, but also the overhead of command- | |
95 response exchanges between fc-xram and loadagent. The flash programming times | |
96 listed above include flashing an FC Magnetite fw image into an FCDEV3B, which | |
97 took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built as | |
98 ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500 | |
99 baud takes 2m54s. | |
81 | 100 |
82 Why does XRAM loading take longer than flashing? Shouldn't it be faster because | 101 Why does XRAM loading take longer than flashing? Shouldn't it be faster because |
83 the flash programming step on the target is replaced with a simple memcpy()? | 102 the flash programming step on the target is replaced with a simple memcpy()? |
84 Answer: fc-xram is currently slower than flash program-bin because the latter | 103 Answer: fc-xram is currently slower than flash program operations because the |
85 sends 256 bytes at a time to loadagent, whereas fc-xram sends one S-record at a | 104 latter send 256 bytes at a time to loadagent, whereas fc-xram sends one |
86 time; the division of the image into S-records is determined by the tool that | 105 S-record at a time; the division of the image into S-records is determined by |
87 generates the SREC image, but TI's hex470 post-linker generates images with 30 | 106 the tool that generates the SREC image, but TI's hex470 post-linker generates |
88 bytes of payload per S-record. Having the operation proceed in smaller chunks | 107 images with 30 bytes of payload per S-record. Having the operation proceed in |
89 increases the overhead of command-response exchanges and thus increases the | 108 smaller chunks increases the overhead of command-response exchanges and thus |
90 overall time. | 109 increases the overall time. |
110 | |
111 Additional complication with FTDI adapters and newer Linux kernel versions | |
112 ========================================================================== | |
113 | |
114 If you are using an FTDI adapter and a Linux kernel version newer than early | |
115 2017 (the change was introduced between 4.10 and 4.11), then you have one | |
116 additional complication: a change was made to the ftdi_sio driver in the Linux | |
117 kernel that makes many loadtools operations (basically everything other than | |
118 flash dumps which are entirely target-driven) unbearably slow (much slower than | |
119 the Slackware 14.2 reference times given above) unless you execute a special | |
120 setserial command first. After you plug in your FTDI-based USB-serial cable or | |
121 connect the USB cable between your PC or laptop and your FTDI adapter board, | |
122 causing the corresponding ttyUSBx device to appear, execute the following | |
123 command: | |
124 | |
125 setserial /dev/ttyUSBx low_latency | |
126 | |
127 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this | |
128 setserial command before running fc-loadtool or fc-xram, and then hopefully you | |
129 should get performance that is comparable to what I get on classic Slackware. | |
130 I say "hopefully" because I am not able to test it myself - I refuse to run any | |
131 OS that can be categorized as "modern" - but field reports of performance on | |
132 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome. |