FreeCalypso > hg > freecalypso-tools
comparison doc/Loadtools-performance @ 680:89ed8b374bc0
doc/Loadtools-performance: finished updates for fc-host-tools-r13
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 09 Mar 2020 04:40:13 +0000 |
parents | f2a023c20653 |
children | 0815661d6e3e |
comparison
equal
deleted
inserted
replaced
679:be641fa7b68d | 680:89ed8b374bc0 |
---|---|
79 flashing. | 79 flashing. |
80 | 80 |
81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the | 81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the |
82 fw image size is almost exactly the same, any difference in latency between | 82 fw image size is almost exactly the same, any difference in latency between |
83 CP2102 and FT2232D is less likely to produce such significant time difference | 83 CP2102 and FT2232D is less likely to produce such significant time difference |
84 given our current 2048 byte transfer block size, thus the difference in physical | 84 given our current 2048 byte transfer block size (in fact fc-xram transfer times |
85 flash program operation times between K5A3281CTM and S71PL129N flash chips seems | 85 suggest that FT2232D is faster), thus the difference in physical flash program |
86 to be the most likely explanation. | 86 operation times between K5A3281CTM and S71PL129N flash chips seems to be the |
87 most likely explanation. | |
88 | |
89 It also needs to be noted that in the current version of fc-loadtool there is | |
90 no difference in performance between flash program-bin, program-m0 and | |
91 program-srec operations: they all use the same binary protocol with 2048 byte | |
92 transfer block size. There is no coupling between source S-records and flash | |
93 programming operation blocks (2048-byte units) in the case of flash program-m0 | |
94 and program-srec: the new implementation of these commands prereads the entire | |
95 S-record image as a separate preparatory step on the host side, the bits to be | |
96 programmed are saved in a temporary binary file (automatically deleted | |
97 afterward), and the actual flash programming operation proceeds from this | |
98 internal binary source - but it knows about any discontiguous program regions | |
99 and skips the gaps properly. | |
87 | 100 |
88 XRAM loading via fc-xram | 101 XRAM loading via fc-xram |
89 ======================== | 102 ======================== |
90 | 103 |
91 Our current fc-xram implementation is similar to the old 2013 implementation of | 104 The new version of fc-xram as of fc-host-tools-r13 is dramatically faster than |
92 flash program-m0 and program-srec commands in that fc-xram sends a separate ML | 105 the original implementation from 2013, using a new binary transfer protocol. |
93 command to loadagent for each S-record, thus the total XRAM image loading time | 106 The speed increase comes from not only switching from hex to binary, but even |
94 is not only the serial bit transfer time, but also the overhead of command- | 107 more so from eliminating the command-response turnaround time on every S3 |
95 response exchanges between fc-xram and loadagent. The flash programming times | 108 record. The new XRAM loading times obtained on the Mother's Slackware 14.2 |
96 listed above include flashing an FC Magnetite fw image into an FCDEV3B, which | 109 host system are: |
97 took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built as | |
98 ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500 | |
99 baud takes 2m54s. | |
100 | 110 |
101 Why does XRAM loading take longer than flashing? Shouldn't it be faster because | 111 Pirelli DP-L10 with built-in CP2102 USB-serial chip, 812500 baud, loading |
102 the flash programming step on the target is replaced with a simple memcpy()? | 112 hybrid-vpm fw build, 49969 S3 records: 0m27s |
103 Answer: fc-xram is currently slower than flash program operations because the | 113 |
104 latter send 256 bytes at a time to loadagent, whereas fc-xram sends one | 114 FCDEV3B interfaced via FT2232D adapter, 812500 baud, loading hybrid fw build, |
105 S-record at a time; the division of the image into S-records is determined by | 115 78875 S3 records: 0m35m |
106 the tool that generates the SREC image, but TI's hex470 post-linker generates | 116 |
107 images with 30 bytes of payload per S-record. Having the operation proceed in | 117 With the previous version of fc-xram these two loads took 1m40s and 2m54s, |
108 smaller chunks increases the overhead of command-response exchanges and thus | 118 respectively. With the current version of loadtools XRAM loading is faster |
109 increases the overall time. | 119 than flash programming for the same fw image as one would naturally expect (the |
120 flash programming step on the target is replaced with a simple memcpy() | |
121 operation), but in the previous version XRAM loading was slower because of | |
122 massive command-response exchange overhead: there was a command-response | |
123 turnaround time incurred for every S3 record, typically carrying only 30 bytes | |
124 of payload. | |
110 | 125 |
111 Additional complication with FTDI adapters and newer Linux kernel versions | 126 Additional complication with FTDI adapters and newer Linux kernel versions |
112 ========================================================================== | 127 ========================================================================== |
113 | 128 |
114 If you are using an FTDI adapter and a Linux kernel version newer than early | 129 If you are using an FTDI adapter and a Linux kernel version newer than early |
115 2017 (the change was introduced between 4.10 and 4.11), then you have one | 130 2017 (the change was introduced between 4.10 and 4.11), then you have one |
116 additional complication: a change was made to the ftdi_sio driver in the Linux | 131 additional complication: a change was made to the ftdi_sio driver in the Linux |
117 kernel that makes many loadtools operations (basically everything other than | 132 kernel that made many loadtools operations (basically everything other than |
118 flash dumps which are entirely target-driven) unbearably slow (much slower than | 133 flash dumps which are entirely target-driven) unbearably slow, at least with |
119 the Slackware 14.2 reference times given above) unless you execute a special | 134 previous versions of loadtools that made many more command-response exchanges |
120 setserial command first. After you plug in your FTDI-based USB-serial cable or | 135 with loadagent for smaller transfer units and thus were much more sensitive to |
121 connect the USB cable between your PC or laptop and your FTDI adapter board, | 136 host system latency on these exchanges. We do not yet know if this FTDI |
122 causing the corresponding ttyUSBx device to appear, execute the following | 137 latency timer issue still has a significant negative impact or not with current |
123 command: | 138 loadtools, but if it does, the solution is to run a special setserial command. |
139 After you plug in your FTDI-based USB-serial cable or connect the USB cable | |
140 between your PC or laptop and your FTDI adapter board, causing the | |
141 corresponding ttyUSBx device to appear, execute the following command: | |
124 | 142 |
125 setserial /dev/ttyUSBx low_latency | 143 setserial /dev/ttyUSBx low_latency |
126 | 144 |
127 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this | 145 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this |
128 setserial command before running fc-loadtool or fc-xram, and then hopefully you | 146 setserial command before running fc-loadtool or fc-xram, and then hopefully you |
129 should get performance that is comparable to what I get on classic Slackware. | 147 should get performance that is comparable to what I get on classic Slackware. |
130 I say "hopefully" because I am not able to test it myself - I refuse to run any | 148 I say "hopefully" because I am not able to test it myself - I refuse to run any |
131 OS that can be categorized as "modern" - but field reports of performance on | 149 OS that can be categorized as "modern" - but field reports of performance on |
132 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome. | 150 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome, |
151 both with and without the low_latency setting. Please be sure to include your | |
152 Linux kernel version and your USB-serial adapter type in your report! |