FreeCalypso > hg > gsm-codec-lib
view doc/PCM8-conversions @ 459:b094bc07051a
doc/Codec-utils: document twamr-* addition
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Fri, 10 May 2024 19:50:29 +0000 |
parents | e4a4bf11f37c |
children |
line wrap: on
line source
What is the authoritatively correct, officially endorsed bidirectional mapping between G.711 A-law and mu-law encodings on one side and 16-bit 2's complement linear PCM on the other side? Surprisingly, there is no official answer to this problem anywhere in the specs! Instead the specs provide the following partial answers: * The G.711 spec itself provides one mapping from A-law code octets to linear numeric values in range [-4032,4032] and another mapping from mu-law code octets to linear numeric values in range [-8031,8031]. The output from each of these mapping is given in "pure mathematical" form, without specifying any bit-level encoding, and furthermore, mu-law decoder output in its pure "conceptual" form has both +0 and -0 values. (The same signed zero problem does not occur in A-law because it's a mid-riser code rather than mid-tread, and thus has no quantized values equal to 0.) * If one takes the "pure mathematical" output from the spec-prescribed G.711 decoder and represents it in 2's complement form, squashing +0 and -0 outputs from the canonical mu-law decoder into "plain 0" at this step, the result is a 13 bits wide 2's complement value for A-law decoding and a 14 bits wide 2's complement value for mu-law. * All GSM speech encoders take 13-bit 2's complement linear PCM samples as their input. How should this 13-bit GSM codec input be derived from A-law or mu-law code octets? GSM specs refer to ITU's G.726 spec for ADPCM - it just so happens that inside the ADPCM algorithm of G.726 (a totally unrelated codec of no relevance to GSM codec work outside of this reference) there is a pair of functions for expanding A-law and mu-law to linear PCM and compressing linear PCM back to A-law or mu-law. * Following this obscure G.726 reference, we eventually conclude that in the case of A-law, GSM specs call for the obvious treatment: take the "natural" output from the canonical A-law decoder, represent it in 2's complement form, the result is 13 bits wide, and just feed that 13-bit 2's complement form to the input of GSM speech encoders. However, in the case of mu-law the "natural" G.711 decoder output is one sign bit plus 13 bits of magnitude, requiring 14 bits in 2's complement representation - and none of the specs I could find says anything about exactly how this 14-bit input should be reduced to 13 bits for feeding to GSM speech encoders. Canonical C implementations of all GSM speech encoders take their input in 16-bit words and clear the 3 least significant bits as their first step; if the 14-bit mu-law decoder output is represented in 16-bit words by padding 2 zero bits on the right and this output is then fed to GSM speech encoder functions, the end effect is that the least-significant bit of the 14-bit decoder output is simply cut off. This form of mu-law-to-GSM transcoder implementation is consistent with TESTx-U.INP and TESTx-U.COD sequences provided in the GSM 06.54 package for EFR. Based on the above considerations, we have our answer for how we should convert from G.711 to 16-bit 2's complement linear PCM: * For A-law, we emit the "natural" output in 13-bit 2's complement form and append 3 zero bits on the right; this transformation is fully lossless. * For mu-law, we emit the "natural" output in 14-bit 2's complement form and append 2 zero bits on the right. This transformation is almost lossless, with just one exception: the "pure" decoder's -0 output (resulting from PCMU octet 0x7F) is squashed to "plain 0", and will be re-emitted as PCMU octet 0xFF rather than 0x7F on subsequent re-encoding to G.711 PCMU. For anyone needing a G.711 to 16-bit linear PCM decoder, the present package provides ready-made decoding tables (following the above rules) in dev/a2s-regen.out and dev/u2s-regen.out, generated by dev/a2s-regen.c and dev/u2s-regen.c programs. Now for the opposite problem: what is the most correct way to compress 16-bit 2's complement linear PCM to A-law or mu-law? In this direction the official specs leave even more ambiguity than in the G.711 decoding direction: * The G.711 spec itself says: "The conversion to A-law or mu-law values from uniform PCM values corresponding to the decision values, is left to the individual equipment specification." The specific implementation used in the guts of G.726 ADPCM codec is referred to only as a non-normative example. * GSM specs likewise refer to this G.726 section 4.2.8 (for compression of 13-bit speech decoder output to G.711) with language that suggests a non-normative example. After painstakingly comparing the C implementation of G.726 in the ITU-T G.191 STL against the language of G.726 spec itself and convincing myself that they really do match, and then painstakingly comparing this approach against the one implemented in the same G.191 STL for G.711 in alaw_compress() and ulaw_compress() and against the table lookup method implemented in libgsm/toast (my first reference, before I went down the rabbit hole of tracking down official specs), I reached the following conclusions: * For A-law encoding all 3 parties (G.191 STL alaw_compress() function, G.726 "compress" block and toast_alaw.c) agree on the same mapping. In this mapping only the most significant 12 bits of the 2's complement input word (equivalent to one sign bit and 11 bits of magnitude) are relevant, leading to the following two interesting properties: - the least-significant bit of GSM speech decoder output is always discarded when converting to A-law; - conversion can be easily implemented with a 4096-byte look-up table based on the upper 12 bits of input, exactly as was done in toast_alaw.c in the venerable libgsm source. * Mu-law encoding is the real hair-raiser: if the input to the to-be-implemented encoder has 14 or more bits (including the most practical problem of 16-bit 2's complement input), there are no less than 3 different ways to implement this encoder! Let us now look at the 3 different ways of encoding a 14-bit or 16-bit 2's complement linear PCM input to G.711 mu-law. In this analysis we shall use 14-bit notation, with 2's complement inputs contained in the domain [-8192,8191]. The difference between the 3 identified ways of mapping from this domain to mu-law have to do with boundaries between quantization intervals. Tables 2a and 2b in the G.711 spec list all defined quantization intervals and all decision values that mark boundaries between them; here is a digested form of the beginning of the canonical quantization table for either side of zero: Quantization interval Quantized value (range of magnitudes) (absolute) --------------------------------------- 0-1 0 1-3 2 3-5 4 ... 29-31 30 31-35 33 35-39 37 ... 91-95 93 95-103 99 ... This canonical quantization table is defined in terms of absolute values, and is therefore fully symmetric around zero. A careful look at the above table raises a question: which quantization interval (and thus which PCMU octet output) should be selected if the input value to the encoder has a magnitude exactly equal to one of the threshold points, or decision values as they are officially called? In other words, what should the encoder do if the magnitude of the 14-bit input value equals 1, 3, 5, ..., 31, 35, 39 etc? The answer to this question is where the 3 candidate mappings under our consideration differ: * The "compress" function of G.726 operating in mu-law mode selects the higher (in absolute value) quantization interval at every decision value threshold, on both sides of zero: see Table 15/G.726. PCMU octet 0x7F (meaning -0) will never be emitted by this version, and an input sequence of -3, -2, -1, 0, 1, 2, 3 will map to quantized values -4, -2, -2, 0, 2, 2, 4. * The ulaw_compress() function in G.191 STL behaves like the G.726 version for positive values, but selects the smaller-absolute-value quantization interval for negative inputs. Given the same input sequence as above, the output will correspond to quantized values -2, -2, -0, 0, 2, 2, 4. (Quantized value -0 is PCMU octet 0x7F.) * The s2u[] table in toast_ulaw.c in libgsm source is flat-out wrong and should not be used or considered further (and because those authors did not include the source for whatever program they used to generate their broken s2u[] and u2s[] tables, we have no way to really analyze them), but one CAN construct a new table for the same function, using the upper 13 bits of 16-bit 2's complement input to generate PCMU output - see our dev/s2u-regen.c program and its output table in dev/s2u-regen.out. The resulting mapping is "mirrored" around zero compared to G.191 STL ulaw_compress(): for the same input sequence as in the previous two examples, the output will correspond to quantized values -4, -2, -2, 0, 0, 2, 2. Just like the G.726 version, this look-up table version will never emit PCMU octet 0x7F for -0. It is important to note that all GSM speech decoders produce 2's complement outputs that are only 13 bits wide, not 14 - therefore, when the input to the G.711 encoder comes from the output of a GSM speech decoder, the difference between all 3 alternatives listed above is masked, with all 3 producing identical output. For production software, our (Themyscira) recommendation is to use look-up tables (dev/s2a-regen.out and dev/s2u-regen.out) for both A-law and mu-law encoding, using the upper 12 bits from 16-bit 2's complement input for A-law encoding and the upper 13 bits for mu-law encoding. For mu-law encoding the resulting mapping is different from what G.191 STL ulaw_compress() function produces, and many will consider that function to be canon - but our approach exhibits the same key properties, just mirrored around zero, and has the advantage of needing only the upper 13 bits. Command line utilities ====================== As usual, the present Themyscira GSM codec libraries & utilities package provides command line utilities for working with the subject of this article: conversions between 16-bit linear PCM (the format read and written by other tools in the present suite) and 8-bit PCM in G.711 A-law or mu-law. The following utilities are provided: pcm16-to-alaw These two utilities read 16-bit linear PCM in raw format (BE pcm16-to-ulaw byte order by default, or LE with -l option) and convert the recording into one byte per sample G.711 format, with each program emitting its respective encoding law. pcm16-to-alaw has only one mapping, but pcm16-to-ulaw supports two possible mappings: by default it applies the mapping of G.191 STL ulaw_compress(), or if use specify -t option it applies the same mapping that would be produced by our recommended 13-bit look-up table method. pcm8-to-pcm16 This utility reads a G.711 8-bit PCM recording (alaw or ulaw selected with a mandatory command line argument) from a "raw" G.711 file and converts it to 16-bit linear PCM. The output byte order is BE by default, or can be changed to LE with an extra command line qualifier.