FreeCalypso > hg > gsm-codec-lib
diff doc/PCM8-conversions @ 235:0ee1a66c1846
doc/PCM8-conversions: beginning of document
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 08 May 2023 00:45:26 +0000 |
parents | |
children | 4c7d0dc1eecb |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/PCM8-conversions Mon May 08 00:45:26 2023 +0000 @@ -0,0 +1,103 @@ +What is the authoritatively correct, officially endorsed bidirectional mapping +between G.711 A-law and mu-law encodings on one side and 16-bit 2's complement +linear PCM on the other side? Surprisingly, there is no official answer to this +problem anywhere in the specs! Instead the specs provide the following partial +answers: + +* The G.711 spec itself provides one mapping from A-law code octets to linear + numeric values in range [-4032,4032] and another mapping from mu-law code + octets to linear numeric values in range [-8031,8031]. The output from each + of these mapping is given in "pure mathematical" form, without specifying any + bit-level encoding, and furthermore, mu-law decoder output in its pure + "conceptual" form has both +0 and -0 values. (The same signed zero problem + does not occur in A-law because it's a mid-riser code rather than mid-tread, + and thus has no quantized values equal to 0.) + +* If one takes the "pure mathematical" output from the spec-prescribed G.711 + decoder and represents it in 2's complement form, squashing +0 and -0 outputs + from the canonical mu-law decoder into "plain 0" at this step, the result is + a 13 bits wide 2's complement value for A-law decoding and a 14 bits wide 2's + complement value for mu-law. + +* All GSM speech encoders take 13-bit 2's complement linear PCM samples as their + input. How should this 13-bit GSM codec input be derived from A-law or mu-law + code octets? GSM specs refer to ITU's G.726 spec for ADPCM - it just so + happens that inside the ADPCM algorithm of G.726 (a totally unrelated codec of + no relevance to GSM codec work outside of this reference) there is a pair of + functions for expanding A-law and mu-law to linear PCM and compressing linear + PCM back to A-law or mu-law. + +* Following this obscure G.726 reference, we eventually conclude that in the + case of A-law, GSM specs call for the obvious treatment: take the "natural" + output from the canonical A-law decoder, represent it in 2's complement form, + the result is 13 bits wide, and just feed that 13-bit 2's complement form to + the input of GSM speech encoders. However, in the case of mu-law the + "natural" G.711 decoder output is one sign bit plus 13 bits of magnitude, + requiring 14 bits in 2's complement representation - and none of the specs I + could find says anything about exactly how this 14-bit input should be reduced + to 13 bits for feeding to GSM speech encoders. Canonical C implementations + of all GSM speech encoders take their input in 16-bit words and clear the 3 + least significant bits as their first step; if the 14-bit mu-law decoder + output is represented in 16-bit words by padding 2 zero bits on the right and + this output is then fed to GSM speech encoder functions, the end effect is + that the least-significant bit of the 14-bit decoder output is simply cut off. + This form of mu-law-to-GSM transcoder implementation is consistent with + TESTx-U.INP and TESTx-U.COD sequences provided in the GSM 06.54 package for + EFR. + +Based on the above considerations, we have our answer for how we should convert +from G.711 to 16-bit 2's complement linear PCM: + +* For A-law, we emit the "natural" output in 13-bit 2's complement form and + append 3 zero bits on the right; this transformation is fully lossless. + +* For mu-law, we emit the "natural" output in 14-bit 2's complement form and + append 2 zero bits on the right. This transformation is almost lossless, + with just one exception: the "pure" decoder's -0 output (resulting from PCMU + octet 0x7F) is squashed to "plain 0", and will be re-emitted as PCMU octet + 0xFF rather than 0x7F on subsequent re-encoding to G.711 PCMU. + +For anyone needing a G.711 to 16-bit linear PCM decoder, the present package +provides ready-made decoding tables (following the above rules) in +dev/a2s-regen.out and dev/u2s-regen.out, generated by dev/a2s-regen.c and +dev/u2s-regen.c programs. + +Now for the opposite problem: what is the most correct way to compress 16-bit +2's complement linear PCM to A-law or mu-law? In this direction the official +specs leave even more ambiguity than in the G.711 decoding direction: + +* The G.711 spec itself says: "The conversion to A-law or mu-law values from + uniform PCM values corresponding to the decision values, is left to the + individual equipment specification." The specific implementation used in the + guts of G.726 ADPCM codec is referred to only as a non-normative example. + +* GSM specs likewise refer to this G.726 section 4.2.8 (for compression of + 13-bit speech decoder output to G.711) with language that suggests a + non-normative example. + +After painstakingly comparing the C implementation of G.726 in the ITU-T G.191 +STL against the language of G.726 spec itself and convincing myself that they +really do match, and then painstakingly comparing this approach against the one +implemented in the same G.191 STL for G.711 in alaw_compress() and +ulaw_compress() and against the table lookup method implemented in libgsm/toast +(my first reference, before I went down the rabbit hole of tracking down +official specs), I reached the following conclusions: + +* For A-law encoding all 3 parties (G.191 STL alaw_compress() function, G.726 + "compress" block and toast_alaw.c) agree on the same mapping. In this + mapping only the most significant 12 bits of the 2's complement input word + (equivalent to one sign bit and 11 bits of magnitude) are relevant, leading + to the following two interesting properties: + + - the least-significant bit of GSM speech decoder output is always discarded + when converting to A-law; + + - conversion can be easily implemented with a 4096-byte look-up table based + on the upper 12 bits of input, exactly as was done in toast_alaw.c in the + venerable libgsm source. + +* Mu-law encoding is the real hair-raiser: if the input to the to-be-implemented + encoder has 14 or more bits (including the most practical problem of 16-bit + 2's complement input), there are no less than 3 different ways to implement + this encoder! +