FreeCalypso > hg > gsm-codec-lib
view doc/PCM8-conversions @ 235:0ee1a66c1846
doc/PCM8-conversions: beginning of document
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 08 May 2023 00:45:26 +0000 |
parents | |
children | 4c7d0dc1eecb |
line wrap: on
line source
What is the authoritatively correct, officially endorsed bidirectional mapping between G.711 A-law and mu-law encodings on one side and 16-bit 2's complement linear PCM on the other side? Surprisingly, there is no official answer to this problem anywhere in the specs! Instead the specs provide the following partial answers: * The G.711 spec itself provides one mapping from A-law code octets to linear numeric values in range [-4032,4032] and another mapping from mu-law code octets to linear numeric values in range [-8031,8031]. The output from each of these mapping is given in "pure mathematical" form, without specifying any bit-level encoding, and furthermore, mu-law decoder output in its pure "conceptual" form has both +0 and -0 values. (The same signed zero problem does not occur in A-law because it's a mid-riser code rather than mid-tread, and thus has no quantized values equal to 0.) * If one takes the "pure mathematical" output from the spec-prescribed G.711 decoder and represents it in 2's complement form, squashing +0 and -0 outputs from the canonical mu-law decoder into "plain 0" at this step, the result is a 13 bits wide 2's complement value for A-law decoding and a 14 bits wide 2's complement value for mu-law. * All GSM speech encoders take 13-bit 2's complement linear PCM samples as their input. How should this 13-bit GSM codec input be derived from A-law or mu-law code octets? GSM specs refer to ITU's G.726 spec for ADPCM - it just so happens that inside the ADPCM algorithm of G.726 (a totally unrelated codec of no relevance to GSM codec work outside of this reference) there is a pair of functions for expanding A-law and mu-law to linear PCM and compressing linear PCM back to A-law or mu-law. * Following this obscure G.726 reference, we eventually conclude that in the case of A-law, GSM specs call for the obvious treatment: take the "natural" output from the canonical A-law decoder, represent it in 2's complement form, the result is 13 bits wide, and just feed that 13-bit 2's complement form to the input of GSM speech encoders. However, in the case of mu-law the "natural" G.711 decoder output is one sign bit plus 13 bits of magnitude, requiring 14 bits in 2's complement representation - and none of the specs I could find says anything about exactly how this 14-bit input should be reduced to 13 bits for feeding to GSM speech encoders. Canonical C implementations of all GSM speech encoders take their input in 16-bit words and clear the 3 least significant bits as their first step; if the 14-bit mu-law decoder output is represented in 16-bit words by padding 2 zero bits on the right and this output is then fed to GSM speech encoder functions, the end effect is that the least-significant bit of the 14-bit decoder output is simply cut off. This form of mu-law-to-GSM transcoder implementation is consistent with TESTx-U.INP and TESTx-U.COD sequences provided in the GSM 06.54 package for EFR. Based on the above considerations, we have our answer for how we should convert from G.711 to 16-bit 2's complement linear PCM: * For A-law, we emit the "natural" output in 13-bit 2's complement form and append 3 zero bits on the right; this transformation is fully lossless. * For mu-law, we emit the "natural" output in 14-bit 2's complement form and append 2 zero bits on the right. This transformation is almost lossless, with just one exception: the "pure" decoder's -0 output (resulting from PCMU octet 0x7F) is squashed to "plain 0", and will be re-emitted as PCMU octet 0xFF rather than 0x7F on subsequent re-encoding to G.711 PCMU. For anyone needing a G.711 to 16-bit linear PCM decoder, the present package provides ready-made decoding tables (following the above rules) in dev/a2s-regen.out and dev/u2s-regen.out, generated by dev/a2s-regen.c and dev/u2s-regen.c programs. Now for the opposite problem: what is the most correct way to compress 16-bit 2's complement linear PCM to A-law or mu-law? In this direction the official specs leave even more ambiguity than in the G.711 decoding direction: * The G.711 spec itself says: "The conversion to A-law or mu-law values from uniform PCM values corresponding to the decision values, is left to the individual equipment specification." The specific implementation used in the guts of G.726 ADPCM codec is referred to only as a non-normative example. * GSM specs likewise refer to this G.726 section 4.2.8 (for compression of 13-bit speech decoder output to G.711) with language that suggests a non-normative example. After painstakingly comparing the C implementation of G.726 in the ITU-T G.191 STL against the language of G.726 spec itself and convincing myself that they really do match, and then painstakingly comparing this approach against the one implemented in the same G.191 STL for G.711 in alaw_compress() and ulaw_compress() and against the table lookup method implemented in libgsm/toast (my first reference, before I went down the rabbit hole of tracking down official specs), I reached the following conclusions: * For A-law encoding all 3 parties (G.191 STL alaw_compress() function, G.726 "compress" block and toast_alaw.c) agree on the same mapping. In this mapping only the most significant 12 bits of the 2's complement input word (equivalent to one sign bit and 11 bits of magnitude) are relevant, leading to the following two interesting properties: - the least-significant bit of GSM speech decoder output is always discarded when converting to A-law; - conversion can be easily implemented with a 4096-byte look-up table based on the upper 12 bits of input, exactly as was done in toast_alaw.c in the venerable libgsm source. * Mu-law encoding is the real hair-raiser: if the input to the to-be-implemented encoder has 14 or more bits (including the most practical problem of 16-bit 2's complement input), there are no less than 3 different ways to implement this encoder!