view doc/PCM8-conversions @ 408:8847c1740e78

libtwamr: integrate VAD1
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 07 May 2024 00:56:10 +0000
parents e4a4bf11f37c
children
line wrap: on
line source

What is the authoritatively correct, officially endorsed bidirectional mapping
between G.711 A-law and mu-law encodings on one side and 16-bit 2's complement
linear PCM on the other side?  Surprisingly, there is no official answer to this
problem anywhere in the specs!  Instead the specs provide the following partial
answers:

* The G.711 spec itself provides one mapping from A-law code octets to linear
  numeric values in range [-4032,4032] and another mapping from mu-law code
  octets to linear numeric values in range [-8031,8031].  The output from each
  of these mapping is given in "pure mathematical" form, without specifying any
  bit-level encoding, and furthermore, mu-law decoder output in its pure
  "conceptual" form has both +0 and -0 values.  (The same signed zero problem
  does not occur in A-law because it's a mid-riser code rather than mid-tread,
  and thus has no quantized values equal to 0.)

* If one takes the "pure mathematical" output from the spec-prescribed G.711
  decoder and represents it in 2's complement form, squashing +0 and -0 outputs
  from the canonical mu-law decoder into "plain 0" at this step, the result is
  a 13 bits wide 2's complement value for A-law decoding and a 14 bits wide 2's
  complement value for mu-law.

* All GSM speech encoders take 13-bit 2's complement linear PCM samples as their
  input.  How should this 13-bit GSM codec input be derived from A-law or mu-law
  code octets?  GSM specs refer to ITU's G.726 spec for ADPCM - it just so
  happens that inside the ADPCM algorithm of G.726 (a totally unrelated codec of
  no relevance to GSM codec work outside of this reference) there is a pair of
  functions for expanding A-law and mu-law to linear PCM and compressing linear
  PCM back to A-law or mu-law.

* Following this obscure G.726 reference, we eventually conclude that in the
  case of A-law, GSM specs call for the obvious treatment: take the "natural"
  output from the canonical A-law decoder, represent it in 2's complement form,
  the result is 13 bits wide, and just feed that 13-bit 2's complement form to
  the input of GSM speech encoders.  However, in the case of mu-law the
  "natural" G.711 decoder output is one sign bit plus 13 bits of magnitude,
  requiring 14 bits in 2's complement representation - and none of the specs I
  could find says anything about exactly how this 14-bit input should be reduced
  to 13 bits for feeding to GSM speech encoders.  Canonical C implementations
  of all GSM speech encoders take their input in 16-bit words and clear the 3
  least significant bits as their first step; if the 14-bit mu-law decoder
  output is represented in 16-bit words by padding 2 zero bits on the right and
  this output is then fed to GSM speech encoder functions, the end effect is
  that the least-significant bit of the 14-bit decoder output is simply cut off.
  This form of mu-law-to-GSM transcoder implementation is consistent with
  TESTx-U.INP and TESTx-U.COD sequences provided in the GSM 06.54 package for
  EFR.

Based on the above considerations, we have our answer for how we should convert
from G.711 to 16-bit 2's complement linear PCM:

* For A-law, we emit the "natural" output in 13-bit 2's complement form and
  append 3 zero bits on the right; this transformation is fully lossless.

* For mu-law, we emit the "natural" output in 14-bit 2's complement form and
  append 2 zero bits on the right.  This transformation is almost lossless,
  with just one exception: the "pure" decoder's -0 output (resulting from PCMU
  octet 0x7F) is squashed to "plain 0", and will be re-emitted as PCMU octet
  0xFF rather than 0x7F on subsequent re-encoding to G.711 PCMU.

For anyone needing a G.711 to 16-bit linear PCM decoder, the present package
provides ready-made decoding tables (following the above rules) in
dev/a2s-regen.out and dev/u2s-regen.out, generated by dev/a2s-regen.c and
dev/u2s-regen.c programs.

Now for the opposite problem: what is the most correct way to compress 16-bit
2's complement linear PCM to A-law or mu-law?  In this direction the official
specs leave even more ambiguity than in the G.711 decoding direction:

* The G.711 spec itself says: "The conversion to A-law or mu-law values from
  uniform PCM values corresponding to the decision values, is left to the
  individual equipment specification."  The specific implementation used in the
  guts of G.726 ADPCM codec is referred to only as a non-normative example.

* GSM specs likewise refer to this G.726 section 4.2.8 (for compression of
  13-bit speech decoder output to G.711) with language that suggests a
  non-normative example.

After painstakingly comparing the C implementation of G.726 in the ITU-T G.191
STL against the language of G.726 spec itself and convincing myself that they
really do match, and then painstakingly comparing this approach against the one
implemented in the same G.191 STL for G.711 in alaw_compress() and
ulaw_compress() and against the table lookup method implemented in libgsm/toast
(my first reference, before I went down the rabbit hole of tracking down
official specs), I reached the following conclusions:

* For A-law encoding all 3 parties (G.191 STL alaw_compress() function, G.726
  "compress" block and toast_alaw.c) agree on the same mapping.  In this
  mapping only the most significant 12 bits of the 2's complement input word
  (equivalent to one sign bit and 11 bits of magnitude) are relevant, leading
  to the following two interesting properties:

  - the least-significant bit of GSM speech decoder output is always discarded
    when converting to A-law;

  - conversion can be easily implemented with a 4096-byte look-up table based
    on the upper 12 bits of input, exactly as was done in toast_alaw.c in the
    venerable libgsm source.

* Mu-law encoding is the real hair-raiser: if the input to the to-be-implemented
  encoder has 14 or more bits (including the most practical problem of 16-bit
  2's complement input), there are no less than 3 different ways to implement
  this encoder!

Let us now look at the 3 different ways of encoding a 14-bit or 16-bit 2's
complement linear PCM input to G.711 mu-law.  In this analysis we shall use
14-bit notation, with 2's complement inputs contained in the domain
[-8192,8191].  The difference between the 3 identified ways of mapping from this
domain to mu-law have to do with boundaries between quantization intervals.
Tables 2a and 2b in the G.711 spec list all defined quantization intervals and
all decision values that mark boundaries between them; here is a digested form
of the beginning of the canonical quantization table for either side of zero:

Quantization interval	Quantized value
(range of magnitudes)	(absolute)
---------------------------------------
0-1			0
1-3			2
3-5			4
...
29-31			30
31-35			33
35-39			37
...
91-95			93
95-103			99
...

This canonical quantization table is defined in terms of absolute values, and
is therefore fully symmetric around zero.  A careful look at the above table
raises a question: which quantization interval (and thus which PCMU octet
output) should be selected if the input value to the encoder has a magnitude
exactly equal to one of the threshold points, or decision values as they are
officially called?  In other words, what should the encoder do if the magnitude
of the 14-bit input value equals 1, 3, 5, ..., 31, 35, 39 etc?  The answer to
this question is where the 3 candidate mappings under our consideration differ:

* The "compress" function of G.726 operating in mu-law mode selects the higher
  (in absolute value) quantization interval at every decision value threshold,
  on both sides of zero: see Table 15/G.726.  PCMU octet 0x7F (meaning -0) will
  never be emitted by this version, and an input sequence of -3, -2, -1, 0, 1,
  2, 3 will map to quantized values -4, -2, -2, 0, 2, 2, 4.

* The ulaw_compress() function in G.191 STL behaves like the G.726 version for
  positive values, but selects the smaller-absolute-value quantization interval
  for negative inputs.  Given the same input sequence as above, the output will
  correspond to quantized values -2, -2, -0, 0, 2, 2, 4.  (Quantized value -0
  is PCMU octet 0x7F.)

* The s2u[] table in toast_ulaw.c in libgsm source is flat-out wrong and should
  not be used or considered further (and because those authors did not include
  the source for whatever program they used to generate their broken s2u[] and
  u2s[] tables, we have no way to really analyze them), but one CAN construct a
  new table for the same function, using the upper 13 bits of 16-bit 2's
  complement input to generate PCMU output - see our dev/s2u-regen.c program
  and its output table in dev/s2u-regen.out.  The resulting mapping is
  "mirrored" around zero compared to G.191 STL ulaw_compress(): for the same
  input sequence as in the previous two examples, the output will correspond to
  quantized values -4, -2, -2, 0, 0, 2, 2.  Just like the G.726 version, this
  look-up table version will never emit PCMU octet 0x7F for -0.

It is important to note that all GSM speech decoders produce 2's complement
outputs that are only 13 bits wide, not 14 - therefore, when the input to the
G.711 encoder comes from the output of a GSM speech decoder, the difference
between all 3 alternatives listed above is masked, with all 3 producing
identical output.

For production software, our (Themyscira) recommendation is to use look-up
tables (dev/s2a-regen.out and dev/s2u-regen.out) for both A-law and mu-law
encoding, using the upper 12 bits from 16-bit 2's complement input for A-law
encoding and the upper 13 bits for mu-law encoding.  For mu-law encoding the
resulting mapping is different from what G.191 STL ulaw_compress() function
produces, and many will consider that function to be canon - but our approach
exhibits the same key properties, just mirrored around zero, and has the
advantage of needing only the upper 13 bits.

Command line utilities
======================

As usual, the present Themyscira GSM codec libraries & utilities package
provides command line utilities for working with the subject of this article:
conversions between 16-bit linear PCM (the format read and written by other
tools in the present suite) and 8-bit PCM in G.711 A-law or mu-law.  The
following utilities are provided:

pcm16-to-alaw	These two utilities read 16-bit linear PCM in raw format (BE
pcm16-to-ulaw	byte order by default, or LE with -l option) and convert the
		recording into one byte per sample G.711 format, with each
		program emitting its respective encoding law.  pcm16-to-alaw
		has only one mapping, but pcm16-to-ulaw supports two possible
		mappings: by default it applies the mapping of G.191 STL
		ulaw_compress(), or if use specify -t option it applies the
		same mapping that would be produced by our recommended 13-bit
		look-up table method.

pcm8-to-pcm16	This utility reads a G.711 8-bit PCM recording (alaw or ulaw
		selected with a mandatory command line argument) from a "raw"
		G.711 file and converts it to 16-bit linear PCM.  The output
		byte order is BE by default, or can be changed to LE with an
		extra command line qualifier.