Code-excited linear prediction

Code-excited linear prediction (CELP) is a speech coding algorithm originally proposed by M. R. Schroeder and B. S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders (e.g., FS-1015). Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

Introduction

The CELP algorithm is based on four main ideas:

Using the source-filter model of speech production through linear prediction (LP) (see the textbook "speech coding algorithm");
Using an adaptive and a fixed codebook as the input (excitation) of the LP model;
Performing a search in closed-loop in a “perceptually weighted domain”.
Applying vector quantization (VQ)

The original algorithm as simulated in 1983 by Schroeder and Atal required 150 seconds to encode 1 second of speech when run on a Cray-1 supercomputer. Since then, more efficient ways of implementing the codebooks and improvements in computing capabilities have made it possible to run the algorithm in embedded devices, such as mobile phones.

CELP decoder

Figure 1: CELP decoder

Before exploring the complex encoding process of CELP we introduce the decoder here. Figure 1 describes a generic CELP decoder. The excitation is produced by summing the contributions from an adaptive (aka pitch) codebook and a stochastic (aka innovation or fixed) codebook:

e[n]=e_{a}[n]+e_{f}[n]\,

where $e_{{a}}[n]$ is the adaptive (pitch) codebook contribution and $e_{{f}}[n]$ is the stochastic (innovation or fixed) codebook contribution. The fixed codebook is a vector quantization dictionary that is (implicitly or explicitly) hard-coded into the codec. This codebook can be algebraic (ACELP) or be stored explicitly (e.g. Speex). The entries in the adaptive codebook consist of delayed versions of the excitation. This makes it possible to efficiently code periodic signals, such as voiced sounds.

The filter that shapes the excitation has an all-pole model of the form $1/A(z)$ , where $A(z)$ is called the prediction filter and is obtained using linear prediction (Levinson–Durbin algorithm). An all-pole filter is used because it is a good representation of the human vocal tract and because it is easy to compute.

CELP encoder

The main principle behind CELP is called Analysis-by-Synthesis (AbS) and means that the encoding (analysis) is performed by perceptually optimizing the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the “best sounding” selection criterion implies a human listener.

In order to achieve real-time encoding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual weighting function. Typically, the encoding is performed in the following order:

Linear Prediction Coefficients (LPC) are computed and quantized, usually as LSPs
The adaptive (pitch) codebook is searched and its contribution removed
The fixed (innovation) codebook is searched

Noise weighting

Most (if not all) modern audio codecs attempt to shape the coding noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error, CELP minimizes the error for the perceptually weighted domain. The weighting filter W(z) is typically derived from the LPC filter by the use of bandwidth expansion:

W(z)={\frac {A(z/\gamma _{1})}{A(z/\gamma _{2})}}

where $\gamma _{1}>\gamma _{2}$ .

External links

This article is based on a paper presented at Linux.Conf.Au
Some parts based on the Speex codec manual
reference implementations of CELP 1016A (CELP 3.2a) and LPC 10e.
Linear Predictive Coding (LPC)

Selected readings

References

B.S. Atal, "The History of Linear Prediction," IEEE Signal Processing Magazine, vol. 23, no. 2, March 2006, pp. 154–161.
M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985.

Data compression methods

Lossless

Entropy type	Unary Arithmetic Asymmetric Numeral Systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Universal Exp-Golomb Fibonacci Gamma Levenshtein

Dictionary type	Byte pair encoding DEFLATE Snappy Lempel–Ziv LZ77 / LZ78 (LZ1 / LZ2) LZJB LZMA LZO LZRW LZS LZSS LZW LZWL LZX LZ4 Brotli Statistical

Other types	BWT CTW Delta DMC MTF PAQ PPM RLE

Audio

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Sound quality Speech coding Sub-band coding

Codec parts	A-law μ-law ACELP ADPCM CELP DPCM Fourier transform LPC LAR LSP MDCT Psychoacoustic model WLPC

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image

Methods	Chain code DCT EZW Fractal KLT LP RLE SPIHT Wavelet

Video

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality

Codec parts	Lapped transform DCT Deblocking filter Motion compensation

Theory

Compression formats
Compression software (codecs)

Multimedia compression and container formats

Video
compression

ISO/IEC	MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2/ASP Part 10/AVC MPEG-H Part 2/HEVC

ITU-T	H.120 H.261 H.262 H.263 H.264 H.265

SMPTE	VC-1 VC-2 VC-3 VC-5

Others	Apple Video AV1 AVS Bink Cinepak Daala Dirac DV DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 ProRes 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video, Spark Theora Thor VP3 VP6 VP7 VP8 VP9 WMV XEB YULS

Audio
compression

ISO/IEC	MPEG-1 Layer III (MP3) MPEG-1 Layer II Multichannel MPEG-1 Layer I AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio

ITU-T	G.711 (A-law, µ-law) G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1

IETF	Opus iLBC

3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B GSM-HR GSM-FR GSM-EFR

Others	ACELP AC-3 ALAC Asao ATRAC CELT Codec2 DRA DTS FLAC iSAC Monkey's Audio TTA True Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV Speex SVOPC TwinVQ VMR-WB Vorbis VSELP WavPack WMA MQA aptX

Image
compression

IEC, ISO, ITU-T, W3C, IETF	CCITT Group 4 GIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG XR Lossless JPEG PNG TIFF TIFF/EP TIFF/IT

Others	APNG BPG DjVu EXR FLIF ICER MNG PGF QTVR WBMP WebP

Containers

ISO/IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport

ITU-T	H.222.0 T.802

IETF	RTP

Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video GXF IFF M2TS Matroska WebM MXF Ogg QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

See Compression methods for methods and Compression software for codecs

This article is issued from Wikipedia - version of the 11/30/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.