Analysis of Iteration Control for Turbo Decoders in Turbo Synchronization Applications

Wireless data transmission results in frequency and phase offsets of the signal in the receiver. In addition, the received symbols are corrupted by noise. Therefore, syn- chronization and channel coding are vital parts of each re- ceiver in digital communication systems. By combining the phase and frequency synchronization with an advanced iter- ative channel decoder (inner loop) e.g. turbo codes in an it- erative way (outer loop), the communications performance can be further increased. This principle is referred to as turbo synchronization. The energy consumption and the peak throughput of the system depend on the number of iterations for both loops. An advanced iteration control can decrease the mean number of needed iterations by detecting correctly decoded blocks. This leads to a dramatic energy saving or to an increase of throughput. In this paper we present a new stopping criterion for decodable blocks for turbo decoding in interrelation with turbo synchronization. Furthermore the implementation complexity of the turbo decoder is shown on a Xilinx FPGA.


Introduction
Synchronization and channel decoding are essential parts of every digital receiver for wireless communication.Task of the synchronization is to present data bits to the channel decoder, where the negative influences of timing, frequency and phase offset are eliminated.A well known advanced scheme for channel coding are turbo codes (Berrou et al., 1993).Turbo code decoding is done in an iterative algorithm based on the maximum a posteriori principle.The turbo decoder produces extrinsic information for the systematic bits, which are used in the next iteration.Communications perfor-Correspondence to: T. Lehnigk-Emden (lehnigk@eit.uni-kl.de)mance, throughput and power consumption of a turbo code decoder depend on the number of iterations.Thus it is desirable to find criteria for the iteration control without negative impact on the communications performance.
Synchronization is typically performed only once before channel decoding.The method of joint iterative turbo code decoding and synchronization is called turbo synchronization (Godtmann et al., 2006).Turbo synchronization allows to decrease the number of pilot symbols for synchronization purposes and thus increases the user data rate for a given bandwidth.In turbo synchronization the turbo decoder must additionally calculate new a posteriori information on the parity bits for the next synchronization refinement.The essential calculation of the extrinsic information of the parity bits in turbo synchronization provides the possibility to use this additional information for iteration control.State-of-the art iteration controls for turbo decoder cannot check for a valid codeword without adding additional redundancy bits.Therefor they can stop the decoder for invalid codewords too.Our novel iteration control is based on a check, whether the results of the actual turbo decoding iteration represent a valid codeword without extra redundancy bits.
In this paper we apply turbo synchronization for the iterative synchronization of phase and frequency offset as well as iteration control.In Sect. 2 we summarize turbo encoding as well as the the principle of turbo code decoding.In the following Sect.3 we explain the iterative synchronization and the interrelation with the decoding algorithm.In Sect. 4 we categorize the state-of-the-art methods for iteration control and present our new iteration control method.In Sect. 5 we analyse the communications performance of turbo synchronization.We thoroughly compare our advanced iteration control to state-of-the-art taking communications performance and hardware performance into account.Communications performance is determined by bit true models.Implementation complexities and results are given based on Xilinx devices.

Turbo codes
With the introduction of binary turbo codes by Berrou in 1993(Berrou et al., 1993) near optimum error correction became possible.Due to these error correction capabilities, binary and duo-binary turbo codes allow for low frame error rates (FER) at a low signal-to-noise ratio (SNR), outperforming the widely used convolutional codes.Because of this advantage turbo codes are now part of a large number of communication standards.
Turbo codes in general consist of a serial or parallel concatenation of two codes, so called component codes, and an interleaver.While the first component code encodes the information in the original order, the second one gets the information in a permuted order, see Fig. 1a).In all standards convolutional codes are used as component codes.
Decoding of turbo codes is an iterative process where probabilistic information is exchanged between component decoders (Berrou, 2003).Iterative decoding implies a big challenge with respect to low latency and high throughput decoders.
A possible realization of a decoder of turbo codes is given in Fig. 1b).The two component decoders that decode the two component codes are connected via interleaver and deinterleaver.They use log likelihood ratios (LLR) of the systematic and parity information to compute the extrinsic information e1 d k and e2 d k on the information bits.The iterative exchange of e1 d k and e2 d k between these component decoders is referred to as turbo principle.One (full) iteration is done if decoder one and decoder two have run once.If only one decoder has calculated new information, we call this one half iteration.
Both component decoders perform a maximum a posteriori probability (MAP) decoding.However the subopti-mal Max-Log MAP algorithm with extrinsic scaling factor (ESF) is more suitable for implementations.In comparison to the optimal algorithm the Max-Log MAP results in a performance loss below 0.2 dB (Robertson et al., 1995(Robertson et al., , 1997)).Moreover it was shown in Worm et al. (2000) that, when employed in turbo decoding, one does not require knowledge of the SNR.
The Max-Log MAP algorithm consists of a forward and a backward recursion.It computes for each possible information or parity bit d k an a posteriori probability (APP) LLR s , p1 , p2 .

Turbo synchronization
The synchronization consists of the estimation of the unknown parameters of timing, frequency and phase offset, and the elimination of all possible negative influences introduced by these parameters.We focus on the frequency and phase synchronization of bursts with linear modulation (e.g.QPSK,16-QAM) in conjunction with turbo decoding.We assume, that the steps of gain control, timing and burst detection are properly carried out before.The received sample sequence r is given in the complex baseband according to Eq. ( 1): The sample sequence r with L elements is based on modulation symbols s(l) with one sample per symbol and symbol duration T , and is disturbed by a noise sequence n.The frequency offset f o and phase offset have to be estimated and corrected.They are considered fixed during an estimation interval.The synchronization is done in two main steps.Initially, a coarse synchronization is carried out.Afterwards, fine synchronization is done iteratively with the additional use of tentative decoder decisions after each decoder iteration.The methods for the coarse synchronization step depend on the existing communications system.The choice of method is influenced by the maximum frequency offset, the number of symbols per burst as well as the number and placement of known symbols.We concentrate on the step of fine synchronization, which will be used for turbo synchronization.With the assumption, that all symbols of the burst are known the effect of the modulation by s(l) can be removed by: A lot of algorithms exist to estimate frequency and phase offset for an unmodulated carrier as described in Eq. ( 2) (Meyr et al., 1998;Mengali and D'Andrea, 1997).However it must be considered, that usually the symbols of the burst are unknown or only some symbols, used for supporting the burst detection or supporting the coarse synchronization are known.Thus we replace the transmitted symbol sequence s by an estimated symbol sequence s e .The estimation of the transmitted symbol sequence is provided by the turbo decoder.Furthermore, the fine synchronization runs in parallel to the decoder iterations to avoid throughput degradation by turbo synchronization; thus a fast and simple algorithm is required.The fine estimation of frequency and phase offset is based on the average phase of the front part and rear part of the burst with a modulation removal by the estimated symbol sequence s e .This is formally given by With the two phase values of Eq. ( 3) the estimate of the frequency offset can be calculated with The estimate of the phase offset is calculated with the help of Eq. ( 4) The first decoder iteration is based on the LLR values λ s d k , λ p 1 d k and λ p 2 d k calculated with the symbols of the coarse synchronized received sequence.For the iterative fine synchronization an estimate of the transmitted symbols is used, which is produced by the turbo code decoder after each iteration.The estimate of the transmitted symbols is gathered by the APP LLR of the decoder.A turbo code decoder computes APP LLR values s of the systematic bits by default.In turbo synchronization applications the decoder must additionally calculate the APP LLR values p1,2 for the parity bits.
To reduce the effect of using some erroneous reference symbols in the fine synchronization, soft values are used for the reference symbols.The values for the quadrature components of the estimated symbols are calculated by a tanh operation on the APP LLR.With the tentative soft values of systematic bits and parity bits the sequence s e is generated and the described fine synchronization process can be carried out.
The received sequence r is corrected with the new estimates of frequency and phase offset.A new synchronized received sequence r is calculated after each full decoder iteration.Turbo decoding and fine synchronization run in parallel in our architecture.For the n-th iteration of the decoder the LLR values ¯ s , ¯ p1 , and ¯ p2 of the transmitted bits are calculated on base of the fine synchronized sequence, which used the APP LLR values of the (n-2)-th iteration.A discussion of the effects of the schedule regarding the update of LLR values can be found in (Alles et al., 2007).

Iterations control
Latency and energy consumption strongly depend on the number of iterations in the decoding process.Thus, reducing the number of iterations in the decoding process is an urgent issue.Instead of using a fixed number of iterations, the number of iterations has to be controlled by intelligent iteration control mechanisms.An efficient iteration control allows to detect correctly decoded blocks while decoding.Thus, a good iteration control is key to optimize energy consumption and reduce the mean latency.In Thul et al. (2002) it was shown that iteration control is the most efficient technique for energy saving in a turbo decoder system without sacrificing communications performance.

State-of-the-art
Cyclic Redundancy Check (CRC) rule.A separate errordetection code, such as a CRC, can be concatenated as an outer code with an inner turbo code in order to flag erroneously decoded sequences.The decoding process is stopped after iteration i whenever the syndrome of the CRC is zero.The false alarm rate in sense of recognizing a erroneous block as correct is a delta to zero.One disadvantage of the CRC is an additional code rate loss.However, in many systems like UMTS LTE the CRC code is available.
Hard decision rules.The idea behind these algorithms presented in Wu et al. (2000) and Shao et al. (1999) is to follow the changes of the hard decisions of either the extrinsic information, the APP information or the decoded bits within one or two iterations.The disadvantage of these rules are that they are not able to decide if the decoded codeword is a valid codeword.
Soft decision rules.Other approaches observe the soft LLR or APP values in the MAP decoders.With these values, several metrics can be calculated, e.g. the mean LLR value.For more details the reader is referred to Wu et al. (2000); Hagenauer et al. (1996) and Matache et al. (2000).All soft decision rules have in common, that thresholds have to be defined for aborting the decoding process.In addition, there is no guarantee that the decision is correct.The choice of the thresholds is always a trade-off between false alarm rate and detection rate of correctly decoded blocks.

Re-encoding approach
Our approach is to check whether the actually decoded information represent a valid codeword or not.One possibility to realize this, is to re-encode the hard decoded information bits and to compare the parity bits from the decoder and encoder.If a valid codeword is detected the decoder can be stopped.
The decoder calculates LLR values of the information and parity bits.The hard decision information bits are given to a turbo encoder which is identical the the encoder at the transmitter side, for re-encoding.The encoder produces new parity bits which are compared to the hard decision parity bits calculated by the two MAP decoders.If there is no difference the decoder has found a valid codeword.Further iterations make no sense.The decoder can be stopped immediately, see Fig. 2.
For implementation the check can be done in parallel to the MAP decoding which leads to no throughput degradation of the MAP decoder.In parallel to the MAP processing the encoder encodes the systematic information and compares the parity of the current MAP decoder.A valid codeword is detected if the systematic bits do not change within two successive half iterations and if the two parity bit sets do not differ.In this case the decoder and synchronization loops can be stopped and the system is ready for a new codeword.

Results
The new approach for iteration control is validated per software simulations in sense of communications performance.In addition the implementation complexity for the turbo synchronization with iteration control on a Xilinx FPGA is presented and compared to the non iteration control solution.

Communications performance
The new iteration control detects correctly decoded blocks and stops the decoder.This approach has no impact on the communications performance, which is shown by simulations in Fig. 3.The figure shows the results for a block length of 1056 and 1136 information bits for the code rates 0.443 and 0.8, respectively.For the lower code rate code 16-QAM gray modulation is used for the higher rate QPSK respectively.Down to a frame error rate (FER) of 10 −3 the iteration control has no effect on the communications performance, the curves are congruent.In addition, the turbo synchronization gains up to 0.25 dB compared to the initial synchronization only.An optimal stopping criterion for decodable blocks, we call this "magic-genie", can stop the decoder at once if the decoded information bits are correct.A comparison of the needed iterations between the "magic-genie" criterion and the re-encoding approach is shown in Fig. 4 for the code with rate 0.443 and in Fig. 5 for the code with the rate 0.8, respectiveley.
The re-encoding approach is close to the magic-genie criterion.The maximum difference is one half iteration.For lower FER the criterion can save up to 80% of the iterations.That means the energy consumption of the whole system can be decreased dramatically or the peak throughput can be increased.In addition it can be seen, that 50% of iterations can be saved at a frame error rate below 10 −1 for high rate code.The turbo decoder with turbo synchronization operates at a lower SNR for the same FER than without turbo synchronization, see Fig. 3. Therefore it needs up to two more half iterations in average.In addition the new criterion can be used as quality indicator of the channel.A statistic over several blocks can be seen as a rough channel estimation.

Hardware implementation
The implementation complexity is analysed on a Xilinx Virtex-5 FPGA.Three versions have been implemented.First the turbo decoder without the extra parity LLR calculation unit which is needed for the turbo synchronization.The second one with the parity LLR calculation unit.The third version extends the second one with the new iteration control.The architecture of the turbo decoder is a state-of-theart SMAP architecture with three recursion units which run in parallel.The key parameters are summarized in Table 1.
For more details the reader is referred to (May et al., 2007).
Table 2 shows a comparison of implementation complexity of the three cases.The extra parity calculation and the encoder for the iteration control utilize additional 57 slices, i.e. 4%.That means that the hardware overhead is negligible in this case.

Conclusions
In this paper we presented a new iteration control for turbo decoding in combination with turbo synchronization for turbo decoders.With turbo synchronization a communications performance gain of 0.25 dB for the low and high rate code can be achieved.The stopping criterion is based on a re-encoding approach to detect a valid codeword during the decoding process.Stopping the decoding loop with this criterion saves up to 80% of iterations and energy without losing any communications performance.The hardware overhead for the stopping criterion is below 5% of the whole turbo decoder.

Table 1 .
Key parameter of the turbo decoder.