Advances in Radio Science Forward and backward RLS-DDCE processing in MIMO-OFDM spatial multiplexing receivers

In this paper we present a novel approach in frequency domain channel estimation technique. Our proposal is based on the recursive least squares (RLS) algorithm combined with the decision making process called decision directed channel estimation (RLS-DDCE). The novelty and key concept of this technique is the block-wise causal and anti-causal RLS processing that yields two independent processing of RLS along with the associated decisions. Due to the implemented low density parity check (LDPC) code the receiver operates with soft information, which enables us to introduce a new modification of the Turbo principle as well as simple addition of the a-posteriori log-likelihood ratios (LLRs). Although the computational complexity is increased by both of our approaches, the latter is relatively less complex than the earlier. Simulation results show that these implementations outperform the simple RLS-DDCE algorithm and yield lower bit error rates (BER) and more accurate channel information.


Introduction
A widespread modulation technique used in today's communication systems is orthogonal frequency-division multiplexing (OFDM), which combines high spectral efficiency, robustness against inter-symbol interference and an easy implementation using the fast Fourier transform (FFT).Combining the OFDM system with a multiple-input-multipleoutput (MIMO) system, a MIMO-OFDM system is created, which results in a higher spectral efficiency and link reliability (Bölcskei et al., 2002;Uysal et al., 2001).
Although the development of OFDM is already very advanced, there still exists enough research potential in MIMO-Correspondence to: P. Beinschob (patric.beinschob@hsu-hamburg.de) OFDM.Today MIMO systems can be found in wireless local area network access points, WiMAX or some 3GPP specifications.
Especially under bad transmission conditions with small signal-to-noise ratios (SNR) or high mobility, there are broad possibilities to improve the performance of MIMO-OFDM systems.High mobility involves a highly time-variant channel, which causes the spectral efficiency to decrease.A solution to countervail this degradation is the enhancement of detailed knowledge of the channel state information (CSI).
An exceedingly important application area for improvement is in the domain of vehicle to vehicle (V2V) and vehicle to infrastructure (V2I) communication, where high relative velocities create a time-variant channel combined with bad SNR scenarios due to obstacle obstructed communication channels.
Receiver designs for MIMO-OFDM which make acceptable use of diversities are rare.There are few researches focusing on iterative receiver architechture (Akhtman and Hanzo, 2007b;Zhang et al., 2006;Liu et al., 2003;Sun et al., 2004), which exploit the Turbo principle with its iterative decoding structure.Even though they result in higher computational complexity these receivers seem to be promising in relation to BER performance.
Since the LDPC codes possess similar performance when compared to Turbo Codes, they are implemented in MIMO-OFDM systems as well (Lu et al., 2004).Also the combination of the Turbo Principle and LDPC codes is under research (Salari et al., 2007).
In this paper we propose a novel Turbo processing based on the LDPC codes, where the information gain is obtained from a causal RLS-DDCE processing and an independent anti-causal processing.These two possible strategies are presented here.Simple information combined by summing up of a-posteriori information and Turbo processing by exchange of extrinsic information between the forward RLS-DDCE process and the backward fork.
Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V.The rest of the paper is organized as follows.The underlying system model and structure are presented in Sect.2, followed by the description of the RLS-DDCE algorithm in Sect.3. Our novel approach with detailed information about the modified Turbo principle and summation of a-posteriori LLRs is presented in Sect.4, respectively in Sects.4.1 and 4.2.The paper is concluded by illustrating our simulation results in Sect. 5 and a conclusion in Sect.6.

System model and structure
The vector of received values r at the time sample m of a MIMO system is the superposition of L • n T previously sent samples and the current n T samples, where L+1 is the length of the sampled channel impulse response and n T is the number of transmit antennas.It is given by where s[m] denotes the current vector of symbols of each of the transmit antenna, w is an identically, independently distributed (iid) additive white Gaussian noise term and h[l,m] is the MIMO channel matrix in delay and time domain, indexed with l respectively m.The past sent samples are denoted by s[m − l], for l = 0,l ≤ L. For simulations the data symbols of the K subcarriers are modulated by an inverse fast Fourier transform (IFFT).In simulations every value corresponding to a transmit antenna of the resulting vectors is transmitted using the formula above.
In frequency domain the system model in Eq. (1) can be described as where n denotes the time index of an OFDM symbol and k its subcarrier index., 2008).
An overview of the implemented general system structure can be seen in Fig. 1.At first the bits for one OFDM transmission frame, arranged in vectors u v , v = 1,...,n c , with n c being the number of codewords, are LDPC (MacKay, 1999) encoded and implemented according to Richardson et al. (2001), using the coding matrix G (3) The vectors u v,c contain the regular information bits u h and the parity bits u f , h = 1,...,n s , f = 1,...,n p , where n s denotes the information length and n p denotes the number of parity bits.Following the transmitter scheme the systematic and parity bits are then interleaved and quadrature amplitude modulated as shown by π respectively M in Fig. 1.At this stage the generated symbols are serial to parallel converted and MIMO encoded, which is performed by spatial multiplexing by multiplying with the unity matrix in the underlying system, resulting in vectors s[n,k], which are symbol wise fed into the IFFT, one for each transmit antenna.
In the receiver, the superposed received signals are transferred back into the frequency domain with the help of a FFT, resulting in the vectors r[n,k] of Eq. (2).Subsequently the RLS-DDCE algorithm is performed on each of the OFDM symbols.The demodulator produces soft information in form of a-priori LLRs, L(u h ), which are defined as the logarithm of the ratio of the probabilities of a bit u h being 0 or 1 .These LLRs are LDPC decoded, producing the estimated aposteriori LLRs at the output of the LDPC decoder: where y is the sequence of received bits for one codeword.
Together with the estimated a-posteriori LLRs the LDPC decoder produces the parity check sum (PCS): where γ is the syndrome vector and A the parity check matrix.The PCS is the Hamming distance of the syndrome vector from the null vector.In case of a codeword with PCS of zero, the codeword is assumed to be correct and to produce the transmitted symbols, the a-posteriori LLRs are decided on a hard basis, followed by the encoding, interleaving and modulating of the decided bits.When the PCS is not zero, the soft information of the aposteriori LLRs is used to transform the received symbols into the transmitted symbols.The symbols gained by either of the strategies are then used to calculate an estimate of the channel transfer function vectors H[n,k] via the RLS algorithm, which is further explained in Sect.3.
The BER is calculated on the basis of the equalized received symbols.These symbols are therefore decoded,deinterleaved and decided on a hard basis.
Figure 2 shows the structure of our novel approach to the channel estimation and is explained in Sect. 4.

RLS-DDCE
The RLS algorithm as described in Akhtman and Hanzo (2007a) is suitable for tracking a communication channel as it computes an estimate of the current channel matrix H[n,k] upon arrival of new received data r[n,k] and converges within just a few OFDM symbols.
The introduced forgetting factor ξ associates an exponential weighting of the past transmitted signals onto the current channel factor.Therefore it can be used to adapt to the timevariant channel conditions.
For calculating the channel transfer function the autocorrelation matrix (6) of the transmitted signals and the crosscorrelation matrix (7) of the transmitted and received signals are needed.Initialisation of both matrices is performed by using the identity matrix I.
Following the development of the RLS algorithm the solution for calculating the channel factors ends up in the matrix form of the normal equation, that can be formed to calculate the channel factors by inverting the autocorrelation matrix, as can be seen in the following equation: The inversion of the autocorrelation matrix can be avoided by using the matrix inversion lemma, the description of which is omitted here.Summation of rank-1 matrices in Eqs. ( 6) and ( 7) is avoided by starting each transmission frame with a training sequence of known pilot symbols, as the matrices have full condition after a few summations.The transmitted symbols are known in the receiver and the channel transfer function estimates H[n,k] can instantly be calculated.The symbols following the pilot symbols have to be decided in the receiver, for that reason the received symbols are equalized, decoded and detected before calculation of the channel factors.

Forward and backward RLS processing
The RLS-DDCE algorithm provides the receiver with information on the channel and also results in the transmitted signals.Due to the uncertainty in the decision making process it is prone to error propagation which might lead to a large amount of errors and entirely destroyed subcarriers.
Our novel proposal for this is to perform the RLS algorithm twice, in causal and anti-causal direction, which uses block-wise processing of received data, as shown in Fig. 3.This is done independently of each other since the additional information gained from that technique can be evaluated to eliminate errors and correct wrong decisions.
Figure 2 illustrates our proposed approach.On the left side the causal RLS processing is shown, which yields the aposteriori LLRs L 1 (u h |y) out of the received symbols r [n,k] www.adv-radio-sci.net/8/101/2010/Adv.Radio Sci., 8, 101-107, 2010 by soft demodulation and decoding, depicted by C −1 in Eq. ( 9).The right part of the figure processes the received data anti-causal wise, Eq. ( 10), where the length of the pilot symbols is denoted by N P and the data length by N d .This routine can be described as: where H † denotes the pseudo-inverse of the channel matrix and ν the anti-causal time index.The gained information in the form of a-posteriori LLRs (4) is passed on for further evaluation.Due to this postevaluation of the first two RLS processing, depending on the SNR and actual channel conditions, some of the codewords in the receiver can be considered as being correct.These codewords provide additional reliable information for the final RLS processing and will not be decided again.In case the post-evaluation does not result in any correct codewords, the final RLS processing is performed using the information of either the causal or anticausal RLS processing.
An important part of the proposed approach is the application of soft information, which is used to calculate the transmitted symbols in case of incorrect decoding.This additional usage of the soft information is exploited to calculate the BER, contrary to the normal RLS-DDCE algorithm.
The underlying frame structure only provides a training sequence at the beginning of each frame, and therefore the training sequence of the subsequent transmission frame can be used for the anti-causal RLS processing.Exploiting the incremental overhead twice comes at no additional overhead cost, but results in an additional channel information gain.Figure 3 gives an idea of the double usage of the pilot symbols.

Modified Turbo principle for forward and backward RLS processing
For our purpose the original Turbo principle of Berrou et al. (1993) and Berrou and Glavieux (1996), nicely explained by Hanzo et al. (2002), is changed, as presented in Fig. 4. We totally ignore the encoding part of the original Turbo Coding and solely perform the normal LDPC coding as presented in Sect.2, which will lead to the combination of LDPC and Turbo decoding in the receiver.
On the receiver side we retain the Turbo decoding layout, though we change the inputs to the component decoders.As the RLS algorithm is processed in causal and anti-causal manner the soft information of the received bits y h is available twice, y h,1 and y h,2 .The availability of two different inputs, which are supposed to be the same under perfect conditions, replaces the usage of two different codes.The extrinsic information is created and exchanged in the same way as in the original Turbo principle: where L c denotes the channel reliability.
Our proposed Turbo decoding is performed twice, starting with the soft information from the causal RLS first, followed by the anti-causal information with the adequate extrinsic information ( 12) and vice versa (13).
In our proposal we use the break criterion provided by Robertson (1994), which evaluates the variance of the a-posteriori LLRs.This is sensible as this criterion describes the alteration of the a-posteriori LLRs and the iteration breaks when subsequent a-posteriori LLRs do not change more than 0.03.Nevertheless we also include an additional break criterion in order to avoid unnecessary iterations.In case the PCS for all codewords of either the causal or anti-causal RLS are zero, which is mostly the case for high SNR, the iteration of the Turbo decoding will not even start.The two break criteria are also shown in Fig. 4.
The comparably large codeword distance of LDPC codes (MacKay, 1999) can be used to evaluate the performance of certain iteration steps.A PCS of zero is likely to be equivalent to an error free codeword.And in addition, simulations have shown that a small total PCS over all codewords is connected to a lower BER.
The total PCS is then used to decide on the output of the two different iteration directions, Eqs. ( 14) and ( 15).The lower total PCS of the second component decoder determines the a-posteriori LLRs L max (u h |y) that are to be further processed: where p v,fw respectively p v,bw describe the PCS of the codeword v after the second LDPC decoder operating on the anticausal respectively causal data.
After the iteration the RLS algorithm is again processed in causal or anti-causal direction, depending on the origin of L max (u h |y).The correct codewords are determined by codewords with a PCS of zero.

Summation of a-posteriori LLRs
Our second approach to improve the channel estimation is to simply sum up the a-posteriori LLRs from the forward and backward RLS processing: This, in comparison to the modified Turbo principle, has less computational complexity and still corrects a large amount of errors.
The key issue behind this idea is the availability of the soft information, which is presented by the sign and magnitude of the LLRs.Out of the LLRs, the probability of a bit being a 1 or 0 can easily be computed: In the case when both of the RLS processing results in the same decided bits, the summation of the LLRs will not change the decisions, but enhances the reliability of the right decision.When the RLS processing decide on different bits, the larger magnitude of the soft information will determine the final bits.
The reliable codewords to conduct the final RLS processing are determined by comparing the hard decided bits, based on the a-posteriori LLRs after the summation, with the hard decided bits of the a-posteriori LLRs before the summation.A codeword is considered to be reliable when all a-posteriori LLRs of a certain codeword do not change due to the summation.To ensure the correctness of that codeword the total PCS for the codeword has to be zero, regardless of whether it is based on the causal or anti-causal RLS.

Simulation results
The simulation was performed on a 4×4 MIMO-OFDM system with the simulation parameter given in Table 1.For the modulation, a 4-QAM was taken so that an OFDM symbol consisted of 1024 bits.The calculated frame duration based on the paramters resulted in 3.86 ms.
The forward, backward and final RLS processing used the simple zero forcing equalizer due to computational complexity and retention of soft information.Thereby the resulting BER was not as small as possible, but the operating principle and the improvements due to our new approach were demonstrated.
Simulations were done for several velocities, ranging from 0 m/s, for comparison, to 25 m/s, suitable for a micro urban cell as simulated by the channel model.The range of interesting SNR values lied between 6 dB and 21 dB.
The forgetting factor ξ was chosen according to (Akhtman and Hanzo, 2007a) with a value of 0.7, so that the algorithm worked fine over a large range of velocities.
Figure 5 shows the comparison of the implemented algorithms for a velocity of 1.67 m/s and 25 m/s.Over the entire SNR range the simple RLS-DDCE performed worst  for both velocities.For smaller SNR, up to about 15 resp.19 dB, the summation of a-posteriori LLRs dominated the BER performance.The Turbo principle was slightly worse, but increased in performance for higher SNR values.For small SNR values the receiver did not yield correct codewords, so the performance increase in comparison to the simple RLS-DDCE algorithm was solely due to the extended soft information evaluation of the final RLS processing.In addition the summation used the added a-posteriori LLRs for the BER evaluation in low SNR regions, which explains the better performance for small SNR values.For the upper SNR range the Turbo principle worked better due to the larger amount of correct codewords, which resulted in a smaller BER.
The simulation results in relation to the velocities are shown in Fig. 6.The RLS-DDCE algorithm again performed worst.For the SNR of 18dB the summation was comparable to the modified Turbo principle for velocities above 12.5 m/s.In case of lower velocities and for the entire velocity range at 21dB the modified Turbo principle performed better, because of the available number of correct codewords.
In order to evaluate the channel estimation the normalized mean squared error (NMSE) is applied:  principle presented runaway values due to insufficient channel realisations.
Figure 8 shows the iterative behavior of the implemented Turbo principle for a SNR of 18 dB and a velocity of 8.33 m/s.The figure nicely depicts how the number of codewords with a total PCS of zero increased with increasing number of iterations, for the forward and backward iteration.Along with it the number of wrong codewords, codewords with PCS of zero and biterrors, also rose.This was due to the LDPC decoder, which ran into wrong codewords due to the exchange of extrinsic information.As one can see, the curve's slope for the total PCS was flattening with increasing iterations, so that the variance break criterion became active at one point and stopped the iteration.In case the final RLS processing is able to make better decisions with more reliable channel information, then the number of iterations should not be too large in order to avoid incorrect codewords.In addition Fig. 8 presents the difference between the causal and anticausal iteration direction, as can be seen at the starting values of the curves.

Conclusion
In this paper we have presented a novel approach to the channel estimation process for challenging time-variant channels and the performance with respect to BER and NMSE has been evaluated over a large range of velocities.The modified Turbo principle, based on different input data for the component decoder, shows increased performance over the entire velocity range for larger SNR values, especially at the upper limit of the velocity range the performance compared to the simple RLS-DDCE is superior.At lower SNR values the performance is still better than the simple RLS-DDCE, though the applied linear equalizer prohibits better performance.The summation of a-posteriori LLRs in contrast performs better for smaller SNR values as the summation corrects a certain amount of wrong decided symbols.The performance increase for the NMSE is comparable to the BER performance and results in a better channel estimation for the proposed approaches.

Fig. 1 .
Fig. 1.General structure of the underlying system, represented for a 4 × 4 system.

Fig. 2 .
Fig.2.Novel approach in forward and backward RLS-DDCE processing implemented on the receiver side of the system depicted in Fig.1.

Fig. 5 .
Fig. 5. BER for different SNR at velocities of 1.67 m/s and 25 m/s.

Figure 7
Figure7shows the NMSE in relation to the velocities for a SNR of 18 and 21 dB.For 21 dB SNR the behaviour was the same as for the BER, the modified Turbo principle resulted in the lowest NMSE values.The results were different for 18 dB SNR, where the summation of a-posteriori LLRs performed the better channel estimation.The values at 1.67 m/s and 20.8 m/s for 18 resp.21 dB for the summation and Turbo

Fig. 8 .
Fig. 8. Iterative behavior of the Turbo principle showing the number of correct codewords, wrong codewords and total PCS for the forward and backward Turbo iteration, at a SNR of 18 dB and 8.33 m/s. ) Fig. 3. Frame structure for proposed MIMO-OFDM RLS-DDCE Forward and Backward Filtering with re-use of next and previous frame preambles.