A digital receiver signal strength detector for multi-standard low-IF receivers

This paper presents a receiver signal strength detector based on a discrete Fourier transform implementation. The energy detection algorithm has been designed and measured using a custom multi-standard transceiver ASIC with a low-IF receiver at 0.5, 1 and 2 MHz IF. The proposed implementation directly processes the single bit 16 modulator data and features a clear channel assessment for arbitrary modulation schemes without energy consuming demodulation. Continuous monitoring of the derivative of the RSSI takes advantage of faster coefficient convergence for higher power levels and reduces computation time. A dynamic range of 65 dB has been achieved in FPGA based measurements with a linearity error of less than 1.2 dB. Furthermore, synthesis results for an on-chip implementation for an 130 nm RF CMOS technology show an overall power consumption of 1.5 mW during calculation.


Introduction
Short range wireless systems with low power consumption are increasingly used for communication of sensor networks.The communication of these networks usually takes place on three bands, which are also used for a various range of common standards.WiFi, Bluetooth or Bluetooth LE are located in the 2.4 GHz band or IEEE 802.15.4 (2016), which specifies multiple physical layers not only for 2.4 GHz, but also for 868 MHz and even for the 433 MHz ISM band.In the last decade all these bands have been increasingly occupied by various communications.Additionally, the trend of the recent years is going towards multi-mode, multi-standard transceivers which offer a wide range of functionality.
With the increased unlicensed use of these bands by various standards, listen before talk functionality is getting more important to ensure a reliable communication.These Clear Channel Assessments (CCA) can be run based on the energy detected in a channel.For standards like IEEE 802.15.4 (2016) a receiver energy detection (ED) is mandatory and a corresponding value between 0x00 and 0xFF has to be provided to the user.Furthermore, an accuracy of at least ±6 dB is required to fulfill the specification.
The low-IF receiver (RX) architecture with modulators ( M) is a suitable solution for multi-standard operation due to its high reconfigurability (L.Zhang et al., 2013) and a high dynamic range (DR).The high DR relaxes the requirements of the analog front-end (Ho et al., 2011) by reducing the need for high gain as well as a fine tuning of analog gain.Even coarse gain steps in the RF front-end ensure a sufficient signal level at the analog-digital converter (ADC) input and the sensitivity specifications for a standard like IEEE 802.15.4 (2016) can be fulfilled.Accordingly, the automated gain control (AGC) loop of the RX front-end alone does not need a high resolution and can easily be controlled from digital signal processing.One method to determine the receiver signal strength indicator (RSSI) value with a high resolution can be found in the digital AGC loop of the digital baseband (BB) filters.However, these BB filters have a comparatively high current consumption during operation and the digital AGC has a build in hysteresis.
Common single band transceiver architectures like the one presented by Raja et al. (2010) or even the Atmel AT86RF233 (2014) use limiting amplifiers (LA) in the RX front-end to limit the signal amplitude to the DR of the ADC.LA offer the possibility of extracting the RSSI with a rather low additional effort as shown by Huang et al. (2000), but the reception of amplitude modulated signals is not possible in this case.However, in low-IF RX architectures like the one presented by Zolfaghari et al. (2017), LAs are not needed.Consequently, an energy and area efficient solution for the ED, instead of limiting amplifiers, is necessary.
This work presents a digital implementation for RSSI calculation based on a discrete Fourier transform algorithm without area consuming digital multipliers.Furthermore, the RSSI quality is monitored during RSSI calculation to minimize the computation time.The algorithm directly processes the oversampled M single bit output data without previous digital filtering.Since the Fourier transform is a highly selective filter itself, down conversion, filtering, demodulation and decimation of the digital BB processing can be deactivated, if the received data is dispensable.This is the case in CCA-ED operation where it has to be ensured that the desired channel is not occupied.Hence, listen before talk functionality can be implemented in an energy efficient way by deactivating digital system components with high power consumption.
The presented RSSI calculation has been designed and tested using synthesized VHDL code executed on an FPGA and connected to a custom triple-band multi-standard transceiver ASIC.An overview of the used RX front-end, the requirements for energy detection and the implementation will be given in the following sections.Subsequently, the RSSI calculation algorithm will be discussed in detail and solutions for an area efficient implementation are given.Furthermore, the measurement setup is presented and the results are shown and compared to analog implementations of ED.

System overview
An overview of the relevant parts of the transceiver ASIC is shown as a block diagram of the RX architecture in Fig. 1.The front-end features a sliding-IF mixer for down conversion of the 2.4 GHz band to an intermediate frequency (IF) of 0.5 to 2 MHz depending on the channel bandwidth.The 1.6 GHz local oscillator (LO) signal of the frequency synthesizer is therefore divided by two and used in the second mixer stage.Furthermore the divided LO signal can be connected to the 868 MHz mixers LO input or further divided by two for the 433 MHz mixer.In this way three bands can be supported with only a single phase-locked loop, which is additionally reused for the ASICs transmitter.After down conversion to the desired IF a complex valued quadrature bandpass filter is used for channel selection.Finally, the received signal is converted to the digital domain using a reconfigurable M as presented in Saalfeld et al. (2016).

Energy detection fundamentals
The Fourier transform is a well known and widely used method to analyze a signal in frequency domain.In Fig. 2 the fast Fourier transform (FFT) of simulation data of the single bit mode of the M is shown for two scenarios.In one simulation a signal is present at 750 kHz.The corresponding frequency bin is marked as S k,1 .Another simulation has been performed with an input signal amplitude below the quantization noise floor.The corresponding frequency bin has been marked with S k,0 .It can be observed that for both scenarios the DC amplitude S 0 is in the same range.In order to determine an RSSI value using a Fourier transform it has to be taken into account that S k is proportional to the run length.Therefore, S 0 can be used to normalize S k to eliminate this proportionality.
The FFT proves to be an efficient algorithm to calculate the entire frequency spectrum (Ohm and Lüke, 2007).However, more light-weight implementations are available for single frequency bins.Implementing a discrete Fourier transform (DFT) as shown in Eq. ( 1) for single frequency bins with fixed values of k is a well-fitted solution for signal power detection.Hence, the absolute value of the complex Fourier-coefficient S k represents the signal power in the kth frequency bin.Furthermore, each bin calculation can be seen as highly selective filtering.Therefore, a narrow band filter for out-of-channel interferer reduction is not necessary. (1)

Goertzel filter
One approach to the DFT implementation is the Goertzel algorithm.The Goertzel algorithm transfer function is shown in Eq. ( 2) and is further explained by Oppenheim and Schafer (1999).
Expanding the transfer function with the complex conjugate term, it can be split into an IIR-filter and an FIR-filter as shown in Eq. ( 3).
Figure 3 shows the corresponding Z-transfer function block diagram.The IIR-filter needs one real multiplication per cycle, while the remaining complex multiplication of the FIRfilter has to be computed only once for every frequency bin.
The resulting algorithm works well with CPUs or DSPs, which have a build-in multiplying unit with sufficient word length, because it reduces the clock cycles required for a complex multiplication to those of a single real multiplication.A VLSI implementation does not benefit in the same way from the reduced multiplication effort, since the complex multiplier has to be implemented in hardware.Furthermore, the real multiplication requires considerable word length, since every error induced by quantization is multiplied in the following cycles.The word length in this case is dominated by the word length of the constant cos-factor, which should be larger than the input word length to achieve the desired resolution.

Discrete Fourier transform
The DFT can also be directly implemented to compute only a single frequency bin instead of multiple bins as it would be necessary for a spectral range.In Eq. ( 1) the DFT calculation for the kth frequency bin was already shown.Figure 4 shows the corresponding Z-transform block diagram.It consists of a e −j 2π k N •n multiplication stage and an integrator.In contrast to the Goertzel algorithm a time dependency introduced by the index n exists for fixed values of the selected bin k and the spectral resolution N.
However, the DFT works well with M because the high resolution is achieved with oversampling and noise shaping instead of a high number of bits.The higher frequency barely has an effect on the VLSI implementation, while a higher number of bits would directly increase the physical size of the multiplier.Utilizing the single bit data stream of the M allows to minimize the size of the multiplier in the input path.Furthermore, the down conversion of the received signal from IF to DC is typically performed by the demodulator as shown by Y. Zhang et al. (2013).Hence, the down conversion multiplication with e −j ω should already be implemented by the DSP.

Hardware implementation
The presented architecture, depicted in Fig. 5, uses the single bit M samples.Since the values represent ±1 the complex multiplier can be implemented as sign selection.A base 2 logarithm unit is included to allow linear mapping between the calculated RSSI and the signal power in dBm.Additionally, the common mode coefficient S 0 is calculated, as shown in Eq. ( 4).The absolute value of the relevant frequency bin, |S k |, is normalized to |S 0 | to have an RSSI value independent of the run length.

RSSI
As seen in Eq. ( 5) the normalization is realized by subtracting the logarithmic values, eliminating the need for a divider unit.Continuous calculation allows variable computation time, running as long as necessary, but as short as possible.

DFT core processing
The calculation of the DFT is shown as the core processing part in the left section in Fig. 5.As stated in Sect.3.2 the index value n of the complex e −j ω multiplication shown in Fig. 4 changes with every cycle.Therefore, a lookup table (LUT) has to be introduced.If the computation runs continuously, then the sample count N in Eq. ( 1) is time variant.Furthermore, the resulting spectral resolution and therefore the width of the detected frequency bin is dependent on N as well.Higher N result in narrower frequency bins and thus increasing the signal to noise ratio (SNR) in this specific bin.During computation the observed bin is fixed in regard to the absolute frequency and is defined by the factor k N which is therefore constant and independent over the run length.Hence the computation can run continuously and does not necessarily need a fixed duration.This increases the resolution, allowing to detect signals with low power levels which would otherwise have been indistinguishable from the noise.Consequently, longer computation time enables the detection of small signals.
Regarding the LUT values the exponential term in Eq. ( 1) can be replaced using Euler's formula.As seen in Eq. ( 6) a cosine value as well as a cosine shifted by a quarter period are required.
The cosine argument is periodic at n = N k , where the largest N in the presented system is set by the maximum M clock frequency f s = 96 MHz.The smallest necessary k is set by the lowest IF f IF = 0.5 MHz and is the index of the relevant frequency bins for RSSI detection.This results in a cosine LUT length of M = 96×10 6 0.5×10 6 = 192 for a full period.However, storing a quarter period is sufficient to reconstruct the entire cosine, reducing the size of the LUT to 48 entries.Reconstruction is done by either negating the LUT value or reversing the access.The need to access the values in reverse as well as in regular order simultaneously requires individual multiplexer-trees for the sine and cosine values.
The input is a complex value defined by the M output values and has to be interpreted as D IF = I − j Q.The complex multiplier at the input is split into four real multiplications, where sin and cos represent the sine and cosine term in Eq. ( 6).The resulting signal is interpreted as shown in Eq. ( 7).The sign selection of cos and sin values depending on the input signal D IF is shown in Table 1.
If the input from the M is negative and the index n requires the LUT value to be inverted, the resulting value will be positive.Therefore, the inversion is controlled by the exclusive disjunction and realized within the following accu-Table 1. LUT signal selection for single bit complex multiplication.mulator.Reusing the accumulator adder requires no additional summation stage for the two's-complement.This accumulator consists of an adder and a register to store the current complex Fourier coefficient.An additional pipeline register delays further processing of the Fourier coefficient until at least one full period of the exponential term in Eq. ( 1) is calculated.Consequently, calculations with incomplete coefficients are prevented.
For the detection of the DC value the coefficient k in Eq. ( 6) is set to k = 0 as shown in Eq. ( 4).Therefore, the multiplication is not necessary and only an accumulation stage is needed.

Post-processing
The L1-norm is a sufficient method to be used in an RSSI detection.Shown in Eq. ( 8) is the result of an L1-norm of the S k signal.The absolute value is generated by checking the sign and, if necessary, negating or of S k during the following summation.
For logarithm calculation the value is, as discussed by Mansour et al. (2014), divided into mantissa and exponent.As shown in Fig. 6 the logarithmic value can be expressed as: The exponent e is calculated using a thermometer encoder, while the mantissa m is generated by shifting the original value X by e bits.This is performed by a variable shift register, which requires one clock cycle per shift.The mantissa is then fed to a LUT with a size of 32 values that maps m to log 2 (m).By using a fixed point format the exponent and the logarithmic mantissa can be concatenated, yielding the final logarithmic value.Small variances between concurrent log 2 (S k ) values are attenuated, using a moving average filter with a length of four.A signed subtracter finally calculates the RSSI value.
The difference of the last two RSSI values is evaluated using an approximate derivative of the RSSI signal.Once the RSSI has converged, the absolute value of this difference drops below a threshold and the RSSI calculation can be stopped.Furthermore, the current slope is given out alongside the RSSI, giving a factor for RSSI quality estimation.

Results
Figure 7 shows the RSSI value, the individual frequency bins as well as the derivation of the RSSI, called Slope.Using the existing ASIC and an FPGA, single bit M samples have been recorded and used for simulations.A high language description of the hardware was used for the simulations, ensuring early verification of the algorithms functionality.Figure 7 depicts the development of the RSSI (S 0 − S k ), the DC value S 0 and the S k value as well as the slope shown in percent.The data is exemplary shown for −70 and −50 dBm input power.It can be seen that the DC value S 0 is equal for both input powers, while S k settles to higher values with higher input power.Furthermore, the decreased settling time for increased input power values can be observed.
The design functionality has also been verified in measurements using VHDL code synthesized on a Xilinx Virtex 5 board.The design was tested using the existing transceiver ASIC in receive operation as presented in Sect. 2. The RF input signals were supplied by a signal generator.Figure 8 shows the RSSI measurement with a step size of 3 dB.Fur-  thermore, the standard deviation over 10 runs is shown for each measured input signal power.
Additionally, the error of the measured RSSI towards an ideal regression is shown in dB with respect to the input power.It can be seen that for all input power measurements the RSSI error stays below 1.2 dB while the standard deviation increases for smaller input signals due to the decreased SNR.
In order to integrate the ED within the transceiver ASIC a synthesis for a 130 nm RF CMOS technology has been evaluated.The measurement results and worst case synthesis report for power and area consumption are used for a comparison to analog LAs in Table 2.The DFT based implementation shows a decent power consumption P total = 1.5 mW during calculation with a small area of 39 000 µm 2 .It has to be pointed out that for the analog LA based implementations the LA is an essential system block of the used RX architecture unlike in a low-IF architecture.Furthermore, LA would need a separate ADC in order to provide a digital RSSI value.Additionally, the integration into a transceiver ASIC would allow the reuse of the down conversion to BB during demodulation reducing the power consumption of the proposed implementation by 6 %.In regards to the DR and error the DFT based digital implementation shows an adequate performance.

Conclusions
A receiver signal strength detection algorithm based on a discrete Fourier transform algorithm has been presented.It evaluates the single bit output data stream of a modulator without previous signal manipulation like digital filtering.Furthermore, the ED has been verified in measurements using a custom ASIC with a low-IF RX.It features a DR of 65 dB with an error of less than 1.2 dB and provides an input power dependent run time by evaluating the derivative of the RSSI signal.The synthesized hardware implementation for a 130 nm RF CMOS technology showed a small area of 39 000 µm 2 and power consumption of less than 1.5 mW.Consequently, an efficient DFT based implementation for ED has been shown which can directly be used as a low effort solution for RSSI calculation in low-IF architectures with M.

Figure 2 .
Figure 2. FFT of single bit M output data stream from schematic simulations.

Figure 4 .
Figure 4. Block diagram of the discrete Fourier transform.

Figure 7 .
Figure 7. Development of the RSSI during simulation over run time for an input power of −70 dBm (solid) and −50 dBm (dashed).

Table 2 .
Comparison of the proposed digital algorithm with analog limiting amplifier based implementations.