recommended netscape fonts:
14-point New Century Schoolbook or Helvetica
versions for printing: postscript and PDF (version one and version two)
2000 Int'l Conf. on Signal Processing Applications and Technology,
Dallas TX, October 16-19, 2000.
|Jeffrey O. Coleman||James J. Alter||Dan Scholnik|
|Naval Research Laboratory|
In this tutorial design-example paper, we outline design approaches for an FPGA-based DSP system for IQ downconversion of a 400 MHz wide IF signal centered at 750 MHz and sampled at 1 GHz. Downconversion through sampling eliminates explicit quadrature carriers. An FPGA clock rate of only 125 MHz requires polyphase (parallel) processing. Such downconversion systems are important in radar, satellite communication, terrestrial microwave communication, and base stations for wireless communication.
Figure 1 illustrates the required signal-processing steps with spectral sketches. Signals have ``='' on the left, and sample rates are marked with triangular tics below the axis. Operations on signals are marked as spectral products (filters) or spec-tral convolutions and have input and output sample rates marked with tics above and below the axis respectively. (The derivation strategy and this notation are presented in . Another IQ-demodulator design is detailed in .)
Preprocessing comprises RF/IF filtering, sampling, and equalization of the RF/IF filters. Sampling aliases 100 MHz transition bands together and out of band. A ``beamforming sum'' follows in certain multi-channel systems only [3,4] (else ignore). Then actual IQ downconversion begins.
Downconversion is simple. A digital image-suppression filter removes signal components originally at negative frequencies, then decimation halves the sampling rate and thus the computation rate of the preceeding filter. The original 750 MHz signal lobe is shifted down in frequency to become complex baseband output signal . The complex exponential required is just , so the multiplication is just sign alternation.
Figures 2 and 3 show the responses of the actual filters, all referred to the sampler input.
The linear-phase, halfband image-suppression filter's coefficients were designed by starting with times a length-14 equiripple Hilbert filter, zero interpolating by two, replacing the zero center coefficient by unity, and halving the result. This gave an impulse response of length 27, a unit-length real part, and an odd imaginary part with seven nonzero coefficients on either side of center. The image-suppression filter's passband was implied by its stopband, and the latter was just the Hilbert filter's 400 MHz passband at 250 MHz.
Combined passbands of the image-suppression filter, the RF/IF filters (measured), and the length-14 nonlinear-phase real FIR equalization filter approximate, by optimization of the latter, a pure delay  with -37 dB of minimized rms error.
Table 1 represents polyphase (block) processing clocked at one eighth the sample rate. The x's represent the sample stream input so far to an FIR filter, and the y rows represent filter outputs Numbers in the table are coefficient indices for required terms, so the second y row, for example, means . The even coefficient symmetry of this length-19 real filter gives it linear phase. Conjugate symmetry would do so for a complex filter: .
Either classic form in Fig. 4 can compute an output block from current and past input blocks using the Table 1 coefficient alignment. Computation is at eight-sample intervals, so each eight-sample delay is just a single clock tick, a one-register delay. Writing , with a residue modulo 8, permits output term to be realized by delaying , element of the input block, by clock ticks (register delays). Scaling by follows the register delays in the direct form but precedes them in the transposed.
Showing all the terms of Table 1 would make Fig. 4 unreadable. The terms shown (indicated in blue), the same for each form, show the several available structures for shared coefficient scaling in a linear-phase filter to save computation. (The negation in certain symmetries is not shown.) When inputs to a coefficient-sharing structure in the direct form are from the same ``column'' (register delay) of delayed inputs, as for , the structure is the same in the transposed form. The term is a degenerate case. Likewise, when a scaled input in the transposed form drives outputs through sums in the same column, as with , the structure is the same in the direct form. Inputs from the same input-data row in the direct form, as for , become multiple outputs in the transposed form (hinting at duality). When coefficient-scaling inputs in the direct form share neither row nor column, as with , there is no shared structure in the transposed form. When the coefficient-scaling output in the transposed form drives sums sharing neither row nor column, as with , there is no shared structure in the direct form. The choice of a set of shared structures to realize a filter is seldom unique.
Lookup tables (LUT) are easily built as in Fig. 5 from the latched four-input logic blocks that are paired as ``slices'' in Xilinx's Virtex FPGAs . Other FPGAs are similar. So assume that LUTs, adders, and registers are available in all widths.
Figure 6 shows LUT scaling of data. Storing the required product, the bottom LUT would suffice for four-bit input. For wider inputs, four-bit pieces are scaled and results summed. Infinite-precision coefficients require different LUTs and rounded products, as in Fig. 7-(a), while finite-precision coefficients use identical LUTs and shifted output words as in Fig. 7-(b). Product output can be narrower if it is rounded along with the final sum (case not shown).
Without linear-phase coefficient sharing in Fig. 4, each transposed-form sum adds scaled versions of some or all of the eight inputs. In Fig. 8 such a sum requires just one copy per input-word bit of each of two distinct LUTs. Summing these two LUT outputs and then shifting and adding those totals according to bit position is not shown.
An LUT's inputs need not come from just one data word as in Fig. 6 nor from just one bit position as in Fig. 8. In Fig. 9, six 10-bit words are linearly combined with 15 LUTs that do neither. The one-to-one mapping of data-input bits to LUT inputs is in general arbitrary, but minimizing the range of bit positions driving a single LUT, as in Fig. 9, minimizes required LUT output width.
Sign considerations were omitted from this paper for simplicity, but they are straightforward. Scaling a complex input simply requires LUTs driven by both real and imaginary parts of the input data. In general twice as many input bits to the LUT system are needed. Complex outputs require LUTs of twice the width, with half the width dedicated to the real part of the output and half dedicated to the imaginary part.
While the full range of design considerations for a system such as this is beyond the scope of a short paper, we outlined here an approach, methods, and techniques with which a design of a wideband IQ downconverter or similar high-speed, multi-rate DSP system can be implemented in an FPGA using lookup-table approaches and a polyphase architecture.