recommended netscape fonts:
14point New Century Schoolbook or Helvetica
versions for printing:
postscript
and PDF
(version one
and
version two)
2000 Int'l Conf. on Signal Processing Applications and Technology,
Dallas TX, October 1619, 2000.
Jeffrey O. Coleman  James J. Alter  Dan Scholnik 

jeffc@alum.mit.edu  alter@radar.nrl.navy.mil  scholnik@nrl.navy.mil 
Naval Research Laboratory 
Washington, DC 
In this tutorial designexample paper, we outline design approaches for an FPGAbased DSP system for IQ downconversion of a 400 MHz wide IF signal centered at 750 MHz and sampled at 1 GHz. Downconversion through sampling eliminates explicit quadrature carriers. An FPGA clock rate of only 125 MHz requires polyphase (parallel) processing. Such downconversion systems are important in radar, satellite communication, terrestrial microwave communication, and base stations for wireless communication.
Figure 1 illustrates the required signalprocessing steps with spectral sketches. Signals have ``='' on the left, and sample rates are marked with triangular tics below the axis. Operations on signals are marked as spectral products (filters) or spectral convolutions and have input and output sample rates marked with tics above and below the axis respectively. (The derivation strategy and this notation are presented in [1]. Another IQdemodulator design is detailed in [2].)
Preprocessing comprises RF/IF filtering, sampling, and equalization of the RF/IF filters. Sampling aliases 100 MHz transition bands together and out of band. A ``beamforming sum'' follows in certain multichannel systems only [3,4] (else ignore). Then actual IQ downconversion begins.
Downconversion is simple. A digital imagesuppression filter removes signal components originally at negative frequencies, then decimation halves the sampling rate and thus the computation rate of the preceeding filter. The original 750 MHz signal lobe is shifted down in frequency to become complex baseband output signal . The complex exponential required is just , so the multiplication is just sign alternation.
Figures 2 and 3 show the responses of the actual filters, all referred to the sampler input.
The linearphase, halfband imagesuppression filter's coefficients were designed by starting with times a length14 equiripple Hilbert filter, zero interpolating by two, replacing the zero center coefficient by unity, and halving the result. This gave an impulse response of length 27, a unitlength real part, and an odd imaginary part with seven nonzero coefficients on either side of center. The imagesuppression filter's passband was implied by its stopband, and the latter was just the Hilbert filter's 400 MHz passband at 250 MHz.
Combined passbands of the imagesuppression filter, the RF/IF filters (measured), and the length14 nonlinearphase real FIR equalization filter approximate, by optimization of the latter, a pure delay [5] with 37 dB of minimized rms error.
Table 1 represents polyphase (block) processing clocked at one eighth the sample rate. The x's represent the sample stream input so far to an FIR filter, and the y rows represent filter outputs Numbers in the table are coefficient indices for required terms, so the second y row, for example, means . The even coefficient symmetry of this length19 real filter gives it linear phase. Conjugate symmetry would do so for a complex filter: .

Either classic form in Fig. 4 can compute an output block from current and past input blocks using the Table 1 coefficient alignment. Computation is at eightsample intervals, so each eightsample delay is just a single clock tick, a oneregister delay. Writing , with a residue modulo 8, permits output term to be realized by delaying , element of the input block, by clock ticks (register delays). Scaling by follows the register delays in the direct form but precedes them in the transposed.
Showing all the terms of Table 1 would make Fig. 4 unreadable. The terms shown (indicated in blue), the same for each form, show the several available structures for shared coefficient scaling in a linearphase filter to save computation. (The negation in certain symmetries is not shown.) When inputs to a coefficientsharing structure in the direct form are from the same ``column'' (register delay) of delayed inputs, as for , the structure is the same in the transposed form. The term is a degenerate case. Likewise, when a scaled input in the transposed form drives outputs through sums in the same column, as with , the structure is the same in the direct form. Inputs from the same inputdata row in the direct form, as for , become multiple outputs in the transposed form (hinting at duality). When coefficientscaling inputs in the direct form share neither row nor column, as with , there is no shared structure in the transposed form. When the coefficientscaling output in the transposed form drives sums sharing neither row nor column, as with , there is no shared structure in the direct form. The choice of a set of shared structures to realize a filter is seldom unique.
Lookup tables (LUT) are easily built as in Fig. 5 from the latched fourinput logic blocks that are paired as ``slices'' in Xilinx's Virtex FPGAs [6]. Other FPGAs are similar. So assume that LUTs, adders, and registers are available in all widths.
Figure 6 shows LUT scaling of data. Storing the required product, the bottom LUT would suffice for fourbit input. For wider inputs, fourbit pieces are scaled and results summed. Infiniteprecision coefficients require different LUTs and rounded products, as in Fig. 7(a), while finiteprecision coefficients use identical LUTs and shifted output words as in Fig. 7(b). Product output can be narrower if it is rounded along with the final sum (case not shown).
Without linearphase coefficient sharing in Fig. 4, each transposedform sum adds scaled versions of some or all of the eight inputs. In Fig. 8 such a sum requires just one copy per inputword bit of each of two distinct LUTs. Summing these two LUT outputs and then shifting and adding those totals according to bit position is not shown.
An LUT's inputs need not come from just one data word as in Fig. 6 nor from just one bit position as in Fig. 8. In Fig. 9, six 10bit words are linearly combined with 15 LUTs that do neither. The onetoone mapping of datainput bits to LUT inputs is in general arbitrary, but minimizing the range of bit positions driving a single LUT, as in Fig. 9, minimizes required LUT output width.
Sign considerations were omitted from this paper for simplicity, but they are straightforward. Scaling a complex input simply requires LUTs driven by both real and imaginary parts of the input data. In general twice as many input bits to the LUT system are needed. Complex outputs require LUTs of twice the width, with half the width dedicated to the real part of the output and half dedicated to the imaginary part.
While the full range of design considerations for a system such as this is beyond the scope of a short paper, we outlined here an approach, methods, and techniques with which a design of a wideband IQ downconverter or similar highspeed, multirate DSP system can be implemented in an FPGA using lookuptable approaches and a polyphase architecture.