recommended netscape fonts:
14-point New Century Schoolbook or Helvetica
versions for printing:
postscript
and PDF
(version one
and
version two)
2000 Int'l Conf. on Signal Processing Applications and Technology,
Dallas TX, October 16-19, 2000.
| Jeffrey O. Coleman | James J. Alter | Dan Scholnik |
|---|---|---|
| jeffc@alum.mit.edu | alter@radar.nrl.navy.mil | scholnik@nrl.navy.mil |
| Naval Research Laboratory |
| Washington, DC |
In this tutorial design-example paper, we outline design approaches for an FPGA-based DSP system for IQ downconversion of a 400 MHz wide IF signal centered at 750 MHz and sampled at 1 GHz. Downconversion through sampling eliminates explicit quadrature carriers. An FPGA clock rate of only 125 MHz requires polyphase (parallel) processing. Such downconversion systems are important in radar, satellite communication, terrestrial microwave communication, and base stations for wireless communication.
Figure 1 illustrates the required signal-processing steps with spectral sketches. Signals have ``='' on the left, and sample rates are marked with triangular tics below the axis. Operations on signals are marked as spectral products (filters) or spec-tral convolutions and have input and output sample rates marked with tics above and below the axis respectively. (The derivation strategy and this notation are presented in [1]. Another IQ-demodulator design is detailed in [2].)
Preprocessing comprises RF/IF filtering, sampling, and equalization of the RF/IF filters. Sampling aliases 100 MHz transition bands together and out of band. A ``beamforming sum'' follows in certain multi-channel systems only [3,4] (else ignore). Then actual IQ downconversion begins.
Downconversion is simple. A digital image-suppression filter removes
signal components originally at negative frequencies, then decimation
halves the sampling rate and thus the computation rate of the
preceeding filter. The original 750 MHz signal lobe is shifted down
in frequency to become complex baseband output signal
.
The complex exponential required is just
, so the
multiplication is just sign alternation.
Figures 2 and 3 show the responses of the actual filters, all referred to the sampler input.
The linear-phase, halfband image-suppression filter's coefficients
were designed by starting with
times a length-14 equiripple
Hilbert filter, zero interpolating by two, replacing the zero center
coefficient by unity, and halving the result. This gave an impulse
response of length 27, a unit-length real part, and an odd imaginary
part with seven nonzero coefficients on either side of center. The
image-suppression filter's passband was implied by its stopband, and
the latter was just the Hilbert filter's 400 MHz passband at 250 MHz.
Combined passbands of the image-suppression filter, the RF/IF filters (measured), and the length-14 nonlinear-phase real FIR equalization filter approximate, by optimization of the latter, a pure delay [5] with -37 dB of minimized rms error.
Table 1 represents polyphase (block) processing clocked at one
eighth the sample rate. The x's represent the sample stream
input so far to an FIR filter, and the y
rows represent filter outputs
Numbers in the
table are coefficient indices for required terms, so the second y row, for example, means
. The even coefficient
symmetry of this length-19 real filter gives it linear phase.
Conjugate symmetry would do so for a complex filter:
.
|
![]() |
Either classic form in Fig. 4 can compute an output block
from current and past input blocks using the Table 1 coefficient
alignment. Computation is at eight-sample intervals, so each
eight-sample delay
is just a single clock tick, a
one-register delay. Writing
, with
a residue modulo 8,
permits output term
to be realized by delaying
,
element
of the input block, by
clock ticks (register delays).
Scaling by
follows the register delays in the direct form but
precedes them in the transposed.
Showing all the terms of Table 1 would make Fig. 4
unreadable. The terms shown (indicated in blue), the same for each
form, show the several available structures for shared coefficient
scaling in a linear-phase filter to save computation. (The negation
in certain symmetries is not shown.) When inputs to a
coefficient-sharing structure in the direct form are from the same
``column'' (register delay) of delayed inputs, as for
, the
structure is the same in the transposed form. The
term is a
degenerate case. Likewise, when a scaled input in the transposed form
drives outputs through sums in the same column, as with
, the
structure is the same in the direct form. Inputs from the same
input-data row in the direct form, as for
, become multiple
outputs in the transposed form (hinting at duality). When
coefficient-scaling inputs in the direct form share neither row nor
column, as with
, there is no shared structure in the transposed
form. When the coefficient-scaling output in the transposed form
drives sums sharing neither row nor column, as with
, there is no
shared structure in the direct form. The choice of a set of shared
structures to realize a filter is seldom unique.
Lookup tables (LUT) are easily built as in Fig. 5 from the latched four-input logic blocks that are paired as ``slices'' in Xilinx's Virtex FPGAs [6]. Other FPGAs are similar. So assume that LUTs, adders, and registers are available in all widths.
Figure 6 shows LUT scaling of data. Storing the required
product, the bottom LUT would suffice for four-bit input. For wider
inputs, four-bit pieces are scaled and results summed.
Infinite-precision coefficients require different LUTs and rounded
products, as in Fig. 7-(a), while finite-precision
coefficients use identical LUTs and shifted output words as in
Fig. 7-(b). Product output
can be narrower if it is
rounded along with the final sum (case not shown).
Without linear-phase coefficient sharing in Fig. 4, each transposed-form sum adds scaled versions of some or all of the eight inputs. In Fig. 8 such a sum requires just one copy per input-word bit of each of two distinct LUTs. Summing these two LUT outputs and then shifting and adding those totals according to bit position is not shown.
An LUT's inputs need not come from just one data word as in Fig. 6 nor from just one bit position as in Fig. 8. In Fig. 9, six 10-bit words are linearly combined with 15 LUTs that do neither. The one-to-one mapping of data-input bits to LUT inputs is in general arbitrary, but minimizing the range of bit positions driving a single LUT, as in Fig. 9, minimizes required LUT output width.
Sign considerations were omitted from this paper for simplicity, but they are straightforward. Scaling a complex input simply requires LUTs driven by both real and imaginary parts of the input data. In general twice as many input bits to the LUT system are needed. Complex outputs require LUTs of twice the width, with half the width dedicated to the real part of the output and half dedicated to the imaginary part.
While the full range of design considerations for a system such as this is beyond the scope of a short paper, we outlined here an approach, methods, and techniques with which a design of a wideband IQ downconverter or similar high-speed, multi-rate DSP system can be implemented in an FPGA using lookup-table approaches and a polyphase architecture.