Block-Processing Approach to Fractional Sample Rate Conversion with Adjustable Timing

Block-Processing Approach to Fractional Sample Rate Conversion
with Adjustable Timing
Alexandra Groth, Heinz G. Göckler
Digital Signal Processing Group
Ruhr-Universität Bochum
Universitätsstr. 150, D-44800 Bochum,Germany
groth@nt.ruhr-uni-bochum.de
ABSTRACT
In the past various FIR block-processing structures (e.g. for fractional sample rate conversion) have been published.
However, the timing interval of all proposed structures is constrained to integer multiples of Ts=MTi (Ti: timing interval
of input signal). Increasing Ts by p ∈ concurrently raises the number of multipliers by the same factor p. In this
paper, two novel structures are proposed that enable a higher degree of freedom for the optimal choice of the timing
interval. As a result, a refined trade-off between timing interval (system clock) and hardware expenditure (chip area) is
possible.
INTRODUCTION
The reduction of power consumption in digital systems is of increasing interest, especially in satellite communications.
One common method is the power supply voltage scaling technique [1]. To this end, the system has to be clocked at a
multiple of the prescribed sampling interval. Hence, parallel processing is of prime importance for this purpose as well
as for overcoming technological speed constraints.
X (z i )
fi
L
H (z )
Y ( zo )
M
Lf i=Mf o
fo
Fig. 1. Fractional sample rate converter (FSRC).
As a general example, the derivation of efficient block-processing structures for fractional sample rate converters
(FSRC) is revisited. The system theoretic approach is a cascade connection of an L-fold upsampler, an FIR-filter H(z) of
length N and an M-fold downsampler (Fig. 1). As a consequence, all filter operations have to be performed at the
highest rate Lfi=Mfo.
pM
z -1
zi-1
zo-1
pL × pM
i
zi-1
pL
pM
MIMO
pM
LTI
pL
pL
System
zi-1
pM
1-to-pM blocking
zo-1
zo-1
zo-1
pL
pL-to-1 unblocking
Fig. 2. Block-processing structure operating with a timing interval pTs=pMTi.
Recently, various block-processing approaches to FSRC have been published [2-7]. As a result, efficient structures (Fig.
2) are known, where filtering is completely performed at a fixed subnyquist rate fs/p=fi/(pM). Hence, the timing interval
is increased by a factor pLM, while the number of multipliers raises to pN (Fig. 3: ).
Number of Multiplipliers
System Theoretic Approach
Current Block Processing
Structure
New Structure 1
New Structure 2
4N
3N
2N
N
T Ti
2Ti 3Ti
To 2To
Ts
5Ti 6Ti 7Ti 2Ts 9Ti 10Ti 11Ti 3Ts
4To 5To
7To 8To
Timing Interval
Fig. 3. Hardware expenditure (Ideal result, L=4 and M=3)
Since this is often a very bad approximation of the optimal timing interval dictated by the semiconductor process for
fabrication, the aim of this contribution is to deduce structures which allow more flexibility in the choice of timing. As a
consequence, the number of required multipliers and adders can be reduced in compliance with the relationship between
timing interval and hardware expenditure (Fig. 3).
[h0 h L-1 ... h1 ]
M/L
z i-1
z i-1
[hL h 2L-1 ... hL+1 ]
M/L
[hN-L hN-1 ... hN-L+1]
z -1
i
M/L
Digital Hold
and Sample
Fig. 4. Fractional Sample Rate Converter operating at fo (M>1, N/L∈ ) [8]
A previous proposal for an FSRC operating at a sample rate fo (Fig. 4) has been presented in [8] (A transposed version
operating at fi in [9]. Since this structure is not optimal in the sense of writing access to the digital sample and hold and
no versions with αTo, α ∈ { \1} exist, efficient structures with timing intervals αTo and αTi are deduced in section 3.
Beforehand efficient interpolators and decimators are reviewed (Section 2).
EFFICIENT TIME-VARYING REALIZATION OF INTERPOLATORS AND DEZIMATORS
Starting from the system theoretic approach, we derive efficient structures for an interpolator and a decimator, both
operating at the highest occurring sampling rate.
Interpolator
First let us consider the system theoretic approach to a whole-numbered interpolation (Fig. 5a). Due to the L-fold
upsampling only every Lth filter input sample gives rise to a non-zero multiplication. A structure avoiding these
superfluous multiplications, deduced from a polyphase interpolator [10], is depicted in Fig. 5b. By sharing (time-multiplexing) each multiplier between L coefficients, this structure requires  N / L  hardware multipliers and
 N / L  − 1 adders with all hardware beeing operated at the output sampling rate fo except from the  N / L  − 1 delays.
As a consequence, this interpolator represents a time-varying filter with the reduced time interval To for memory access.
X(z i )
g2L
g3L
g1
gL+1
g2L+1
g3L+1
g2L-1
…
gL
…
g0
gL-1
fo
Y(zo )
z i-1
z -1
i
…
…
z i-1
G(zo)
L
g3L-1
gN-1
Fig. 5. a.) Interpolator (System theoretic approach) b.) Efficient realization operating at fo
Starting from a transposed direct form realization of the filter G(zo) we obtain Fig. 5c. In contrast to the previous
realization, this structure is inefficient due the (  N / L  − 1) L delays operating at the high output sample frequency fo.
g3L+1
g2L+1
gL+1
g1
fo
gN-1
z o-L
g3L-1
…
g0
…
gL
…
g2L
…
g3L
g2L -1
gL-1
z o-L
z -L
o
Fig. 5. c.) Inefficient realization operating at fo
Decimator
The same considerations can be applied to whole-numbered decimators. Hence, the final structure (Fig. 6b) still
operates at the input frequency fi avoiding any superfluous operation. Note that the result is again a time-varying filter,
which requires only  N / M  hardware multipliers and  N / M  adders, but  N / M  M − M + 1 delays. The structure
derived from the transposed direct form realization of H(zi) is depicted in Fig. 6c. Although requiring 2  N / M  − 1
adders, it is less expensive due to the 2  N / M  − 1 delays needed.
X(z i )
H(zi)
M
fi
z -M
i
z -M
i
hM
h2M
h3M
h1
hM+1
h2M+1
h3M+1
h2M-1
…
hM-1
…
h0
…
…
z i-M
Y(zo )
h3M-1
hN-1
“0”
Accu
fo
Fig. 6. a.) Decimator (System theoretic approach) b.) Realization operating at fi
hM
h0
h3M+1
h2M+1
hM+1
h1
“0”
Accu
fo
h2M-1
h3M-1
“0”
“0”
fo
Accu
z -1
o
…
h N-1
…
h2M
…
…
fi
h3M
hM-1
“0”
Accu
fo
Accu
z o-1
fo
z o-1
Fig. 6. c.) Realization operating at fi with less delays
EFFICIENT FSRC WITH ADJUSTABLE TIMING
The aim of this section is to derive an efficient FSRC-structure, operating with the timing interval Td=αTo (Structure 1)
or Td=αTi (Structure 2), respectively, as desired. To this end, a time domain polyphase decomposition is applied both to
the input and output signals of Fig. 1. As a result, we obtain a structure still comprising superfluous operations.
Eventually, those arithmetic operations are eliminated by substituting either the interpolators or decimators according to
Fig. 5 and Fig. 6, respectively. Without any loss of generality, we assume that L and M are coprime. As an example,
structure 1 is derived subsequently.
The output signal y(kTo) of the system depicted in Fig. 1 is related to the input signal x(nTi) by [10]
y (kTo ) =
 M r
k L − L 
∑
x (nTi )h(kMT − nLT − rT )
(1)
n =−∞
with r being the unknown additional delay necessary for causality. Since the final block processing structure has to
comprise an input decimator delay chain system, which decomposes the input signal x(nTi) into polyphase components
(with F being the common divider of α and L)
x(nTi ) =
αM
−1
F
∑
m =0
αM
αM


Ti - mTi ), for j
− m = n
 x( j
F
F



0,
otherwise 
(2)
and reduces the sampling rate by deleting the zeros, and an output interpolator delay chain circuit, which interleaves the
polyphase components of the output signal y(kTo) according to
 y (iαTo + lTo ), for iα + l = k 
 = y (kTo ) .
0,
otherwise 
l =0 
α −1
∑
The substitutions
k = iα + l with i ∈ , l ∈ {0, , α − 1}
and
n= j
αM
αM 

− m with j ∈ , n ∈ 0, ,
− 1
F
F


(3)
are introduced into (1). As a result, we obtain the polyphase component of the output signal
y[iαTo + lTo ] =
αM
−1
Ψ
F
∑ ∑ x[ j
m = 0 j =−∞
αM
L
Ti − mTi ]h[(i − j )α MT + lMT + mLT − rT ]
F
F
(4)
with
Fr 
 Fi Fl Fm
.
Ψ= +
+
−
 L α L α M α ML 
The substitution
 β 
β =
α M + ( β )α M with β = lM + mL − r and ( β )α M = β modulo (α M )
α M 
leads to
y[iαTo + lTo ] =
αM
−1
Ψ
F
∑ ∑ x[ j
m = 0 j =−∞


αM
L  lM + mL − r  
α MT + (lM + mL − r )α M T  . (5)
Ti − mTi ]h  i − j + 


αM
F
F 



By defining the polyphase components of type 1 [10]
yl( p1,α ) [iTd ] = y[iαTo + lTo ]
( p1,α M )
µ
h
(Td = α To = α
(6)
[ν Td ] = h[να MT + µT ]
(7)
αM
L
Td ] = x[ j
Ti − mTi ]
F
F
(8)
M
Ti) and of type 3
L
( p 3,
xm
αM
)
F
[j
Eq. (5) can compactly be rewritten as
yl( p1,α ) [iTd ] =
αM
−1
Ψ
F
∑∑
m = 0 j =−∞
( p 3,
xm
αM
)
F
 L  ( p1,α M )
 j F Td  h( lM + mL− r )α M
L

 lM + mL − r  
iTd − j F Td + 
 Td  .
αM



(9)
To guarantee causality, the following requirements have to be met:
1.
The original overall system has to be causal [11], i.e.
h(kMT − nLT − rT ) ( k M )<( n + r ) = 0 .
L
2.
(10)
L
All subsystems have to be causal, i.e.
L

 lM + mL − r  
M)
h((lMp1,+αmL
Td + 
− r )α M iTd − j
 Td  L = 0
αM
F


 i< j
∀ i, j .
(11)
F
Due to block processing the sampling instants ( calculation of output signal) are iαTo. Therefore only input
signals up to jαM/FTi ≤ iαTo (jαM/FTi sampling instants of the input delay-chain decimator circuit) can be
considered when calculating the output signal. As a consequence, the impulse response has to be zero for all times
jαM/FTi > iαTo (or jL/FTd > iTd), i.e. for jL/F > i.
Provided that the original FSRC is causal (condition 1), conditions 2 can only be satisfied, if for (9)
i− j
L  lM + mL − r 
+
 L < 0
αM
F 
i< j
∀ m, l
(12)
F
holds or
 lM + mL − r 

 ≤ 0
αM
∀ m, l
(13)
resp. Hence, this leads to
 α ML


r ∈ 
− L − M + 1 , , ∞  .

 F

(14)
In order to obtain the smallest additional delay, we choose
r = rmin =
α ML
− L − M + 1.
F
Thus, the block processing structure can be described in the time domain by
with ( ∗ stands for convolution)
yl( p1,α ) [iTd ] =
αM
−1
F
∑
m =0
( p 3,
xm
αM
)
F
α −1
 y ( p1,α ) (iTd ), for iα + l = k 
y (kTo ) = ∑  l

0,
otherwise 
l =0 
(15)
α ML


 

 lM + mL − F + L + M − 1  
 L  ( p1,α M )
 Td  .
i F Td  ∗ h(lM + mL− α ML + L + M −1)α M iTd + 
α
M
F


 

 

As it can be seen from xm(p3, αM/F)[i(L/F)Td], the input signal x(nTi) is subjected to an (αM/F)-fold polyphase
decomposition and is then upsampled by L/F. After the filtering with
z
α ML


 lM + mL − F + L + M −1 


αM




d
H ( p1,α M ) α ML
( lM + mL −
F
+ L + M −1)α M
( zd )
the resulting α polyphase components yl(p1,α)[iTd] are interleaved. The resulting structure is depicted in Fig. 7. In order
to eliminate the superfluous operations caused by the (L/F)-fold upsampling each upsampler with the subsequent FIRfilter (interpolator) has to be realized according to Fig. 5.
Similarly, a structure operating at Td=αTo can be derived. To this end, an α-fold time-domain polyphase decomposition
of the input signal and an (αL/F)-fold polyphase decomposition of the output signal is introduced into (1), where
r=(αML)/F-M-L+1 (F beeing the common factor of α and M). As a result, we obtain Fig. 6. Eventually, the superfluous
operations are eliminated by replacing the (M/F)-fold decimator with to Fig. 9.
The number of required hardware (multipliers, adders and delays) is depicted in Fig. 8 for N=360, L=4 and M=3. In
most cases it can be observed that structure 2 needs an unallowable large number of adders and delays. As a
consequence, structure 1 should be preferred. In addition it can be seen that for large timing intervals the gap between
the ideal and real number of multipliers becomes larger if αM (αL resp.) does not divide N. On the other hand the
number of delays decreases even in this cases.
αM
m=0
z -1
i
αM
m=1
z -1
i
-αML+L+M-1
αM
l=0
H(-αML+L+M-1) (zd )
αM
L
zd
L
-αML+L+2M-1
H(-αML+L+2M-1) (zd )
zd
αM
αM
L
-αML+L+αM-1
H(-αML+L+αM-1) (z d )
zd
αM
αM
L
zd
-αLM+2L+M-1
H(-αML+2L+M-1) (zd )
αM
αM
L
zd
-αML+2L+2M-1
H(-αML+2L+2M-1) (zd )
αM
αM
L
zd
-αML+2L+αM-1
H(-αML+2L+αM-1) (zd )
αM
αM
α
z -1
o
l=1
α
l=α-1
α
z -1
o
z -1
o
z -1
i
m=αM-1
αM
L
H(M-1)
L
H(2M-1)
L
H(αM-1)
(zd )
αM
(zd )
αM
αM
(zd )
Interpolator
Fig. 7. FSRC with a timing interval Td=αTo (redundant arithmetic operations still remaining, α und L coprime).
1XP EHURI$ GGHUV
7LP LQJ,QWHUYDO7
V
1XP EHURI'HOD\( OHP HQWV
1XPEHURI0XOWLSOLHUV
7LP LQJ,QWHUYDO7
V
7LP LQJ,QWHUYDO7
V
Fig. 8. Hardware expenditure, example N=360, L=4 and M=3)
CONCLUSION
A novel systematic and rigorous derivation of block-processing structures operating with a timing interval of αTi or αTo
( α ∈ ) has been given. As a consequence, we obtain efficient structures that enable more freedom for the choice of
the optimal timing interval and, hence, allow a trade- off between system clock and chip area.
α
z -1
i
z -1
i
z -1
i
α
α
m=0
m=1
m=α-1
zd
-αML+L+M-1
αL
zd
-αML+2L+M-1
H(-αML+2L+M-1) (zd )
αL
αL
M
zd
-αLM+αL+M-1
H(-αML+αL+M-1) (zd )
αL
αL
M
zd
-αLM+L+2M-1
H(-αML+L+2M-1) (zd )
αL
αL
M
zd
-αML+2L+2M-1
H(-αML+2L+2M-1) (z d)
αL
αL
M
zd
-αML+αL+2M-1
H(-αML+αL+2M-1) (zd )
αL
αL
M
H(-αML+L+M-1) (zd )
αL
M
l=0
αL
z -1
o
l=1
αL
z -1
o
z -1
o
l=αL-1
H(L-1)
αL
H(2L-1)
αL
H (αL-1)
αL
(zd )
M
(zd )
M
(zd )
αL
M
Decimator
Fig. 9. FSRC with a timing interval of Td=αTi (redundant arithmetic operations still remaining, α und M coprime)
REFERENCES
[1] A.P. Chandrakasan, and R.W. Broderson, “Minimizing Power Consumption in Digital CMOS Circuits,” Proc. of
IEEE, vol. 83, no. 4, pp. 498-523, , April 1995.
[2] C.C. Hsiao, “Filter Matrix for Rational Sampling Rate Conversions,” IEEE Int. Conf. Acoustics, Speech, Signal
Processing ICASSP '87, Dallas, pp. 2173-2176, 1987.
[3] P.P. Vaidyanathan, “Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications: A Tutorial,”
Proc. of IEEE, vol. 78, no. 1, pp. . 56-93, Januar 1990.
[4] W.H. Yim, and F.P. Coakley, B.G. Evans “Extended Polyphase Structures for Multirate DSP,” IEE Proc.-F, vol.
139, no. 4, , pp.273-277, August 1992.
[5] H.G. Göckler, G. Evangelista, and A. Groth, “em Minimal Polyphase Implementation of Fractional Sample Rate
Conversion,” Signal Processing, vol. 81, no.4, pp. 673-691, April 2001.
[6] A. Groth, and H.G. Göckler, “Efficient Minimum Group Delay Block Processing Approach to Fractional Sample
Rate Conversion,” ISCAS '01, Sydney, Australia, vol. II, pp. 189-192, May 2001.
[7] A. Groth, and H.G. Göckler, “Polyphase Implemenation of Unrestricted Fractional Sample Rate Conversion,”
Internal Report, http://www.nt.ruhr-uni-bochum.de/lehrstuhl/mitarbeiter/alex.html, 2000.
[8] R. Crochiere and L. Rabiner, “Interpolation and Decimation of Digital Signals: A Tutorial Review,” Proc. IEEE,
vol. 69, pp. 300-331, March 1981.
[9] J. Webb, “Transposed FIR Filter Structure with Time-Varying Coefficients for Digital Data Resampling,” IEEE
Trans. on Signal Processing, vol. 48, pp. 2594-2600, September 2000.
[10] P.P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall Signal Processing Series, 1993.
[11] H.W. Schüßler, Digitale Signalverarbeitung 1, Analyse diskreter Signale und Systeme, Springer Verlag, Berlin,
1994.