On 21 Oct 2008 00:04:49 GMT, Andrew Reilly
<andrew-newspost@[EMAIL PROTECTED]
> wrote:
>On Mon, 20 Oct 2008 14:21:09 -0700, Eric Jacobsen wrote:
>
>> I can go on and on, but I hope you're starting to get the drift. For
>> people who actually have to build things, being constrained by the
>> availability and details of the building blocks makes one look much
>> further than just academic papers and software library comparisons. All
>> that being said, I don't think I've personally plugged in an FHT
>> anywhere serious in at least fifteen years. As I said in my very
>> first post in this thread, memory and processing is cheap enough these
>> days that saving development time by plugging in a canned FFT is almost
>> always the way to go. Evidently the difference between you and me is
>> that I recognize that there are still those odd corner cases once in a
>> while where the tradeoffs can go the other way.
>
>I'm afraid that you still haven't made the case, at least to me.
I'm not really trying to make a case, believe it or not. I don't
really care what people use, and as I mentioned *I* haven't seen a
reason to personally use an FHT in a long time. I do admit to not
feeling a need to keep up with whatever details people are changing in
either FHTs or FFTs to try to make that last gasp of difference,
because:
a) unless the assumptions apply to the cases I'll encounter (and who
knows what those'll be) it may not matter
b) because any difference is likely to be small it'll probably be
easier to make it up somewhere else rather than replace a block that's
already developed and debugged (i.e., a canned transform) with new
development.
I've not yet encountered a situation where b) would kick in, although
I can certainly imagine cases where it might were I to be called on to
solve that particular problem.
> In
>fact, a couple of your comments, specifically about wasted space,
>irregularity of access pattern (i.e., address generator complexity) and
>the bit about no multiplications in the first two p***** makes me think
>that you haven't looked at an FFT algorithm in about as long as your FHT
>experience. I'm not in Steven's league, but none of the FFT code that
>I've written in the last several years has had much other than linear
>memory addressing, and hasn't used multiplies in the first two p*****.
>The first FFT machine that I encountered was indeed a VLSI device (the
>Austek A41102.) It achieved it's good performance for the day by
>streaming data in and out of DRAM chips with sequences of column-access
>cycles. That was pretty quickly overtaken by Motorola DSP chips, though.
>
>I suspect that a paraphrase of Steven's point (and, indeed my own
>interest) is not to say that there's anything *necessarily* worse about
>the FHT than the FFT (apart from the point that I make below), but that
>in any given instance, it would seem likely that a technique that makes
>the FHT faster could *also* be used to make the FFT faster. That's why
>he's keen to find particular examples and insights of such corner cases.
One only needs to look at the differences between the two transforms
to see where advantages could lie. Steven even pointed out that with
real-input FFTs many have complications that are a PITA. The
reference I cited mentioned that this was a key avenue in getting the
simplifications that led to their use of the FHT. There wasn't
enough detail in that paper to know exactly how those differences were
exploited, and it's been too long since I've built anything at that
level (because it's made more sense to use the canned stuff!) that I
just can't recall the tricks.
I'm not inclined to dig out old stuff and sort out what the heck I did
well over a decade ago, but it was clear to me that the differences in
the details were real enough to play a lot of tradeoffs in both
twiddle and data memory management depending on how one played the
rest of the tradeoffs (e.g., algorithm selection, memory architecture,
pipelining, etc., etc.). I know my lack of detailed recollection
doesn't count for much, but I don't know why people think that the two
transforms are so identical that there are no differences to exploit
when conditions allow. That's just not the case.
>Why are we computing a transform in the first place? Do we want to
>analyse the power spectrum, or do we want to perform convolution
>filtering, or some other purpose? It's been a while since I looked, but
>my memory of the FHT has it requiring a bunch of additional sum/
>difference operations on X_k and X_{N-k}, in order to perform either of
>those operations. That's not a particularly attractive addressing
>sequence, and a bunch of additional memory operations, to my eye.
>
>Cheers,
Actually, that's a reasonable case in point. If the transform is
done in-place in a linear memory (say, a length Nx1 memory for an N-pt
transform), and one only wants the magnitude of the output vector, if
the memory is dual-****t the magnitude-squared vector can be formed in
N cycles. The funky output symmetry that you mention, where the
equivalent FT real and imaginary parts are computed from the output
as:
re(FT(f)) = HT(f) + HT(N-f)
im(FT(f)) = HT(f) - HT(N-f)
actually provides a simple way to get mag-squared output (which is
close to what the OP was looking for, and is, actually what made me
think of this whole thing). If you want the mag-squared, it boils
down to:
re(FT(f))^2 + im(FT(f))^2 = 2 [ HT(f)^2 + HT(N-f)^2 ]
You can get the factor of two for free (in hardware, anyway), so
that's not a complication. With a dual-****t memory you can
simultaneously read HT(f) and HT(N-f) and progress through the
computations with two counters (or a counter and an adder or
subtractor). That's fairly simple and certainly not more complex
than the FT case.
Reading re(FT(f)) and im(FT(f)) is pretty simple, too, depending on
how the output data is arranged in the buffer. Splitting it into two
N/2 buffers is often workable and then only requires one counter for
the address generation, but one of the subtleties is evaluating how
the memories have to be architected. From a length-N real input to
either a length N/2 complex output, a length-N output with re() and
im() parts arranged somehow, or whatever. If one wants only one
buffer and the transform done in-place the tradeoffs may change from
having separate buffers for input and output and processing. A
multi-****t memory makes some of it easier (different pieces for either
case), but the differences also start to become more apparent.
The correlation and convolution operations are nearly the same as for
the FT case, except that (just like with the magnitude) one address is
incrementing while the other decrements. The operations are
otherwise essentially the same. Whether a particular implementation
finds it "better" to use a single, length-N memory, a single length-N
memory with the address space split, or two length-N/2 memories for
the buffering could move the tradeoff one way or the other. The
differences are very small otherwise.
Where one actually finds an advantage either way will likely depend on
the details of how one wants to manage the memory, how much memory is
available and how it is arranged, and how much addrress generation and
control hardware one can afford. How many lines/bits are twiddling
during the operation affects power consumption, and there are likely
differences there as well (how could there not be, they're not
perfectly identical).
The article I cited seemed to indicate that that author found a lot of
simplification in those sorts of tradeoffs with the FHT. YMMV if you
don't have the same libraries and assumptions as he did, but I do not
find that author's conclusions incredible. In that article for the
stand-alone solutions investigated (i.e., comparing a particular FHT
to a particular FFT) while the FFT had smaller area its latency and
power consumption were much higher than the FHT. So, again,
depending on what's im****tant to you the tradeoffs can go different
directions. There is no pat answer, and claiming that an FFT is
always better under all conditions and cir***stance (if anyone is
actually claiming that) would seem to me to be a stretch, especially
if that point of view is based on analyses that didn't take metrics
into account that are im****tant to an implementer at the time. Having
the sort of prescience and omniscience to cover all such cases just
seems incredibly implausible to me given that there are differences in
the two approaches and the sorts of things that can become im****tant
in an implementation are often pretty peculiar.
Again, if nobody is claiming that an FFT is always better than an FHT
under all cir***stances, then there's not really anything to argue
about. It's just an interesting discussion as far as I'm concerned.
Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org
Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php


|