Because CPU speed has increased dramatically when compared with memory
speed, the slowness of memory hinders the overall system performance.
A method combining the loop pipelining technique with
data prefetching, called Partition Scheduling with Prefetching
(PSP), is proposed. In PSP, the iteration space is first divided
into regular partitions.
Then a two-part schedule, consisting of
the ALU and memory parts, is produced and balanced to
produce high throughput. These two parts are
executed simultaneously, and hence the remote memory latencies are
overlapped. We study the optimal partition shape and size so that a
well balanced overall schedule can be obtained. Experiments on DSP
benchmarks show that the proposed methodology consistently produces
optimal or near optimal solutions.
Experiments show that the average schedule length obtained by PSP is
of that derived using list scheduling.
Two journal papers and 5 conference papers were published and submitted under this category.