In this tutorial you will learn
1) how to hack makefiles
2) how to view assembly code generated by gcc (works on any platform)
3) how to run assembly through the SPU timing tool and how to interpret the output of the static analysis (in terms of the effectiveness of software pipelining)
Install the spu_timing tool:
1. Mount 'CellSDK-Extras-Fedora_3.1.0.0.0.iso,' the 'cell-spu-timing-3.1-2.i686.rpm' will be located in the x86/ folder
2. Use 'rpm -i cell-spu-timing-3.1-2.i686.rpm' command to install it
3. the 'spu_timing' tool will be shown in the '/opt/cell/sdk/usr/bin/' folder
4. Use 'spu-gcc -S <filename>.c' to generate '.s' file
5. Use '/opt/cell/sdk/usr/bin/spu_timing <filename>.s' to generate '.s.timing' file
1)start X window first
2)ssh -Y liw34@cnode1
or ssh mills.cas.mcmaster.ca first and then
ssh -Y liw34@cnode1.cas.mcmaster.ca
cp -r ibm ~
spu-gcc -S hello.c
/opt/ibm/cell-sdk/prototype/bin/
spu_timing hello.s
ls
more hello.s.timing
Explanation copied from programming tutorial
3.7.2.1 Static Analysis of SPE Threads (Page 115-116)
The listing below shows an spu-timing static timing analysis for the
inner loop of the SPE code
illustrated in Section3.6.3.3 on page109, the Euler Particle-System
Simulation example. This
listing shows significant dependency stalls (indicated by the -) and
poor dual-issue rates. The
inner loop has an instruction mix of eight even-pipeline (pipe 0)
instructions and ten odd-pipeline
(pipe 1) instructions. Therefore, any program changes that minimize
data dependencies will
improve dual-issue rates and lower the cycle per instruction (CPI).
.L19:
0D 78 a
$49,$8,$10
1D 012 789 lqx
$51,$6,$9
0D 89 ila
$47,66051
1D 0123 89 lqx
$52,$6,$11
0 0 9 ai
$7,$7,-1
0 ----456789 fma
$50,$51,$12,$52
1 -----012345 stqx
$50,$6,$11
1 123456 lqx
$48,$8,$10
0D 23 ai
$8,$8,4
1D 234567 lqa $44,ctx+16
1 345678 lqx $43,$6,$9
1 ---7890 rotqby $46,$48,$49
1 ---1234 shufb $45,$46,$46,$47
0 ---567890 fm $42,$12,$45
0d -----123456 fma $41,$42,$44,$43
1d ------789012 stqx $41,$6,$9
0D 89 ai $6,$6,16
.L39:
1D 8901 brnz $7,.L19
The character columns in the above static-analysis listing have the
following meanings:
Column 1 The first column shows the pipeline that issued an
instruction. Pipeline 0 is repre-
sented by 0 in the first column and pipeline 1 is represented by 1.
Column 2 The second column can contain a D, d, or nothing. A D
signifies a successful
dual-issue was accomplished by the two instructions listed in
row-pairs. A d signifies a dual-
issue was possible, but did not occur due to dependencies; for example,
operands being in
flight. If there is no entry in the second column, dual-issue could not
be performed because
the issue rules were not satisfied (for example, an even-pipeline
instruction was fetched from
an odd LS address or an odd-pipeline instruction was fetched from an
even LS address). See
Section3.1.1.4 Pipelines and Dual-Issue Rules on page61.
Column 3 The third column is always blank.
Columns 4 through 53 The next 50 columns represent clock cycles and
are repeated as
0123456789 five times. A digit is displayed in these columns whenever
the instruction exe-
cutes during that clock cycle. Therefore, an <n>-cycle instruction will
display <n> digits.
Dependency stalls are flagged by a dash (-).
Columns 54 and boyond The remaining entries on the row are the
assembly-language
instructions or assembler-line addresses (for example, .L19) of the
program's assembly
code.
Static-analysis timing files can be quickly interpreted by:
Scanning the columns of digits. Small slopes (more horizontal) are
bad. Large slopes (more
vertical) are good.
Looking for instructions with dependencies (those with dashes in the
listing).
Looking for instructions with poor dual-issue rates either a d or
nothing in column 2.
This information can be used to understand what areas of code are
scheduled well and which are poorly scheduled.
About SPU_TIMING
If you are using a Bash shell, you can set SPU_TIMING as a shell
variable by using the command
export SPU_TIMING=1. You can also set SPU_TIMING in the makefile and
build the .s file by using
the following statement:
SPU_TIMING=1 make foo.s
This creates the timing file for file foo.c. It sets the SPU_TIMING
variable only in the sub-shell of
the makefile. It generates foo.s and then invokes spu-timing on foo.s
to produce a
foo.s.timing file.
Another way to invoke the performance tool is by entering one of the
following statements in the
command prompt, depending on which compiler generated that assembly:
spu-timing foo.s