We report Linpack benchmark results on the TSUBAME supercomputer, a
large scale heterogeneous system equipped with NVIDIA Tesla GPUs
and ClearSpeed SIMD accelerators.
With all of 10,480 Opteron cores, 640 Xeon cores,
648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs,
we have achieved 87.01TFlops, which is the second record
as a heterogeneous system in the world.
This paper describes careful tuning and load balancing method
required to achieve this performance.
On the other hand, since the peak speed is 163 TFlops, the efficiency
is 53\%, which is lower than other systems.
This paper also analyses this gap from the aspect
of system architecture.