Report on Computational Science and Engineering Support Activities at Daresbury Laboratory 1999/2000 icon

Report on Computational Science and Engineering Support Activities at Daresbury Laboratory 1999/2000


1 чел. помогло.
Similar
Report on Computational Science and Engineering Support Activities at Daresbury Laboratory...
Center for Vision Science and Technology and Visual Processing Laboratory...
Computational Neuroethology Laboratory...
Naval Research Laboratory, Center for Bio/Molecular Science and Engineering, Rm. 406 Bldg. 30...
As Chairman, I now present our annual report on the activities of the charity...
B. Sc. Degree in Electrical Engineering, University of Engineering and Technology Lahore...
Japan Science and Technology Agency icorp, Computational Brain Project...
Using Compendium as a tool to support the design of learning activities...
Professor of Chemical Engineering and Director of Polymer Science and Engineering...
Template for proposing new mars science laboratory landing sites...
Report of fep activities & European Events...
Proceedings of the International Conference...



страницы: 1   ...   62   63   64   65   66   67   68   69   70
return to the beginning
^

Applications Performance: FLITE3D




M.F. Guest

CLRC Daresbury Laboratory, Daresbury, Warrington WA4 4AD.


FLITE3D is a finite-element code for solving the Euler equations governing airflow over whole aircraft. Parallelisation of FLITE3D for shared and distributed memory parallel systems has been undertaken as part of a collaboration between the Computational Engineering Group at Daresbury and the Sowerby Research Centre at British Aerospace. The code comprises a suite of modules for obtaining Euler solutions of the flow over complex configurations.


Work has been carried out on the parallelisation of the steady Euler flow solver using standard techniques of mesh-partitioning for a single-program multiple-data (SPMD) programming model implemented in Fortran 77 and C with message passing using a choice (selectable at compile time) of MPI or PVM. The flow solver now reads in the partitioned mesh and performs the necessary communications at the boundaries between sub-domains. Fields are gathered onto the master processor for output so that no changes are necessary in the post-processing stages. This also enables the flow solver to be stopped and restarted using a different number of processors, if necessary. Table 1 shows timings on the Cray T3E/1200E, IBM-SP/WH2-375 and both Pentium- and Alpha-Beowulf systems for two MPI-based FLITE3D benchmark studies, (i) a modest wing body benchmark using 298,244 elements, and (ii) the more demanding F18 benchmark using 3,444,350 elements.


Table 1: Time in Wall Clock Seconds for the FLITE3D benchmarks on the Cray T3E/1200E, IBM SP/WH2-375 and Pentium and Alpha Beowulf Systems.




CPUS

Cray

T3E / 1200E

IBM

SP/WH2-375

Pentium

Beowulf II

Alpha

Beowulf III










(LAM/MPI)

(QsNet)

wing body benchmark (298,244 elements)

4

510

120.6

468

95

8

280

68.1

247

50

16

160

43.4

136

31

32

100

30.9

79

20

F18 benchmark (3,444,350 elements)

4

5040

1177

4630

950

8

2620

581

2360

478

16

1350

306

1260

263

32

720

172

690

150

64

415










128

250













These benchmarks provide further compelling evidence as to the value of the Beowulf clusters, and to the limited performance of the Cray EV56 node. Focusing on just the largest F18 benchmark, we see that although the Cray is scaling well (a speedup of 81 on 128 nodes), the Pentium cluster outperforms the Cray T3E/1200E at all node counts. Beowulf II shows a percentage delivery figure of 145% of the Cray T3E on 32 nodes. This figure increases substantially on the more powerful CPUs of the IBM SP/WH2-375 and Alpha Cluster. The Linux Alpha Beowulf III outperforms the 32-node Cray T3E by a factor of 4.8, with the 32-node Alpha time significantly faster than that recorded on 128 nodes of the Cray. The relative performance of the IBM SP/WH2 is also impressive. While slower than the Alpha cluster, the 32-CPU SP timing is again significantly faster than that recorded on 128 nodes of the Cray. Although the code was originally developed for the Cray, these results strongly suggest that the individual node performance of the T3E is far from optimal.


^

Applications Performance: SUMMARY




M.F. Guest

CLRC Daresbury Laboratory, Daresbury, Warrington, WA4 4AD.


We summarise the conclusions of the benchmarking exercise on applications reported in a number of separate articles in Table 1, by showing the % of a 32-node partition of the Cray T3E/1200E delivered by both the Pentium-based Beowulf II and Alpha-based Beowulf III systems (i.e. T32-nodeCray T3E / T32-nodeBeowulf).


These figures suggest the following:




  1. In many of the applications the inexpensive Pentium-based system with simple fast ethernet connection delivers a significant fraction of Cray/T3E performance. While applications with extensive communication demands clearly exhibit inferior performance and scalability on the PC-based system (e.g. CASTEP, DL_POLY with bond constraints, direct-MP2 gradient calculations using GAMESS-UK), the delivered performance is at worst 34% of the Cray T3E/1200E. Many of the other applications show a much higher delivered level of performance; in a few cases the Beowulf cluster actually equals or exceeds Cray performance (e.g. CRYSTAL, Ewald-based DL_POLY, CHARMM, FLITE3D). In these cases it makes little or no sense to be using the T3E for 32-node runs when equivalent performance is achieved by a solution that costs a tiny fraction of that associated with using the high-end machine.




  1. While there are a number of performance issues associated with the QSNet Linux Alpha Cluster that are the subject of on-going study (notably the limited memory bandwidth of the UP2000, and the effective utilisation of L2 cache - the so-called issue of "page colouring" under Linux), initial results from the Alpha cluster are most encouraging. In all benchmarks, the 32-CPU cluster exceeds the performance of 64-nodes of the Cray T3E/1200E (and that associated with the 32-CPU IBM/SP Winterhawk2). In optimal cases (those marked with a § in the Table) the Linux Cluster is outperforming 128-nodes of the Cray T3E/1200E. This provides fairly compelling evidence that suitably-configured Beowulf systems can provide not only highly cost-effective departmental, mid-range solutions, but can match the levels of performance associated with a significant fraction of a high-end MPP machine, again for a small fraction of the cost.




Table 1: Application Performance: Percentage of 32-node partition of the Cray T3E/1200E achieved by the 32-node Pentium Beowulf II and 32-node Alpha Beowulf III.



Code

Pentium-III

Linux Alpha EV67/667




Beowulf II

Beowulf III




(%)

(%)

GAMESS-UK







SCF

53-69%

202%

DFT

65-85%

255-326% (§)

DFT (Jfit)

43-77%

174-226%

DFT Gradient

90%

340% (§)

MP2 Gradient

44%




SCF Forces

80%













CRYSTAL

145%

349% (§)










NWChem (DFT Jfit)

56-77%

296-322%










REALC

67%













DL_POLY







Ewald-based

95-107%

352-447% (§)

bond constraints

34-56%

143-260%










CHARMM

96%

318% (§)










CASTEP

34%




CPMD

62%













ANGUS

60%

250%

FLITE3D

104%

480% (§)





212






Download 2.54 Mb.
leave a comment
Page70/70
Date conversion31.08.2011
Size2.54 Mb.
TypeДокументы, Educational materials
Add document to your blog or website

страницы: 1   ...   62   63   64   65   66   67   68   69   70
плохо
  1
Your rate:
Place this button on your site:
docs.exdat.com

The database is protected by copyright ©exdat 2000-2017
При копировании материала укажите ссылку
send message
Documents

upload
Documents

Рейтинг@Mail.ru
наверх