From - Fri Sep 20 19:40:34 1996 Path: news.unizar.es!news.rediris.es!news.uoregon.edu!enews.sgi.com!news.corp.sgi.com!walter.cray.com!fido.asd.sgi.com!news From: "Omar G. Stradella" Newsgroups: comp.sys.sgi.misc Subject: Re: Performance differences between f90 and cc Date: Wed, 18 Sep 1996 11:26:17 -0400 Organization: Silicon Graphics, Inc. Lines: 103 Message-ID: <32401499.41C6@boston.sgi.com> References: <323FD7D3.41C6@na.uni-tuebingen.de> NNTP-Posting-Host: arkham.boston.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.0C-SGI (X11; I; IRIX 6.2 IP22) Add -OPT:alias=restrict to the cc compile line. The problem is that the C compiler has no way of telling that the memory regions pointed to by x and y do not overlap. In Fortran, argument aliasing is forbidden by the standard. Omar. Eberhard Pasch wrote: > > We have an Indigo2 with a R8000 and Irix 6.2. > > The FORTRAN compiler seems to be much better than the > C compiler. Please take a look at the following example. > The code out f90 is FOUR times faster than the code > produced by the cc Compiler. > I've tried many of the cc optimization options without > success. > > FORTRAN source > > subroutine daxpy(n,da,dx,incx,dy,incy) > > double precision dx(*),dy(*),da > integer i,incx,incy,n > > do 10 i = 1,n,1 > dy(i) = dy(i) + da*dx(i) > 10 continue > return > end > > out of .s file: (f90 -S -c -O3) > # > # Pipelined loop line 6 steady state > # > # 4 unrollings before pipelining > # 6 cycles per 4 iterations > # 8 flops ( 33% of peak) (madds count as 2) > # 4 flops ( 33% of peak) (madds count as 1) > # 4 madds ( 33% of peak) > # 12 mem refs (100% of peak) > # 2 integer ops ( 16% of peak) > # 18 instructions ( 75% of peak) > # 1 short trip threshold > # 7 ireg registers used. > # 11 fgr registers used. > # > > This version uses 100% of the processor performance. > > C source > > void daxpyc(int n, double a, double *x, double *y) > { > int i; > for (i=0; i y[i]=y[i]+a*x[i]; > } > > out of .s file: (cc -S -c -O3) > # > # Pipelined loop line 4 steady state > # > # 4 unrollings before pipelining > # 24 cycles per 4 iterations > # 8 flops ( 8% of peak) (madds count as 2) > # 4 flops ( 8% of peak) (madds count as 1) > # 4 madds ( 8% of peak) > # 12 mem refs ( 25% of peak) > # 2 integer ops ( 4% of peak) > # 18 instructions ( 18% of peak) > # 1 short trip threshold > # 7 ireg registers used. > # 4 fgr registers used. > # > # 6 min cycles required for resources > # 24 cycles required for recurrence(s) > # 12 operations in largest recurrence > # > > This version uses only 25% of the processor. > > Has anyone else made this observation? > What are the correct options for the cc Compiler? > > Any help is appreciated. > > Eberhard Pasch > -- > ************************************ > * Eberhard Pasch * > * EMail: pasch@na.uni-tuebingen.de * > ************************************ -- +-----------------------------------------------------------------------+ Omar G. Stradella, Ph.D. Supercomputing Applications Silicon Graphics, Inc. Computational Chemistry One Cabot Road, Hudson, MA 01749, USA N 42 22'41" W 71 33'45" E-mail: omar@boston.sgi.com Phone: +1-508-567-2258 FAX: +1-508-562-4755 http://www.sgi.com/ChemBio http://reality.sgi.com/omar +-------- Ph-nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn ---------+