performance - Matlab + CUDA slow in solving matrix-vector equation A*x=B -



performance - Matlab + CUDA slow in solving matrix-vector equation A*x=B -

i calculating equation a*x=b, matrix , b vector, x reply (unknown) vector.

hardware specs: intel i7 3630qm (4 cores), nvidia geforce gt 640m (384 cuda cores)

here's example:

>> a=rand(5000); >> b=rand(5000,1); >> agpu=gpuarray(a); >> bgpu=gpuarray(b); >> tic;a\b;toc; elapsed time 1.382281 seconds. >> tic;agpu\bgpu;toc; elapsed time 4.775395 seconds.

somehow gpu much slower... why? slower in fft, inv, lu calculations, should related matrix division.

however, gpu much faster in matrix multiplication (the same data):

>> tic;a*b;toc; elapsed time 0.014700 seconds. >> tic;agpu*bgpu;toc; elapsed time 0.000505 seconds.

the main question why gpu a\b (mldivide) slow comparing cpu?

updated

here more results when a, b (on cpu), aa, bb (on gpu) rand(5000):

>> tic;fft(a);toc; elapsed time *0.117189 *seconds. >> tic;fft(aa);toc; elapsed time 1.062969 seconds. >> tic;fft(aa);toc; elapsed time 0.542242 seconds. >> tic;fft(aa);toc; elapsed time *0.229773* seconds. >> tic;fft(aa);toc;

bold times stable times. gpu twice slower. way, why gpu more slower on first 2 attempts? compiled twice firstly?

in addition:

>> tic;sin(a);toc; elapsed time *0.121008* seconds. >> tic;sin(aa);toc; elapsed time 0.020448 seconds. >> tic;sin(aa);toc; elapsed time 0.157209 seconds. >> tic;sin(aa);toc; elapsed time *0.000419 *seconds

after 2 calculations gpu incredibly faster in sin calculations.

so, still, why gpu slow in matrix division, fft , similar calculations, though fast in matrix multiplication , trigonometry? question should not that... gpu should faster in these calculations because matlab has released overlapped functions (mldivide, fft) gpu.

could help me solve these issues, please? :)

please read how matlab calculates solutions. help understand why gpu slower.

i'll seek in few words.

a*x=b becomes l*(u*x=y)=b, l*u=a

so matlab makes l*u (this process cannot done parallel far know instead steps can done parallel, due nature) then matlab solves l*y=b , finds y. (this process cannot done parallel each step requires info previous) then matlab solves u*x=y , finds x. (this process cannot done parallel each step requires info previous)

so gpu clock slower cpu, , since processes cannot done parallel, cpu faster. , no, unless come improve method (good luck!) gpu slower except in specific cases.

performance matlab matrix cuda linear-algebra

Comments

Popular posts from this blog

web services - java.lang.NoClassDefFoundError: Could not initialize class net.sf.cglib.proxy.Enhancer -

Accessing MATLAB's unicode strings from C -

javascript - mongodb won't find my schema method in nested container -