performance - Matlab + CUDA slow in solving matrix-vector equation A*x=B -
performance - Matlab + CUDA slow in solving matrix-vector equation A*x=B -
i calculating equation a*x=b, matrix , b vector, x reply (unknown) vector.
hardware specs: intel i7 3630qm (4 cores), nvidia geforce gt 640m (384 cuda cores)
here's example:
>> a=rand(5000); >> b=rand(5000,1); >> agpu=gpuarray(a); >> bgpu=gpuarray(b); >> tic;a\b;toc; elapsed time 1.382281 seconds. >> tic;agpu\bgpu;toc; elapsed time 4.775395 seconds.
somehow gpu much slower... why? slower in fft, inv, lu calculations, should related matrix division.
however, gpu much faster in matrix multiplication (the same data):
>> tic;a*b;toc; elapsed time 0.014700 seconds. >> tic;agpu*bgpu;toc; elapsed time 0.000505 seconds.
the main question why gpu a\b (mldivide) slow comparing cpu?
updated
here more results when a, b (on cpu), aa, bb (on gpu) rand(5000):
>> tic;fft(a);toc; elapsed time *0.117189 *seconds. >> tic;fft(aa);toc; elapsed time 1.062969 seconds. >> tic;fft(aa);toc; elapsed time 0.542242 seconds. >> tic;fft(aa);toc; elapsed time *0.229773* seconds. >> tic;fft(aa);toc;
bold times stable times. gpu twice slower. way, why gpu more slower on first 2 attempts? compiled twice firstly?
in addition:
>> tic;sin(a);toc; elapsed time *0.121008* seconds. >> tic;sin(aa);toc; elapsed time 0.020448 seconds. >> tic;sin(aa);toc; elapsed time 0.157209 seconds. >> tic;sin(aa);toc; elapsed time *0.000419 *seconds
after 2 calculations gpu incredibly faster in sin calculations.
so, still, why gpu slow in matrix division, fft , similar calculations, though fast in matrix multiplication , trigonometry? question should not that... gpu should faster in these calculations because matlab has released overlapped functions (mldivide, fft) gpu.
could help me solve these issues, please? :)
please read how matlab calculates solutions. help understand why gpu slower.
i'll seek in few words.
a*x=b becomes l*(u*x=y)=b, l*u=a
so matlab makes l*u (this process cannot done parallel far know instead steps can done parallel, due nature) then matlab solves l*y=b , finds y. (this process cannot done parallel each step requires info previous) then matlab solves u*x=y , finds x. (this process cannot done parallel each step requires info previous)so gpu clock slower cpu, , since processes cannot done parallel, cpu faster. , no, unless come improve method (good luck!) gpu slower except in specific cases.
performance matlab matrix cuda linear-algebra
Comments
Post a Comment