{"id":240,"date":"2011-09-11T01:05:23","date_gmt":"2011-09-11T05:05:23","guid":{"rendered":"http:\/\/zhanxw.com\/blog\/?p=240"},"modified":"2011-09-11T15:41:31","modified_gmt":"2011-09-11T19:41:31","slug":"%e5%8a%a0%e9%80%9fr%e7%9a%84%e7%9f%a9%e9%98%b5%e8%bf%90%e7%ae%97speed-up-r-matrix-computation","status":"publish","type":"post","link":"https:\/\/zhanxw.com\/blog\/2011\/09\/%e5%8a%a0%e9%80%9fr%e7%9a%84%e7%9f%a9%e9%98%b5%e8%bf%90%e7%ae%97speed-up-r-matrix-computation\/","title":{"rendered":"\u52a0\u901fR\u7684\u77e9\u9635\u8fd0\u7b97(Speed up R matrix computation)"},"content":{"rendered":"<p>Speed up R matrix computation with smallest effort.<\/p>\n<p>\u7ed9R\u63d0\u901f\u6709\u4e24\u4e2a\u65b9\u6cd5\uff1a<br \/>\n1. \u4f7f\u7528Intel compiler<br \/>\n2. \u4f7f\u7528\u66f4\u5feb\u7684\u77e9\u9635\u8fd0\u7b97\u5e93<\/p>\n<p>\u5176\u4e2d\u6211\u4f7f\u7528\u7b2c\u4e00\u4e2a\u65b9\u6cd5\u5e76\u6ca1\u6709\u770b\u5230\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\uff0c\u6240\u4ee5\u8fd9\u91cc\u4ecb\u7ecd\u7b2c2\u79cd\u65b9\u6cd5\uff0c\u4fdd\u8bc1\u77e9\u9635\u8fd0\u7b97\u81f3\u5c11\u63d0\u901f2\u500d\u3002<br \/>\n\u6211\u4f7f\u7528\u7684\u662fR-2.13.1\u7248\u672c\uff0c\u77e9\u9635\u5e93\u4f7f\u7528GotoBLAS\u3002<br \/>\n\u6839\u636e\u4e0b\u9762\u8fd9\u4e2a\u94fe\u63a5\uff0c<br \/>\nhttp:\/\/r.789695.n4.nabble.com\/configure-can-t-find-dgemm-in-MKL10-td920212.html<br \/>\nGotoBLAS\u6bd4Intel MKL\u5feb\u3002\u636e\u8bf4\uff0cGotoBLAS\u6bd4ATLAS\u4e5f\u8981\u5feb\u3002<\/p>\n<p>\u5177\u4f53\u6b65\u9aa4\u5982\u4e0b\uff1a<br \/>\n\uff081\uff09\u5efa\u7acb\u4e00\u4e2ashell \u6e90\u6587\u4ef6\uff1a<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nexport FFLAGS=&quot;-march=native -O3&quot;\r\nexport CFLAGS=&quot;-march=native -O3 -DMKL_ILP64&quot;\r\nexport CXXFLAGS=&quot;-march=native -O3 -DMKL_ILP64&quot;\r\nexport FCFLAGS=&quot;-march=native -O3&quot;\r\n\r\n.\/configure --enable-R-shlib --enable-BLAS-shlib --with-blas --with-lapack --prefix=\/net\/dumbo\/home\/zhanxw\/software\/Rmkl                  \r\n<\/pre>\n<p>\u4e4b\u540e\u7528make, make install\u5b89\u88c5\u3002<br \/>\n\uff082\uff09\u4e0b\u8f7dGotoBLAS\uff0c\u5728\u6e90\u76ee\u5f55&#8217;make&#8217;\u5373\u53ef\uff0c\u5f97\u5230\u7684BLAS\u5e93\u6587\u4ef6\u540d\u662f&#8217;libgoto2.so&#8217;<br \/>\n\uff083\uff09\u5efa\u7acb\u7b26\u53f7\u94fe\u63a5\u3002\u5728R\u5b89\u88c5\u76ee\u5f55\u4e0be.g. \/lib64\/R\/lib\uff0c\u5df2\u7ecf\u6709\u4e00\u4e2aR\u9ed8\u8ba4\u7684BLAS\u52a8\u6001\u8fde\u63a5\u5e93libRblas.so\uff0c\u628a\u8fd9\u4e2a\u6539\u6210\u94fe\u63a5\u5230libgoto2.so\u7684\u7b26\u53f7\u94fe\u63a5\u3002<\/p>\n<p>\u8fd93\u6b65\u4e4b\u540e\uff0cR\u5c31\u4f1a\u4f7f\u7528GotoBLAS\u4f5c\u4e3a\u77e9\u9635\u8fd0\u7b97\u5e93\u3002\u5728\u6211\u4eec\u7684\u670d\u52a1\u5668\u4e0a\uff0cbenchmark\u7ed3\u679c\u5982\u4e0b\uff1a<br \/>\n# GCC + default BLAS<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n   R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.764666666666666\r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.596666666666666\r\nSorting of 7,000,000 random values__________________ (sec):  0.833333333333333\r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  4.425\r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  2.30366666666667\r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  1.13650194597564\r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.778666666666666\r\nEigenvalues of a 640x640 random matrix______________ (sec):  1.406\r\nDeterminant of a 2500x2500 random matrix____________ (sec):  2.28733333333334\r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  2.02366666666667\r\nInverse of a 1600x1600 random matrix________________ (sec):  1.933\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  1.76516531172197\r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.06166666666667\r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.601666666666669\r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.56866666666667\r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.757666666666661\r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  0.595000000000013\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.785128552514896\r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  22.9366666666667\r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  1.16349747864837\r\n                      --- End of test ---\r\n<\/pre>\n<p># GCC + GotoBLAS(GCC)<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n  R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.776333333333333\r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.597\r\nSorting of 7,000,000 random values__________________ (sec):  0.838\r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.376333333333333\r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  0.293\r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  0.558725402933605\r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.785666666666668\r\nEigenvalues of a 640x640 random matrix______________ (sec):  2.092\r\nDeterminant of a 2500x2500 random matrix____________ (sec):  0.303666666666667\r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  0.292999999999999\r\nInverse of a 1600x1600 random matrix________________ (sec):  0.396333333333331\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.455580734019386\r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.07166666666667\r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.608999999999999\r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.848\r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.675666666666665\r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  0.591000000000001\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.761149272082565\r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  12.5466666666667\r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  0.578643662905733\r\n                      --- End of test ---\r\n\r\n\r\n<\/pre>\n<p># ICC + build-in BLAS<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n   R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.722333333333333\r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.398\r\nSorting of 7,000,000 random values__________________ (sec):  0.853333333333333\r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  23.2723333333333 \r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  9.48066666666666 \r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  1.80121303632586 \r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.919666666666667 \r\nEigenvalues of a 640x640 random matrix______________ (sec):  1.01100000000001 \r\nDeterminant of a 2500x2500 random matrix____________ (sec):  4.84600000000001 \r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  3.71033333333332 \r\nInverse of a 1600x1600 random matrix________________ (sec):  6.53100000000001 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  2.62935462784594 \r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.825333333333333 \r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.588666666666654 \r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.65866666666667 \r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.665000000000001 \r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  0.55400000000003 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.686183322572556 \r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  57.0363333333334 \r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  1.4812151139281 \r\n                      --- End of test ---\r\n\r\n<\/pre>\n<p># ICC + GotoBLAS(ICC)<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n   R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.738666666666667\r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.388000000000001\r\nSorting of 7,000,000 random values__________________ (sec):  0.857333333333333 \r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.633333333333333 \r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  0.537666666666667 \r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  0.631245051729315 \r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.938333333333333 \r\nEigenvalues of a 640x640 random matrix______________ (sec):  5.53166666666667 \r\nDeterminant of a 2500x2500 random matrix____________ (sec):  0.957666666666666 \r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  0.601000000000001 \r\nInverse of a 1600x1600 random matrix________________ (sec):  1.741 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  1.16088739499808 \r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.813 \r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.591333333333334 \r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.663 \r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.669333333333332 \r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  4.883 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  1.13162201708511 \r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  22.5443333333333 \r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  0.939499363744844 \r\n                      --- End of test ---\r\n<\/pre>\n<p>\u901a\u8fc7\u6bd4\u8f83GCC\/ICC \u4e0e R\u81ea\u5e26\u7684BLAS\/GotoBLAS\u76844\u79cd\u7ec4\u5408\uff0c\u5728\u6211\u4eec\u7684\u670d\u52a1\u5668\u7cfb\u7edf\u4e0bGCC+GotoBLAS\u6700\u5feb\u3002<\/p>\n<p>\u6ce8\uff1a<br \/>\nLAPACK\u662f\u5bf9BLAS\u7684\u518d\u6b21\u5c01\u88c5\uff0c\u56e0\u6b64\u6211\u4eec\u4e0d\u9700\u8981\u6539\u53d8libRlapack.so\u3002\u8fd9\u4e00\u70b9\u53ef\u4ee5\u901a\u8fc7&#8217;nm -g libRlapack.so&#8217;\uff0c\u67e5\u770bdgemm_\u7684\u5b9a\u4e49\u4e3a\u2018U\u2019\uff08\u8bf4\u660e\u8fd9\u4e2a\u51fd\u6570\u6ca1\u6709\u5728\u8be5\u6587\u4ef6\u4e2d\u5b9e\u73b0\uff09\uff0c\u800c\u901a\u8fc7&#8217;ldd libRlapack.so&#8217;\u53ef\u4ee5\u53d1\u73b0\u5b83\u4f1a\u8c03\u7528libRblas.so<\/p>\n<p>\u5176\u4ed6\u8d44\u6e90\uff1a<\/p>\n<ul>\n<a href=\"http:\/\/psyccomputing.blogspot.com\/2010\/04\/compiling-64-bit-r-2101-with-mkl-in.html\" title=\"\u4ecb\u7ecd\u7528Intel\u7f16\u8bd1\u5668\u7f16\u8bd164bit R 2.10\">\u4ecb\u7ecd\u7528Intel\u7f16\u8bd1\u5668\u7f16\u8bd164bit R 2.10<\/a>\n<\/ul>\n<ul>\n<a href=\"http:\/\/www.rd.dnc.ac.jp\/~otsu\/lecture\/RwithMKL.html\" title=\"\u4ecb\u7ecdIntel\u7f16\u8bd1\u5668\u7248\u672c11\u7f16\u8bd1R\">\u4ecb\u7ecdIntel\u7f16\u8bd1\u5668\u7248\u672c11\u7f16\u8bd1R <\/a>\n<\/ul>\n<ul>\n<a href=\"http:\/\/software.intel.com\/en-us\/articles\/intel-mkl-link-line-advisor\/\" title=\"\u4ecb\u7ecd\u5982\u4f55\u94fe\u63a5Intel MKL \u5e93\">\u4ecb\u7ecd\u5982\u4f55\u94fe\u63a5Intel MKL \u5e93<\/a>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Speed up R matrix computation with smallest effort. \u7ed9R\u63d0\u901f\u6709\u4e24\u4e2a\u65b9\u6cd5\uff1a 1. \u4f7f\u7528Intel compiler 2. \u4f7f\u7528\u66f4\u5feb\u7684\u77e9\u9635\u8fd0\u7b97\u5e93 \u5176\u4e2d\u6211\u4f7f\u7528\u7b2c\u4e00\u4e2a\u65b9\u6cd5\u5e76\u6ca1\u6709\u770b\u5230\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\uff0c\u6240\u4ee5\u8fd9\u91cc\u4ecb\u7ecd\u7b2c2\u79cd\u65b9\u6cd5\uff0c\u4fdd\u8bc1\u77e9\u9635\u8fd0\u7b97\u81f3\u5c11\u63d0\u901f2\u500d\u3002 \u6211\u4f7f\u7528\u7684\u662fR-2.13.1\u7248\u672c\uff0c\u77e9\u9635\u5e93\u4f7f\u7528GotoBLAS\u3002 \u6839\u636e\u4e0b\u9762\u8fd9\u4e2a\u94fe\u63a5\uff0c http:\/\/r.789695.n4.nabble.com\/configure-can-t-find-dgemm-in-MKL10-td920212.html GotoBLAS\u6bd4Intel MKL\u5feb\u3002\u636e\u8bf4\uff0cGotoBLAS\u6bd4ATLAS\u4e5f\u8981\u5feb\u3002 \u5177\u4f53\u6b65\u9aa4\u5982\u4e0b\uff1a \uff081\uff09\u5efa\u7acb\u4e00\u4e2ashell \u6e90\u6587\u4ef6\uff1a export FFLAGS=&quot;-march=native -O3&quot; export CFLAGS=&quot;-march=native -O3 -DMKL_ILP64&quot; export CXXFLAGS=&quot;-march=native -O3 -DMKL_ILP64&quot; export FCFLAGS=&quot;-march=native -O3&quot; .\/configure &#8211;enable-R-shlib &#8211;enable-BLAS-shlib &#8211;with-blas &#8211;with-lapack &#8211;prefix=\/net\/dumbo\/home\/zhanxw\/software\/Rmkl \u4e4b\u540e\u7528make, make install\u5b89\u88c5\u3002 \uff082\uff09\u4e0b\u8f7dGotoBLAS\uff0c\u5728\u6e90\u76ee\u5f55&#8217;make&#8217;\u5373\u53ef\uff0c\u5f97\u5230\u7684BLAS\u5e93\u6587\u4ef6\u540d\u662f&#8217;libgoto2.so&#8217; \uff083\uff09\u5efa\u7acb\u7b26\u53f7\u94fe\u63a5\u3002\u5728R\u5b89\u88c5\u76ee\u5f55\u4e0be.g. \/lib64\/R\/lib\uff0c\u5df2\u7ecf\u6709\u4e00\u4e2aR\u9ed8\u8ba4\u7684BLAS\u52a8\u6001\u8fde\u63a5\u5e93libRblas.so\uff0c\u628a\u8fd9\u4e2a\u6539\u6210\u94fe\u63a5\u5230libgoto2.so\u7684\u7b26\u53f7\u94fe\u63a5\u3002 \u8fd93\u6b65\u4e4b\u540e\uff0cR\u5c31\u4f1a\u4f7f\u7528GotoBLAS\u4f5c\u4e3a\u77e9\u9635\u8fd0\u7b97\u5e93\u3002\u5728\u6211\u4eec\u7684\u670d\u52a1\u5668\u4e0a\uff0cbenchmark\u7ed3\u679c\u5982\u4e0b\uff1a # GCC + default BLAS [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[64,59,65,62,45],"class_list":["post-240","post","type-post","status-publish","format-standard","hentry","category-sysadmin","tag-benchmark","tag-blas","tag-gotoblas","tag-icc","tag-r"],"_links":{"self":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts\/240","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/comments?post=240"}],"version-history":[{"count":0,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts\/240\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/media?parent=240"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/categories?post=240"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/tags?post=240"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}