{"id":250,"date":"2011-09-11T15:11:01","date_gmt":"2011-09-11T19:11:01","guid":{"rendered":"http:\/\/zhanxw.com\/blog\/?p=250"},"modified":"2011-10-05T22:27:31","modified_gmt":"2011-10-06T02:27:31","slug":"%e4%bd%bf%e7%94%a8intel-compiler-suite%e5%92%8cintel-mkl%e7%bc%96%e8%af%9164%e4%bd%8dr","status":"publish","type":"post","link":"https:\/\/zhanxw.com\/blog\/2011\/09\/%e4%bd%bf%e7%94%a8intel-compiler-suite%e5%92%8cintel-mkl%e7%bc%96%e8%af%9164%e4%bd%8dr\/","title":{"rendered":"\u4f7f\u7528Intel Compiler Suite\u548cIntel MKL\u7f16\u8bd164\u4f4dR"},"content":{"rendered":"<p>Compiling 64bit R using Intel Compiler (icc\/ifort) and Intel Math Kernel Library (MKL).<br \/>\n\u901a\u8fc7Intel\u7684\u7f16\u8bd1\u5668\u548cIntel MKL\uff0c\u6211\u4eec\u5f97\u5230\u8fd0\u884c\u901f\u5ea6\u6700\u5feb\u7684R\u7cfb\u7edf\uff08\u6bd4\u4e0a\u4e00\u7bc7\u4ecb\u7ecd\u7684<a href=\"http:\/\/zhanxw.com\/blog\/2011\/09\/%e5%8a%a0%e9%80%9fr%e7%9a%84%e7%9f%a9%e9%98%b5%e8%bf%90%e7%ae%97speed-up-r-matrix-computation\/\" title=\"\u52a0\u901fR\u7684\u77e9\u9635\u8fd0\u7b97\"> R+GotoBlas <\/a>\u8fd8\u5feb\u4e00\u70b9\u70b9\uff09\u3002<\/p>\n<p>\u4e0b\u8f7d\uff0c\u5b89\u88c5Intel Parallel Studio\uff0c\u8fd9\u4e2a\u5305\u62ecIntel C compiler (icc), C++ Compiler (icpc), Fortran compiler(ifort)\uff1a<br \/>\nhttp:\/\/software.intel.com\/en-us\/articles\/intel-parallel-studio-xe\/<\/p>\n<p>\u4e0b\u8f7d\uff0c\u5b89\u88c5Intel Math Kernel Library<br \/>\nhttp:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/<br \/>\nIntel\u7684\u8fd9\u4e24\u4e2a\u8f6f\u4ef6\u5bf9\u4e8e\u975e\u5546\u4e1a\u7528\u9014\u662f\u514d\u8d39\u7684\u3002<\/p>\n<p>\u7136\u540e\u9700\u8981\u4e0b\u8f7dR\u7684\u6e90\u4ee3\u7801\uff1a<br \/>\nhttp:\/\/cran.cnr.berkeley.edu\/<\/p>\n<p>\u89e3\u538b\u7f29R\u4e4b\u540e\uff0c\u5728\u5176\u76ee\u5f55\u4e0b\u5efa\u7acbbash\u6587\u4ef6\u6765\u6307\u5b9a\u7f16\u8bd1\u7684\u65b9\u5f0f\uff08R\u672c\u8eab\u662f\u4f7f\u7528\u9759\u6001\u94fe\u63a5\u8fd8\u662f\u52a8\u6001\u94fe\u63a5\u5e93\uff1f\u5b89\u88c5\u8def\u5f84\uff1f\uff09\u3002<br \/>\n\u5177\u4f53\u65b9\u5f0f\u53ef\u4ee5\u5728\u8fd9\u4e2a\u811a\u672c\u7684\u672b\u5c3e\u90e8\u5206\u627e\u5230\uff0c\u5927\u5bb6\u53ef\u4ee5\u81ea\u5df1\u6309\u9700\u8981\u4fee\u6539\u3002<br \/>\n\u6ce8\uff1a\u5728\u6211\u7684\u6bd4\u8f83\u4e0b\uff0c\u4f7f\u7528\u52a8\u6001\u94fe\u63a5\u7684BLAS\u5e93\u4e0e\u9759\u6001\u94fe\u63a5\u5e93\u76f8\u6bd4\u4e0d\u4f1a\u635f\u5931\u901f\u5ea6\uff1b\u4f7f\u7528\u52a8\u6001\u94fe\u63a5\u5e93\u7684\u4f18\u70b9\u662f\u53ef\u4ee5\u65b9\u4fbf\u7684\u6362\u7528\u4e0d\u540cBLAS\u5e93\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nsource \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/bin\/iccvars.sh intel64                                                                                   \r\nsource \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/bin\/ifortvars.sh intel64\r\nsource \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/mkl\/bin\/mklvars.sh intel64\r\n\r\nexport CC=icc\r\nexport CFLAGS=&quot;-O3 -wd188 -mieee-fp&quot;\r\nexport F77=ifort\r\nexport FFLAGS=&quot;-O3 -mieee-fp&quot;\r\nexport CXX=icpc\r\nexport CXXFLAGS=&quot;-O3&quot;\r\nexport FC=ifort\r\nexport FCFLAGS=&quot;-O3 -mieee-fp&quot;\r\nexport ICC_LIBS=\/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/compiler\/lib\/intel64\r\nexport IFC_LIBS=\/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/compiler\/lib\/intel64\r\nexport SHLIB_CXXLD=icpc\r\nexport SHLIB_CXXLDFLAGS=-shared\r\n\r\nMKL_LIB_PATH=\/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/mkl\/lib\/intel64\r\nexport LD_LIBRARY_PATH=$MKL_LIB_PATH\r\n\r\nOMP_NUM_THREADS=8\r\n\r\nexport LDFLAGS=&quot;-L${MKL_LIB_PATH},-Bdirect,--hash-style=both,-Wl,-O1 -L$ICC_LIBS -L$IFC_LIBS -L\/usr\/local\/lib&quot;\r\n\r\nexport SHLIB_LDFLAGS=&quot;-lpthread&quot;\r\nexport MAIN_LDFLAGS=&quot;-lpthread&quot;\r\n\r\nMKL=&quot;-L${MKL_LIB_PATH} -lmkl_blas95 -lmkl_lapack95  -Wl,--start-group -lmkl_intel -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread&quot;\r\n\r\nOMP_NUM_THREADS=8\r\n\r\nMKL=&quot;-L${MKL_LIB_PATH} -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread&quot;\r\n#static linked library of R                                                                                                                                   \r\n#.\/configure --with-blas=&quot;$MKL&quot;  --with-lapack=&quot;$MKL&quot; --prefix=\/net\/dumbo\/home\/zhanxw\/software\/Rmkl                                                           \r\n\r\n# dynamic linked library of: R and BLAS                                                                                                                       \r\n#.\/configure --enable-R-shlib --enable-BLAS-shlib --with-blas=&quot;$MKL&quot;  --with-lapack=&quot;$MKL&quot; --prefix=\/net\/dumbo\/home\/zhanxw\/software\/Rmkl                      \r\n\r\n#dynamic linked library of: BLAS                                                                                                                              \r\n.\/configure --enable-BLAS-shlib --with-blas=&quot;$MKL&quot;  --with-lapack=&quot;$MKL&quot; --prefix=\/net\/dumbo\/home\/zhanxw\/software\/Rmkl\r\n<\/pre>\n<p>\u4e4b\u540e\u7528make; make install\u5373\u53ef\u3002<br \/>\n\u4f7f\u7528\u540c\u6837\u7684R-benchmark\u811a\u672c\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a<br \/>\nIntel Compiler (ICC+Ifort) and Intel MKL<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n   R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.719666666666667 \r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.394333333333333 \r\nSorting of 7,000,000 random values__________________ (sec):  0.861 \r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.709 \r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  0.448 \r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  0.611437229773395 \r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.907666666666668 \r\nEigenvalues of a 640x640 random matrix______________ (sec):  0.613000000000001 \r\nDeterminant of a 2500x2500 random matrix____________ (sec):  0.493333333333333 \r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  0.334333333333332 \r\nInverse of a 1600x1600 random matrix________________ (sec):  0.611666666666667 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.569777440099831 \r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.82 \r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.535999999999999 \r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.64933333333334 \r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.683666666666667 \r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  0.828000000000003 \r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.774276714018349 \r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  11.609 \r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  0.646126830621363 \r\n                      --- End of test ---\r\n<\/pre>\n<p>Intel Compiler(ICC+Ifort) + GotoBlas2(Compiled by ICC\/Ifort)<\/p>\n<pre class=\"brush: plain; highlight: [13,23]; title: ; notranslate\" title=\"\">\r\n   R Benchmark 2.5\r\n   ===============\r\nNumber of times each test is run__________________________:  3\r\n\r\n   I. Matrix calculation\r\n   ---------------------\r\nCreation, transp., deformation of a 2500x2500 matrix (sec):  0.715333333333333\r\n2400x2400 normal distributed random matrix ^1000____ (sec):  0.41\r\nSorting of 7,000,000 random values__________________ (sec):  0.862666666666666\r\n2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.829333333333333\r\nLinear regr. over a 3000x3000 matrix (c = a \\ b')___ (sec):  0.554666666666667\r\n                      --------------------------------------------\r\n                 Trimmed geom. mean (2 extremes eliminated):  0.690382674196494\r\n\r\n   II. Matrix functions\r\n   --------------------\r\nFFT over 2,400,000 random values____________________ (sec):  0.922333333333332\r\nEigenvalues of a 640x640 random matrix______________ (sec):  0.681333333333333\r\nDeterminant of a 2500x2500 random matrix____________ (sec):  0.511666666666667\r\nCholesky decomposition of a 3000x3000 matrix________ (sec):  0.433333333333332\r\nInverse of a 1600x1600 random matrix________________ (sec):  0.594333333333331\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.591732764155743\r\n\r\n   III. Programmation\r\n   ------------------\r\n3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.835999999999999\r\nCreation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.545000000000002\r\nGrand common divisors of 400,000 pairs (recursion)__ (sec):  2.66133333333333\r\nCreation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.695666666666665\r\nEscoufier's method on a 45x45 matrix (mixed)________ (sec):  0.585000000000001\r\n                      --------------------------------------------\r\n                Trimmed geom. mean (2 extremes eliminated):  0.698105585240407\r\n\r\n\r\nTotal time for all 15 tests_________________________ (sec):  11.838\r\nOverall mean (sum of I, II and III trimmed means\/3)_ (sec):  0.658231817116501\r\n                      --- End of test ---\r\n<\/pre>\n<p>\u5e38\u89c1\u7684\u9519\u8bef\uff1a<br \/>\n\u5728\u7f16\u8bd1R\u7684\u65f6\u5019\uff0c\u6211\u4eec\u7528&#8211;with-blas=&#8221;$MKL&#8221;\u6765\u5236\u5b9aIntel MKL\u7684\u4f4d\u7f6e \uff08\u7f51\u4e0a\u5176\u4ed6\u6587\u7ae0\u7684\u505a\u6cd5\uff09\uff0c\u4f46\u5982\u679c$MKL\u7684\u503c\u4e0d\u6b63\u786e\uff0cR\u65e0\u6cd5\u6b63\u5e38\u94fe\u63a5MKL\u3002\u6211\u4eec\u9700\u8981\u68c0\u67e5configure\u7684\u8f93\u51fa\u6216\u8005\u6587\u4ef6config.log\uff0c\u8981\u786e\u4fdd\u8fd9\u4e24\u9879\u7684\u68c0\u67e5\u90fd\u662fyes:<br \/>\nchecking for dgemm_ in -L\/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/mkl\/lib\/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread&#8230; yes<br \/>\nchecking whether double complex BLAS can be used&#8230; yes<br \/>\nchecking whether the BLAS is complete&#8230; yes<\/p>\n<p>\u503c\u5f97\u6307\u51fa\u7684\u662f\u5728\u94fe\u63a5Intel\u5e93\u65f6\uff0cLP64 \u548c ILP64\u662f\u4e0d\u540c\u7684\u3002\u5728\u6211\u7684\u673a\u5668\u4e0a\uff0c\u9519\u8bef\u7684\u5236\u5b9aILP64\uff0c\u4f8b\u5982-lmkl_intel_ilp64\uff0c\u4f1a\u5bfc\u81f4R\u65e0\u6cd5\u4f7f\u7528MKL\uff0c\u56e0\u4e3a\u4f7f\u7528ILP64\u7f16\u8bd1\u7684\u7a0b\u5e8f\u4f1acrash(\u5728configure\u811a\u672c\u91cc\uff0c\u8fd9\u4e2a\u6587\u4ef6\u662fconftest)<\/p>\n<p>config.log\u662f\u975e\u5e38\u6709\u7528\u7684\u6587\u4ef6\uff0c\u5b83\u5305\u62ec\u7684configure\u68c0\u67e5\u7cfb\u7edf\u73af\u5883\u65f6\u76f8\u5173\u4fe1\u606f\uff0c\u901a\u8fc7\u8fd9\u4e2a\u6587\u4ef6\u5e76\u7ed3\u5408configure(\u672c\u8d28\u662f\u4e00\u4e2ashell script\uff09\uff0c\u53ef\u4ee5\u5e2e\u52a9\u6211\u4eec\u786e\u5b9aR\u662f\u5426\u53ef\u4ee5\uff0c\u6216\u8005\u4e3a\u4ec0\u4e48\u4e0d\u53ef\u4ee5\u94fe\u63a5MKL\u5e93\u3002<\/p>\n<p>\u53e6\u5916\uff0c\u4f7f\u7528shared BLAS\u5e93\u7684\u65f6\u5019R\u4f1a\u68c0\u67e5zgeev_\uff0c\u5e76\u68c0\u67e5\u4e0d\u5230MKL\uff0c\u8fd9\u4e2aR\u201c\u6709\u610f\u201d\u7684\u7ed3\u679c\u3002\u56e0\u4e3a\u52a8\u6001\u7684MKL\u5e93\u4f1a\u5305\u542bLAPACK\u7684\u4fe1\u606f\u3002\u5982\u679c\u4ecb\u610f\u8fd9\u65b9\u9762\u7684\u901f\u5ea6\u635f\u5931\uff0c\u53ef\u4ee5\u4f7f\u7528\u9759\u6001\u94fe\u63a5\u7684\u65b9\u5f0f\u3002<\/p>\n<p>Updated (2011-10-05):<\/p>\n<p>Similar idea in the PPT format:<\/p>\n<p><a href='http:\/\/zhanxw.com\/blog\/wp-content\/uploads\/2011\/09\/R_BLAS-Sachdeva.ppt'>R_BLAS-Sachdeva<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Compiling 64bit R using Intel Compiler (icc\/ifort) and Intel Math Kernel Library (MKL). \u901a\u8fc7Intel\u7684\u7f16\u8bd1\u5668\u548cIntel MKL\uff0c\u6211\u4eec\u5f97\u5230\u8fd0\u884c\u901f\u5ea6\u6700\u5feb\u7684R\u7cfb\u7edf\uff08\u6bd4\u4e0a\u4e00\u7bc7\u4ecb\u7ecd\u7684 R+GotoBlas \u8fd8\u5feb\u4e00\u70b9\u70b9\uff09\u3002 \u4e0b\u8f7d\uff0c\u5b89\u88c5Intel Parallel Studio\uff0c\u8fd9\u4e2a\u5305\u62ecIntel C compiler (icc), C++ Compiler (icpc), Fortran compiler(ifort)\uff1a http:\/\/software.intel.com\/en-us\/articles\/intel-parallel-studio-xe\/ \u4e0b\u8f7d\uff0c\u5b89\u88c5Intel Math Kernel Library http:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/ Intel\u7684\u8fd9\u4e24\u4e2a\u8f6f\u4ef6\u5bf9\u4e8e\u975e\u5546\u4e1a\u7528\u9014\u662f\u514d\u8d39\u7684\u3002 \u7136\u540e\u9700\u8981\u4e0b\u8f7dR\u7684\u6e90\u4ee3\u7801\uff1a http:\/\/cran.cnr.berkeley.edu\/ \u89e3\u538b\u7f29R\u4e4b\u540e\uff0c\u5728\u5176\u76ee\u5f55\u4e0b\u5efa\u7acbbash\u6587\u4ef6\u6765\u6307\u5b9a\u7f16\u8bd1\u7684\u65b9\u5f0f\uff08R\u672c\u8eab\u662f\u4f7f\u7528\u9759\u6001\u94fe\u63a5\u8fd8\u662f\u52a8\u6001\u94fe\u63a5\u5e93\uff1f\u5b89\u88c5\u8def\u5f84\uff1f\uff09\u3002 \u5177\u4f53\u65b9\u5f0f\u53ef\u4ee5\u5728\u8fd9\u4e2a\u811a\u672c\u7684\u672b\u5c3e\u90e8\u5206\u627e\u5230\uff0c\u5927\u5bb6\u53ef\u4ee5\u81ea\u5df1\u6309\u9700\u8981\u4fee\u6539\u3002 \u6ce8\uff1a\u5728\u6211\u7684\u6bd4\u8f83\u4e0b\uff0c\u4f7f\u7528\u52a8\u6001\u94fe\u63a5\u7684BLAS\u5e93\u4e0e\u9759\u6001\u94fe\u63a5\u5e93\u76f8\u6bd4\u4e0d\u4f1a\u635f\u5931\u901f\u5ea6\uff1b\u4f7f\u7528\u52a8\u6001\u94fe\u63a5\u5e93\u7684\u4f18\u70b9\u662f\u53ef\u4ee5\u65b9\u4fbf\u7684\u6362\u7528\u4e0d\u540cBLAS\u5e93\u3002 source \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/bin\/iccvars.sh intel64 source \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/bin\/ifortvars.sh intel64 source \/home\/zhanxw\/intel\/composer_xe_2011_sp1.6.233\/mkl\/bin\/mklvars.sh intel64 export CC=icc export CFLAGS=&quot;-O3 -wd188 -mieee-fp&quot; [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,11],"tags":[64,62,63,60,61,45],"class_list":["post-250","post","type-post","status-publish","format-standard","hentry","category-statistics","category-sysadmin","tag-benchmark","tag-icc","tag-ifort","tag-intel","tag-mkl","tag-r"],"_links":{"self":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts\/250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/comments?post=250"}],"version-history":[{"count":0,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/posts\/250\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/media?parent=250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/categories?post=250"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhanxw.com\/blog\/wp-json\/wp\/v2\/tags?post=250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}