使用Intel Compiler Suite和Intel MKL编译64位R

Compiling 64bit R using Intel Compiler (icc/ifort) and Intel Math Kernel Library (MKL).
通过Intel的编译器和Intel MKL,我们得到运行速度最快的R系统(比上一篇介绍的 R+GotoBlas 还快一点点)。

下载,安装Intel Parallel Studio,这个包括Intel C compiler (icc), C++ Compiler (icpc), Fortran compiler(ifort):
http://software.intel.com/en-us/articles/intel-parallel-studio-xe/

下载,安装Intel Math Kernel Library
http://software.intel.com/en-us/articles/intel-mkl/
Intel的这两个软件对于非商业用途是免费的。

然后需要下载R的源代码:
http://cran.cnr.berkeley.edu/

解压缩R之后,在其目录下建立bash文件来指定编译的方式(R本身是使用静态链接还是动态链接库?安装路径?)。
具体方式可以在这个脚本的末尾部分找到,大家可以自己按需要修改。
注:在我的比较下,使用动态链接的BLAS库与静态链接库相比不会损失速度;使用动态链接库的优点是可以方便的换用不同BLAS库。

source /home/zhanxw/intel/composer_xe_2011_sp1.6.233/bin/iccvars.sh intel64                                                                                   
source /home/zhanxw/intel/composer_xe_2011_sp1.6.233/bin/ifortvars.sh intel64
source /home/zhanxw/intel/composer_xe_2011_sp1.6.233/mkl/bin/mklvars.sh intel64

export CC=icc
export CFLAGS="-O3 -wd188 -mieee-fp"
export F77=ifort
export FFLAGS="-O3 -mieee-fp"
export CXX=icpc
export CXXFLAGS="-O3"
export FC=ifort
export FCFLAGS="-O3 -mieee-fp"
export ICC_LIBS=/home/zhanxw/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
export IFC_LIBS=/home/zhanxw/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
export SHLIB_CXXLD=icpc
export SHLIB_CXXLDFLAGS=-shared

MKL_LIB_PATH=/home/zhanxw/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64
export LD_LIBRARY_PATH=$MKL_LIB_PATH

OMP_NUM_THREADS=8

export LDFLAGS="-L${MKL_LIB_PATH},-Bdirect,--hash-style=both,-Wl,-O1 -L$ICC_LIBS -L$IFC_LIBS -L/usr/local/lib"

export SHLIB_LDFLAGS="-lpthread"
export MAIN_LDFLAGS="-lpthread"

MKL="-L${MKL_LIB_PATH} -lmkl_blas95 -lmkl_lapack95  -Wl,--start-group -lmkl_intel -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread"

OMP_NUM_THREADS=8

MKL="-L${MKL_LIB_PATH} -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread"
#static linked library of R                                                                                                                                   
#./configure --with-blas="$MKL"  --with-lapack="$MKL" --prefix=/net/dumbo/home/zhanxw/software/Rmkl                                                           

# dynamic linked library of: R and BLAS                                                                                                                       
#./configure --enable-R-shlib --enable-BLAS-shlib --with-blas="$MKL"  --with-lapack="$MKL" --prefix=/net/dumbo/home/zhanxw/software/Rmkl                      

#dynamic linked library of: BLAS                                                                                                                              
./configure --enable-BLAS-shlib --with-blas="$MKL"  --with-lapack="$MKL" --prefix=/net/dumbo/home/zhanxw/software/Rmkl

之后用make; make install即可。
使用同样的R-benchmark脚本,结果如下:
Intel Compiler (ICC+Ifort) and Intel MKL

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.719666666666667 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.394333333333333 
Sorting of 7,000,000 random values__________________ (sec):  0.861 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.709 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.448 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.611437229773395 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.907666666666668 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.613000000000001 
Determinant of a 2500x2500 random matrix____________ (sec):  0.493333333333333 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.334333333333332 
Inverse of a 1600x1600 random matrix________________ (sec):  0.611666666666667 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.569777440099831 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.82 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.535999999999999 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.64933333333334 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.683666666666667 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.828000000000003 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.774276714018349 


Total time for all 15 tests_________________________ (sec):  11.609 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.646126830621363 
                      --- End of test ---

Intel Compiler(ICC+Ifort) + GotoBlas2(Compiled by ICC/Ifort)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.715333333333333
2400x2400 normal distributed random matrix ^1000____ (sec):  0.41
Sorting of 7,000,000 random values__________________ (sec):  0.862666666666666
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.829333333333333
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.554666666666667
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.690382674196494

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.922333333333332
Eigenvalues of a 640x640 random matrix______________ (sec):  0.681333333333333
Determinant of a 2500x2500 random matrix____________ (sec):  0.511666666666667
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.433333333333332
Inverse of a 1600x1600 random matrix________________ (sec):  0.594333333333331
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.591732764155743

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.835999999999999
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.545000000000002
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.66133333333333
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.695666666666665
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.585000000000001
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.698105585240407


Total time for all 15 tests_________________________ (sec):  11.838
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.658231817116501
                      --- End of test ---

常见的错误:
在编译R的时候,我们用–with-blas=”$MKL”来制定Intel MKL的位置 (网上其他文章的做法),但如果$MKL的值不正确,R无法正常链接MKL。我们需要检查configure的输出或者文件config.log,要确保这两项的检查都是yes:
checking for dgemm_ in -L/home/zhanxw/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread… yes
checking whether double complex BLAS can be used… yes
checking whether the BLAS is complete… yes

值得指出的是在链接Intel库时,LP64 和 ILP64是不同的。在我的机器上,错误的制定ILP64,例如-lmkl_intel_ilp64,会导致R无法使用MKL,因为使用ILP64编译的程序会crash(在configure脚本里,这个文件是conftest)

config.log是非常有用的文件,它包括的configure检查系统环境时相关信息,通过这个文件并结合configure(本质是一个shell script),可以帮助我们确定R是否可以,或者为什么不可以链接MKL库。

另外,使用shared BLAS库的时候R会检查zgeev_,并检查不到MKL,这个R“有意”的结果。因为动态的MKL库会包含LAPACK的信息。如果介意这方面的速度损失,可以使用静态链接的方式。

Updated (2011-10-05):

Similar idea in the PPT format:

R_BLAS-Sachdeva

加速R的矩阵运算(Speed up R matrix computation)

Speed up R matrix computation with smallest effort.

给R提速有两个方法:
1. 使用Intel compiler
2. 使用更快的矩阵运算库

其中我使用第一个方法并没有看到显著的速度提升,所以这里介绍第2种方法,保证矩阵运算至少提速2倍。
我使用的是R-2.13.1版本,矩阵库使用GotoBLAS。
根据下面这个链接,
http://r.789695.n4.nabble.com/configure-can-t-find-dgemm-in-MKL10-td920212.html
GotoBLAS比Intel MKL快。据说,GotoBLAS比ATLAS也要快。

具体步骤如下:
(1)建立一个shell 源文件:

export FFLAGS="-march=native -O3"
export CFLAGS="-march=native -O3 -DMKL_ILP64"
export CXXFLAGS="-march=native -O3 -DMKL_ILP64"
export FCFLAGS="-march=native -O3"

./configure --enable-R-shlib --enable-BLAS-shlib --with-blas --with-lapack --prefix=/net/dumbo/home/zhanxw/software/Rmkl                  

之后用make, make install安装。
(2)下载GotoBLAS,在源目录’make’即可,得到的BLAS库文件名是’libgoto2.so’
(3)建立符号链接。在R安装目录下e.g. /lib64/R/lib,已经有一个R默认的BLAS动态连接库libRblas.so,把这个改成链接到libgoto2.so的符号链接。

这3步之后,R就会使用GotoBLAS作为矩阵运算库。在我们的服务器上,benchmark结果如下:
# GCC + default BLAS

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.764666666666666
2400x2400 normal distributed random matrix ^1000____ (sec):  0.596666666666666
Sorting of 7,000,000 random values__________________ (sec):  0.833333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  4.425
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  2.30366666666667
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.13650194597564

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.778666666666666
Eigenvalues of a 640x640 random matrix______________ (sec):  1.406
Determinant of a 2500x2500 random matrix____________ (sec):  2.28733333333334
Cholesky decomposition of a 3000x3000 matrix________ (sec):  2.02366666666667
Inverse of a 1600x1600 random matrix________________ (sec):  1.933
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.76516531172197

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.06166666666667
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.601666666666669
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.56866666666667
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.757666666666661
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.595000000000013
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.785128552514896


Total time for all 15 tests_________________________ (sec):  22.9366666666667
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.16349747864837
                      --- End of test ---

# GCC + GotoBLAS(GCC)

  R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.776333333333333
2400x2400 normal distributed random matrix ^1000____ (sec):  0.597
Sorting of 7,000,000 random values__________________ (sec):  0.838
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.376333333333333
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.293
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.558725402933605

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.785666666666668
Eigenvalues of a 640x640 random matrix______________ (sec):  2.092
Determinant of a 2500x2500 random matrix____________ (sec):  0.303666666666667
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.292999999999999
Inverse of a 1600x1600 random matrix________________ (sec):  0.396333333333331
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.455580734019386

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.07166666666667
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.608999999999999
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.848
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.675666666666665
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.591000000000001
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.761149272082565


Total time for all 15 tests_________________________ (sec):  12.5466666666667
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.578643662905733
                      --- End of test ---


# ICC + build-in BLAS

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.722333333333333
2400x2400 normal distributed random matrix ^1000____ (sec):  0.398
Sorting of 7,000,000 random values__________________ (sec):  0.853333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  23.2723333333333 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  9.48066666666666 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.80121303632586 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.919666666666667 
Eigenvalues of a 640x640 random matrix______________ (sec):  1.01100000000001 
Determinant of a 2500x2500 random matrix____________ (sec):  4.84600000000001 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  3.71033333333332 
Inverse of a 1600x1600 random matrix________________ (sec):  6.53100000000001 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  2.62935462784594 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.825333333333333 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.588666666666654 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.65866666666667 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.665000000000001 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.55400000000003 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.686183322572556 


Total time for all 15 tests_________________________ (sec):  57.0363333333334 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.4812151139281 
                      --- End of test ---

# ICC + GotoBLAS(ICC)

   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.738666666666667
2400x2400 normal distributed random matrix ^1000____ (sec):  0.388000000000001
Sorting of 7,000,000 random values__________________ (sec):  0.857333333333333 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.633333333333333 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.537666666666667 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.631245051729315 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.938333333333333 
Eigenvalues of a 640x640 random matrix______________ (sec):  5.53166666666667 
Determinant of a 2500x2500 random matrix____________ (sec):  0.957666666666666 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.601000000000001 
Inverse of a 1600x1600 random matrix________________ (sec):  1.741 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.16088739499808 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.813 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.591333333333334 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  2.663 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.669333333333332 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  4.883 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.13162201708511 


Total time for all 15 tests_________________________ (sec):  22.5443333333333 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.939499363744844 
                      --- End of test ---

通过比较GCC/ICC 与 R自带的BLAS/GotoBLAS的4种组合,在我们的服务器系统下GCC+GotoBLAS最快。

注:
LAPACK是对BLAS的再次封装,因此我们不需要改变libRlapack.so。这一点可以通过’nm -g libRlapack.so’,查看dgemm_的定义为‘U’(说明这个函数没有在该文件中实现),而通过’ldd libRlapack.so’可以发现它会调用libRblas.so

其他资源:

R Graphics

Plot legend outside of the plot area

primitive method:

method1, hard code layout (
https://stat.ethz.ch/pipermail/r-help/2007-May/132466.html)

layout(matrix(c(2,1), byrow = T), height=c(2,10))
par(mar=c(5,3,0,2))
plot(rnorm(100))
grid(10,10)
plot.new()
par(mar=c(0,0,0,0))
plot.window(c(0,1), c(0,1))
lsize = legend(0.5, .5, "text", pch=’o’)
box("figure")
box(, lty = 2)
box("plot", col = "red")
points(rnorm(100), type="n")
# layout.show(2) : display the layout

method 2, refer to:

use xpd = NA options will not clip the legend outside of the plot area, however, you need to manual place the legend, that makes plot less pleasant. However, the folloing example works:

data(iris)
par(xpd = NA, mai=rep(.5,4), mfrow=c(2,2))
for(i in 1:4)
boxplot(iris[,i]~iris[,5], main=colnames(iris)[i], col=2:4, xaxt="n")
 
width <- 6 # width of the plot device in inches (it should be 7 inches by
# default but Windows does not implement this correctly)
leg <- legend(0, 0, legend=levels(iris$Species), fill=2:4, hor=TRUE, plot=FALSE)
xcoord <- grconvertX(width/2, "inches", "user")-leg$rect$w/2
ycoord <- grconvertY(0, "inches", "user")+leg$rect$h
 
legend(xcoord, ycoord, legend=levels(iris$Species), fill=2:4, bty="n", hor=TRUE)

package ggplot2:

Example:

> ggplot(dat,aes(x=x)) +
+ geom_histogram(aes(y=..density..,fill="Histogram"),binwidth=0.5) +
+ stat_function(fun = dnorm, aes(colour= "Density")) +
+ scale_x_continuous(‘x’, limits = c(-4, 4)) +
+ opts(title = "Histogram with Overlay") +
+ scale_fill_manual(name="",value="blue") +
+ scale_colour_manual(name="",value="red") +
+ scale_y_continuous(‘Frequency’)+
+ opts(legend.key=theme_rect(fill="white",colour="white"))+
+ opts(legend.background = theme_blank())

package Lattice:

Specify space = "bottom", or space = "right" in the auto.keys or key parameter

Example:

> xyplot( lat ~ long , data = quakes, key = list( points = list( col=c("orange","white","blue")), text = list( c("a","b","c") ), space="bottom"))
> xyplot( lat ~ long , data = quakes, key = list( points = list( col=c("orange","white","blue")), text = list( c("a","b","c") ), space="right"))
> barchart(yield ~ variety | site, data = barley,
groups = year, layout = c(1,6), stack = TRUE,
auto.key = list(space = "right"),
ylab = "Barley Yield (bushels/acre)",
scales = list(x = list(rot = 45)))

https://stat.ethz.ch/pipermail/r-help/2005-July/075953.html

Powered by Qumana

R代码除错 (How to debug R code)

R代码除错 (How to debug R code)
Tricks about how to debug R code

使用R的用户中很多人抱怨R的代码不好调试。对我来说,我觉得R至少比Perl好一点,因为至少R的说明档丰富,至少看的懂源码。好了,长话短说,R的界面很简单,没有Visual studio那么强大的调试器,也没有GDB那样灵活的调试命令(见 GDB 使用经验, GDB 使用经验(二)),我总结出来以下5种调试方法,用在不同的场合。当然话说回来,还是尽量写没有bug的代码,一劳永逸。

1. 传统调试函数
traceback(), debug(), trace(), browser(), recover()
traceback() 是在出错退出后,打印出调用堆栈的情况
debug() 是将断点设置在一个函数上,这个函数被调用的时候会变为单步执行,因此我们可以手动跟踪,只不过这里不如gdb灵活
trace() 等于是在函数中插入额外的调试代码,例如:trace(sum)在每次调用sum的时候打印出sum的参数;又比如
## arrange to call the browser on entering and exiting
## function f
trace(“f”, quote(browser(skipCalls=4)), exit = quote(browser(skipCalls=4)))
则表示使用browser()来调试,从第5次开始
browser():这个函数往往作为参数,被调用时用户可以检查变量。用户可以输入c表示继续,n表示下一条指令,Q表示退出
recover():和browser类似,也是被调用。不同在于用户可以选择不同的frame(堆栈深度)。

2. 更传统的调试函数print(),cat()
使用print()来打印每个变量调用时候的值;
更简单的情况可以用cat(),它的语法更简单,例如cat(“x=”, x)

3. 设置options(error=…)
我们希望出错的时候,R可以停止执行后续代码,并进入我们指定的调式模式。
在R的交互界面,可以设置:
options(error=recover)
在Rscript,即命令行方式,可以用下面的话把出错信息存储到文件:
options(error = quote({dump.frames(to.file=TRUE); q()}))

调试完毕,恢复初始设置时,可以用:
options(error = NULL)

这里举个例子吧( 出处):
错误的情景:

x <- 1:5
y <- x + rnorm(length(x),0,1)
f <- function(x,y) {
  y <- c(y,1)
  lm(y~x)
}

我们调试的时候,输入:

options(error=recover)

> f(x,y)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'x')

Enter a frame number, or 0 to exit   

1: f(x, y)
2: lm(y ~ x)
3: eval(mf, parent.frame())
4: eval(expr, envir, enclos)
5: model.frame(formula = y ~ x, drop.unused.levels = TRUE)
6: model.frame.default(formula = y ~ x, drop.unused.levels = TRUE)

Selection: 1
Called from: eval(expr, envir, enclos)
Browse[1]> x
[1] 1 2 3 4 5
Browse[1]> y
[1] 1.6591197 0.5939368 4.3371049 4.4754027 5.9862130 1.0000000

通过检查x和y的值就能发现问题了。

4. 设置断点 setBreakpoint()
从R 2.10开始,我们有了两个调试相关的函数findLineNum(), setBreakpoint()
有了断点,我们可以快速执行代码,直至有可能的错误部分(想想如果只有debug()则需要人工单步执行R语句,或者错误发生后recover(),我们需要反推到底是什么造成的错误)。这将大大提高我们除错的速度。
出处

这里举个例子展示如何在第3行设置断点:

x <- " f <- function(a, b) {
             if (a > b)  {
                 a
             } else {
                 b
             }
         }"


eval(parse(text=x))  # Normally you'd use source() to read a file...

findLineNum("<text>#3")   # <text> is a dummy filename used by parse(text=)

#This will print
#f step 2,3,2 in <environment: R_GlobalEnv>

#and you can use

setBreakpoint("<text>#3")

5. *apply 函数中如何调试:
用过R的都知道在循环中出错不容易。因为R处理循环很慢,我们往往不用for循环,而用sapply(), lapply()等等。这些函数出错的时候从来不会说是第几个循环变量出错的。对此,我们有如下方法:

使用try()函数, 出处
举个例子:

> x <- as.list(-2:2)
> x[[2]] <- "what?!?"
> ## using sapply
> sapply(x, function(x) 1/x)
Error in 1/x : non-numeric argument to binary operator
# 看看用try()函数怎么样?
> sapply(x, function(x) try(1/x))
Error in 1/x : non-numeric argument to binary operator
[1] "-0.5"                                                    
[2] "Error in 1/x : non-numeric argument to binary operator\n"
[3] "Inf"                                                     
[4] "1"                                                       
[5] "0.5"

或者第三方程序库也行:
出处
foreach(.verbose= TRUE) —— 这个我没试验出来,不过foreach仍然是个强大的工具
plyr(.inform=TRUE)
给个plyr库的例子:

> laply(x, function(x) 1/x, .inform = TRUE)

Error in 1/x : non-numeric argument to binary operator
Error: with piece 2: 
[1] "what?"

另外题外话,R里面执行install.packages()的时候,只有头一次可以选repo(镜像库)的位置,如果之后你还想选不同的镜像库怎么办?可以执行这个:
options(“repos”=c(CRAN=”@CRAN@”))

最后把参考过的网页列在下面:
【1】Getting the state of variables after an error occurs in R
【2】What is your favorite R debugging trick?
【3】Debugging lapply/sapply calls
【4】R script line numbers at error?

如何在Python中调用C/C++代码

如何在Python中调用C/C++代码
How to mix C/C++ code in Python

本文介绍一种手动的、简单的在Python中使用C/C++代码的方式。这个方法主要使用了ctypes模块。其他的混合Python,C/C++编程的方法还有Swig Boost.Python。前一种方法需要写一个接口文件(interface),而后一种需要使用庞大、深奥的boost类库,后两者适合可能适合更复杂的情况,这里只介绍第一种方法。

混合C/C++代码需要这几步:
1. 包装接口 C/C++ wrap functions up
2. 打包成共享库 Compiling C/C++ code and pack it to shared library
3. Python中导入共享库 Python imports shared library

先介绍一下北京,这里我的C++类GenomeSequence使用了模板(Template)和Memorymap,这是一个访问基因序列的类,比如如果一个生物序列是GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACT… 我们的类是gs,那么gs[0] = ‘G’, gs[1]=’A’ …. 摘录相关的函数如下:

class GenomeSequence : public genomeSequenceArray
{
public:
    /// Simple constructor - no implicit file open
    GenomeSequence();
    /// set the reference name that will be used in open()
    /// \param referenceFilename the name of the reference fasta file to open
    /// \return false for success, true otherwise
    ///
    /// \sa open()
    bool setReferenceName(std::string referenceFilename);
    /// return the number of bases represented in this reference
    /// \return count of bases
    genomeIndex_t   getNumberBases() const
    {
        return getElementCount();
    }
    inline char operator[](genomeIndex_t index) const
    {
        uint8_t val;
        if (index < getNumberBases())
        {
            if ((index&1)==0)
            {
                val = ((uint8_t *) data)[index>>1] & 0xf;
            }
            else
            {
                val = (((uint8_t *) data)[index>>1] & 0xf0) >> 4;
            }
        }
        else
        {
            val = baseNIndex;
        }
        val = isColorSpace() ? int2colorSpace[val] : int2base[val];
        return val;
    }
    /* ........... more codes omitted ................ */
}

但实际上这些细节并不重要,重要是如何包装,我们编写GenomeSequence_wrap.cpp文件,包括对上述4个函数的封装,源码如下:

#include "GenomeSequence.h"
#include <string>

extern "C"{
    GenomeSequence* GenomeSequence_new(){ return new GenomeSequence();}
    bool GenomeSequence_setReferenceName(GenomeSequence* gs, char* s) { 
        if (!gs) return false;
        std::string str = s;
        //printf("Loading %s ...\n", s);
        if (!gs->setReferenceName(str)){
            gs->open();
        } else {
            printf("Loading FAIL\n");
        }
        return (gs->setReferenceName(str));
    }
    void GenomeSequence_close(GenomeSequence* gs) {if (gs) gs->close();};
    int GenomeSequence_getNumBase(GenomeSequence* gs) {
        if (!gs) {
            printf("invalid gs\n");
            return -1;
        }
        return (gs->getNumberBases());
    }
    char GenomeSequence_getBase(GenomeSequence* gs, unsigned int i) { 
        if (gs) {
            return (*gs)[i];
        };
    };
}

第二步是编译,记住单个C/C++文件编译时使用-fPIC参数,最后打包的时候编译成共享库,摘录Makefile文件中片段如下:

lib:
	g++ -c -fPIC -I./lib GenomeSequence_wrap.c
	g++ -shared -Wl,-soname,libstatgen.so -o libstatgen.so  lib/*.o lib/samtools/k*.o lib/samtools/bgzf.o *.o

最后一步是在Python中写一个封装类,注意前两行引入ctypes库,之后就用这个库调用包装函数就行。
注意:我在GenomeSequence类的__getitem__中使用了如何扩展Python的容器类一文中介绍的一些技巧,这样可以更灵活的使用下标来访问数组中的元素。

from ctypes import cdll
lib = cdll.LoadLibrary("./libstatgen.so")

class GenomeSequence:
    def __init__ (self):
        self.obj = lib.GenomeSequence_new()
    def open(self, filename):
        lib.GenomeSequence_setReferenceName(self.obj, filename)
    def __len__ (self):
        return lib.GenomeSequence_getNumBase(self.obj)
    def __getitem__(self, key):
        if isinstance(key, int):
            return chr(lib.GenomeSequence_getBase(self.obj, key))
        elif isinstance(key, slice):
            return ''.join([self[x] for x in xrange(*key.indices(len(self)))])
        elif isinstance(key, tuple):
            return ''.join([self[i] for i in key])

    def at(self, i):
        return chr(lib.GenomeSequence_getBase(self.obj, i))
    def close(self):
        lib.GenomeSequence_close(self.obj)
    
if __name__ == '__main__':
    gs = GenomeSequence ()
    gs.open("/home/zhanxw/statgen/src/karma/test/phiX.fa");
    print len(gs)
    seq = [(gs.at(i)) for i in xrange(60)]
    print ''.join(seq)
    print gs[0:10],gs[20:30]
    print gs[0:10, 20:30]
    print gs[-10:]
    gs.close()
    print "DONE"

本文主要参考【1】。这里的方法基本重复了【1】中的步骤。写出本文中的代码在于进一步验证ctypes库可以灵活的处理C/C++和Python中的简单数据类型int, char*。

【1】Calling C/C++ from python?

如何扩展Python的容器类

如何扩展Python的容器类
How to extend Python container class (using some idiom)

本文假设已经有一个C++语言写的array类型的数据结构,可以用v.getBase(unsigned int i) 来得到v 数组在下标i的数值。我们想利用Python灵活的slice功能,比如1:10, 1:10:2, -10:-5等方式来指定不同的下标。这种灵活的下标在Python中可以有三种形式:

1. 整数: v[1]
2. slice 对象: v[1:10]
3. tuple 对象: v[1:10, 20:30]

这三种对象都会被传到__getitem__(self, key)的key参数中。通过参考【1】,【2】,我发现下面的代码可以简洁的处理上述所有情况:

 
class ContainerClass:
    def __getitem__(self, key):
        if isinstance(key, int):
            return chr(v.getBase(self.obj, key))
        elif isinstance(key, slice):
            return ''.join([self[x] for x in xrange(*key.indices(len(self)))])
        elif isinstance(key, tuple):
            return ''.join([self[i] for i in key])

注意:
这里只是代码片段。全部代码见另一片Blog:如何在Python中调用C/C++代码

【1】Python Data Model:
http://docs.python.org/reference/datamodel.html
【2】Python in a nut shell:
http://books.google.com/books?id=JnR9hQA3SncC&pg=PA110&lpg=PA110&dq=python+slice+object+idiom&source=bl&ots=Jb1XIv_71t&sig=-_NHkwycfC8yipkc4Tl_e4sruKc&hl=en&ei=uRXfTcr5Jsro0QHa4sG5Cg&sa=X&oi=book_result&ct=result&resnum=10&ved=0CF0Q6AEwCQ#v=onepage&q=python%20slice%20object%20idiom&f=false

如何检验一维数据的分布

本文介绍如何使用R软件来分析一维随机变量。分析的内容包括如何查找一维数据的分布类型,如何估计分布参数以及如何用假设检验来测试一维数据的分布类型。
How to find, fit, test the distribution of univariate variable in R?
我们经常见到一维随机变量,比如线性模型的响应,我们通常需要检验它是否是正态分布来决定模型中直接用Y还是用log(Y),或者其他的transformation。
本文主要参考【1】,我会介绍一些基本的方法,但建议读者参考原文获得更多的信息。

1. 画密度图,CDF图

直方图:history(x)
密度图:plot(density(x))
CDF图:plot(ecdf(x))

检查是否是正态分布:

z= (x-mean(x))/sd(x)
qqnorm(z)
abline(0,1)

类似的可以检查其他分布(先构造一个理论分布,再qqnorm)

x.wei <- rweibull(200, shape=2.1, scale=1.1)
x.teo <- rweibull(200, shape=2.1, scale=1.0)
qqplot(x.teo, x.wei)
abline(0,1)

http://www.statsoft.com/textbook/distribution-fitting/

2. 利用矩估计猜测分布类型
主要是standardize之后计算一二三四阶矩(moment),然后对比下面网页列举的常见分布,猜出到底是哪一种分布:
NIST 1.3.5.11. Measures of Skewness and Kurtosis

3. 估计分布参数
当我们知道分布类型后,可以估计分布参数,常见的有矩估计和最大似然估计。
矩估计相对简单,可以用mean,var函数计算,但可能不具有无偏的性质。
最大似然估计有
1) mle() 在 stats4 包里
2) fitdistr() 在 MASS 包里
1)的方法显然更基本,但能适用于各种分布,2)的方法使用简单,对Gamma, Weibull, Normal等分布只需要一个命令,例如:

fitdistr(x.norm,"normal") ## fitting gaussian pdf parameters 
mean	sd
9.9355373 2.0101691 
(0.1421404) (0.1005085)

4. 检查分布是否合适?
在做Goodness of fit tests之前,可以先画出直方图和理论密度分布图。
之后,可以利用卡方检验来做Goodness of fit tests。具体来讲:
i) 对于Poisson, binomial, negative binomail, 我们可以使用vcd包中的goodfit函数。
ii) 对于一般的分布,可以把变量归类,然后利用卡方检验公示计算观察到变量数量和理论值之间的差异,然后计算pvalue
iii) 对于一般的分布,也可以使用Kolmogorov-Smirnov test来做统计检验

对第三种情况举例如下:

> x.wei <- rweibull(n=200, shape=2.1, scale = 1.1)
> ks.test(x.wei, "pweibull", shape=2, scale= 1)

	One-sample Kolmogorov-Smirnov test

data:  x.wei 
D = 0.1042, p-value = 0.02591
alternative hypothesis: two-sided 

特别的,我们需要检查数据是否是正态分布。
最常用的是Shapiro-Wilk test:shapiro.test()
此外,R里面有一个package nortest,提供了另外5种检查正态分布的函数:
i) Shapiro-Francia test: sf.test()
ii) Anderson-Darling test: ad.test()
iii) Cramer-Von Mises test: cvm.test()
iv) Lilliefors test: lillie.test() 适用于小样本,参数未知的正态分布
v) pearson.test: pearson.test()
这5种test各有细致的差异,使用的时候需自己区分。

参考文献:
【1】 http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf

关于Fix Bug (读Coolshell Linux 2.6.39-rc3的一个插曲有感)

云风的buzz中提到了coolshell.cn的一篇文章Linux 2.6.39-rc3的一个插曲有感,看过之后,有两个感受:
(1)Linus 提出的 ‘think and analyze’的解决方式。
对于bug,从我解读Linus和Cloud的buzz回复来看,他们对于bug的方式都是知其然也知其所以然,即要么不去改bug,要么就把bug想清楚了从系统上改好。就像给一个人看病,要是治胃疼,不是说你少吃点,或者是来几片止疼药,而是调查清楚为什么得了胃病。要是长期不规律吃饭得的胃病,恐怕最好的解决方法是养成按时吃饭的习惯。这样做会增加改一个bug的时间。但从长远来看,我们有可能节省时间。比如:如果我们引入magic number之后,一个星期之后,又有类似的bug发现了,如果我们每个bug都是通过仔细思考,而且系统构建有足够的正交性的话,我们会很自然的想到,这个bug肯定发生在其他部位,而不太可能是刚修复bug的变体,这就减少了解决问题时的思考路径,节省了时间。这个时候,我们应该对自己说Good job,给自己多一点鼓励。
而有时,我们可能在别人的代码里发现了bug,这种情况下同样适用Linus的建议,即‘think and analyze’或者是’years and years of testing on thousands of machines’。要不能通过思考和分析解决问题,要么就用历史证明过的已经work的代码。这样即使不能fix 这个bug,也不会造成fix bug的假象,从而避免以后有的版本work,有的版本不work的现象。读通别人的代码并不简单,这个’think and analyze‘的过程也许要很多天,因为这个过程强迫我们用别人的思维结果来思考,而我们很难知道为什么别人这样做。个人认为,如果有足够的单元测试和高质量的文档,这个思考分析的时间并不是浪费,而是学习其他人编程思路和技巧的机会。同时,我们应该让老板知道,我们需要时间来读代码,我们在解决问题,但是的确需要时间。如果一个系统的核心部分很难读懂,而bug并不关键,不妨告诉自己不要太完美主义(想想iphone,没有多任务,电池续航有限,价格高,但仍然一个好产品)。

总而言之,Linus的观点就是每看到一个bug,问一下你确实了解这个bug的原理么?
如果Yes: (think and analyze之后),准备重组的理由,用别人可以理解的方式改(少用magic number)。
如果No: (‘years and years of testing on thousands of machines’) 回到以前测试好的版本,并且注明这个bug没有被fix。
这就是Linus认为的: It really is that simple. It’s _always_ that simple.

(2)对Yinghai Lu的评价
首先Yinghai Lu是一个kernel开发者,我认为他是有足够的相关经验的:很快发现了问题,提出了一种解决方法
陈皓认为“我没有想到Linux 内核组会有像Yinghai这样工作的方式,毕竟这是一个黑客级的开发团队。” 我认为言之过甚。
Yinghai 的 email 中说:” can you try following change ? it will push gart to 0x80000000″ (你能不能试试这个?)
而陈皓说“给出了个fix”,他认为Yinghai提出的是会加入到kernel code中的patch,我想这不一定是Yinghai的本意。

以我的理解来看,Linus一直负责code review,他希望code的质量高,有注释,不能随意fix bug (改一两个数)。因此他非常反感code中的magic number,以及只改code不管背后的逻辑;而Yinghai Lu做为开发者,首先想到的是找到问题的原因。他自己的电脑里没法重复这个bug,因此他需要把patch交给别人来测试。在不同的立场下,Linus生气 和 Yinghai的解释 就容易理解了。

我不清楚kernel开发的具体流程是怎样的。但我认为Yinghai 试图做的是在top down这个框架下做修补,他不想或者不能够对整个top down的机制做出修补,特别是这个机制下已经有垃圾代码的情况下(见HPA的回信)。在Linus看来,他是不希望看到top down这种新尝试(相对于bottom up)缺乏足够的调研支持的。至于未来Linux kernel的采用机制,将要由kernel 开发者们和Code reviewer之间做出协调,也要和以往的低质量代码做出协调。我会拭目以待。

使用Monit监视VPS上的服务

How to use Monit to monitor VPS services.

早就从Tianyicui的twitter上了解了monit这个服务,但是一直没用上。直到最近自己VPS上的Mysql没有自动启动,才发现需要一个自动化的工具来监视VPS上重要进程(Nginx, PHP5-FPM, MySQL)的状态。
简单来讲,借鉴Monit的文档和 Graham King 的Blog,我的Monit配置如下。

zhanxw@zhanxw:~$ cat /etc/monit/conf.d/zhanxw.monit 
set daemon 120
set idfile /var/.monit.id
set alert me@zhanxw.com
set mailserver smtp.gmail.com port 587 username "XXXX@XXXX.com" password "XXXX" using tlsv1 with timeout 30 seconds # this basically allows you to use gmail's smtp server.  tlsv1 gets things going in SSL.
set logfile /var/log/monit.log

set httpd port 2812 
    use address zhanxw.com
    allow 141.211.10.47
    allow 205.185.127.61
    allow @zhanxw
    #allow admin:Monit

check system zhanxw.com
    if memory usage > 75% then alert

check process nginx with pidfile "/var/run/nginx.pid"
    start = "/etc/init.d/nginx start"
    stop = "/etc/init.d/nginx stop"
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 2 then alert
    if failed port 8080 and protocol http then restart
    if 2 restarts within 3 cycles then timeout
    alert me@zhanxw.com

check process php5-fpm pidfile "/var/run/php5-fpm.pid"
    start = "/usr/sbin/service php5-fpm start"
    stop = "/usr/sbin/service php5-fpm stop"
    if failed host 127.0.0.1 port 9000 then restart
    if 2 restarts within 3 cycles then timeout
    alert me@zhanxw.com
    
check process mysqld with pidfile "/var/lib/mysql/zhanxw.pid"
    start = "/usr/sbin/service mysql start"
    stop = "/usr/sbin/service mysql stop"
    if failed host 127.0.0.1 port 3306 then restart
    if 2 restarts within 3 cycles then timeout
    alert me@zhanxw.com

check process varnish with pidfile "/var/run/varnishd.pid"
    start = "/usr/sbin/service varnish start"
    stop = "/usr/sbin/service varnish stop"
    if failed port 80 and protocol http then restart
    if 2 restarts within 3 cycles then timeout
    alert me@zhanxw.com

这个配置的优势在于可以把服务进程的变化发到我的gmail信箱;可以用和SSH相同的密码登录Monit管理界面;可以自动重启关键进程(Nginx, PHP5-FPM,MySQL,Varnish)。我没有监控SSHD,因为这个进程不是zhanxw.com blog的关键部分,即不和WordPress服务相关。

本文没有提到安装Monit,实际上在ubuntu系统上可以用 sudo apt-get install monit 安装。

对于有图形化要求的系统监视,上面的Blog提到了另一个Open source soluiton: Munin。有兴趣的可以移步到那里学习。