R代码除错 (How to debug R code)

R代码除错 (How to debug R code)
Tricks about how to debug R code

使用R的用户中很多人抱怨R的代码不好调试。对我来说，我觉得R至少比Perl好一点，因为至少R的说明档丰富，至少看的懂源码。好了，长话短说，R的界面很简单，没有Visual studio那么强大的调试器，也没有GDB那样灵活的调试命令（见 GDB 使用经验, GDB 使用经验（二）），我总结出来以下5种调试方法，用在不同的场合。当然话说回来，还是尽量写没有bug的代码，一劳永逸。

1. 传统调试函数
traceback(), debug(), trace(), browser(), recover()
traceback() 是在出错退出后，打印出调用堆栈的情况
debug() 是将断点设置在一个函数上，这个函数被调用的时候会变为单步执行，因此我们可以手动跟踪，只不过这里不如gdb灵活
trace() 等于是在函数中插入额外的调试代码，例如：trace(sum)在每次调用sum的时候打印出sum的参数；又比如
## arrange to call the browser on entering and exiting
## function f
trace(“f”, quote(browser(skipCalls=4)), exit = quote(browser(skipCalls=4)))
则表示使用browser()来调试，从第5次开始
browser()：这个函数往往作为参数，被调用时用户可以检查变量。用户可以输入c表示继续，n表示下一条指令，Q表示退出
recover()：和browser类似，也是被调用。不同在于用户可以选择不同的frame（堆栈深度）。

2. 更传统的调试函数print()，cat()
使用print()来打印每个变量调用时候的值；
更简单的情况可以用cat()，它的语法更简单，例如cat(“x=”, x)

3. 设置options(error=…)
我们希望出错的时候，R可以停止执行后续代码，并进入我们指定的调式模式。
在R的交互界面，可以设置：
options(error=recover)
在Rscript，即命令行方式，可以用下面的话把出错信息存储到文件：
options(error = quote({dump.frames(to.file=TRUE); q()}))

调试完毕，恢复初始设置时，可以用：
options(error = NULL)

这里举个例子吧（出处）：
错误的情景：

x <- 1:5
y <- x + rnorm(length(x),0,1)
f <- function(x,y) {
  y <- c(y,1)
  lm(y~x)
}

我们调试的时候，输入：

options(error=recover)

> f(x,y)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'x')

Enter a frame number, or 0 to exit   

1: f(x, y)
2: lm(y ~ x)
3: eval(mf, parent.frame())
4: eval(expr, envir, enclos)
5: model.frame(formula = y ~ x, drop.unused.levels = TRUE)
6: model.frame.default(formula = y ~ x, drop.unused.levels = TRUE)

Selection: 1
Called from: eval(expr, envir, enclos)
Browse[1]> x
[1] 1 2 3 4 5
Browse[1]> y
[1] 1.6591197 0.5939368 4.3371049 4.4754027 5.9862130 1.0000000

通过检查x和y的值就能发现问题了。

4. 设置断点 setBreakpoint()
从R 2.10开始，我们有了两个调试相关的函数findLineNum(), setBreakpoint()
有了断点，我们可以快速执行代码，直至有可能的错误部分（想想如果只有debug()则需要人工单步执行R语句，或者错误发生后recover()，我们需要反推到底是什么造成的错误）。这将大大提高我们除错的速度。
出处

这里举个例子展示如何在第3行设置断点：

x <- " f <- function(a, b) {
             if (a > b)  {
                 a
             } else {
                 b
             }
         }"


eval(parse(text=x))  # Normally you'd use source() to read a file...

findLineNum("<text>#3")   # <text> is a dummy filename used by parse(text=)

#This will print
#f step 2,3,2 in <environment: R_GlobalEnv>

#and you can use

setBreakpoint("<text>#3")

5. *apply 函数中如何调试：
用过R的都知道在循环中出错不容易。因为R处理循环很慢，我们往往不用for循环，而用sapply(), lapply()等等。这些函数出错的时候从来不会说是第几个循环变量出错的。对此，我们有如下方法：

使用try()函数, 出处：
举个例子：

> x <- as.list(-2:2)
> x[[2]] <- "what?!?"
> ## using sapply
> sapply(x, function(x) 1/x)
Error in 1/x : non-numeric argument to binary operator
# 看看用try()函数怎么样？
> sapply(x, function(x) try(1/x))
Error in 1/x : non-numeric argument to binary operator
[1] "-0.5"                                                    
[2] "Error in 1/x : non-numeric argument to binary operator\n"
[3] "Inf"                                                     
[4] "1"                                                       
[5] "0.5"

或者第三方程序库也行：
出处
foreach(.verbose= TRUE) —— 这个我没试验出来，不过foreach仍然是个强大的工具
plyr(.inform=TRUE)
给个plyr库的例子：

> laply(x, function(x) 1/x, .inform = TRUE)

Error in 1/x : non-numeric argument to binary operator
Error: with piece 2: 
[1] "what?"

另外题外话，R里面执行install.packages()的时候，只有头一次可以选repo（镜像库）的位置，如果之后你还想选不同的镜像库怎么办？可以执行这个：
options(“repos”=c(CRAN=”@CRAN@”))

最后把参考过的网页列在下面：
【1】Getting the state of variables after an error occurs in R
【2】What is your favorite R debugging trick?
【3】Debugging lapply/sapply calls
【4】R script line numbers at error?

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Leave a Reply Cancel reply