Ridge regression

Ridge regression 相关的资料

以前学Ridge regression,大概记住了如何求回归系数 \(\beta^{ridge}\),但有些细节不了解。我找到下面两个slide感觉又学到了新东西。

 

第一个讲义 【Link or RidgeRegression – Ryan Tibshirani

http://www.stat.cmu.edu/~ryantibs/datamining/lectures/16-modr1.pdf

Slide 6:Shrink之后的回归系数看起来什么样?

Slide 7:为什么要center和normalize predictors?

Slide 13:当真正的回归系数很大的时候,Ridge regression还有好的表现么?或者说当true coefficient很大的时候,应该向什么方向调整\(\lambda\)可以减少prediction error?

 

第二个讲义 【Link or RidgeRegression-Rudyregularization

http://www-stat.stanford.edu/~owen/courses/305/Rudyregularization.pdf

Slide 17:Ridge regression的回归系数的bias具体是多少?

Slide 23-24:Ridge regression和Principal component analysis (PCA)的关系是什么?

Slide 37:选择Ridge regression的tuning parameter时一般用cross validation。如果用GCV(Generalized cross validation)的话,可以有close form.

Exact Logistic Regression

We will briefly explain why we would be interested in implementing exact logistic regression, then provides C++ and R codes.

1. Why exact test?

Since we want to have a clear mind of how likely/unlikely the realization we observed. In the classic example of 2×2 table without covariates, especially the 2×2 table has very few (<5) occurrence, Fisher exact tests are often applied, and large sample theory cannot give an accurate estimation.

2. Why exact logistic regression?

Fisher’s exact test cannot applied to logistic model. For example, when we have covariates in the model, we want BOTH estimate the effect size and get its exact p-value. In this case, only exact logistic regression provides solution.The theoretical background is provided in reference [1].

My implementation:

Download: exactLogisticRegression.tar

1. I verified the results with SAS.
2. The speed is comparable to, or faster than SAS.

Cons:
1. Only 1 interested parameter conditioning on all other parameter is supported for now.
2. I have not implemented the confidence limit parts, as it’s a bit more tedious.

R binding : mypackage_1.0.tar

See mypackage/R/rcpp_hello_world.R, I wrote a R function to wrap the C++ function.

R binding is helped by RCpp package. It greatly reduced the workload of exchanging date (in the form of matrix, list, vector) between C++ and R. A quick tutorial can be found from RCpp homepage(http://dirk.eddelbuettel.com/code/rcpp.html). For experienced Rcpp user, the quick-ref documentation (http://dirk.eddelbuettel.com/code/rcpp/Rcpp-quickref.pdf) is helpful.

【1】 Exact Logistic Regression, Robert E. Derr, SAS Institute Inc., Cary, NC http://support.sas.com/rnd/app/da/new/daexactlogistic.html

【2】Rcpp: Rcpp: Seamless R and C++ Integration dirk.eddelbuettel.com/code/rcpp.html