May 262011
 

如何在Python中调用C/C++代码
How to mix C/C++ code in Python

本文介绍一种手动的、简单的在Python中使用C/C++代码的方式。这个方法主要使用了ctypes模块。其他的混合Python,C/C++编程的方法还有Swig Boost.Python。前一种方法需要写一个接口文件(interface),而后一种需要使用庞大、深奥的boost类库,后两者适合可能适合更复杂的情况,这里只介绍第一种方法。

混合C/C++代码需要这几步:
1. 包装接口 C/C++ wrap functions up
2. 打包成共享库 Compiling C/C++ code and pack it to shared library
3. Python中导入共享库 Python imports shared library

先介绍一下北京,这里我的C++类GenomeSequence使用了模板(Template)和Memorymap,这是一个访问基因序列的类,比如如果一个生物序列是GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACT… 我们的类是gs,那么gs[0] = ‘G’, gs[1]=’A’ …. 摘录相关的函数如下:

class GenomeSequence : public genomeSequenceArray
{
public:
    /// Simple constructor - no implicit file open
    GenomeSequence();
    /// set the reference name that will be used in open()
    /// \param referenceFilename the name of the reference fasta file to open
    /// \return false for success, true otherwise
    ///
    /// \sa open()
    bool setReferenceName(std::string referenceFilename);
    /// return the number of bases represented in this reference
    /// \return count of bases
    genomeIndex_t   getNumberBases() const
    {
        return getElementCount();
    }
    inline char operator[](genomeIndex_t index) const
    {
        uint8_t val;
        if (index < getNumberBases())
        {
            if ((index&1)==0)
            {
                val = ((uint8_t *) data)[index>>1] & 0xf;
            }
            else
            {
                val = (((uint8_t *) data)[index>>1] & 0xf0) >> 4;
            }
        }
        else
        {
            val = baseNIndex;
        }
        val = isColorSpace() ? int2colorSpace[val] : int2base[val];
        return val;
    }
    /* ........... more codes omitted ................ */
}

但实际上这些细节并不重要,重要是如何包装,我们编写GenomeSequence_wrap.cpp文件,包括对上述4个函数的封装,源码如下:

#include "GenomeSequence.h"
#include <string>

extern "C"{
    GenomeSequence* GenomeSequence_new(){ return new GenomeSequence();}
    bool GenomeSequence_setReferenceName(GenomeSequence* gs, char* s) { 
        if (!gs) return false;
        std::string str = s;
        //printf("Loading %s ...\n", s);
        if (!gs->setReferenceName(str)){
            gs->open();
        } else {
            printf("Loading FAIL\n");
        }
        return (gs->setReferenceName(str));
    }
    void GenomeSequence_close(GenomeSequence* gs) {if (gs) gs->close();};
    int GenomeSequence_getNumBase(GenomeSequence* gs) {
        if (!gs) {
            printf("invalid gs\n");
            return -1;
        }
        return (gs->getNumberBases());
    }
    char GenomeSequence_getBase(GenomeSequence* gs, unsigned int i) { 
        if (gs) {
            return (*gs)[i];
        };
    };
}

第二步是编译,记住单个C/C++文件编译时使用-fPIC参数,最后打包的时候编译成共享库,摘录Makefile文件中片段如下:

lib:
	g++ -c -fPIC -I./lib GenomeSequence_wrap.c
	g++ -shared -Wl,-soname,libstatgen.so -o libstatgen.so  lib/*.o lib/samtools/k*.o lib/samtools/bgzf.o *.o

最后一步是在Python中写一个封装类,注意前两行引入ctypes库,之后就用这个库调用包装函数就行。
注意:我在GenomeSequence类的__getitem__中使用了如何扩展Python的容器类一文中介绍的一些技巧,这样可以更灵活的使用下标来访问数组中的元素。

from ctypes import cdll
lib = cdll.LoadLibrary("./libstatgen.so")

class GenomeSequence:
    def __init__ (self):
        self.obj = lib.GenomeSequence_new()
    def open(self, filename):
        lib.GenomeSequence_setReferenceName(self.obj, filename)
    def __len__ (self):
        return lib.GenomeSequence_getNumBase(self.obj)
    def __getitem__(self, key):
        if isinstance(key, int):
            return chr(lib.GenomeSequence_getBase(self.obj, key))
        elif isinstance(key, slice):
            return ''.join([self[x] for x in xrange(*key.indices(len(self)))])
        elif isinstance(key, tuple):
            return ''.join([self[i] for i in key])

    def at(self, i):
        return chr(lib.GenomeSequence_getBase(self.obj, i))
    def close(self):
        lib.GenomeSequence_close(self.obj)
    
if __name__ == '__main__':
    gs = GenomeSequence ()
    gs.open("/home/zhanxw/statgen/src/karma/test/phiX.fa");
    print len(gs)
    seq = [(gs.at(i)) for i in xrange(60)]
    print ''.join(seq)
    print gs[0:10],gs[20:30]
    print gs[0:10, 20:30]
    print gs[-10:]
    gs.close()
    print "DONE"

本文主要参考【1】。这里的方法基本重复了【1】中的步骤。写出本文中的代码在于进一步验证ctypes库可以灵活的处理C/C++和Python中的简单数据类型int, char*。

【1】Calling C/C++ from python?

  One Response to “如何在Python中调用C/C++代码”

  1. I have noticed you don’t monetize your blog, don’t waste your traffic,
    you can earn additional bucks every month because you’ve got high quality content.
    If you want to know how to make extra bucks, search for: Mrdalekjd methods for $$$

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

*