Extending IDC and IDAPython

Scripting with IDA Pro is very useful to automate tasks, write scripts or do batch analysis, nonetheless one problem is commonly faced by script writers: the lack of a certain function from the scripting language. In the blog post going to demonstrate how to extend both IDC and IDAPython to add new functions.

Extending IDC

Every IDC variable is represented by an idc_value_t object in the SDK. The IDC variable type is stored in the idc_value->vtype field and have one of the following values:

VT_LONG: Used to store 32bit signed numbers (or 64bit numbers if __EA64__ is set). Use idc_value->set_long() to set a value and idc_value->num to read it.
VT_STR2: Used to store arbitrary strings. Use idc_value->set_string() to set a value, idc_value->c_str() to get a C string or idc_value->qstr() to get a qstring instance.
VT_INT64: Used to store 64bit signed numbers. Use idc_value->i64 to read the value and idc_value->set_int64() to set a value
VT_OBJ: Used to represent objects. Such idc variable is created via the VarObject() and attributes are managed by VarSetAttr() / VarGetAttr()
VT_PVOID: Use to store arbitrary untyped values. Use idc_value->pvoid to read this value and idc_value->set_pvoid() to set it

Now that idc_value_t is covered, let us explain how to add a new IDC function in two steps:

Writing the IDC function callback
Registering the function

Writing the callback

The callback is defined as:

typedef error_t idaapi idc_func_t(idc_value_t *argv,idc_value_t *r);

The argv array is initialized by the IDC interpreter and contains the passed arguments and the r argument is set by the callback to return values to the IDC interpreter.

Registering an IDC function

The IDC function callback can be registered with the IDC interpreter using this function:

bool set_idc_func(const char *name, idc_func_t *fp, const char *args);

Where:

name: designates the IDC function name to be added
fp: IDC function callback (written in the previous step)
args: A zero terminated character array containing the expected arguments types (VT_xxx). For example, an IDC function that expects two numbers and one string as arguments have the following args value: {VT_LONG, VT_LONG, VT_STR2, 0}

When the plugin is about to unload (in its term() callback), it should unregister the IDC function by calling set_idc_func(name, NULL, NULL)

Example

For the sake of demonstration, let us add getenv() to IDC:

sstatic const char idc_getenv_args[] = { VT_STR2, 0 };
static const char idc_getenv_name[] = "getenv";
static error_t idaapi idc_getenv(idc_value_t *argv, idc_value_t *res)
{
  char *env = getenv(argv[0].c_str());
  res->set_string(env == NULL ? "" : env);
  return eOk;
}
int idaapi init()
{
  set_idc_func(idc_getenv_name, idc_getenv, idc_getenv_args );
  return PLUGIN_KEEP;
}
void idaapi term(void)
{
  // Unregister
  set_idc_func(idc_getenv_name, NULL, NULL);
}

Extending IDAPython

It is possible to extend IDAPython in two ways:

Modifying the source code of IDAPython
Writing a plugin to extend Python scripting

While the former method requires basic knowledge of SWIG, both methods require some understanding of the Python C API. In this article, we will use the second method because it is more practical and does not require modification to IDAPython itself. This process involes three steps:

Initializing the Python C API
Writing the callback(s)
Registering the function(s)

If you’re new to the Python C API, you could still follow and understand the code used in this blog post without refering to this tutorial.

Writing the callback

Let us wrap the following function:

// ints.hpp
idaman ssize_t ida_export get_predef_insn_cmt(
        const insn_t &cmd,
        char *buf,
        size_t bufsize);

which is used to retrieve the predifined comment of a given instruction. We want to expose it as the following Python function:

def get_predef_insn_cmt(ea):
    """Return instruction comments
    @param ea: Instruction ea
    @return: None on failure or a string containing the instruction comment
    """
    pass

Here is the callback:

static PyObject *py_getpredef_insn_cmt(PyObject * /*self*/, PyObject *args)
{
  do
  {
    // Parse arguments
    unsigned long ea;
    if (!PyArg_ParseTuple(args, "k", &ea))
      break;
    // Decode instruction
    if (decode_insn(get_screen_ea()) == 0)
      break;
    // Get comments
    char buf[MAXSTR], *p = buf;
    p += qsnprintf(buf, sizeof(buf), "%s: ", ph.instruc[cmd.itype].name);
    if (get_predef_insn_cmt(cmd, p, sizeof(buf) - (p - buf)) == -1 || *p == '\0')
      break;
    // Return comment as a string
    return PyString_FromString(buf);
  } while (false);
  Py_RETURN_NONE;
}

PyArg_ParseTuple() is used to parse the arguments. This function can be likened to sscanf(). For a description about the format string please refer to the Py_BuildValue() documentation
get_predef_insn_cmt() is used to fetch the comment and PyString_FromString() is used to return a string Python object
In case of failures we use the Py_RETURN_NONE macro to return the Py_NONE value (None in Python)

Registering the callback(s)

Unlike IDC callback registration, Python C API allows you to register more than a function at once. This is done by describing all the callbacks in a PyMethodDef array:

static PyMethodDef py_methods[] =
{
  {"getpredef_insn_cmt",  py_getpredef_insn_cmt, METH_VARARGS, ""},
  {NULL, NULL, 0, NULL}
};

and then calling the registration function:

Py_InitModule("idapyext", py_methods);

Finally, to use this function, make sure you “import idapyext” before calling getpredef_insn_cmt The source code used in the blog post can be downloaded from here
UPD 2014-04-24: please read http://www.hexblog.com/?p=788 for important changes in IDA v6.5

UI and scripting improvements IDA Pro 5.7 highlights