This is a guest entry written by Sergejs Harlamovs from IKARUS Security Software GmbH. His views and opinions are his own and not those of Hex-Rays. Any technical or maintenance issues regarding the code herein should be directed to the author.
IdaClu: Finding clues without knowing what to seek
IdaClu, as the name suggests, is about "clusterization" and "finding the clues". The plugin offers a toolset to group functions based on various criteria. This makes it particularly valuable for analyzing large samples with minimal or no context.
The problem and the solution
When reverse engineering in IDA, identifying function groups is a common task. To understand how a specific program works, it's more effective to work with groups rather than focusing on each function separately. However, it's not always the case that connected functions are referencing each other with xrefs and will eventually appear in the same call stack. Sometimes the connection between the functions is weak making building the group a non-trivial task.
IDA provides rich search functionality out of the box and there are some handy plugins that are pushing it even further. Finding common patterns in code might help partially with the discussed problem, but still requires a significant amount of time to identify the relevant parts.
A reasonable solution would be to have an automatic tool that highlights common points between functions… and to make it extensible… and to make it cross-compatible across different versions of IDA… and a helicopter available on a rooftop and a lot of explosions and sharks.
Some of the most prominent use cases include:
identifying functions with similar code structure using fuzzy hashing and a combination of the following methods:
- function byte pattern similarity
- function opcode similarity
- functon pseudocode text similarity
identifying all immediates in the code, excluding those that can be interpreted as addresses and find the corresponding functions referencing these values
identifying frequently referenced VFT-offsets and the various argument variations received by corresponding functions
identifying most referenced strings and the functions referencing them
identifying functions implementing specific control-flow constructs
The functions can be grouped differently based on the chosen algorithm ("tool" in terms of IdaClu). Each tool is represented by a corresponding button in the main GUI dialog's sidebar and is backed by a separate script in the IdaClu plugin subfolder.
Most tools are self-sufficient by design, but some may require user input for proper function grouping. In such cases, tool button must be clicked twice: first to display input controls, and then again to submit input data and initiate grouping/clustering.
As of the time of this writing, the toolset consists of 18 grouping/clustering algorithms available out of the box, organized into sections for convenience. A detailed description of these tools is available on the GitHub repository page.
IdaClu offers several function labeling tools. They allow to flag grouped functions in bulk. These tools are built on top of native and well-known IDA features:
- prefixing/renaming the functions
- highlighting functions with a color
- moving functions to a specific folder
Functions can have multiple labels applied, and there is a toggle for recursive mode. When switched on, all the functions down the tree that are referenced by currently selected functions will be labeled in the same way.
The labeling tools activate when functions are grouped and when at least one function is selected in a tree/table view. In case something goes wrong or there's a need to alter a specific function name, it can all be done inside IdaClu without switching to the standard Functions subview. To clear prefix/folder labels, click the "CLEAR" button in the corresponding mode. To rename a specific function, right-click it and select the "Rename" option.
The grouping iteration count can be arbitrary. It makes sense to continue as long as new connections between the functions can be found and new conclusions about program functionality can be made. IdaClu considers all functions discovered by IDA as potential candidates for grouping. The filtering feature helps narrow down the scope and makes grouping more specific. To do so, it uses user defined labels – prefixes, folders, and colors.
Labeled functions can form distinct sets for upcoming grouping iterations. There are as many filters as labeling tools. To focus on the payload of the analyzed sample and exclude library functions from grouping, set prefix filter to sub_ value. This is the only standard prefix.
While many of us try to stick to the very last version of IDA there are several reasons why some opt for the older versions. For instance, not upgrading the .idb version to 7.x or 8.x may be necessary if some team members are using older versions. In rare cases, older versions might produce more consistent decompiled code. Regardless of the reason, many will agree that checking plugin compatibility with the installed IDA version can be inconvenient.
IdaClu aims to be as IDA-version agnostic as possible. Using a set of IdaPython and Qt-shims alone proved insufficient. To address this, the plugin introduces an object that stores the current IDA setup and supported feature state. This object can be passed to any plugin component, ensuring cross-compatibility. When there is a replacement for specific features, shims are utilized. Otherwise, these features, along with related UI controls and functionality, are excluded, ensuring graceful degradation.
IdaClu was designed to be highly extensible from the very beginning. There were three main reasons for this:
- Predicting all potential use cases for this plugin was challenging due to the diverse contexts of software reverse engineering. Many other tools/sub-plugins for IdaClu will likely appear in the future.
- Advanced users should be able to influence the plugin easily, without having to reverse-engineer it or wait for feature requests to be fulfilled by the author.
- Writing an IdaPython script is much simpler than creating a plugin and delving into the Qt framework. The community scripts could benefit from IdaClu's GUI interface if converted to corresponding tools/sub-plugins.
If any of the mentioned points resonate with you, there might be a reason to write a sub-plugin/tool for IdaClu. Any valid IdaPython script that returns a dictionary of lists, where each list element is a function address, is a good candidate for conversion.
It's super-easy to start:
- Get the existing IdaPython script or write a new simplistic one
Add the following block at the start
SCRIPT_NAME = '<script_name>' # arbitrary name that will appear on the corresponding button SCRIPT_TYPE = 'func' # 'func' or 'custom' depending on whether the script iterates on functions or some other data structures to produce the output SCRIPT_VIEW = 'tree' # 'tree' is the only currently supported view, 'table' is to be added SCRIPT_ARGS =  # experimental feature, supports tuples of the form ('<control_name>', '<control_type>', '<control_placeholder>')
Add/replace the main function header with one of the following prototypes:
# Case #1: SCRIPT_TYPE == 'func': def get_data(func_gen=None, env_desc=None, plug_params=None): # 1. Iterate over pre-filtered functions via func_gen() generator # 2. Progress bar values are calculated automatically
# Case #2: SCRIPT_TYPE == 'custom': def get_data(progress_callback=None, env_desc=None, plug_params=None): # 1. Iterate over custom data structures # 2. Use `progress_callback(<current_index>, <total_count>)` to report current progress
Make sure get_data() function returns a dictionary of lists, where each list element is a function address
Navigate to IdaClu plugin "plugin" sub-folder and place your script file under any of the existing group_x folders
Now, launch IDA and IdaClu and see if it works. If you were lucky to make it work and you find it possible to share please do so and contribute to the tool repository of IdaClu 😉