Hex-Rays logo State-of-the-art binary code analysis tools
email icon

COM aka Component Object Model is the technology used by Microsoft (and others) to create and use reusable software components in a manner independent from the specific language or vendor. It uses a stable and well-defined ABI which is mostly compatible with Microsoft C++ ABI, allowing easy implementation and usage of COM components in C++.

COM basics

COM components and their interfaces are identified by UUID aka GUID – unique 128-bit IDs usually represented as string of several hexadecimal number groups. For example, {00000000-0000-0000-C000-000000000046} represents IUnknown – the base interface which must be implemented by any COM-conforming component. 

Each COM interface provides a set of functions in a way similar to a C++ class. On the binary level, this is represented by a structure with function pointers, commonly named <name>Vtbl. For example, here’s how the IUnknown is laid out:

struct IUnknownVtbl
  HRESULT (__stdcall *QueryInterface)(IUnknown *This, const IID *const riid, void **ppvObject);
  ULONG (__stdcall *AddRef)(IUnknown *This);
  ULONG (__stdcall *Release)(IUnknown *This);
struct IUnknown
  struct IUnknownVtbl *lpVtbl;

IDA’s standard type libraries include most of the COM interfaces defined by the Windows SDKs, so you can import these structures from them. Here’s how to do it manually:

  1. Open the Structures window (ShiftF9);
  2. Use “Add struct type…” from the context menu, or Ins;
  3. Type the name of the interface and/or its vtable (e.g. IUnknownVtbl) and click OK. If the interface is known, it will be imported from the type library automatically. If you are not sure it is available, you can click “Add standard structure” and use incremental search (start typing the name) to check if it’s present in the list of available types.

Once imported, the struct can be used, for example, to label indirect calls performed using the interface pointer.

How to know which interface is being used in the code? There are multiple ways it can be done, but one common approach is to use the CoCreateInstance API. It returns a pointer to the interface defined by the interface ID (IID) which is a kind of GUID. You can check what IID is used, then search for it in Windows SDK headers and hopefully find the interface name.

For example, consider this call:

.text:30961A4D push    eax                             ; ppv
.text:30961A4E push    offset riid                     ; riid
.text:30961A53 push    1                               ; dwClsContext
.text:30961A55 push    esi                             ; pUnkOuter
.text:30961A56 push    offset rclsid                   ; rclsid
.text:30961A5B mov     [ebp+ppv], esi
.text:30961A5E call    ds:CoCreateInstance

If we follow riid, we can see that it’s been formatted by IDA nicely as an instance of the IID structure:

.text:30961C18 riid dd 0EC5EC8A9h                           ; Data1
.text:30961C18                                         ; DATA XREF: sub_30961A2E+20↑o
.text:30961C18                                         ; sub_30DECD76+1D↓o
.text:30961C18 dw 0C395h                               ; Data2
.text:30961C18 dw 4314h                                ; Data3
.text:30961C18 db 9Ch, 77h, 54h, 0D7h, 0A9h, 35h, 0FFh, 70h; Data4

In the text form, this corresponds to EC5EC8A9-C395-4314-9C77-54D7A935FF70, but since it’s quite awkward to convert from the struct representation, a quick way is to search for EC5EC8A9 and see if you can find a match.

There is one in wincodec.h

    IWICImagingFactory : public IUnknown
        virtual HRESULT STDMETHODCALLTYPE CreateDecoderFromFilename( 
            /* [in] */ __RPC__in LPCWSTR wzFilename,
            /* [unique][in] */ __RPC__in_opt const GUID *pguidVendor,
            /* [in] */ DWORD dwDesiredAccess,
            /* [in] */ WICDecodeOptions metadataOptions,
            /* [retval][out] */ __RPC__deref_out_opt IWICBitmapDecoder **ppIDecoder) = 0;

Now that we know we’re dealing with IWICImagingFactory, we can import IWICImagingFactoryVtbl and  use it to label the calls made later by dereferencing the ppv variable:

IDA uses type information of the structure’s function pointer to label and propagate argument information:

While this process works, it is somewhat tedious and error prone. Is there something better?

COM helper

IDA ships with a standard plugin which can automate some parts of the process. If you invoke Edit > Plugins > COM Helper, it shows a little help about what it does:

Invoke the menu again to re-enable it. The default state is on for new databases, so normally you do not need to do that. With plugin enabled, we can do the following:

  1. Undefine/delete the IID instance at riid.
  2. Redefine it as a GUID (Alt-Q, choose “GUID”).

If the GUID is known, the instance is renamed to CLSID_<name>, and the corresponding <name>Vtbl is imported into the database automatically (if available in loaded type libraries). You can then use it to resolve the indirect calls from the interface pointer.

Extending the known interface list

To detect known GUIDs, on Windows the COM Helper uses the registry (HKLM\Software\Classes\Interfacesubtree). If the GUID is not found in registry (or not running on Windows), the file cfg/clsid.cfg in IDA’s install directory is consulted. It is a simple text file with the list of GUIDs and corresponding names. If you are dealing with lesser-known interfaces, you can add their GUIDs to this file so that they can be labeled nicely.