Hex-Rays' blog

Igor’s tip of the week #13: String literals and custom encodings – Hex Rays

Written by Igor Skochinsky | Oct 29, 2020

Most of IDA users probably analyze software that uses English or another Latin-based alphabet. Thus the defaults used for string literals – the OS system encoding on Windows and UTF-8 on Linux or macOS – are usually good enough. However, occasionally you may encounter a program which does use another language.

Unicode strings

In case the program uses wide strings, it is usually enough to use the corresponding “Unicode C-style” option when creating a string literal:

In general, Windows programs tend to use 16-bit wide strings (wchar_t is 16-bit) while Linux and Mac use 32-bit ones (wchar_t is 32-bit). That said, exceptions happen and you can use either one depending on a specific binary you’re analyzing.

Hint: you can use accelerators to quickly create specific string types, for example Alt–A, U for Unicode 16-bits.

Custom encodings

There may be situations when the binary being analyzed uses an encoding different from the one picked by IDA, or even multiple mutually incompatible encodings in the same file. In that case you can set the encoding separately for individual string literals, or globally for all new strings.

Add a new encoding

To add a custom encoding to the default list (usually UTF-8, UTF-16LE and UTF-32LE):

  1. Options > String literals… (Alt–A);
  2. Click the button next to “Currently:”;
  3. In context menu, “Insert…” (Ins);
  4. Specify the encoding name.

For the encoding name you can use:
  • Windows codepages (e.g. 866, CP932, windows-1251)
  • Well-known charset names (e.g. Shift-JIS, UTF-8, Big5)

On Linux or macOS, run iconv -l to see the available encodings.

Note: some encodings are not supported on all systems so your IDB may become system-specific.

Use the encoding for a specific string literal
  1. Invoke Options > String literals… (Alt–A);
  2. Click the button next to “Currently:”;
  3. Select the encoding to use;
  4. Click the specific string button (e.g. C-Style) if creating a new literal or just OK if modifying an existing one.

Set an encoding as default for all new string literals
  1. Invoke Options > String literals… (Alt–A);
  2. Click “Manage defaults”;
  3. Click the button next to “Default 8-bit” and select the encoding to use.

From now on, the A shortcut will create string literals with the new default encoding, but you can still override it on a case-by-case basis, as described above.