Most of IDA users probably analyze software that uses English or another Latin-based alphabet. Thus the defaults used for string literals – the OS system encoding on Windows and UTF-8 on Linux or macOS – are usually good enough. However, occasionally you may encounter a program which does use another language.
In case the program uses wide strings, it is usually enough to use the corresponding “Unicode C-style” option when creating a string literal:
In general, Windows programs tend to use 16-bit wide strings (wchar_t
is 16-bit) while Linux and Mac use 32-bit ones (wchar_t
is 32-bit). That said, exceptions happen and you can use either one depending on a specific binary you’re analyzing.
Hint: you can use accelerators to quickly create specific string types, for example Alt–A, U for Unicode 16-bits.
There may be situations when the binary being analyzed uses an encoding different from the one picked by IDA, or even multiple mutually incompatible encodings in the same file. In that case you can set the encoding separately for individual string literals, or globally for all new strings.
To add a custom encoding to the default list (usually UTF-8, UTF-16LE and UTF-32LE):
On Linux or macOS, run iconv -l
to see the available encodings.
Note: some encodings are not supported on all systems so your IDB may become system-specific.
From now on, the A shortcut will create string literals with the new default encoding, but you can still override it on a case-by-case basis, as described above.