Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #3339
Closed
Open
Issue created Jan 11, 2019 by Administrator@rootContributor

Support using the original encoding template even after IR modifications

Created by: rlyerly

Decoding an instruction into DynamoRIO's IR format and then directly re-encoding back to machine code may change the machine-level encoding type. This happens even if the instruction is never modified using the IR APIs. Below is an example of dumping a function's instructions using DynamoRIO as a standalone decoder/re-encoder:

byte *start, *end; // Start & end address of memory-mapped function
byte *real; // Real virtual address of function
byte *prev;
... (retrieve real, start & end) ...
while(start < end) {
  instr_t *instr = instr_create(GLOBAL_DCONTEXT);
  instr_init(GLOBAL_DCONTEXT, instr);
  prev = start;
  start = decode_from_copy(GLOBAL_DCONTEXT, start, real, instr);
  std::cout << "Instruction size: " << instr_length(GLOBAL_DCONTEXT, instr) << std::endl;
  disassemble_with_bytes(GLOBAL_DCONTEXT, real, STDERR);
  instr_free(GLOBAL_DCONTEXT, instr);
  real += start - prev;
}

The while-loop decodes all instructions in a function (mapped into memory from an on-disk binary) and prints their sizes using instr_length(). This which forces re-encoding the instruction to determine its size since the loop uses decode_from_copy() to decode instructions. This code produces the following output for a given binary:

Instruction size: 10
 48 8b 04 25 d0 35 92 mov    0x009235d0[8byte] -> %rax
 00

The instruction's original size is 8 bytes, but DynamoRIO's re-encoding process changes the machine-level encoding so that it is now 10 bytes. According to the discussion here, this is because DynamoRIO walks an encoding template from specialized to general encoding types. In this particular situation, DynamoRIO found a more specialized encoding for the instruction versus what was emitted by the compiler. The instruction's change in size is a side-effect of changing the encoding.

Being able to control the encoding types may provide more flexibility for users, especially for instances where the user explicit control. For example, when using DynamoRIO as a standalone decoder/re-encoder users may not want to change the code size as it invalidates control flow targets. Exposing encoding controls may lead to finicky APIs, however, especially for encoding- or user-specified restrictions. For example, what if the user requests an encoding type that is not compatible with the instruction's operands?

As a step in that direction, DynamoRIO could expose an API to allow the user to specify that it wants to use the same encoding as the original instruction, e.g., instr_use_orig_encoding(instr_t *instr) or instr_encode(void *drcontext, instr_t *instr, byte *pc, bool orig_encoding). If the user changed the instruction or operands in such a way that the original encoding is invalid, DynamoRIO could return an error code or nullptr indicating the encoding failed.

Assignee
Assign to
Time tracking