You can do better than 2 bytes. Use the same epilogue, but store a copy of the "binary" just before the stack pointer and offset the instruction pointer from the start of the binary by 1 byte. If you use the binary consisting of literally a one-byte value, 0x2A (i.e. 42), then your first instruction will be the first instruction of the epilogue which will pop the "binary" into RDI setting RDI to 42. There are maybe some details in the alignment, padding, and instruction choice in the loader to make that work "generically", but that strategy should work and give you a 1-byte solution.
edit: Actually, just define your binary format so that the first byte is copied to the stack and all subsequent bytes are copied to text with the epilogue appended to it.
edit: You could also define it so that the first byte is copied into the first argument register/RDI if you want to shrink loaded RAM usage to just 4 bytes of code and 1 byte of data.
This is of course assuming it is a "generic" binary format that is not literally just encoding the contents of the tiny program. Otherwise you could do 0 bytes and just have the loader pre-fill RAX with 60 and RDI with 42 and insert a one instruction epilogue consisting of syscall. You could technically still call that a "generic" binary format since any actual binary you attempt to load will just blow away those pre-filled GPR values.
edit: Actually, just define your binary format so that the first byte is copied to the stack and all subsequent bytes are copied to text with the epilogue appended to it.
edit: You could also define it so that the first byte is copied into the first argument register/RDI if you want to shrink loaded RAM usage to just 4 bytes of code and 1 byte of data.
This is of course assuming it is a "generic" binary format that is not literally just encoding the contents of the tiny program. Otherwise you could do 0 bytes and just have the loader pre-fill RAX with 60 and RDI with 42 and insert a one instruction epilogue consisting of syscall. You could technically still call that a "generic" binary format since any actual binary you attempt to load will just blow away those pre-filled GPR values.