PEH.EXE: Running Peh on DOS
February 2024
Introduction
Peh is a fixed-point cryptographic calculator and an arithmetic language created by Stanislav Datskovskiy (2017) in Ada 2012 programming language (AXE Consultants 2012) using his Finite Field Arithmetic library (FFA). While the library itself is largely system-independent (the only system-dependent knobs being the CPU characteristics, e.g., the word width), the calculator itself uses calls to the operating system using standard C library functions for process control, command line parameters and I/O. Both FFA and Peh were linked against the GNAT Ada Runtime Library which itself also communicates with the operating system using the C standard library.
The purpose of this article is to present a proof of concept on how to compile a static Peh executable for a DOS-like operating system running on a 32‑bit x86 CPU (i386 or later) using GNAT Ada Compiler, a part of the GNU Compiler Collection (GCC), without linking against the C standard library. We will need to prepare a small Ada runtime required to run both FFA and Peh and replace the system-dependent bits with calls to our own operating system interfaces.
While DOS itself, running in 16‑bit real mode, cannot run 32‑bit applications, we can still execute them putting the CPU in the Protected Mode using a DOS extender and communicating with the operating system using DOS Protected Mode Interface (DPMI) (DPMI Committee 1990). The extender we are going to use is Narech Koumar’s DOS/32 Advanced DOS Extender (DOS32A), as it is pretty compact and easy to modify to suit our needs (as we shall soon see).
Getting started
The author uses the triplet i686‑pc‑pe in the reminder of this article to identify the target in the compilation environment. The triplet itself, as a matter of course, can be changed to accomodate the local system conventions or to specify an older x86 CPU (like, e.g., an i486, etc.).
The entire process can be summarized in the following steps:
- assembling and linking a modified DOS32A capable of running programs compiled by our GCC,
- compiling a toolchain targeting i686‑pc‑pe,
- cross-compiling our own Ada runtime library for i686‑pc‑pe,
- cross-compiling Peh for i686‑pc‑pe,
- adding finishing touches.
Prerequisites:
- V version control system,
- Borland Turbo Assembler 4.0 (or equivalent, other versions might or might not work),
- source release of GNU binary utilities (binutils) (any version supporting Microsoft Windows should work, the author used versions 2.38–2.41 and did not encounter any problems),
- source release of GCC 4.9.4 (or any other version supporting Ada 2012, but the cross-complier build process may be different),
- source release of FFA 199 (ch. 21A-Ter).
The author assumes the reader has experience with cross-compilation and has managed to successfully compile binutils and GCC in the past.
Extending the DOS extender
We will need the DOS32A source code, which has been mirrored in the following vpatch: dos32a-src.vpatch (seal).
The first obstacle in our way is the incompatibility of executable file formats: a valid input for DOS32A is either in a form of an LE, LX or an LC executable. It seems the author of DOS32A had plans of adding a support for the PE executable format, a format extensively supported by the entire GNU software development stack, but unfortunately the loader part has never been finished, the only thing left being a comment saying “Insert PE loader implementation here.” Without any further ado, we shall abide by this instruction.
The PE/COFF format (Microsoft Corporation 1999), used by the GNU toolchain, aims to be compatible with the format used on Microsoft Windows, so we can use the provided documentation to create a loading mechanism which will properly parse the file header and load all of the needed objects to memory. Microsoft Windows executables are loaded to the address specified in the header, so all of the memory references will have to be listed in the appropriate section in the final executable. Fortunately, the entire memory allocation part has been already taken care of in the main LX executable loader, so our job is a little easier. The reader should get familiar with the source code of the loader to understand the exact process and the procedures we are going to use.
The reader should also get familiar with the LX format documentation (it can be found on the compact disk of the Arsenal Computer’s OS/2 Arsenal), because the original loader was written with regard to the terminology and conventions used therein.
The LX executable files are divided into objects. The DOS extender allocates memory for every object and loads the appropriate segments before applying fixups for all of the absolute memory references. It is quite similar to the PE/COFF file format, where we have sections and relocations. We can use the original loader code for loading up the segments (and its general idea for our main loader procedure), but we will have to locate and properly load the sections ourseleves.
Press the following vpatch: dos32a-pecoff.vpatch (seal). The only file differing from the original source release is the loadpe.asm. We can analyze it below.
load_pe_app: mov _app_type,3 call load_pe_header call verbose_showloadhdr
First thing we do is loading the PE file header and displaying the useful information in case of the verbose mode. Fortunately, we are not going to modify the internals of the extender, so we can safely use the original display procedure, but we will have to write our own header loader. (Refer to the PE/COFF documentation while analyzing the code.)
load_pe_header: mov ecx,0F8h ; size of a full 32-bit PE header mov edx,04h mov _err_code,3002h ; "error in app file" call load_fs_block mov edx,_exec_start mov ax,fs:[0016h] ; load characteristics and ax,0103h ; check for: cmp ax,0102h ; EXECUTABLE_IMAGE and 32BIT_MACHINE mov ax,3005h ; and not RELOCS_STRIPPED jne file_error
We only want to load 32‑bit executables with relocations. We cannot force the loader to load our sections in the desired addresses, so we are going to have to properly relocate all memory references. Fortunately, our linker can leave the whole section with all the memory references in our executable sections, so we can use this information later.
mov ax,fs:[0006h] ; get number of sections mov cx,ax cmp ax,APP_MAXOBJECTS mov ax,4001h ; "too many objects" ja file_error mov _app_num_objects,ecx
We do not want more sections than the maximum amount of loadable objects.
mov eax,0F8h ; size of the optional header add eax,edx mov _app_off_objects,eax mov eax,fs:[0034h] ; image base mov _app_tmp_addr1,eax mov eax,fs:[0060h] ; stack size mov _app_tmp_addr2,eax mov eax,fs:[00A0h] ; base relocation table virtual address mov _app_off_fixrectab,eax mov eax,fs:[00A4h] ; base relocation table size mov _app_siz_fixrecstab,eax mov _app_eip_object,0 ; 0 = have not found it yet mov eax,fs:[0028h] ; entry point address mov _app_eip,eax ret
This part is pretty self-explanatory. We load all of the offsets in our file containing information required later. We cannot determine which section contains the beginning of our code, so we set it to zero for now.
mov ecx,1 @@1: call load_pe_object call create_selector call verbose_showloadobj push dword 0 ; padding push edi ; save address of the loaded object push ebp ; save size of the loaded object push esi ; save the virtual address inc cx cmp cx,word ptr _app_num_objects jbe @@1
After loading the header, it is time to load all of our sections. We load the object itself, create a selector for the extender and display information in the verbose mode. We also fill our stack with a simple data structure, for every object, containing the virtual address, the size of the object and its actual address.
load_pe_object: push ecx mov _err_code,3002h mov edx,_app_off_objects call seek_from_start mov ecx,28h ; size of a section entry xor edx,edx call load_fs_block add _app_off_objects,eax
The loading part is pretty simple. First, we want to load the entry containing all the information about our section. We move the pointer with each loaded section.
mov eax,fs:[0008h] ; virtual size mov esi,fs:[000Ch] ; virtual address mov edx,fs:[0014h] ; pointer to raw data mov ecx,fs:[0024h] ; characteristics call seek_from_start ; head to section data
Here we load all the required information to dedicated registers and move our file pointer to the beginning of the section data.
mov edx,2040h ; 32-bit and preloaded test ecx,40000000h ; readable-p jz @@nr or edx,0001h ; set readable @@nr: test ecx,80000000h ; writable-p jz @@nw or edx,0002h ; set writable @@nw: test ecx,20000000h ; executable-p jz @@ne or edx,0004h ; set executable @@ne: push edx ; save our characteristics
Here we must read the flags of our section and translate them to a format used by the DOS extender. As we can see in the code above, we do not have to check all of the flags, just the ones required to properly set up the objects.
test ecx,00000020h ; check if section contains code jz @@skip cmp _app_eip_object,0 ; not 0 = already found it jnz @@skip cmp _app_eip,esi ; EIP >= virtual address jb @@skip mov ecx,eax add ecx,esi cmp _app_eip,ecx ; EIP < virtual address + virtual size jae @@skip mov ecx,[esp+4] mov _app_eip_object,ecx sub _app_eip,esi
This is how we get proper pointer to the beginning of our code. If the section is executable, we check if we are still looking for the section containing the initial procedure. If we are, and if it looks like the header value lies within the section’s virtual bounds, we set the appropriate variable to the current section and fix the initial procedure address.
@@skip: mov ebx,eax ; get physical size shr ebx,12 ; number of pages test eax,0FFFh ; check for a tail jz @@1 inc ebx ; add one more page @@1: call alloc_block ; allocate EAX memory block to EDI mov ecx,eax ; ECX = bytes to read mov ebp,eax ; EBP = preserve virtual size mov edx,edi ; EDX = addres to read to call fill_zero_pages ; fill allocated memory with zeroes
We have to figure out how many memory blocks we need for our section and allocate them. Fortunately, we can use the original allocation procedure.
mov _err_code,3002h call load_gs_block ; load object data pop edx ; leave our characteristics on EDX pop ecx ret
Finally, we load up the section data, leaving the characteristics on the EDX register for the selector creation procedure.
The only thing missing is our stack. The LX file format contains a dedicated stack object, but there is no such section residing in the PE/COFF executable file. We did load the size of the stack, hinted by the file header, so we can just create one more fixed-size object.
call create_stack_object call create_selector call verbose_showloadobj
The actual stack object creation should be pretty self-explanatory by now.
create_stack_object: push ecx mov eax,_app_tmp_addr2 mov ebx,eax shr ebx,12 test eax,0FFFh jz @@1 inc ebx @@1: mov _app_esp_object,ecx mov _app_esp,eax call alloc_block mov ecx,eax mov ebp,eax mov edx,edi call fill_zero_pages mov edx,2103h ; 32-bit, zeroed, rw pop ecx ret
The only thing left is relocating all of the addresses. First we must prepare the appropriate registers for the relocation procedure and find the section containing the relocations.
mov ebp,esp ; point to objects mov ebx,_app_num_objects dec bx shl bx,4 ; times 16 (the size of our record) call fix_reloc_offset
We do know the virtual address of the relocations section, so we can just iterate through our stack and find the appropriate object.
fix_reloc_offset: push ebx mov eax,_app_off_fixrectab @@1: cmp eax,[ebp+ebx+0] je @@done sub bx,10h jmp @@1 @@done: mov eax,[ebp+ebx+8] mov _app_off_fixrectab,eax pop ebx ret
Now we can relocate the addresses in all of our objects.
@@4: call relocate_pe_object sub bx,10h ; get next object jnc @@4
The relocation code is pretty complex, so it is going to be presented in one block.
relocate_pe_object: mov _err_code,4005h ; "unrecognized fixup data" mov edx,_app_off_fixrectab ; first base relocation block mov ecx,_app_siz_fixrecstab ; size of all blocks @@1: test ecx,ecx ; check if zero jz @@done ; if zero, we're done ;; here we assume consistent object ordering, ascending and we also ;; assume there are no blocks starting before our first object mov eax,[ebp+ebx+0] ; virtual address of our loaded object cmp eax,gs:[edx] ; skip blocks below us ja @@next add eax,[ebp+ebx+4] ; add size of our loaded object cmp eax,gs:[edx] ; skip blocks above us jbe @@next lea esi,[edx+8] ; ESI = address of first relocation mov edi,edx ; EDI = address of current block add edi,gs:[edx+4] ; EDI = address of next block @@2: cmp esi,edi ; finish the block if at the end je @@next mov ax,gs:[esi] ; get relocation and eax,0000F000h ; relocation type mask jz @@skip ; skip if base relocation type = 0 cmp eax,00003000h jne file_errorm ; eggog if base relocation type /= 3 mov ax,gs:[esi] ; get relocation (again) and eax,00000FFFh ; clear relocation type add eax,gs:[edx] ; add page address sub eax,[ebp+ebx+0] ; sub virtual address of our object add eax,[ebp+ebx+8] ; add actual address of our object push esi ecx edx ebx ; save the registers mov esi,gs:[eax] ; load the address to relocate sub esi,_app_tmp_addr1 ; sub image base xor ebx,ebx mov edx,_app_num_objects ; for our all remote objects shl edx,4 ; times 16 (size of object structure) @@3: cmp ebx,edx ; leave loop if we did all objects je @@oo ;; here we assume consistent remote object ordering, descending cmp esi,[ebp+ebx+0] ; address equal or lower our page jae @@mm add ebx,10h ; next object jmp @@3 @@mm: mov ecx,_app_tmp_addr1 ; get image base sub gs:[eax],ecx ; sub it from the address mov ecx,[ebp+ebx+0] ; get virtual address of our object sub gs:[eax],ecx ; sub it from the address mov ecx,[ebp+ebx+8] ; get actual address of our object add gs:[eax],ecx ; add it to the address @@oo: pop ebx edx ecx esi ; restore the registers @@skip: add esi,2 ; records are two octets long jmp @@2 ; next reloc @@next: mov eax,gs:[edx+4] ; get size of our block add edx,eax ; move block pointer to next block sub ecx,eax ; shrink size off all blocks jmp @@1 ; next block @@done: ret
After that, we can safely close the file, set the stack to the proper value, show some debug info in the verbose mode and finally enter our code.
call close_exec mov esp,_sel_esp call verbose_showstartup jmp enter_32bit_code
The build process involves assembling two object files using Borland Turbo Assembler (TASM) and linking them together. Version 7.1 of DOS32A used a linker accompanying Watcom C/C++ compilers, but we can make do with Borland Turbo Linker, (hopefully) included in our distribution of TASM.
The author of DOS32A used a few macros not included in the final source release. We can find the file STDDEF.INC required for proper assembly in an older release (version 7.1). The file is reproduced verbatim below.
cr equ 0Dh,0Ah cre equ 0Dh,0Ah,00h bptr equ byte ptr wptr equ word ptr dptr equ dword ptr offs equ offset fptr equ far ptr clr macro r1 xor r1,r1 endm rdtsc macro db 0Fh, 31h endm TRUE equ 1 FALSE equ 0
It is pretty self-explanatory. It should be saved into the LIB directory of your TASM installation (or any other directory set with an -i switch in the TASM configuration file or at the command line). The DOS32A 7.1 documentation also included a sample configuration file containing switches -r -ml -m -q -t and they must be either added to the local configuration file or must be specified at the command line.
Issue the following commands while in the top-level of the source directory.
tasm -dEXEC_TYPE=0 -c -la kernel.asm, obj\kernel.obj, obj\kernel.lst tasm -dEXEC_TYPE=0 -c -la dos32a.asm, obj\dos32a.obj, obj\dos32a.lst tlink /Tde /3 obj\dos32a.obj obj\kernel.obj, bin\dos32a.exe
The bin directory should now contain a brand new dos32a.exe executable. The reader is expected to read the DOS32A documentation to understand the way it is to be used.
Cross-compiling for i686‑pc‑pe
Though the binutils package contains a dedicated ‑*‑pe* support (and should be compiled the regular way, without any changes to the source tree), the GCC, unfortunately, does not. But we should be able to make do with what we have with only a few changes to the source tree.
The following instructions apply to GCC 4.9.4. The author managed to compile a cross-compiler using the following instructions on his machine and operating system, which does not use the GNU C Library and is incapable of running dynamically linked executables, among other oddities. As the GCC was heavily modified to run in such an environment, only the changes required to successfully create a cross compiler are listed. If the reader uses an environment which requires other changes to run GCC properly, or wants to use a different version of the package, following the instructions below may not produce a desirable result. They can still, however, be helpful in producing a required cross compiler.
First of all, we need to teach GCC to recognize our new target. Fortunately, it is so similar to the existing Microsoft Windows targets, we can reuse most of the files with only one exception. Navigate to ./gcc/config/i386 and make a copy of the cygming.h file (the author named the copy pe.h). The copy will be used in our target configuration. The only change we need to make is removing one of the lines in the TARGET_OS_CPP_BUILTINS definition. Remove the line containing the following text.
EXTRA_OS_CPP_BUILTINS (); \
Now, modify the ./gcc/config.gcc file. Analyze the structure of the switch statement containing various targets and add the following definition.
i[34567]86-*-pe*) tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h dbxcoff.h newlib-stdint.h i386/pe.h" xm_file=i386/xm-cygwin.h tmake_file="${tmake_file} i386/t-cygming t-slibgcc" target_gtfiles="\$(srcdir)/config/i386/winnt.c" extra_options="${extra_options} i386/cygming.opt" extra_objs="winnt.o winnt-stubs.o" c_target_objs="${c_target_objs} msformat-c.o" cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o" ;;
You can compare the above definition with other Microsoft Windows targets (the obvious change being the inclusion of our pe.h file).
A similar change is required in the libgcc config. Modify the ./libgcc/config.host file. Analyze the structure, just like in the previous step, and add the following definition:
i[34567]86-*-pe*) tmake_file="$tmake_file t-libgcc-pic" ;;
As before, it is recommended to compare it with other definitions.
The above changes should be enough for the GCC to recognize the new target. The GCC can now be compiled for the i686‑pc‑pe target (with, of course, the Ada support). The author had noticed the build process does not set up the correct assembler in his environment and is thus required to use the --with-as parameter for the ./configure script (may not be the case in all environments).
After producing and installing a cross-compiler for the i686‑pc‑pe, the reader is expected to properly configure the gprbulid system to recognize the i686‑pc‑pe target, as both our runtime library, FFA, and Peh use it as their default build system.
Weightless Ada Runtime for DPMI
The runtime we are going to use is Weightless Ada Runtime for DPMI (ward). It is a spiritual successor to ave1’s (2018) Zero Foot Print Ada runtime and even now still shares its build process. (Possibly subject to a change to remove the make(1) dependency.) The version linked in this article contains a simple and minimal runtime necessary to properly run Peh and FFA (without any guarantees of conforming to the Ada 2012 standard). The runtime itself is simple enough to be analyzed by the reader, provided they become familiar with the DPMI specification and the DOS API.
The runtime can be obtained by pressing the following vpatch: weightless-genesis.vpatch (seal).
The build process consists of issuing a simple make command in the top-level of the source directory (the reader should adjust the target triplet in the Makefile or pass it as an argument (TARGET) in case of using a different one than the author).
make
The result should be a static libada.a library in the lib directory.
Compiling Peh
Press the following vpatch: ffa_dpmi_poc.vpatch (seal).
Peh can be bulit the traditional way using a gprbuild command with the following arguments.
gprbuild --target=i686-pc-pe --RTS=/path/to/ward
(The /path/to/ward obviously replaced by the actual path on the reader’s system.)
The bin directory should now contain the Peh executable. The default knobs in binutils cause the file to appear without the required file extension, so the reader should add an .exe suffix (required by DOS32A and DOS-like systems in general) before continuing. We should now be able to run it in a DOS environment using DOS32A. Issue the following command.
dos23a peh
The result should be as expected and documented in the FFA series.
Finishing touches
While the result is satisfactory, running peh.exe directly results in a message telling us how Peh cannot be run in DOS mode. The most desired result is probably being able to run Peh just like any other program on our system. We can achieve that by replacing the stub portion of the PE/COFF executable file (i.e., a program displaying the above message) with another program that will run DOS32A with our executable as the parameter. Fortunately, a stub doing exactly that can be found in the stub32 directory of our DOS32A source release. Assembling it (and understanding the loading process of PE executables) is left as an excersise for the reader.
Unfortunately, neither ld(1) nor strip(1) supports replacing the PE stub (unlike Microsoft’s ld.exe linker), so we will have to do the job ourselves. As we can see in the PE/COFF documentation, the only things that will have to be replaced, aside from the stub program in the beggining of the file, are the absolute addresses of the COFF sections. The author encourages the reader to analyze the documentation and write a program for replacing the stub and updating all of the section references, but is willing to share the following poorly-written Lisp program to demonstrate what must be done.
(defun replace-stub (old-executable new-stub new-executable) "Replace program stub in old-executable with new-stub. Replace program stub in old-executable with new-stub. Stores the result in new-executable. Stub must at least contain the header." ;; portable helper funcitons (little-endian) (flet ((read-word (stream) (+ (read-byte stream) (ash (read-byte stream) 8))) (read-double-word (stream) (+ (read-byte stream) (ash (read-byte stream) 8) (ash (read-byte stream) 16) (ash (read-byte stream) 24))) (write-word (word stream) (write-byte (logand word #xff) stream) (write-byte (logand (ash word -8) #xff) stream)) (write-double-word (double-word stream) (write-byte (logand double-word #xff) stream) (write-byte (logand (ash double-word -8) #xff) stream) (write-byte (logand (ash double-word -16) #xff) stream) (write-byte (logand (ash double-word -24) #xff) stream))) (with-open-file (stub new-stub :element-type '(unsigned-byte 8) :direction :input) (let ((stub-length (file-length stub))) (when (< stub-length #x40) (error "Stub must be at least 64 bytes long.")) (with-open-file (input old-executable :element-type '(unsigned-byte 8) :direction :input) (with-open-file (output new-executable :element-type '(unsigned-byte 8) :direction :output :if-exists :supersede :if-does-not-exist :create) ;; write stub up to e_lfanew (loop for i from 1 to #x3c do (write-byte (read-byte stub) output)) ;; write e_lfanew based on file length (write-double-word stub-length output) ;; write the rest of the stub (when (> stub-length #x40) (file-position stub #x40) (loop for i from #x41 to stub-length do (write-byte (read-byte stub) output))) ;; read the original e_lfanew (file-position input #x3c) (let ((elfanew (read-double-word input)) (input-length (file-length input))) ;; position head at PE\0\0 (file-position input elfanew) ;; skip 6 bytes (loop for i from 1 to 6 do (write-byte (read-byte input) output)) ;; get number of sections (let ((number-of-sections (read-word input))) (write-word number-of-sections output) ;; skip 12 bytes (loop for i from 1 to 12 do (write-byte (read-byte input) output)) ;; read size of optional header (let ((size-of-optional-header (read-word input))) (write-word size-of-optional-header output) ;; skip last 2 bytes of header and the entire optional header (loop for i from -1 to size-of-optional-header do (write-byte (read-byte input) output)) ;; for every section (loop for s from 1 to number-of-sections do ;; skip 20 bytes (loop for i from 1 to 20 do (write-byte (read-byte input) output)) ;; read pointer to raw data (let ((pointer-to-raw-data (read-double-word input))) (when (not (zerop pointer-to-raw-data)) ;; shift it (setf pointer-to-raw-data (+ pointer-to-raw-data (- stub-length elfanew)))) (write-double-word pointer-to-raw-data output)) ;; skip 16 bytes (loop for i from 1 to 16 do (write-byte (read-byte input) output))))) ;; skip the rest of the file (loop for i from (1+ (file-position input)) to input-length do (write-byte (read-byte input) output)))))))))
Running the above program with the appropriate paths will create a new executable which should be able to run as expected on a computer system running DOS. Type the following command.
peh
The result should be as expected.
The DOS32A can itself be configured during its build process by the appropriate knobs in the dos32a.asm file or the environment variable (if the config by environment knob is on). The author likes to at least disable the copyright banner.
References
ave1. 2018. “GNAT and Zero Foot Print Runtimes,” Ave1 (February). http://dulap.xyz/pub/mirrors/ave1.org/2018/gnat-and-zero-foot-print-runtimes/trackback/index.html.
AXE Consultants. 2012. Ada Reference Manual: 2012 Edition. http://ada-auth.org/standards/12rm/RM-Final.pdf.
Datskovskiy, Stanislav [asciilifeform, pseud.]. 2017. “Finite Field Arithmetic,” Loper OS (December). http://www.loper-os.org/?p=1913.
DPMI Committee. 1990. DOS Protected Mode Interface (DPMI) Specification: Protected Mode API for DOS Extended Applications; Version 0.9. https://web.archive.org/web/20160405012113/http://tenberry.com/dpmi/01.html.
Microsoft Corporation. 1999. Microsoft Portable Executable and Common Object File Format Specification: Revision 6.0. http://www.osdever.net/documents/PECOFF.pdf.