ἅπαξ λεγόμενον
146.190.13.172

PEH.EXE: Running Peh on DOS


February 2024

Introduction

Peh is a fixed-point cryptographic calculator and an arithmetic language created by Stanislav Datskovskiy (2017) in Ada 2012 programming language (AXE Consultants 2012) using his Finite Field Arithmetic library (FFA). While the library itself is largely system-independent (the only system-dependent knobs being the CPU characteristics, e.g., the word width), the calculator itself uses calls to the operating system using standard C library functions for process control, command line parameters and I/O. Both FFA and Peh were linked against the GNAT Ada Runtime Library which itself also communicates with the operating system using the C standard library.

The purpose of this article is to present a proof of concept on how to compile a static Peh executable for a DOS-like operating system running on a 32‑bit x86 CPU (i386 or later) using GNAT Ada Compiler, a part of the GNU Compiler Collection (GCC), without linking against the C standard library. We will need to prepare a small Ada runtime required to run both FFA and Peh and replace the system-dependent bits with calls to our own operating system interfaces.

While DOS itself, running in 16‑bit real mode, cannot run 32‑bit applications, we can still execute them putting the CPU in the Protected Mode using a DOS extender and communicating with the operating system using DOS Protected Mode Interface (DPMI) (DPMI Committee 1990). The extender we are going to use is Narech Koumar’s DOS/32 Advanced DOS Extender (DOS32A), as it is pretty compact and easy to modify to suit our needs (as we shall soon see).

Getting started

The author uses the triplet i686‑pc‑pe in the reminder of this article to identify the target in the compilation environment. The triplet itself, as a matter of course, can be changed to accomodate the local system conventions or to specify an older x86 CPU (like, e.g., an i486, etc.).

The entire process can be summarized in the following steps:

  1. assembling and linking a modified DOS32A capable of running programs compiled by our GCC,
  2. compiling a toolchain targeting i686‑pc‑pe,
  3. cross-compiling our own Ada runtime library for i686‑pc‑pe,
  4. cross-compiling Peh for i686‑pc‑pe,
  5. adding finishing touches.

Prerequisites:

  1. V version control system,
  2. Borland Turbo Assembler 4.0 (or equivalent, other versions might or might not work),
  3. source release of GNU binary utilities (binutils) (any version supporting Microsoft Windows should work, the author used versions 2.38–2.41 and did not encounter any problems),
  4. source release of GCC 4.9.4 (or any other version supporting Ada 2012, but the cross-complier build process may be different),
  5. source release of FFA 199 (ch. 21A-Ter).

The author assumes the reader has experience with cross-compilation and has managed to successfully compile binutils and GCC in the past.

Extending the DOS extender

We will need the DOS32A source code, which has been mirrored in the following vpatch: dos32a-src.vpatch (seal).

The first obstacle in our way is the incompatibility of executable file formats: a valid input for DOS32A is either in a form of an LE, LX or an LC executable. It seems the author of DOS32A had plans of adding a support for the PE executable format, a format extensively supported by the entire GNU software development stack, but unfortunately the loader part has never been finished, the only thing left being a comment saying “Insert PE loader implementation here.” Without any further ado, we shall abide by this instruction.

The PE/COFF format (Microsoft Corporation 1999), used by the GNU toolchain, aims to be compatible with the format used on Microsoft Windows, so we can use the provided documentation to create a loading mechanism which will properly parse the file header and load all of the needed objects to memory. Microsoft Windows executables are loaded to the address specified in the header, so all of the memory references will have to be listed in the appropriate section in the final executable. Fortunately, the entire memory allocation part has been already taken care of in the main LX executable loader, so our job is a little easier. The reader should get familiar with the source code of the loader to understand the exact process and the procedures we are going to use.

The reader should also get familiar with the LX format documentation (it can be found on the compact disk of the Arsenal Computer’s OS/2 Arsenal), because the original loader was written with regard to the terminology and conventions used therein.

The LX executable files are divided into objects. The DOS extender allocates memory for every object and loads the appropriate segments before applying fixups for all of the absolute memory references. It is quite similar to the PE/COFF file format, where we have sections and relocations. We can use the original loader code for loading up the segments (and its general idea for our main loader procedure), but we will have to locate and properly load the sections ourseleves.

Press the following vpatch: dos32a-pecoff.vpatch (seal). The only file differing from the original source release is the loadpe.asm. We can analyze it below.

load_pe_app:
        mov     _app_type,3
        call    load_pe_header
        call    verbose_showloadhdr

First thing we do is loading the PE file header and displaying the useful information in case of the verbose mode. Fortunately, we are not going to modify the internals of the extender, so we can safely use the original display procedure, but we will have to write our own header loader. (Refer to the PE/COFF documentation while analyzing the code.)

load_pe_header:
        mov     ecx,0F8h                ; size of a full 32-bit PE header
        mov     edx,04h
        mov     _err_code,3002h         ; "error in app file"

        call    load_fs_block
        mov     edx,_exec_start

        mov     ax,fs:[0016h]           ; load characteristics
        and     ax,0103h                ; check for:
        cmp     ax,0102h                ; EXECUTABLE_IMAGE and 32BIT_MACHINE
        mov     ax,3005h                ; and not RELOCS_STRIPPED
        jne     file_error

We only want to load 32‑bit executables with relocations. We cannot force the loader to load our sections in the desired addresses, so we are going to have to properly relocate all memory references. Fortunately, our linker can leave the whole section with all the memory references in our executable sections, so we can use this information later.

        mov     ax,fs:[0006h]           ; get number of sections
        mov     cx,ax
        cmp     ax,APP_MAXOBJECTS
        mov     ax,4001h                ; "too many objects"
        ja      file_error
        mov     _app_num_objects,ecx

We do not want more sections than the maximum amount of loadable objects.

        mov     eax,0F8h                ; size of the optional header
        add     eax,edx
        mov     _app_off_objects,eax

        mov     eax,fs:[0034h]          ; image base
        mov     _app_tmp_addr1,eax

        mov     eax,fs:[0060h]          ; stack size
        mov     _app_tmp_addr2,eax

        mov     eax,fs:[00A0h]          ; base relocation table virtual address
        mov     _app_off_fixrectab,eax

        mov     eax,fs:[00A4h]          ; base relocation table size
        mov     _app_siz_fixrecstab,eax

        mov     _app_eip_object,0       ; 0 = have not found it yet

        mov     eax,fs:[0028h]          ; entry point address
        mov     _app_eip,eax
        ret

This part is pretty self-explanatory. We load all of the offsets in our file containing information required later. We cannot determine which section contains the beginning of our code, so we set it to zero for now.

        mov     ecx,1
@@1:    call    load_pe_object
        call    create_selector
        call    verbose_showloadobj
        push    dword 0                 ; padding
        push    edi                     ; save address of the loaded object
        push    ebp                     ; save size of the loaded object
        push    esi                     ; save the virtual address
        inc     cx
        cmp     cx,word ptr _app_num_objects
        jbe     @@1

After loading the header, it is time to load all of our sections. We load the object itself, create a selector for the extender and display information in the verbose mode. We also fill our stack with a simple data structure, for every object, containing the virtual address, the size of the object and its actual address.

load_pe_object:
        push    ecx

        mov     _err_code,3002h
        mov     edx,_app_off_objects
        call    seek_from_start
        mov     ecx,28h                 ; size of a section entry
        xor     edx,edx
        call    load_fs_block
        add     _app_off_objects,eax

The loading part is pretty simple. First, we want to load the entry containing all the information about our section. We move the pointer with each loaded section.

        mov     eax,fs:[0008h]          ; virtual size
        mov     esi,fs:[000Ch]          ; virtual address
        mov     edx,fs:[0014h]          ; pointer to raw data
        mov     ecx,fs:[0024h]          ; characteristics
        call    seek_from_start         ; head to section data

Here we load all the required information to dedicated registers and move our file pointer to the beginning of the section data.

        mov     edx,2040h               ; 32-bit and preloaded
        test    ecx,40000000h           ; readable-p
        jz      @@nr
        or      edx,0001h               ; set readable
@@nr:   test    ecx,80000000h           ; writable-p
        jz      @@nw
        or      edx,0002h               ; set writable
@@nw:   test    ecx,20000000h           ; executable-p
        jz      @@ne
        or      edx,0004h               ; set executable
@@ne:   push    edx                     ; save our characteristics

Here we must read the flags of our section and translate them to a format used by the DOS extender. As we can see in the code above, we do not have to check all of the flags, just the ones required to properly set up the objects.

        test    ecx,00000020h           ; check if section contains code
        jz      @@skip
        cmp     _app_eip_object,0       ; not 0 = already found it
        jnz     @@skip
        cmp     _app_eip,esi            ; EIP >= virtual address
        jb      @@skip
        mov     ecx,eax
        add     ecx,esi
        cmp     _app_eip,ecx            ; EIP < virtual address + virtual size
        jae     @@skip
        mov     ecx,[esp+4]
        mov     _app_eip_object,ecx
        sub     _app_eip,esi

This is how we get proper pointer to the beginning of our code. If the section is executable, we check if we are still looking for the section containing the initial procedure. If we are, and if it looks like the header value lies within the section’s virtual bounds, we set the appropriate variable to the current section and fix the initial procedure address.

@@skip: mov     ebx,eax                 ; get physical size
        shr     ebx,12                  ; number of pages
        test    eax,0FFFh               ; check for a tail
        jz      @@1
        inc     ebx                     ; add one more page

@@1:    call    alloc_block             ; allocate EAX memory block to EDI
        mov     ecx,eax                 ; ECX = bytes to read
        mov     ebp,eax                 ; EBP = preserve virtual size
        mov     edx,edi                 ; EDX = addres to read to
        call    fill_zero_pages	        ; fill allocated memory with zeroes

We have to figure out how many memory blocks we need for our section and allocate them. Fortunately, we can use the original allocation procedure.

        mov     _err_code,3002h
        call    load_gs_block           ; load object data

        pop     edx                     ; leave our characteristics on EDX
        pop     ecx
        ret

Finally, we load up the section data, leaving the characteristics on the EDX register for the selector creation procedure.

The only thing missing is our stack. The LX file format contains a dedicated stack object, but there is no such section residing in the PE/COFF executable file. We did load the size of the stack, hinted by the file header, so we can just create one more fixed-size object.

        call    create_stack_object
        call    create_selector
        call    verbose_showloadobj

The actual stack object creation should be pretty self-explanatory by now.

create_stack_object:
        push    ecx
        mov     eax,_app_tmp_addr2
        mov     ebx,eax
        shr     ebx,12
        test    eax,0FFFh
        jz      @@1
        inc     ebx

@@1:    mov     _app_esp_object,ecx
        mov     _app_esp,eax

        call    alloc_block
        mov     ecx,eax
        mov     ebp,eax
        mov     edx,edi
        call    fill_zero_pages

        mov     edx,2103h               ; 32-bit, zeroed, rw
        pop     ecx
        ret

The only thing left is relocating all of the addresses. First we must prepare the appropriate registers for the relocation procedure and find the section containing the relocations.

        mov     ebp,esp                 ; point to objects
        mov     ebx,_app_num_objects
        dec     bx
        shl     bx,4                    ; times 16 (the size of our record)

        call    fix_reloc_offset

We do know the virtual address of the relocations section, so we can just iterate through our stack and find the appropriate object.

fix_reloc_offset:
        push    ebx
        mov     eax,_app_off_fixrectab
@@1:    cmp     eax,[ebp+ebx+0]
        je      @@done
        sub     bx,10h
        jmp     @@1
@@done: mov     eax,[ebp+ebx+8]
        mov     _app_off_fixrectab,eax
        pop     ebx
        ret

Now we can relocate the addresses in all of our objects.

@@4:    call    relocate_pe_object
        sub     bx,10h                  ; get next object
        jnc     @@4

The relocation code is pretty complex, so it is going to be presented in one block.

relocate_pe_object:
        mov     _err_code,4005h         ; "unrecognized fixup data"
        mov     edx,_app_off_fixrectab  ; first base relocation block
        mov     ecx,_app_siz_fixrecstab ; size of all blocks
@@1:    test    ecx,ecx                 ; check if zero
        jz      @@done                  ; if zero, we're done

        ;; here we assume consistent object ordering, ascending and we also
        ;; assume there are no blocks starting before our first object
        mov     eax,[ebp+ebx+0]         ; virtual address of our loaded object
        cmp     eax,gs:[edx]            ; skip blocks below us
        ja      @@next
        add     eax,[ebp+ebx+4]         ; add size of our loaded object
        cmp     eax,gs:[edx]            ; skip blocks above us
        jbe     @@next

        lea     esi,[edx+8]             ; ESI = address of first relocation
        mov     edi,edx                 ; EDI = address of current block
        add     edi,gs:[edx+4]          ; EDI = address of next block

@@2:    cmp     esi,edi                 ; finish the block if at the end
        je      @@next

        mov     ax,gs:[esi]             ; get relocation
        and     eax,0000F000h           ; relocation type mask
        jz      @@skip                  ; skip if base relocation type = 0
        cmp     eax,00003000h
        jne     file_errorm             ; eggog if base relocation type /= 3
        mov     ax,gs:[esi]             ; get relocation (again)
        and     eax,00000FFFh           ; clear relocation type
        add     eax,gs:[edx]            ; add page address
        sub     eax,[ebp+ebx+0]         ; sub virtual address of our object
        add     eax,[ebp+ebx+8]         ; add actual address of our object

        push    esi ecx edx ebx         ; save the registers

        mov     esi,gs:[eax]            ; load the address to relocate
        sub     esi,_app_tmp_addr1      ; sub image base

        xor     ebx,ebx
        mov     edx,_app_num_objects    ; for our all remote objects
        shl     edx,4                   ; times 16 (size of object structure)
@@3:    cmp     ebx,edx                 ; leave loop if we did all objects
        je      @@oo

        ;; here we assume consistent remote object ordering, descending
        cmp     esi,[ebp+ebx+0]         ; address equal or lower our page
        jae     @@mm
        add     ebx,10h                 ; next object
        jmp     @@3

@@mm:   mov     ecx,_app_tmp_addr1      ; get image base
        sub     gs:[eax],ecx            ; sub it from the address
        mov     ecx,[ebp+ebx+0]         ; get virtual address of our object
        sub     gs:[eax],ecx            ; sub it from the address
        mov     ecx,[ebp+ebx+8]         ; get actual address of our object
        add     gs:[eax],ecx            ; add it to the address

@@oo:   pop     ebx edx ecx esi         ; restore the registers

@@skip: add     esi,2                   ; records are two octets long
        jmp     @@2                     ; next reloc

@@next: mov     eax,gs:[edx+4]          ; get size of our block
        add     edx,eax                 ; move block pointer to next block
        sub     ecx,eax                 ; shrink size off all blocks
        jmp     @@1                     ; next block
@@done: ret

After that, we can safely close the file, set the stack to the proper value, show some debug info in the verbose mode and finally enter our code.

        call    close_exec
        mov     esp,_sel_esp
        call    verbose_showstartup
        jmp     enter_32bit_code

The build process involves assembling two object files using Borland Turbo Assembler (TASM) and linking them together. Version 7.1 of DOS32A used a linker accompanying Watcom C/C++ compilers, but we can make do with Borland Turbo Linker, (hopefully) included in our distribution of TASM.

The author of DOS32A used a few macros not included in the final source release. We can find the file STDDEF.INC required for proper assembly in an older release (version 7.1). The file is reproduced verbatim below.


cr	equ 0Dh,0Ah
cre	equ 0Dh,0Ah,00h


bptr	equ byte ptr
wptr	equ word ptr
dptr	equ dword ptr
offs	equ offset
fptr	equ far ptr

clr	macro r1
	xor r1,r1
	endm

rdtsc	macro
	db 0Fh, 31h
	endm

TRUE	equ 1
FALSE	equ 0

It is pretty self-explanatory. It should be saved into the LIB directory of your TASM installation (or any other directory set with an -i switch in the TASM configuration file or at the command line). The DOS32A 7.1 documentation also included a sample configuration file containing switches -r -ml -m -q -t and they must be either added to the local configuration file or must be specified at the command line.

Issue the following commands while in the top-level of the source directory.

tasm -dEXEC_TYPE=0 -c -la kernel.asm, obj\kernel.obj, obj\kernel.lst
tasm -dEXEC_TYPE=0 -c -la dos32a.asm, obj\dos32a.obj, obj\dos32a.lst

tlink /Tde /3 obj\dos32a.obj obj\kernel.obj, bin\dos32a.exe

The bin directory should now contain a brand new dos32a.exe executable. The reader is expected to read the DOS32A documentation to understand the way it is to be used.

Cross-compiling for i686‑pc‑pe

Though the binutils package contains a dedicated ‑*‑pe* support (and should be compiled the regular way, without any changes to the source tree), the GCC, unfortunately, does not. But we should be able to make do with what we have with only a few changes to the source tree.

The following instructions apply to GCC 4.9.4. The author managed to compile a cross-compiler using the following instructions on his machine and operating system, which does not use the GNU C Library and is incapable of running dynamically linked executables, among other oddities. As the GCC was heavily modified to run in such an environment, only the changes required to successfully create a cross compiler are listed. If the reader uses an environment which requires other changes to run GCC properly, or wants to use a different version of the package, following the instructions below may not produce a desirable result. They can still, however, be helpful in producing a required cross compiler.

First of all, we need to teach GCC to recognize our new target. Fortunately, it is so similar to the existing Microsoft Windows targets, we can reuse most of the files with only one exception. Navigate to ./gcc/config/i386 and make a copy of the cygming.h file (the author named the copy pe.h). The copy will be used in our target configuration. The only change we need to make is removing one of the lines in the TARGET_OS_CPP_BUILTINS definition. Remove the line containing the following text.

	EXTRA_OS_CPP_BUILTINS ();					\

Now, modify the ./gcc/config.gcc file. Analyze the structure of the switch statement containing various targets and add the following definition.

i[34567]86-*-pe*)
	tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h dbxcoff.h newlib-stdint.h i386/pe.h"
	xm_file=i386/xm-cygwin.h
	tmake_file="${tmake_file} i386/t-cygming t-slibgcc"
	target_gtfiles="\$(srcdir)/config/i386/winnt.c"
	extra_options="${extra_options} i386/cygming.opt"
	extra_objs="winnt.o winnt-stubs.o"
	c_target_objs="${c_target_objs} msformat-c.o"
	cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
	;;

You can compare the above definition with other Microsoft Windows targets (the obvious change being the inclusion of our pe.h file).

A similar change is required in the libgcc config. Modify the ./libgcc/config.host file. Analyze the structure, just like in the previous step, and add the following definition:

i[34567]86-*-pe*)
	tmake_file="$tmake_file t-libgcc-pic"
	;;

As before, it is recommended to compare it with other definitions.

The above changes should be enough for the GCC to recognize the new target. The GCC can now be compiled for the i686‑pc‑pe target (with, of course, the Ada support). The author had noticed the build process does not set up the correct assembler in his environment and is thus required to use the --with-as parameter for the ./configure script (may not be the case in all environments).

After producing and installing a cross-compiler for the i686‑pc‑pe, the reader is expected to properly configure the gprbulid system to recognize the i686‑pc‑pe target, as both our runtime library, FFA, and Peh use it as their default build system.

Weightless Ada Runtime for DPMI

The runtime we are going to use is Weightless Ada Runtime for DPMI (ward). It is a spiritual successor to ave1’s (2018) Zero Foot Print Ada runtime and even now still shares its build process. (Possibly subject to a change to remove the make(1) dependency.) The version linked in this article contains a simple and minimal runtime necessary to properly run Peh and FFA (without any guarantees of conforming to the Ada 2012 standard). The runtime itself is simple enough to be analyzed by the reader, provided they become familiar with the DPMI specification and the DOS API.

The runtime can be obtained by pressing the following vpatch: weightless-genesis.vpatch (seal).

The build process consists of issuing a simple make command in the top-level of the source directory (the reader should adjust the target triplet in the Makefile or pass it as an argument (TARGET) in case of using a different one than the author).

make

The result should be a static libada.a library in the lib directory.

Compiling Peh

Press the following vpatch: ffa_dpmi_poc.vpatch (seal).

Peh can be bulit the traditional way using a gprbuild command with the following arguments.

gprbuild --target=i686-pc-pe --RTS=/path/to/ward

(The /path/to/ward obviously replaced by the actual path on the reader’s system.)

The bin directory should now contain the Peh executable. The default knobs in binutils cause the file to appear without the required file extension, so the reader should add an .exe suffix (required by DOS32A and DOS-like systems in general) before continuing. We should now be able to run it in a DOS environment using DOS32A. Issue the following command.

dos23a peh

The result should be as expected and documented in the FFA series.

Finishing touches

While the result is satisfactory, running peh.exe directly results in a message telling us how Peh cannot be run in DOS mode. The most desired result is probably being able to run Peh just like any other program on our system. We can achieve that by replacing the stub portion of the PE/COFF executable file (i.e., a program displaying the above message) with another program that will run DOS32A with our executable as the parameter. Fortunately, a stub doing exactly that can be found in the stub32 directory of our DOS32A source release. Assembling it (and understanding the loading process of PE executables) is left as an excersise for the reader.

Unfortunately, neither ld(1) nor strip(1) supports replacing the PE stub (unlike Microsoft’s ld.exe linker), so we will have to do the job ourselves. As we can see in the PE/COFF documentation, the only things that will have to be replaced, aside from the stub program in the beggining of the file, are the absolute addresses of the COFF sections. The author encourages the reader to analyze the documentation and write a program for replacing the stub and updating all of the section references, but is willing to share the following poorly-written Lisp program to demonstrate what must be done.

(defun replace-stub (old-executable new-stub new-executable)
  "Replace program stub in old-executable with new-stub.

Replace program stub in old-executable with new-stub.
Stores the result in new-executable.
Stub must at least contain the header."
  ;; portable helper funcitons (little-endian)
  (flet ((read-word (stream)
           (+ (read-byte stream)
              (ash (read-byte stream) 8)))
         (read-double-word (stream)
           (+ (read-byte stream)
              (ash (read-byte stream) 8)
              (ash (read-byte stream) 16)
              (ash (read-byte stream) 24)))
         (write-word (word stream)
           (write-byte (logand word #xff) stream)
           (write-byte (logand (ash word -8) #xff) stream))
         (write-double-word (double-word stream)
           (write-byte (logand double-word #xff) stream)
           (write-byte (logand (ash double-word -8) #xff) stream)
           (write-byte (logand (ash double-word -16) #xff) stream)
           (write-byte (logand (ash double-word -24) #xff) stream)))
    (with-open-file (stub new-stub
                          :element-type '(unsigned-byte 8)
                          :direction :input)
      (let ((stub-length (file-length stub)))
        (when (< stub-length #x40)
          (error "Stub must be at least 64 bytes long."))
        (with-open-file (input old-executable
                               :element-type '(unsigned-byte 8)
                               :direction :input)
          (with-open-file (output new-executable
                                  :element-type '(unsigned-byte 8)
                                  :direction :output
                                  :if-exists :supersede
                                  :if-does-not-exist :create)
            ;; write stub up to e_lfanew
            (loop for i from 1 to #x3c do
                 (write-byte (read-byte stub) output))
            ;; write e_lfanew based on file length
            (write-double-word stub-length output)
            ;; write the rest of the stub
            (when (> stub-length #x40)
              (file-position stub #x40)
              (loop for i from #x41 to stub-length do
                   (write-byte (read-byte stub) output)))
            ;; read the original e_lfanew
            (file-position input #x3c)
            (let ((elfanew (read-double-word input))
                  (input-length (file-length input)))
              ;; position head at PE\0\0
              (file-position input elfanew)
              ;; skip 6 bytes
              (loop for i from 1 to 6 do
                   (write-byte (read-byte input) output))
              ;; get number of sections
              (let ((number-of-sections (read-word input)))
                (write-word number-of-sections output)
                ;; skip 12 bytes
                (loop for i from 1 to 12 do
                     (write-byte (read-byte input) output))
                ;; read size of optional header
                (let ((size-of-optional-header (read-word input)))
                  (write-word size-of-optional-header output)
                  ;; skip last 2 bytes of header and the entire optional header
                  (loop for i from -1 to size-of-optional-header do
                       (write-byte (read-byte input) output))
                  ;; for every section
                  (loop for s from 1 to number-of-sections do
                     ;; skip 20 bytes
                       (loop for i from 1 to 20 do
                            (write-byte (read-byte input) output))
                     ;; read pointer to raw data
                       (let ((pointer-to-raw-data (read-double-word input)))
                         (when (not (zerop pointer-to-raw-data))
                           ;; shift it
                           (setf pointer-to-raw-data (+ pointer-to-raw-data
                                                        (- stub-length
                                                           elfanew))))
                         (write-double-word pointer-to-raw-data output))
                     ;; skip 16 bytes
                       (loop for i from 1 to 16 do
                            (write-byte (read-byte input) output)))))
              ;; skip the rest of the file
              (loop for i from (1+ (file-position input)) to input-length do
                   (write-byte (read-byte input) output)))))))))

Running the above program with the appropriate paths will create a new executable which should be able to run as expected on a computer system running DOS. Type the following command.

peh

The result should be as expected.

The DOS32A can itself be configured during its build process by the appropriate knobs in the dos32a.asm file or the environment variable (if the config by environment knob is on). The author likes to at least disable the copyright banner.

References

ave1. 2018. “GNAT and Zero Foot Print Runtimes,” Ave1 (February). http://dulap.xyz/pub/mirrors/ave1.org/2018/gnat-and-zero-foot-print-runtimes/trackback/index.html.

AXE Consultants. 2012. Ada Reference Manual: 2012 Edition. http://ada-auth.org/standards/12rm/RM-Final.pdf.

Datskovskiy, Stanislav [asciilifeform, pseud.]. 2017. “Finite Field Arithmetic,” Loper OS (December). http://www.loper-os.org/?p=1913.

DPMI Committee. 1990. DOS Protected Mode Interface (DPMI) Specification: Protected Mode API for DOS Extended Applications; Version 0.9. https://web.archive.org/web/20160405012113/http://tenberry.com/dpmi/01.html.

Microsoft Corporation. 1999. Microsoft Portable Executable and Common Object File Format Specification: Revision 6.0. http://www.osdever.net/documents/PECOFF.pdf.