OS/360 Object File Format
The OS/360 Object File Format was the standard object module file format for the IBM DOS/360, OS/360 and VM/370,[1] Univac VS/9,[2] and Fujitsu BS2000[3] mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language.[4] This format is still in use by the Z/VSE operating system (the follow-on to the DOS/360 Operating System). In contrast, it has been superseded by the GOFF file format on the MVS Operating System (the follow-on to the OS/360 Operating System) and on the Z/VM Operating System (the follow-on to the VM/370 Operating System). Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.[5]
Use
This format provides for the description of a compiled application's object code, which can be fed to a linkage editor to be made into an executable program, or run directly through an object module loader. It is created by the Assembler or by a programming language compiler. For the rest of this article, unless a reason for being explicit in the difference between a language compiler and an assembler is required, the term "compile" includes "assemble" and "compiler" includes "assembler."
Weaknesses
This format was considered adequate for the time it was originally developed, around 1964. Over time, it had a number of weaknesses, among which is that
- it supports only 8-byte long names (and typically there is a convention that the names are UPPER CASE only, and are restricted to certain symbols in the name, see the discussion below).
- alignment cannot be specified.
- a module that is pure data and is not executable cannot be specified.
- a reentrant module (as opposed to one merely read-only) cannot be specified.
- cannot distinguish between a subroutine (a routine that handles data only through arguments) vs. a function (a routine that returns data through a return value).
- a module designed so that it is movable (as opposed to merely reentrant) cannot be specified.
- address constants can't be identified as pointers (such as for access to a data structure) as opposed to, say, access to a table (that is not changed) or to a virtual method in a dynamic record.
- attributes cannot be assigned to external references (a reference is to code vs. a reference to data).
- no means to allow procedures or functions to check or validate argument types or validate external structures.
- no means to declare an object, where part of the structure is data and part is code (methods that operates upon the data of the object).
- the SYM symbolic table is limited in the information it can provide.
These and other weaknesses caused this format to be superseded by the GOFF module file format. But, it was a good choice as it was satisfactory for the needs of programming languages being used at the time, it did work and was simple to implement (especially where machines at the time may have had as little as 128K of memory, many operating multiple concurrent or consecutive jobs with as little as 64K, and actually performing useful work), simple to use and for simple programs (object orientation and concepts like virtual methods would be decades in the future from when it was originally developed), can still be adequate. Also, the format is still satisfactory to continue to be used for older programs that either were never changed, or where the source code is unavailable and the object files are the only part of the program remaining.
Note that the GOFF file format merely superseded this format (and provided more information for a language compiler or the assembler), the format is still valid, may still continue to be used, and was not deprecated. This format has the advantage that it is easy and simple to create, and a compiler for a language that can live with its restrictions, which are maximum 8-character upper-case only module names, applications no larger than 2^24 in size (16 megabytes) for code and data, means that any programming language that can write 80-byte fixed-format binary files (basically anything including COBOL and FORTRAN, not just Assembler), can be used to create a compiler for this object format. In fact, the Australian Atomic Energy Commission's Pascal 8000 Compiler for the IBM 360/370, itself written in Pascal as a self-hosting compiler back in 1978–1980, directly created its own object files without using the Assembler as an intermediate step.
Record Types
There are 6 different record types:
- ESD records define main programs, subroutines, functions, dummy sections, Fortran Common, and any module or routine that can be called by another module. They are used to define the program(s) or program segments that were compiled in this execution of the compiler, and external routines used by the program (such as exit() in C, CALL EXIT in Fortran; new() and dispose() in Pascal). ESD records should occur before any reference to an ESD symbol.
- TXT records contain the machine instructions or data which is held by the module.
- RLD records are used to relocate addresses. For example, a program referencing an address located 500 bytes inside the module, will internally store the address as 500, but when the module is loaded into memory it's bound to be located someplace else, so an RLD record informs the linkage editor or loader what addresses to change. Also, when a module references an external symbol, it will usually set the value of the symbol to zero, then include an RLD entry for that symbol to allow the loader or linkage editor to alter the address to the correct value.
- SYM records were added to allow for providing additional information about a symbol, such as the type of data (character or numerio) and the size of the item.
- XSD records were added to provide additional information beyond that provided in the ESD record about public symbols such as procedures and functions, and to expand the size of a procedure or function name to more than 8 characters.
- END records indicate the end of a module, and optionally where the program is to begin execution.
Format
All records are exactly 80 bytes long; unused fields should be blank-filled. The first byte of every record is always the binary value 02. The next 3 bytes are always the record type. Character values are in EBCDIC. The remainder of each record's fields are dependent on the record type. By convention, if the module was named in the TITLE statement of an assembly language program (or the language compiler decides to give the module a name), its name appears left-justified in positions 73–80 of each record; if the name is shorter than 8 characters or no name was given, a sequence number (in characters, right justified with zero fill) appears for the remainder of each record. In actual practice, the sequence number field may be blank or contain anything the language translator wants to put there, and is essentially a comment field.
The assembler, (or compiler, in the case of a high-level language such as C, COBOL, Fortran, Pascal, PL/I or RPG III), would create an ESD record for each subroutine, function, or program, and for Common Blocks in the case of Fortran programs. Additional ESD entries in ESD records would be created for ENTRY statements (an alias for a module or an alternative entry point for a module), for additional subroutines, functions or Fortran named or blank COMMON blocks included as part of a compiled or assembled modules, and for names of external subroutines and functions called by a module.
Note that there are two kinds of public symbol types, ESDID entries and LDID entries. ESDID entries are CSECTS and DSECTS (Programs, Procedures and Functions, and possibly Record or Structure declarations) and LDID entries are ENTRY statements (alternative or alias entry points to a CSECT or DSECT). The ESDID numbering space is separate from the LDID numbering space, and thus two different named symbols, one an ESDID and one an LDID can both have the binary value of 0001.
The program's executable object code and data would be stored in TXT records. Calls to other subroutines, functions or COMMON blocks are resolved through RLD records, which modify the address as stored in a TXT record to determine the complete address of the subroutine or function. Optionally, a language can provide symbolic reference information such as object names and data type information or debugging symbols through SYM records, and then the END statement indicates the end of an Object module file and the optional start address for the subroutine, function or program that this file should be started at,if the starting address for the routine is not the first byte of the first routine (some routines may have non-executable data preceding their actual code or the first routine assembled or compiled is not the "main" program or "primary" module.) As has been reported, some people discovered because of the way older assemblers worked (circa 1968–1975), a program compiled faster if you put data "above" a program before the code for the program, once the assembler started to notice instructions, it was much slower, so, programmers would write routines where they put the data and constants first, then included code for the program. When assembling a program could take 30 minutes to an hour instead of a few seconds as now, this was a big difference.
Note that while not required, it is a convention that module and symbolic names are in all upper case, that the first character of a name field is a letter or the symbols @,# or $, and that subsequent characters of a name consist of those characters plus the character digits 0 through 9, although older software may or may not correctly process object module files which used lower-case identifiers. Most programming languages other than Assembly cannot not call modules that have names containing @ # or $ (notably Fortran, which is why its run-time library has a name with a # in it so it would not conflict with any name chosen by a programmer), so most programs, subroutines, or functions were written to use only a letter for the first character, and if the name was longer than 1 character, to use only letters and digits for the 2nd through (up to) 8th character. (Note that this choice not to use # @ or $ does not apply to a "main" program written in Assembler or any language that can use these identifiers, the program loader doesn't care what the name of the module is.) Also, modules written to be used as subroutines typically restricted themselves to 6 characters as versions of Fortran before about 1978 also can't use subroutines or modules using more than 6 characters in length. The COBOL compiler typically discards the dash character if it appears in a program's PROGRAM-ID or a CALL statement to an external module.
In the 1990s, a new record type, the XSD record was added to extend the use of this object module format to encompass longer module names than 8 characters and to permit mixed-case names, as required by the C programming language.
Prefix (Byte 1) | Type (Bytes 2–4) | Purpose | Address (Bytes 6–8) in binary if used | Size of info in bytes 17+ (Bytes 11–12) | Flag bits (XSD record) or blanks (Bytes 13–14) | Binary ESDID (Bytes 15–16) | Data | Ident (Bytes 73–80) | |
---|---|---|---|---|---|---|---|---|---|
(Bytes 17–64) | (Bytes 65–72) | ||||||||
02 | ESD | Module Type | Blanks | Size used in bytes 17-64 | Blanks | Binary ESDID of first Non-LD module symbool in bytes 17–64 or blank if all symbols on this record are LD | 1 to 3 16-byte Module Symbols (see below) | Blanks | Deck ID, sequence number, or both |
TXT | Program or data | Relative address of data in bytes 17–72 of this record | Size used in bytes 17–72 | Blanks | ESDID | 1-56 bytes of data ("data" can be program instructions, program data, or both) | |||
RLD | Relocatable information | Blanks | Size used in bytes 17–72 | Blanks | Blanks | 1 to 13 variable-length relocation entries (see table below) | |||
SYM | Symbol Table information | Blanks | Size used in bytes 17–72 | Blanks | Blanks | Variable length symbol data (see table below) | |||
XSD | Extended Symbol Information | Blanks | Size used in bytes 17–72 | Flag Bits (See Table Below) | LDID Identifier if XSD is for an LD; otherwise the ESDID | XSD Data (see table below) | |||
END | End of module | Entry Address if specified or blanks | Blanks | blanks | ESDID of entry address or blanks | End data (see table below) |
Field | Size | Notes | ||
---|---|---|---|---|
Name | 8 | Identifies the Program, Function, Subroutine or FORTRAN COMMON Block (This will be blank for PC or blank COMMON or unnamed BLOCK DATA) | ||
Type | 1 | Value (Hex) |
Module Type |
What the Module is |
00 | SD | START, CSECT, DSECT (Dummy section; will not have any TXT records), or Fortran named BLOCK DATA module | ||
01 | LD | ENTRY (Label Definition within a previously identified CSECT) | ||
02 | ER | EXTRN (External Reference) | ||
04 | PC | Private Code (START or CSECT that has no name or Fortran unnamed BLOCK DATA module; this module cannot be called as a function or subroutine from another program) | ||
05 | CM | Fortran named or blank COMMON (Provides size only; CM records never have TXT records) | ||
06 | XD (PR) | External Dummy Section or Pseudo Register | ||
0A | WX | WXTRN (Weak Extern - An external routine that does not have to be present for the module to operate) | ||
0D | SD | Quad-Aligned START or CSECT | ||
0E | PC | Quad-Aligned Private Code | ||
0F | CM | Quad-Aligned Common (Provides size only; COMMON never has TXT records) | ||
Address | 3 | Binary starting address of this module; address of symbol within module for LD | ||
Flag | 1 | |||
Alignment in binary for XD; Blank for ER, LD or WX; for SD, CM, or PC use the following: | ||||
Bits | Value | Purpose | ||
0-1 | Not used | |||
2 | 0 | Use Bit 5 for value of RMODE | ||
1 | RMODE 64 Bits | |||
3 | 0 | Use Bits 6–7 for AMODE | ||
1 | AMODE 64 Bits | |||
4 | 0 | Module is Read/Write | ||
1 | RSECT (Module is Read-Only and is not self-modifying) | |||
5 | 0 | RMODE 24 bits | ||
1 | RMODE 31 bits or RMODE ANY | |||
6–7 | 00 | AMODE 24 bits | ||
01 | AMODE 24 bits | |||
10 | AMODE 31 Bits | |||
11 | AMODE ANY | |||
Size | 3 | Length in binary for PC, CM or SD; one blank followed by 2-byte Binary LDID for LD (and ESDID numbers are separate from LDID numbers; an ESDID can be numbered 0001 and it will be a different identifier from the LDID with number 0001); blanks for ER, XD, PR or WX. Note, a program compiler creating an SD record that does not know how long the module is going to be could leave this field blank, then specify the length of this module in the END record. |
Field | Size | Notes | ||
---|---|---|---|---|
Note that the first relocation entry of an RLD record must be 8 bytes. If any entry's flag field has bit 7 set, the entry following it is using the same values for the Relocation and Position values as this entry, and that following entry has 4 bytes, it only has the flag and address fields. If that entry's flag field has bit 7 set, this continues for the entry that follows it; if the flag bit of that entry is not set, the next entry following it (if there are any additional entries in this record) uses a full 8 bytes.
To make this simple, presume for example a C program named basura calls the exit() function. | ||||
Relocation | 2 | Binary ESDID of the symbol to be relocated; this is the foreign symbol (exit) | ||
Position | 2 | Binary ESDID where the relocation is to be made; this is the module referencing the relocation symbol above (basura) | ||
Flag | 1 | |||
Bits | Meaning | |||
0 | Reserved | |||
1 | If 1, add 4 to address constant length value in bits 4–5 | |||
2-3 | Value | Address Constant type | ||
0 | A - External address, can be a data table or could be an external module | |||
1 | V | |||
2 | Q | |||
3 | CXD | |||
4-5 | Address Constant Length - 1 | |||
6 | Direction of relocation (0 to add; 1 to subtract) Subtraction is usually only used for A-type address constants | |||
7 | if 1, the Position and Relocation value fields of the entry following this one on this RLD record are the same as this one, and that entry is only 4 bytes in length. The last entry of an RLD record must clear this bit. | |||
Address | 3 | Absolute address in module of Position entry to be relocated. |
Field | Byte No. | Size | Notes | ||
---|---|---|---|---|---|
Note that for the symbol information, entries are packed one after another; only the first two fields are always present. Name field is omitted if bit 4 of Organization is 1; any later fields are also omitted for non-data items (bit 0 of Organization is 0). In the case of a Data Item (bit 0 of Organization is 1) only the Data Type and Length Fields will always be present, and the length field may be 1 or 2 bytes depending on data type. | |||||
Organization | 1 | 1 | |||
Bits | Value | Meaning | |||
0 | 0 | Non-Data Type | |||
1 | Data Type | ||||
(For non-datatype) | |||||
1-3 | 000 | Space | |||
001 | Control Section | ||||
010 | Dummy Control Section | ||||
011 | Common | ||||
100 | Machine Instruction | ||||
101 | CCW | ||||
(For datatype) | |||||
1 | 0 | No Multiplicity | |||
1 | Multiplicity (Indicates presence of M field) | ||||
2 | 0 | Independent (not a packed or zoned decimal constant) | |||
1 | Cluster (packed or zoned decimal constant) | ||||
3 | 0 | No Scaling | |||
1 | Scaling (indicates presence of S field) | ||||
Both datatype and non-datatype | |||||
4 | 0 | Has Name | |||
1 | No Name Provided | ||||
5-7 | Length of name - 1 | ||||
Address | 2 | 3 | Offset from start of Csect | ||
Name | 0-8 | If Bit 4 of byte 1 is 1, this field is not present, otherwise 1-8 bytes | |||
The following fields are present only for data items (Bit 0 of byte 1 is 1) | |||||
Data Type | 1 | Value in Hexadecimal | |||
00 | Character (C - Type; 2 byte length) | ||||
04 | Hexadecimal (X - Type; 2 byte length) | ||||
08 | Binary (B - Type; 2 byte length) | ||||
10 | F-type, 32-bit integer (1 byte length, typically 4) | ||||
14 | H-type, 16-bit integer (1 byte length, typically 2) | ||||
18 | E-type, 32-bit (single precision) floating point (1 byte length, typically 4) | ||||
1C | D-type, 64-bit (double precision) floating point (1 byte length, typically 8) | ||||
20 | A-type or Q-type 32-bit address or value (1 byte length, typically 4) | ||||
24 | Y-type, 16-bit address or value (1 byte length, typically 2) | ||||
28 | S-type (1 byte length) | ||||
2C | V-type, 32-bit external symbol (1 byte length, typically 4) | ||||
30 | P-type, variable length packed decimal (1 byte length) | ||||
34 | Z-type, variable length zoned decimal (1 byte length) | ||||
38 | L-type (1 byte length) | ||||
Length | 1 or 2 | Length - 1; 2 bytes for Character, Hexadecimal or Binary type (i.e. size from 1 to 32768 bytes); 1 byte for all other types (size from 1 to 256 bytes) | |||
Multiplicity | 0 or 3 | M field;3 byte repeat count or presumed value of 1 (not repeated) if this field is not present (bit 1 of Organization is 0) | |||
Scale | 0 or 2 | S field; 2 byte scale value (present only for F, H, E, D, L, P and Z type data) or scale is presumed to be 0 if this field is not present (bit 3 of Organization is 0) |
Field | Byte No. | Size | Notes | ||
---|---|---|---|---|---|
The XSD record type was added in the 1990s to allow MVS to support longer module names for the C compiler. | |||||
Flag Byte 1 | 13 | 1 | Bits 1-6 are used for XPLINK; Bit 7 is used for AMODE 64; neither of these are used by the binder; Bit 8 is always 0. | ||
Flag Byte 2 | 14 | 1 | Bit | Meaning | |
1 | Name may have multiple definitions | ||||
2 | Name is mangled | ||||
3 | Internal Linkage | ||||
4 | Template | ||||
5 | Concat | ||||
6 | Name eligible for import or export | ||||
7 | 1 if name is a function | ||||
8 | 1 if name was mapped (e.g. #pragma map) | ||||
Length | 17-20 | 4 | Length of the name | ||
Offset | 21-24 | 4 | Offset of first byte of name or substring of name (origin of 1) | ||
Type | 25 | 1 | Value (Hex) |
Module Type |
What the Module is |
00 | SD | START, CSECT, DSECT (Dummy section; will not have any TXT records), or Fortran named BLOCK DATA module | |||
01 | LD | ENTRY (Label Definition within a previously identified CSECT) | |||
02 | ER | EXTRN (External Reference) | |||
04 | PC | Private Code (START or CSECT that has no name or Fortran unnamed BLOCK DATA module; this module cannot be called as a function or subroutine from another program) | |||
05 | CM | Fortran named or blank COMMON (Provides size only; CM records never have TXT records) | |||
06 | XD (PR) | External Dummy Section or Pseudo Register | |||
0A | WX | WXTRN (Weak Extern - An external routine that does not have to be present for the module to operate) | |||
0B | UR | ||||
Address | 26-28 | 3 | 24-bit address of symbol | ||
Specification | 29 | 1 | Depends on Module Type (value of byte 25) | ||
Module Type | Value | Meaning | |||
LD ER CM Null or WX | Blank | Not used | |||
PR | Alignment Factor | ||||
00 | Byte Alignment | ||||
01 | Halfword Alignment | ||||
03 | Word Alignment | ||||
07 | Doubleword Alignment | ||||
SD or PC | Bits 1-2 | Not Used | |||
Bit 3 | If 1, RMODE 64; otherwise use value of bit 6 | ||||
Bit 4 | If 1, AMODE 64; otherwise use value of bits 7 and 8 | ||||
Bit 5 | 1 if RSECT | ||||
Bit 6 | 0=RMODE 24; 1=RMODE 31; (Ignored if bit 3 is 1) | ||||
Bits 7-8 | 00 or 01=AMODE 24; 10=AMODE 31; 11=AMODE any (Ignored if bit 4 is 1) | ||||
Length or Identifier | 30-32 | 3 | Value | Meaning | |
Zero | If length is specified on END record for types SD, PC, CM | ||||
Length | Control Section Length For types SD, PC, CM; Pseudo-Register length for type PR | ||||
Identifier | Identifier of SD entry containing name for type LD | ||||
Blank | If type is ER or WX | ||||
Name | 33-72 | Varies | Name or Substring of Name |
Field | Byte No. | Size | Notes |
---|---|---|---|
A module ends with the data for the END record. | |||
Start ESDID | 15-16 | 2 | For Type 1 END record (see byte 33) Binary ESDID of starting point of this module if given; blanks if Type 2 END record or not given. Note that the starting ESDID of this module can be an external symbol. This allows a module to automatically start a run-time library or startup code instead of itself. (The language compiler would either always assign the main program of a program it compiled the same name or would create an alias "entry" name, either an SD or LD entry with a standard name which was always the same and to which it could transfer control to when it was ready to start the user's program.) |
Start Name | 17-24 | 8 | For Type 2 END record (see byte 33) name of starting point of this module if given; blanks if Type 1 END record or not given |
Module Size | 29-32 | 4 | Binary length of Module in bytes if not specified on the SD entry of the ESD record; byte 29 is zero if it was specified there; blanks if not provided. This allows a compiler to essentially be "one pass" and write the object code to an object file as it compiled it, then indicate the size of the module after it had compiled it, when it actually "knows" how large the program is. |
END Format or IDR count | 33 | 1 | EBCDIC digit character '1', '2' or Blank; indicates format type of END record. Note, some versions of the assembler use this byte to indicate the number of Identifier (IDR) Records (values in bytes 34–52, or in 53–71 if present), are in this record. Blank may mean 1 record is present, or none are, or may be used to indicate it is a type 1 assembler record. The identifier fields are essentially a comment field so they could be used possibly for anything (especially the Order Number field), or could be blank. |
Note that bytes 34–52 are collectively referred to as an IDR record. Bytes 53–71 are known as a secondary IDR record. They may either or both be blank. Also, unlike any other information fields which contain numbers, the values in an IDR are all text and none are expressed in binary. | |||
Order No. | 34-43 | 9 | Order No. of the Assembler or identifier of compiler, may have letters or digits |
Version | 44-45 | 2 | 2-Digit version number of assembler or compiler |
Revision | 46-47 | 2 | 2-Digit revision number of assembler or compiler |
Run Year | 48-49 | 2 | Last two digits of year this assembly was run or program was compiled e.g. 80 for 1980 or 2080. Given when the format was developed, 00 and numbers greater than 63 are probably in the 20th century (1964-2000) and all numbers above 0 and under 64 are probably in the 21st (2001-2063). |
Run Day | 50-52 | 3 | 3-Digit day of year this assembly was run or program was compiled, e.g. 001 for January 1; 032 for February 1, etc. Numbers later than 059 (which would always mean February 28) could be one day or the day after that date depending on whether the original program was compiled or assembled on a leap year; 060 would be either February 29 on a leap year, or March 1 on any other year. 365 would be December 30 or 31 depending on whether the year it was compiled was a leap year or not. 366 would always mean a module assembled or compiled on December 31 of a leap year. |
Additional Identifier | 53-71 | 18 | Is essentially the same format as 33–52 if included, or is blanks. |
References
- ↑ OS/VS - VM/370 Assembler Programmer's Guide, GC33-4021-4, Fifth Edition, IBM, San Jose, CA (September 1982) http://www.textfiles.com/bitsavers/pdf/ibm/370/asm/GC33-4021-4_OS_VS_VM_370_Assembler_Programmers_Guide_Sep82.pdf
- ↑ VS/9 Assembler Reference Manual: Programming Guide for the Univac 90/60, 90/70 and 90/80 series Mainframe Computers, Sperry Univac, Cinnamonson, NJ, 1978
- ↑ ASSEMBH Reference Manual, U5223-J-Z125-3-7600, Fujitsu Technology Solutions GmbH, June 2010, http://manuals.ts.fujitsu.com/file/957/assh_bs.pdf (Retrieved August 7, 2013)
- ↑ High Level Assembler for z/OS & z/VM & z/VSE Programmer’s Guide, Appendix C. Release 6, SC26-4941-05, IBM, San Jose, CA, July 2008 http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/asmp1020.pdf (Retrieved March 27, 2010)
- ↑ IBM z/VSE System Control Statements: Version 5 Release 1, SC34-2637-00, IBM, 1984, 2011.
- OS/MVS Program Management: Advanced Facilities, SA22-7644-07, Eighth Edition, IBM, Poukeepsie, NY,Eighth Edition, September 2007 http://publibz.boulder.ibm.com/epubs/pdf/iea2b270.pdf (Retrieved August 9, 2013
- John R. Ehrman, How the Linkage Editor Works: A Tutorial on Object/Load Modules, Link Editors, Loaders, and What They Do for (and to) You, IBM Silicon Valley (Santa Teresa) Laboratory, San Jose, 1994, 2001 ftp://ftp.boulder.ibm.com/software/websphere/awdtools/hlasm/s8169a.pdf (Retrieved July 29, 2013)