The Mach-O file


In Mac OS X, almost all files that contain executable code, including applications, libraries, and kernel modules, are in Mach-O file format.

The Mach-O format was not originally developed by Apple; it was designed by the Open Source Foundation for the OSF/1 operating system (which is based on Mach) and adapted by Apple for the x86 architecture as part of the OpenStep project.

The Mach-O file format and Application Binary Interface (ABI) specifications describe how an executable should be loaded and launched by the kernel. They pass the following information to the operating system:

    * how the dynamic loader works,
    * how to load separate libraries,
    * how to organize a process’s address space,
    * where to find the entry point,
    * etc.

Since Mach-O is the main format for executable files in Mac OS X, let’s take a more detailed look at its structure.

Mach-O structure

Mach-O files can be roughly divided into three parts: the header, load commands and segments that may be comprised of several sections. The header and the load commands describe a file's main features, and the data segment contains a set of bytes that link to the load commands.

The Header. The first four bytes in the header determine the so-called magic number, which identifies the file as either a 32- or 64-bit file. It also helps determine the byte order for the CPU. The header determines the architecture for which the file has been compiled. This helps the kernel guarantee that files will be launched only on the platform for which the file was compiled. Sometimes binary files may contain code for more than one architecture. This format is known as Universal Binaries. In this case, the file will start with a fat header.

Load commands. The load commands area contains a list of commands that tell the kernel how to load different file segments. These commands describe how each segment is balanced in memory, what access rights it has and where it is located in memory.

Segments and sections. Mach-O format executable files usually have 5 segments:

    * __PAGEZERO is located at the zero virtual address and does not have any kind of protection. This segment does not have an area in the file on disk.
    * __TEXT contains data which can only be accessed for reading or execution.
    * __DATA contains data which can be written to. This section is marked as copy-on-write.
    * __OBJC contains data used for Objective-C runtime environments.
    * __LINKEDIT contains data used to establish dynamic connections.

The __TEXT and __DATA segments contain zero or more sections. Each section contains a certain kind of data, for example executable code, constants, strings, etc. That way, executable and nonexecutable code is stored within the same segment, but separate from each other.