MIPS O32 ABI - FR0 and FR1 Interlinking

From Dmz-portal

Jump to: navigation, search


1. Introduction

MIPS ABIs have been adjusted many times as the architecture has evolved, resulting in the introduction of incompatible ABI variants. Current architectural changes lead us to review the state of the O32 ABI and evaluate whether the existing ABI can be made more compatible with current and future extensions.
The three primary reasons for extending the current O32 ABI are the introduction of the MSA ASE, the desire to exploit the FR=1 mode of current FPUs and the new MIPS32r6 architecture revision which only supports the ‘FR=1’ mode.

For the avoidance of doubt:

2. Acknowledgment

The proposals covered in this document have been constructed by Matthew Fortune, Ilie Garbacea and Rich Fuhler with architectural information from Ranganathan Sudhakar.

Matthew Fortune (matthew.fortune@imgtec.com)
Ilie Garbacea (ilie.garbacea@imgtec.com)
Rich Fuhler (rich.fuhler@imgtec.com)
Ranganathan Sudhakar (ranganathan.sudhakar@imgtec.com)

3. Aim

Define how to extend and transition the O32 ABI to allow code to run on an FPU with FR=1. The benefits of such a transition are:

  1. Enable the use of MSA within Android 32-bit
  2. Enable the use of MSA within Debian 32-bit
  3. Enable O32 binaries to run on FR=1 systems
  4. Eliminate the need for new libraries by requiring compatibility with the existing O32 ABI

A modeless O32 ABI extension will be defined to allow code to run on MIPS processor in either FR=0 or FR=1 mode. This ABI extension is referred to as O32 FPXX ABI.

A further O32 ABI extension will be defined to provide access to O32 FP64 features whilst also being able to interlink with and execute O32 FP32. This extension requires FR=1 as well as new hardware support to assist with executing the O32 FP32 code with FR=1. This ABI extension is referred to as O32 FP64A ABI. The 'A' is for Android which is one of the driving forces behind maintaining compatibility between pre-built code and new hardware extensions.

4. Why is O32 not modeless today?

O32 is not modeless today partly because of architectural restrictions and partly due to calling convention extensions made when introducing the MIPS32r1 architecture.

Figure 1 shows an analysis of why the O32 ABI as implemented today is not modeless with respect to the FR bit along with notes explaining the information.

|                    |    MIPS I     | MIPS II | MIPS32r1 |    MIPS32r2    |
|Ld/st doubles       | SWC1,LWC1 (1) |          SDC1,LDC1                  |
|GP/FP move highpart |           MTC1,MFC1 (2)            | MTHC1,MFHC1 (3)| 
|Num doubles         |                16 – even registers                  |
|Num singles         |            16           |             32            |
|Callee-saved double |                          6                          |
|Callee-saved single |             6           |           12 (4)          |
|modeless            |    No (1,2)   |  No (2) | No (2,4) |     No (3,4)   |
|Can be modeless     |    No         |                 Yes                 |
Figure 1: Architecture and O32 calling convention issues preventing modeless

(1) The need to use SWC1 and LWC1 to load/store the high 32 bits of doubles means that there is no way for hardware to access the appropriate register based on the current FRmode.
(2) The use of MTC1/MFC1 to access the high part of doubles i.e. odd numbered registers has exactly the same problems as (1).
(3) The MTHC1 and MFHC1 instructions do operate based on the current FR mode but are not currently used by GCC for MIPS32r2 O32. (they are used by GCC for MIPS32r2 O32 FP64)
(4) Allowing odd numbered single registers to be callee-saved means that when run on an FR=1 FPU then the odd numbered registers are not saved via the LDC1/SDC1 instructions which are used to save the register. I.e. in FR=0 SDC1 $f20 does save $f21, whereas in FR=1 SDC1 $f20 does not save $f21.

5. Generating modeless code

With the exception of MIPS I, all MIPS architectures can generate code to run modeless. To achieve modeless operation, any instruction that moves data to or from a double-precision (64-bit) floating-point register must conform to the FRmode. Beginning with MIPS II, all ISAs will be forward-compatible with the O32 FPXX extensions.

A new command line option is proposed to target the three code generation modes:

| Option | Meaning                                                            |
| -mfp32 | Generate code that assumes it will only run on an FR=0 FPU         |
| -mfpxx | Generate code that can interlink with -mfp32 XOR -mfp64 and run on |
|        | either an FR=0 or FR=1 FPU                                         |
| -mfp64 | Generate code that assumes it will only run on an FR=1 FPU         |

The following sections explain special code generation rules required for -mfpxx.

5.1. MTC1, MFC1

The MTC1 and MFC1 instructions must not be used to access the high part of a double-precision register but can be used to access single-precision registers. It is also valid to access the low part of a double-precision register should that be useful. Any double or long long transfer between GP and FP double registers must go via memory. This restriction impacts the following scenarios:

  1. Moving double-precision arguments that get passed in register pairs ($4,$5) or ($6, $7) into a floating point register. This should be relatively rare; $4,$5 are only used if the first argument is a double and followed by varargs, $6,$7 are used in the following prototypes:
    1. xxx foo(int[, int], double[, anything])
    2. xxx foo(float, float, double[, anything])
    3. xxx foo([double|float], double[, any], …)
  2. Floating point conversions between int or long long and float or double.
  3. Reloads due to register allocation placing DI mode values in FP registers under extreme register pressure. (This should be a non-issue)

The cost of using memory to move doubles/long long between integer and FP registers is expected to be only slightly more costly than the corresponding move instructions due to the heavy cost of direct register transfers especially in high end MIPS implementations.

Code generation changes to achieve modeless operation are disabled via the command line option: -mfp32. This will result in an object that has to be run in FR=0 mode.

5.2. MTHC1, MFHC1

Architectures that support MTHC1/MFHC1 should use these instructions to access the high part of double-precision registers instead of moving via memory.

6. Creating a modeless calling convention

Code compiled to conform to the O32 ABI will be unable to run on an FR=1 FPU if it uses single-precision floating-point with odd register numbers. The odd register numbers 20-31 are expected to be callee-saved, but FR=1 conformant executables will treat the floating-point registers as a double unit.

See Appendix A for instruction traces showing the following calling conventions in action.

The current definition of O32 callee-saved registers is represented in Figure 2; this is referred to as O32 FP32 (-mfp32). S indicates that a register is callee-saved, this may be as a single register or part of a double-precision (even) register.

|      | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|32-bit|  S |  S |  S |  S |  S |  S |  S |  S |  S |  S |  S |  S |
             Figure 2: O32 FP32 callee-saved FP registers

The O32 FPXX (-mfpxx) callee-saved registers are represented in Figure 3. Sd indicates that the register is callee-saved and must be preserved as a double-precision value. ST indicates that the register has the same behaviour as for S but is caller-saved as well:

|      | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|32-bit| Sd | ST | Sd | ST | Sd | ST | Sd | ST | Sd | ST | Sd | ST |
    Figure 3: O32 FPXX callee-saved FP registers (FR=0 view)

The O32 FPXX (-mfpxx) callee-saved registers as viewed on a system running in FR=1 mode are represented in Figure 4. U indicates a register that is unusable in an O32 FPXX object as only 16 doubles can exist. XT indicates the same behaviour as ST but in FR=1 mode the register may not actually be callee-saved in reality as the definition of ST allows it to be saved as either a single-precision register or part of the corresponding double:

|      | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
| Hi32 | Sd |  U | Sd |  U | Sd |  U | Sd |  U | Sd |  U | Sd |  U |
| Lo32 | Sd | XT | Sd | XT | Sd | XT | Sd | XT | Sd | XT | Sd | XT |
    Figure 4: O32 FPXX callee-saved FP registers (FR=1 view)

The O32 FP64 (-mfp64) callee-saved registers are represented in Figure 5. T indicates that a register is caller-saved:

|      | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
| Hi32 | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T |
| Lo32 | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T |
          Figure 5: O32 FP64 callee-saved FP registers

From these descriptions it can be seen that:

7. Defining O32 FP64A

The motivation for the O32 FP64A ABI is to support mixing O32 FP32 code and a form of O32 FP64 while the transition from O32 FP32 to O32 FPXX is in progress. This ABI variant will be relatively short lived and is only required in environments where full compatibility with O32 FP32 is required.

7.1 O32 FP64A specification

O32 FP64A is almost identical to O32 FP64 except that it is forbidden to directly access odd-numbered single-precision registers. This means that the following instructions must not be used with odd-numbered floating-point registers:

Odd-numbered double-precision and wider registers can still be used. Moving double precision data to/from odd-numbered registers is required to be moved via memory. MIPS32r6 floating-point instructions that act on floating-point condition code registers must not use odd-numbered registers.

7.2 O32 FP64A/FP32 and FRE mode

Please refer to Appendix C for details of all FPU register modes.

The new FRE hardware mode provides the bridge between using new FR=1 hardware features whilst still being able to maintain full compatibility with pre-existing code using the O32 FP32 ABI.

The special feature of FRE mode revolves around single-precision register handling and the behaviour in particular of odd-numbered registers. In order to make FP32 code execute correctly an odd numbered single-precision register must update the upper 32-bits of and even-double and an update to an even double must update the odd-single. FRE mode achieves this by redirecting the reads and writes of odd-numbered registers to the upper 32-bits of the even double.

FP64A and FP32 can directly inter-call as the calling-convention only involves even numbered single or double precision registers and in FRE mode the even-numbered double-precision data is always in the same physical register. This also covers the FP32 requirement for odd-numbered singles to be considered callee-saved as they are part of the even-numbered double-precision register that will be preserved.

As FP32 and FPXX are inherently compatible, the fact that FP32 operates correctly in FRE mode means that FPXX will also operate correctly.

8. Adding MSA

The MIPS SIMD Architecture (-mmsa) is defined to require FR=1 mode and assumes the register layout provided by that mode. In particular it includes 32 128bit registers where all 32 registers overlay the 32 64-bit registers defined on a FR=1 mode FPU. MSA is compatible with the O32 FP64 ABI, MSA callee-saved registers are represented

in Figure 6:

|      | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|      |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |
|      |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |  T |
| Hi32 | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T |
| Lo32 | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T | Sd |  T |
       Figure 6: O32 FP64 callee-saved registers including MSA

8.1. MSA O32 ABI Extension

There are no MSA specific extensions to the O32 ABI. Calling convention remains identical to the base ABI but MSA can only be deployed in conjunction with the O32 FP64 or O32 FP64A extensions.

9. Re-definition of the -mabi=32 -mfp64 ABI

The -mabi=32 -mfp64 ABI variant will change and become compliant with this specification. Based on all the information available, this ABI variant has not been used beyond experimental testing and as such redefining will not cause significant disruption. The pre-existing definition triggers the use of a .gnu_attribute 4,4 which will continue to be reserved. Any module carrying this attribute can only be linked with other modules having the same attribute.

Changes to O32 FP64:

10. Assembler/Linker and ELF flags

FPXX, FP64 and FP64A objects require mechanisms to record the fact they use a non-default FP ABI. This information is recorded in two ways as code flows through the tools:

The following sections explain how the tools handle these flags.

10.1. ELF attributes, header and program flags

The new flags are defined as follows:

typedef struct
  /* Version of flags structure.  */
  uint16_t version;
  /* The level of the ISA: 1-5, 32, 64.  */
  uint8_t isa_level;
  /* The revision of ISA: 0 for MIPS V and below, 1-n otherwise.  */
  uint8_t isa_rev;
  /* The size of general purpose registers.  */
  uint8_t gpr_size;
  /* The size of co-processor 1 registers.  */
  uint8_t cpr1_size;
  /* The size of co-processor 2 registers.  */
  uint8_t cpr2_size;
  /* The floating-point ABI.  */
  uint8_t fp_abi;
  /* Processor-specific extension.  */
  uint32_t isa_ext;
  /* Mask of ASEs used.  */
  uint32_t ases;
  /* Mask of general flags.  */
  uint32_t flags1;
  uint32_t flags2;
} Elf_Internal_ABIFlags_v0;

/* Values for the xxx_size bytes of an ABI flags structure.  */

#define AFL_REG_NONE         0x00       /* No registers.  */
#define AFL_REG_32           0x01       /* 32-bit registers.  */
#define AFL_REG_64           0x02       /* 64-bit registers.  */
#define AFL_REG_128          0x03       /* 128-bit registers.  */

/* Masks for the ases word of an ABI flags structure.  */

#define AFL_ASE_DSP          0x00000001 /* DSP ASE.  */
#define AFL_ASE_DSPR2        0x00000002 /* DSP R2 ASE.  */
#define AFL_ASE_EVA          0x00000004 /* Enhanced VA Scheme.  */
#define AFL_ASE_MCU          0x00000008 /* MCU (MicroController) ASE.  */
#define AFL_ASE_MDMX         0x00000010 /* MDMX ASE.  */
#define AFL_ASE_MIPS3D       0x00000020 /* MIPS-3D ASE.  */
#define AFL_ASE_MT           0x00000040 /* MT ASE.  */
#define AFL_ASE_SMARTMIPS    0x00000080 /* SmartMIPS ASE.  */
#define AFL_ASE_VIRT         0x00000100 /* VZ ASE.  */
#define AFL_ASE_MSA          0x00000200 /* MSA ASE.  */
#define AFL_ASE_MIPS16       0x00000400 /* MIPS16 ASE.  */
#define AFL_ASE_MICROMIPS    0x00000800 /* MICROMIPS ASE.  */
#define AFL_ASE_XPA          0x00001000 /* XPA ASE.  */

/* Values for the isa_ext word of an ABI flags structure.  */

#define AFL_EXT_XLR           1  /* RMI Xlr instruction.  */
#define AFL_EXT_OCTEON2       2  /* Cavium Networks Octeon2.  */
#define AFL_EXT_OCTEONP       3  /* Cavium Networks OcteonP.  */
#define AFL_EXT_LOONGSON_3A   4  /* Loongson 3A.  */
#define AFL_EXT_OCTEON        5  /* Cavium Networks Octeon.  */
#define AFL_EXT_5900          6  /* MIPS R5900 instruction.  */
#define AFL_EXT_4650          7  /* MIPS R4650 instruction.  */
#define AFL_EXT_4010          8  /* LSI R4010 instruction.  */
#define AFL_EXT_4100          9  /* NEC VR4100 instruction.  */
#define AFL_EXT_3900         10  /* Toshiba R3900 instruction.  */
#define AFL_EXT_10000        11  /* MIPS R10000 instruction.  */
#define AFL_EXT_SB1          12  /* Broadcom SB-1 instruction.  */
#define AFL_EXT_4111         13  /* NEC VR4111/VR4181 instruction.  */
#define AFL_EXT_4120         14  /* NEC VR4120 instruction.  */
#define AFL_EXT_5400         15  /* NEC VR5400 instruction.  */
#define AFL_EXT_5500         16  /* NEC VR5500 instruction.  */
#define AFL_EXT_LOONGSON_2E  17  /* ST Microelectronics Loongson 2E.  */
#define AFL_EXT_LOONGSON_2F  18  /* ST Microelectronics Loongson 2F.  */

/* Masks for the flags1 word of an ABI flags structure.  */
#define AFL_FLAGS1_ODDSPREG   1  /* Uses odd single-precision registers.  */

10.2. Assembler

Assembler options:

| Option         | Meaning                                                |
| -mfp32         | Sets the mode to FP32. (default)                       |
| -mfpxx         | Sets the mode to FPXX.                                 |
| -mfp64         | Sets the mode to FP64.                                 |
| -modd-spreg    | Enabled use of odd-numbered single-precision registers |
|                | (if the architecture supports them)                    |
| -mno-odd-spreg | Disable use of odd-numbered single-precision registers |

Assembly directives:

| Directive          | Meaning                                                       |
| .module fp=32      | Equivalent to -mfp32                                          |
|                    | Implies gnu_attribute 4,1                                     |
| .module fp=xx      | Equivalent to -mfpxx                                          |
|                    | Implies gnu_attribute 4,5                                     |
| .module fp=64      | Equivalent to -mfp64                                          |
|                    | Implies gnu_attribute 4,6                                     |
| .module oddspreg   | Equivalent to -modd-spreg                                     |
| .module nooddspreg | Equivalent to -mno-odd-spreg                                  |
| .set fp=32         | Indicates the start of an FP32 region                         |
| .set fp=xx         | Indicates the start of an FPXX region                         |
| .set fp=64         | Indicates the start of an FP64 region                         |
| .set oddspreg      | Indicates the start of an odd-single region                   |
| .set nooddspreg    | Indicates the start of a no odd-single region                 |
| .gnu_attribute 4,0 | No floating point is present in the module (default)          |
| .gnu_attribute 4,1 | FP code in the module uses the FP32 ABI for a 32-bit ABI      |
| .gnu_attribute 4,1 | FP code in the module uses 64-bit registers for a 64-bit ABI  |
| .gnu_attribute 4,2 | FP code in the module only uses single precision ABI          |
| .gnu_attribute 4,3 | FP code in the module uses soft-float ABI                     |
| .gnu_attribute 4,4 | FP code in the module assumes an FPU with FR=1 and has 12     |
|                    | callee-saved doubles. Historic, no longer supported.          |
| .gnu_attribute 4,5 | FP code in the module uses the FPXX ABI                       |
| .gnu_attribute 4,6 | FP code in the module uses the FP64 ABI                       |
| .gnu_attribute 4,7 | FP code in the module uses the FP64A ABI                      |

The .module directive should be handled as if the equivalent setting were applied on the command line. .module directives must not be allowed after code has been generated and as such must come first in assembly text. The command line options, .module directive and .gnu_attribute directive can result in a gnu_attribute being added to a module. If multiple .gnu_attribute directives are seen in assembly text then the last one for each tag is taken to be the overall value.

If a .gnu_attribute 4 directive is seen in the assembly text then its value is used regardless of the effective command line options (i.e. the result of combining command line options with .module directives). An explicit .gnu_attribute must however be checked against the effective command line options and any inconsistencies reported via a warning. One exception is that an explicit .gnu_attribute 4,0 is exempt from any checks against the effective command line options. The FP ABI inferred for each set of effective command line options is given below.

If there is no explicit .gnu_attribute 4 then one is introduced by inferring the floating-point ABI from the effective command line options:

| Options                        | Value |
| -mabi=32 -mfp32                |   1   |
| -mabi=64 -mfp64                |   1   |
| -msingle-float                 |   2   |
| -msoft-float                   |   3   |
| -mabi=32 -mfpxx                |   5   |
| -mabi=32 -mfp64 -modd-spreg    |   6   |
| -mabi=32 -mfp64 -mno-odd-spreg |   7   |

An example of effective command line options is a file containing .module fp=xx and assembled with -mabi=32 -mfp64 would result in the effective options being -mabi=32 -mfpxx.

10.2.1. .MIPS.abiflags

A new implicitly generated section will be present on all new modules. The section contains a versioned data structure which represents essential information to allow a program loader to determine the requirements of the application. ELF e_flags currently contain some of this information but space is limited and e_flags are not available after loading an application.

The structure is versioned to allow for future extensions. The initial version is 0.

10.2.2. GNU ifunc - work in progress please ignore

10.3. Linker

The mode resulting from combining any two objects using the O32 ABI is shown in Figure 7. The column and row headings relate to the suffix of the Val_GNU_MIPS_ABI_FP_* values. DOUBLE equates to the O32 FP32 extension.

|        | ANY    | DOUBLE | FPXX   | FP64A | FP64 |
| ANY    | ANY    | DOUBLE | FPXX   | FP64A | FP64 |
| DOUBLE | DOUBLE | DOUBLE | DOUBLE | err   | err  |
| FPXX   | FPXX   | DOUBLE | FPXX   | FP64A | FP64 |
| FP64A  | FP64A  | err    | FP64A  | FP64A | FP64 |
| FP64   | FP64   | err    | FP64   | FP64  | FP64 |
Figure 7: Combining FP modes 

A warning should be emitted (on request) showing which modules have forced the use of a specific mode to assist with tracking down why an executable/shared library requires a specific mode. This warning only occurs if there is at least one O32 FPXX object and one mode specific object and should be limited to report the first object rather than all. The option for this is --warn-forced-fp-mode

10.3.1 Handling .MIPS.abiflags

The new .MIPS.abiflags section is merged in a similar manner to other private flags. Consistency checks between e_flags and .MIPS.abiflags content are also required. For register sizes, the maximum value is taken. For FP ABI the same rules are used as for merging gnu_attribute 4. ASEs are OR'd together and ISA extensions are updated to reflect the largest encompassing extension. ISA level and ISA revision follow the rules for merging architectures in e_flags. The flags1 field is a bitmask which is simply OR'd together and flags2 bits are reserved and should cause link errors when any unknown bit is set. A single structure is emitted and tagged with a PT_MIPS_ABIFLAGS program header. This program header must be emitted immediately after the PT_INTERP header and displaces the PT_MIPS_REGINFO header that would otherwise follow PT_INTERP. The .MIPS.abiflags section is placed as the first read-only data section to ensure it is as close to the program headers as possible, this allows a program loader to read a fixed amount of data from an ELF and generally get the .MIPS.abiflags read as part of that.

10.3.2 Handling link-incompatible ABI extensions

The FP64 and FP64 ABI extensions are link incompatible with pre-existing objects. For a pre-existing static linkers this is already guarded against as the extensions are marked using .gnu_attribute 4 values which have been in use for a considerable time. However, pre-existing dynamic linkers cannot/do not read the GNU attributes and as such cannot reject objects using these new ABIs. For this reason the ELF ABIVERSION is increased when using FP64 or FP64A. ABIVERSION 3 indicates the use of FP64 or FP64A and any dynamic linker which supports ABIVERSION 3 will respond to the PT_MIPS_ABIFLAGS program header.

10.4. Program loader

Several new hardware capability bits are defined to support the new O32 ABI variants:

HWCAP_MIPS_R6   = 1 << 0    // MIPSr6 or above, implies FR=1 is the only FR mode supported.
HWCAP_MIPS_MSA  = 1 << 1    // Indicates that MSA is supported

10.4.1. Basic mode set-up

Pre-existing program loaders will assume that O32 executables need running in FR=0 mode; for FP32 and FPXX executables this is acceptable. For FP64 executables then these will be loaded and run but execute incorrectly, this is an accepted failure.

New program loaders must acknowledge the PT_MIPS_ABIFLAGS header and for O32 respond to the fp_abi field. The following tables show the mode in which the FPU should be set for each fpabi value. The rows are named from the suffixes on the Val_GNU_MIPS_ABI_FP_* values.

Behaviour for O32 ABI on ISA<=MIPS32r5
FPABI FPU enable FR mode FRE mode Notes
not present Yes 0 0 This is where there is no PT_GNU_MIPS header
ANY Yes 0 0
DOUBLE Yes 0 0
SOFT No 0 0
FPXX Yes 0/1 0 FR=1 is only permitted from MIPS32r2 onwards
FP64 Yes 1 0 Lack of FR=1 support should result in an error or full emulation
FP64A Yes 1 0 Lack of FR=1 support should result in an error or full emulation
DOUBLE+FP64A Yes 1 1 Please refer to section 10.4.1. Lack or FR=1 or FRE=1 should result in an error or full emulation
Behaviour for O32 ABI on ISA>=MIPS32r6
FPABI FPU enable FR mode FRE mode
ANY Yes 1 0
not present Yes 1 1 Lack of FRE support should result in an error or full emulation
DOUBLE Yes 1 1 Lack of FRE support should result in an error or full emulation
SOFT No 1 0
FPXX Yes 1 0
FP64 Yes 1 0
FP64A Yes 1 0
DOUBLE+FP64A Yes 1 1 Please refer to section 10.4.1. Lack of FRE should result in an error or full emulation
Behaviour for N32/N64 ABI
FPABI FPU enable FR mode FRE mode
ANY Yes 1 0
not present Yes 1 0
DOUBLE Yes 1 0
SOFT No 0 0

10.4.1. Special handling for dynamic executables

Dynamic executables have two components loaded by the program loader. Firstly the executable and secondly the interpreter. The initial hardware mode required depends on the overall requirements of both the executable and interpreter. Some combinations of program loader and interpreter are simply not supported. Please note that this matrix is different from the static linker compatibility matrix for DOUBLE+FP64A. The table is bi-directional i.e. it does not matter whether the rows are the executable and columns the interpreter or reverse.

Overall FPABI for combined executable and interpreter requirements
SOFT SOFT error error error SOFT
FP64 FP64 FP64 error

The mode requirements of the special multi-ABI combination shown as DOUBLE+FP64A are listed in the mode tables in the previous section.

10.4.3. GNU ifunc - work in progress please ignore

10.4.4. Co-ordinating mode requirements

Since program headers carry the mode requirements of an executable, these are already passed from loader to running process via the auxilliary vector.

10.5. Dynamic linker

The dynamic linker requires similar rules to the static linker but special handling is required if a process starts life without needing a specific mode. An O32 FPXX executable must of course begin in FR=0 or FR=1 mode as hardware requires a specific mode; as in the previous section FR=1 is recommended but no assumptions can be made. This raises the question of how the process can then make use of an FP32 shared library. From a correctness point of view, an O32 FPXX executable could run in FR=0 and could then directly link against an FP32 shared library. The problem is that the loader does not know what shared libraries will be used by an executable and hence cannot predict whether FR=0 or FR=1 is required overall. The dynamic linker therefore reserves the right to change the FR mode during loading of objects. An O32 FP64A executable has similar behaviour but instead of being able to cope with both FR=0 and FR=1 it requires FR=1 but can execute with FRE=0 or FRE=1.

To do this, the dynamic linker needs to track which of the four hard-float FP ABIs the overall process has at any given time and infer the required FPU mode from that. This information is carried in the program headers and this must be inspected at load time.

At any given time the current hardware mode must allow all loaded code to execute correctly. I.e. except when loading new libraries there are no changes to hardware modes in normal execution.

The diagram below shows all the supported combinations of ABIs and the hardware modes that each combination requires. The edges of the graph represent adding or removing an ABI variant to/from the process. It is the task of the dynamic linker to:

  1. Allow or disallow the loading of a new object depending on both currently loaded objects and hardware support.
  2. Change the processor mode such that all currently loaded code can execute correctly.


10.5.1. Loading shared libraries

When loading a new shared library into a process then a new mode requirement is presented. The dynamic linker’s actions depend on the current mode requirement for all loaded objects, the current mode the process is running in and the capabilities of the hardware:

The following algorithm determines if a candidate shared library can be loaded into the current process and sets up the correct hardware mode ready for it being loaded. This check should ideally be the last part of the decision about whether to load a given object but will not lead to corruption if it is run and the object does not end up being loaded. The algorithm operates without the need to explicitly track when an object is unloaded from a process.

To be completed.

10.5.2. Setting the FPU mode

An extension to the Linux prctl syscall is available to control the current FPU register mode. The current mode can be inspected and a new mode set. The prctl call also updates the register mode for all threads in the current thread-group. The definition of the new prctls is below:

#define PR_SET_FP_MODE 45
#define PR_GET_FP_MODE 46
#define PR_FP_MODE_FR  (1 << 0)
#define PR_FP_MODE_FRE (1 << 1)

prctl(PR_SET_FP_MODE, mode)

    Will switch to the FP mode specified by the mode (arg2) argument,
    which is some combination of PR_FP_MODE_* bits. All threads in the current
    thread group will switch modes prior to the prctl completing.
mode = prctl(PR_GET_FP_MODE)

    Will return the current FP mode.

RETURN: PR_SET_FP_MODE will return 0 on success. PR_GET_FP_MODE returns a non-negative value on
success.  Any failure will result in returning -1 and setting errno.

    EINVAL – arg2 was not a valid mode
    ENOTSUPP – arg2 was a valid setting but the current CPU does not support the mode.

On kernels that do not support the new PRCTL operations then -1 will be returned and errno set to EINVAL

Other operating systems must provide an interface with similar semantics to support dynamic mode switching described in this document.

10.5.3. Mode transition restrictions

As shown in the diagram in section 10.5 most of the floating-point ABI extensions can execute in multiple modes. A mode change may happen at an arbitrary point of execution in a multi-threaded executable as all threads in a process must change mode simultaneously. Unfortunately not all code can be interrupted and survive a mode change, specifically this affects code using the FPXX ABI extension and odd-numbered single-precision registers.

Of the various alternatives for implementing a mode change the most flexible is to treat all floating-point registers as double-precision and store them to memory, switch modes, and load them back to the same register numbers. This also happens to be the simplest and most natural fit to context switching code which is implemented in terms of double-precision data. The description below assumes a double-precision based context save/restore.

Why can't FPXX with odd-numbered single-precision registers survive all possible mode switches?

To answer this it is important to consider both the impact that mode-switching has on a given class of register, the modes which each ABI extension can execute in and the class of registers which can be used in each ABI.

The table below shows all combinations of start and end modes and whether each class of register survives the transition. An N/A represents a transition where data would have to be invented from nowhere or vanish owing to register state only existing in either destination or source modes.

| start mode  |    FR0    |    FR1    |    FRE    |	
| end mode    | FR1 | FRE | FR0 | FRE | FR0 | FR1 |
| even single | yes | yes | yes | yes | yes | yes |
| even double | yes | yes | yes | yes | yes | yes |
| odd single  | no  | yes | no  | no  | yes | no  |
| odd double  | N/A | N/A | N/A | yes | N/A | yes |

The next table shows which modes each ABI extension can execute in:

| ABI extension              | FR0 | FR1 | FRE |
| FP32                       | yes | no  | yes |
| FPXX (without odd-singles) | yes | yes | yes |
| FPXX (with odd-singles)    | yes | yes | yes |
| FP64A                      | no  | yes | yes |
| FP64                       | no  | yes | no  |

Finally the register classes usable by each ABI extension:

| ABI extension              | even | even | odd  | odd  |
|                            | sgl  | dbl  | sgl  | dbl  |
| FP32                       | yes  | yes  | yes  | no   |
| FPXX (without odd-singles) | yes  | yes  | no   | no   |
| FPXX (with odd-singles)    | yes  | yes  | yes  | no   |
| FP64A                      | yes  | yes  | no   | yes  |
| FP64                       | yes  | yes  | yes  | yes  |

The only class of register which can fail to transition is odd-singles so this restricts the affected ABI extensions to FP32, FPXX (with odd singles) and FP64. FP64 is not a problem as it can only execute in FR1 mode meaning no possible mode changes. FP32 is not a problem as it cannot execute in FR1 mode and therefore only transitions between FRE and FR0 can occur, for these transitions odd-singles survive. This leaves FPXX with odd singles as the only ABI extension which can fail.

10.5.4. GNU ifunc - Work in progress please ignore

11. Compiler options

11.1. FP mode options and defaults

| Option | Meaning                                                             |
| -mfp32 | Disables code generation restrictions that would be required        |
|        | to create FPXX code.                                                |
|        | Emits a .module fp=32 directive at the start of the assembly output |
|        | Targets FP32 calling convention.                                    |
| -mfpxx | Enables code generation restrictions that are required to create    |
|        | FPXX code.                                                          |
|        | Emits a .module fp=xx directive at the start of the assembly output.|
|        | Targets FPXX calling convention.                                    |
| -mfp64 | Emits a .module fp=64 directive at the start of the assembly output |
|        | Targets FP64 calling convention.                                    |

Ideally any MIPS I arch/core will default to -mfp32 Any MIPS II -> MIPS32r2 arch/core will default to -mfpxx. However, this should be controlled via a configure time option to adjust the default ABI from O32 FP32 to O32 FPXX (or O32 FP64 as needed). The new configure time option is --with-fp-32=[32|xx|64] and this affects the FP mode only when targetting the O32 ABI.

Not all architectures/cores can support all FP modes as shown in Figure 9. Any mismatch is a compiler error.

| Arch    |Use -mfp32 | Use -mfpxx | Use -mfp64               |
| mips1   |    Yes    |     No     |     No                   |
| others  |    Yes    |     Yes    | If MTHC1/MFHC1 supported |
| mips32r6|    No     |     Yes    | Yes                      |
        Figure 9: Valid compiler FP mode overrides

11.1. Odd-numbered single-precision registers

A critical aspect of the O32 interlinking plan is careful control over the use of odd-numbered single-precision instructions. In particular the O32 FPXX extension should almost always be used without odd-singles. There are two reasons for this:

  1. It is not possible to perform multi-threaded dynamic-mode switching when O32-FPXX code that uses odd-singles is loaded
  2. It is not possible to statically determine if O32-FPXX assembly code is compliant with the ABI when odd-singles are used.

The ability to control the use of odd-numbered single-precision registers is also used as part of targeting the FP64A extension and also to generate generic MIPS32 code which is compatible with processors like loongson-3a which fail to support arithmetic operations on odd-singles.

Despite this change to default behaviour, O32 FPXX does not forbid the use of 32 single-precision registers but the warnings above must be made clear in documentation.

The option for this feature is:

| Option         | Meaning                                                         |
| -modd-spreg    | Enable odd-numbered single-precision floating point registers   |
|                | if supported by the current architecture                        |
| -mno-odd-spreg | Disable odd-numbered single-precision floating point registers  |

These options are not supported for 64-bit ABIs. Compilers should set -mno-odd-spreg by default for O32-FPXX and should always be set for loongson-3a.

System integrators may choose to mandate whether odd-numbered single-precision registers are available by default. A configure time option to control this is therefore available called --with-odd-spreg-32=yes|no. Using both --with-fp-32=64 and --with-odd-spreg-32=no allows the default O32 ABI extension to be set as FP64A.

11.3. Function attributes

Individual functions can be annotated with an attribute to indicate the FP mode that they should use. There are 3 attributes to match the 3 assembler directives:

fp32, fpxx, fp64

Similar attributes are also supported for selecting whether odd registers are enabled or not:

odd-spreg, no-odd-spreg

The rules applied to using these attributes follow the same rules as the assembler directives listed in Section 9.2.

12. Interaction with context related functions

12.1. setjmp/longjmp

No changes are required (nor permitted) to either size or alignment of the jmp_buf structure used as part of the setjmp/longjmp sequence. Furthermore, the behaviour of setjmp/longjmp is not affected by any of the calling conventions listed in section 6 or dynamic mode switching shown in section 9.5.

12.1.1. Calling convention

The implementation of setjmp/longjmp must conform to the code generation rules according to the FP mode. When built as FPXX these routines must use FRmode aware instructions only. Since the alignment of jmp_buf cannot be increased then it may be necessary to re-organise the layout to dynamically achieve the 8 byte alignment requirements of LDC1/SDC1 instructions.

The jmp_buf layout is opaque and can be changed without affecting existing objects. Both setjmp and longjmp are obtained from the same library and are always internally consistent.

12.1.2. Dynamic mode switching

The setjmp function implements the first half of the FP mode switching routine shown in Figure 8 and the longjmp function implements the second half. Any number of FP mode switches can then occur between a setjmp and longjmp and the program will behave correctly.

12.2. *context functions

The setcontext, getcontext, makecontext and swapcontext functions are impacted similarly to setjmp/longjmp.

13. Fall-back plan for MSA

A contingency plan to ensure that MSA can be used with pre-existing O32 FP32 objects is also required to give access to FR=1 mode for small regions of code contained within one function.

This mode has been known as FP64 compatibility mode with option -mfp64-compat during development. The feature essentially switches a function into FP64 mode in the prologue and switches back to FP32 mode in the epilogue. The feature also ensures that any function calls are made in FP32 mode. Full details of this feature are beyond the scope of this document.

The options to trigger this compatibility mode will be:

-mmsa -mfp32

These options are sufficient to say that the function must operate as if in FP32 mode and therefore has to use the compatibility feature to enable use of the MSA instruction set.

14. Achievements

14.1. Debian

Debian is traditionally built for a MIPS II architecture and then used on newer architectures. The new ABI allows the next version of debian to be built O32 FPXX and then gain access to MSA on hardware that supports it and future architectures that may be FP64 only. Pre-existing debian builds/versions will not be able to access MSA functionality.
It is expected that by introducing the change to the default O32 ABI then over the following months (or years) the entire debian distribution would be O32 FPXX and then be able to access both FP64 features and MSA.

14.2. Android

Android can now gain access to an MSA or FP64 only core by migrating the two major architecture dependent parts. Firstly the base android image is expected to be rebuilt from source when targeting a new architecture and hence would naturally become O32 FPXX with some libraries potentially O32 FP64 that use MSA. Secondly the native apps will begin migrating towards O32 FPXX by introducing the O32 FPXX ABI to the Android NDK in the near future. It is expected that by the time at which an MSA capable Android device is available, then native apps will have been refreshed/updated/rebuilt using new tools and have become O32 FPXX. This will happen in an ad hoc manner but since popularity of apps changes over time the stale applications are likely to be less important. Apps that do not get rebuilt will continue to run on an FR=0 system where the base android image is O32 FPXX or O32 FP32; such apps are however not able to link with MSA optimised libraries nor run on an FR=1 only architecture.

14.3. FP64/MSA

Linking with FP64/MSA objects for either bare metal or Linux application becomes naturally possible for any architecture from MIPS II onwards. Such resulting executables do of course have to run on an FR=1 and/or MSA capable architecture. Over time any pre-built objects or libraries will migrate to be O32 FPXX and as such more and more software will be able to naturally link against FP64/MSA objects.

14.4. FR=1 systems

O32 FPXX binaries now carry forward to MIPS32r6 and MIPS64r6 and beyond which only implement FR=1. In particular the rebuilt Debian and android will naturally work on such architectures.

Appendix A. O32 Instruction Traces

A.1. FP32 code calling FP32

|                      |         FR=0 FPU            |        FR=1 FPU        |     FR=1 FRE=1 FPU     |
| mov $gp, 1           |                             |                        |                        |
| caller:              |                             |                        |                        |
| addiu $sp, $sp, -8   |                             |                        |                        |
| sdc1 $f20, 0($sp)    | store $f20:$f21 to stack    | store $f20 to stack    | store $f20 to stack    |
| mtc1 $gp, $f21       | $f21 == 1                   | $f21 == 1              | $f20[32-63] == 1       |
| jalr callee          |                             |                        |                        |
| nop                  |                             |                        |                        |
| > addiu $sp, $sp, -8 |                             |                        |                        |
| > sdc1 $f20, 0($sp)  | store $f20:$f21 to stack    | store $f20 to stack    | store $f20 to stack    |
| > mtc1 $0, $f21      | $f21 == 0                   | $f21 == 0              | $f20[32-63] == 0       | 
| > ldc1 $f20, 0($sp)  | $f21 == 1                   | $f21 == 0              | $f20[32-63] == 1       |
| > addiu $sp, $sp, 8  |                             |                        |                        |
| > jr $ra             |                             |                        |                        |
| > nop                |                             |                        |                        |
| mfc1 $2, $f21        | $2 == 1                     | ** $2 == 0 **          | $2 == 1                |
| ldc1 $f20, 0($sp)    | Restore $f20:$f21 from stack| Restore $f20 from stack| Restore $f20 from stack|
| addiu $sp, $sp, 8    |                             |                        |                        |
| jr $ra               |                             |                        |                        |
| nop                  |                             |                        |                        |

The existing O32 code cannot be run on an FR=1 FPU as shown by the error with *s. The callee code shown here uses SDC1/LDC1 on $20 to preserve $21; this is not strictly necessary as $f21 could have been preserved using SWC1/LWC1 on $21. The reason for using the double-precision save/restore is partly because of historic code generation behaviour and partly because it generally makes more sense, why?

It is therefore recommended that only double-precision save/restore is performed for callee-saved registers.

A.2. O32 FPXX code calling FPXX

|                      |         FR=0 FPU            |        FR=1 FPU        |      FR=1 FRE=1 FPU        |
| mov $gp, 1           |                             |                        |                            |
| caller:              |                             |                        |                            |
# addiu $sp, $sp, -16  #                             #                        #                            #
| sdc1 $f20, 0($sp)    | store $f20:$f21 to stack    | store $f20 to stack    | store $f20 to stack        |
| mtc1 $gp, $f21       | $f21 == 1                   | $f21 == 1              | $f20[32-63] == 1           |
# swc1 $f21, 8($sp)    # store $f21 to stack         # store $f21 to stack    # store $f20[32-63] to stack #
| jalr callee          |                             |                        |                            |
| nop                  |                             |                        |                            |
| > addiu $sp, $sp, -8 |                             |                        |                            |
| > sdc1 $f20, 0($sp)  | store $f20:$f21 to stack    | store $f20 to stack    | store $f20 to stack        |
| > mtc1 $0, $f21      | $f21 == 0                   | $f21 == 0              | $f20[32-63] == 0           | 
| > ldc1 $f20, 0($sp)  | $f21 == 1                   | $f21 == 0              | $f20[32-63] == 1           |
| > addiu $sp, $sp, 8  |                             |                        |                            |
| > jr $ra             |                             |                        |                            |
| > nop                |                             |                        |                            |
# lwc1 $f21, 8($sp)    # $f21 = 1                    # $f21 = 1               # $f20[32-63] = 1            #
# mfc1 $2, $f21        # $2 == 1                     # $2 == 1                # $2 == 1                    #
| ldc1 $f20, 0($sp)    | Restore $f20:$f21 from stack| Restore $f20 from stack| Restore $f20 from stack    |
# addiu $sp, $sp, 16   #                             #                        #                            #
| jr $ra               |                             |                        |                            |
| nop                  |                             |                        |                            |

Changes from existing O32 are marked with #. Note that the callee code is identical for O32 FP32 and O32 FPXX demonstrating that all combinations of O32 FP32 and O32 FPXX result in executables that operate correctly on FR=0 FPUs.

A.3. O32 FP64A code calling FP64A

|                      |        FR=1 FPU        |      FR=1 FRE=1 FPU        |
| mov $gp, 1           |                        |                            |
| caller:              |                        |                            |
| jr $ra               |                        |                            |
| nop                  |                        |                            |

A.4. O32 FP64 code calling FP64

|                      |        FR=1 FPU        |
| mov $gp, 1           |                        |
| caller:              |                        |
| addiu $sp, $sp, -8   |                        |
| mtc1 $gp, $f21       | $f21 == 1              |
| swc1 $f21, 0($sp)    | store $f21 to stack    |
| jalr callee          |                        |
| nop                  |                        |
| > mtc1 $0, $f21      | $f21 == 0              |
| > jr $ra             |                        |
| > nop                |                        |
| lwc1 $f21, 0($sp)    | $f21 = 1               |
| mfc1 $2, $f21        | $2 == 1                |
| addiu $sp, $sp, 8    |                        |
| jr $ra               |                        |
| nop                  |                        |

FR0 and FR=1 FRE=1 are not shown above as FP64 code cannot execute in these modes.

A.5. O32 FPXX code calling FP64

|                      |         FR=0 FPU            |        FR=1 FPU        |
| mov $gp, 1           |                             |                        |
| caller:              |                             |                        |
| addiu $sp, $sp, -16  |                             |                        |
| sdc1 $f20, 0($sp)    | store $f20:$f21 to stack    | store $f20 to stack    |
| mtc1 $gp, $f21       | $f21 == 1                   | $f21 == 1              |
| swc1 $f21, 8($sp)    | store $f21 to stack         | store $f21 to stack    |
| jalr callee          |                             |                        |
| nop                  |                             |                        |
| > mtc1 $0, $f21      | $f21 == 0                   | $f21 == 0              |
| > jr $ra             |                             |                        |
| > nop                |                             |                        |
| lwc1 $f21, 8($sp)    | $f21 = 1                    | $f21 = 1               |
| mfc1 $2, $f21        | $2 == 1                     | $2 == 1                |
| ldc1 $f20, 0($sp)    | Restore $f20:$f21 from stack| Restore $f20 from stack|
| addiu $sp, $sp, 16   |                             |                        |
| jr $ra               |                             |                        |
| nop                  |                             |                        |

A.6. O32 FP64 code calling FPXX

|                      |         FR=0 FPU         |        FR=1 FPU        |
| mov $gp, 1           |                          |                        |
| caller:              |                          |                        |
| addiu $sp, $sp, -8   |                          |                        |
| mtc1 $gp, $f21       | $f21 == 1                | $f21 == 1              |
| swc1 $f21, 0($sp)    | store $f21 to stack      | store $f21 to stack    |
| jalr callee          |                          |                        |
| nop                  |                          |                        |
| > addiu $sp, $sp, -8 |                          |                        |
| > sdc1 $f20, 0($sp)  | store $f20:$f21 to stack | store $f20 to stack    |
| > mtc1 $0, $f21      | $f21 == 0                | $f21 == 0              | 
| > ldc1 $f20, 0($sp)  | $f21 == 1                | $f21 == 0              |
| > addiu $sp, $sp, 8  |                          |                        |
| > jr $ra             |                          |                        |
| > nop                |                          |                        |
| lwc1 $f21, 0($sp)    | $f21 = 1                 | $f21 = 1               |
| mfc1 $2, $f21        | $2 == 1                  | $2 == 1                |
| addiu $sp, $sp, 8    |                          |                        |
| jr $ra               |                          |                        |
| nop                  |                          |                        |

Appendix B. GNU Attributes

/* Object attribute tags.  */
  /* 0-3 are generic.  */

  /* Floating-point ABI used by this object file.  */
  Tag_GNU_MIPS_ABI_FP = 4,

  /* MSA ABI used by this object file.  */

/* Object attribute values.  */
  /* Values defined for Tag_GNU_MIPS_ABI_FP.  */

  /* Not tagged or not using any ABIs affected by the differences.  */

  /* Using hard-float -mdouble-float.  */

  /* Using hard-float -msingle-float.  */

  /* Using soft-float.  */

  /* Using -mips32r2 -mfp64.  */
  Val_GNU_MIPS_ABI_FP_OLD_64 = 4,

  /* Using -mfpxx */

  /* Using -mips32r2 -mfp64.  */
  Val_GNU_MIPS_ABI_FP_64 = 6,

  /* Using -mips32r2 -mfp64 -mno-odd-spreg.  */
  Val_GNU_MIPS_ABI_FP_64A = 7,

  /* Values defined for Tag_GNU_MIPS_ABI_MSA.  */

  /* Not tagged or not using any ABIs affected by the differences.  */

  /* Using 128-bit MSA.  */
  Val_GNU_MIPS_ABI_MSA_128 = 1,

Appendix C. FPU hardware modes

This section summarises the behaviour of each support floating point mode.

32-bit FPRs - FR=0

FR=0 is a mode in which the FPU presents 32 32-bit registers. These registers are numbered from $f0 to $f31. Pairs of even and odd registers are used to form 64-bit containers and are numbered from $f0 to $f30. Double-precision operations are forbidden from executing with odd-numbered registers. Prior to MIPS32r1 the odd-numbered registers are also not usable for single-precision operations.

64-bit FPRs - FR=1

FR=1 is a mode in which the FPU presents 32 64-bit registers. These registers are numbered from $f0 to $f31 and every register can be used for either single-precision or double-precision operations. When used for single-precision operations the upper 32-bit of the register becomes undefined.

Hybrid FPRs - FR=1 FRE=1

A new hardware mode is introduced in MIPS32r5 called FRE=1. This emulation mode exists to bridge the gap between FR=0 and FR=1 and enables O32 FP32 software to continue to be executable while transitioning to O32 FPXX and O32 FP64A. Since FRE=1 is used in conjunction with FR=1 the FPU presents 32 64-bit registers but instructions have modified behaviour. Operations on 64-bit and wider formats execute in exactly the same manner as FR=1 mode but 32-bit formats have special handling. Three special cases exist:

  1. A write to an even-numbered single-precision register will not clobber the upper 32-bits of the 64-bit register.
  2. A write to an odd-numbered single-precision register will update the upper 32-bits of the neighbouring even-numbered double-precision register and will not clobber the lower 32-bits.
  3. A read from an odd-numbered single-precision register will read the upper 32-bits of the neighbouring even-numbered double-precision register.

All O32 hard-float ABI extensions except FP64 can execute correctly in this mode. Debug access to single precision data must account for the non-standard layout of single/double precision registers and redirect reads and writes as above.

This hardware mode is implemented via software emulation of all 32-bit formats. When enabled, FRE=1 makes instructions which use 32-bit formats raise a reserved instruction exception and kernel emulation performs the operation using the rules above.

Due to the performance cost involved in using FRE=1 it should be enabled as a last resort rather than used for all processes regardless of whether it is absolutely necessary.

Personal tools