#
#  Nintendo64 RSP Info Text, Copyright (C) 1998 by Michael Tedder
#  --------------------------------------------------------------
#
#  Includes an overview and description of the RSP, an opcode matrix with
#  mnemonics and descriptions of most of the opcodes.
#

RSP Overview
------------

As you probably already know, the RSP is one half of the RCP (the other half
being the RDP).  This CPU inside the N64 is specifically designed to handle
lots of math, mostly used in 3D and audio.

Put simply, the RSP is a MIPS R4000 CPU with a "customized" version of the
MDMX extensions.  The opcode matrix for the RSP is compatible with the R4000,
but is nothing like the MDMX specification.

If you're already familiar with MMX on x86-based PCs, then you should have no
problems understanding the RSP, as the RSP utilizes SIMD (Single Instruction,
Multiple Data) technology.

Since the RSP is a signal processor, it operates on data very quickly without
any kind of error checking.  For example, it *is* possible to divide by zero
on the RSP.  (For those of you who are curious, the result of n/0 is 7FFFh,
where 'n' is any number.)

Another benefit of the RSP is that memory alignment issues are not enforced.
It is perfectly valid to execute: LW t0, 1(r0)

The RSP can *only* access its 8K of memory.  Everything must be DMA'ed to and
from this area.  Even further along, data can only be accessed in DMEM and code
can only be accessed in IMEM.  (In other words, the RSP cannot do self-
modifying code itself.  All reads/writes are enforced to DMEM.)

DMEM is mapped at 04000000h and extends to 04000FFFh.  IMEM is mapped at
04001000h and extends to 04001FFFh.

Code for the RSP is either poked (written directly) to IMEM by the main CPU,
the R4300i, or it is DMA'ed using the RSP's DMA unit.  When preparing the RSP
to utilize new code, the SP_STATUS_HALT and SP_STATUS_BROKE bits need to be
written to the SP status register so the RSP halts execution.  Once the new
code is in IMEM, the RSP's PC register is set to the appropriate location and
the SP_STATUS_HALT and SP_STATUS_BROKE bits are cleared from the SP status
register.  The RSP continues to execute code until its status bits are changed
or until it executes a BREAK opcode.  If it encounters a BREAK opcode, the
SP_STATUS_HALT and SP_STATUS_BROKE bits are automatically set and an SP
interrupt is generated if the SP_STATUS_INTR_BREAK bit is set.


The Registers
-------------

The RSP has 32 32-bit general purpose registers (R0, T0, T1, SP, RA, etc) taken
from its R4000 counterpart.  There is no COP1, so there are no floating-point
registers.  The RSP also contains 32 128-bit "vector" registers ($v0, $v1, $v2,
$v3 ... $v31).

Each vector register is split into 8 16-bit "units".  Operations on these
vector registers only address each unit as 16-bits.

The RSP also has 8 "hidden" 32-bit accumulators and a set of 4 flags.  The
accumulators cannot be directly accessed, but are used in operations that
require "carrying-over" of data.  The flags are 16-bits in size, and the
first flag is used as a bitmask for carry-over of data.  The other 3 flags
serve an unknown purpose at this time.

There is no real COP0 on the RSP, but MTC0/MFC0 are used to communicate with
the RSP and RDP registers.  The mapping is as follows:

                    GPR             MM N64 Register
                  -------         -------------------
                     0             SP memory address
                     1             SP DRAM DMA address
                     2             SP read DMA length
                     3             SP write DMA length
                     4             SP status
                     5             SP DMA full
                     6             SP DMA busy
                     7             SP semaphore

                     8             DP CMD DMA start
                     9             DP CMD DMA end
                    10             DP CMD DMA current
                    11             DP CMD status
                    12             DP clock counter
                    13             DP buffer busy counter
                    14             DP pipe busy counter
                    15             DP TMEM load counter

For example: MFC0 t0, r0  would move the RSP's DMA memory address into T0.


The Opcode Matrix
-----------------

Again, the RSP is based upon a R4000, so many of the instructions have
identical mappings.  All of the vector opcodes are mapped into COP2 on the RSP.


       28..26                       Opcode
          0       1       2       3       4       5       6       7
31..29+-------+-------+-------+-------+-------+-------+-------+-------+
   0  |Special|RegIMM |   J   |  JAL  |  BEQ  |  BNE  | BLEZ  | BGTZ  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  | ADDI  | ADDIU | SLTI  | SLTIU | ANDI  |  ORI  | XORI  |  LUI  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  | COP0  |  ---  | COP2  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   4  |  LB   |  LH   |  ---  |  LW   |  LBU  |  LHU  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   5  |  SB   |  SH   |  ---  |  SW   |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   6  |  ---  |  ---  | LWC2  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   7  |  ---  |  ---  | SWC2  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+



        2..0                   SPECIAL function
          0       1       2       3       4       5       6       7
 5..3 +-------+-------+-------+-------+-------+-------+-------+-------+
   0  |  SLL  |  ---  |  SRL  |  SRA  | SLLV  |  ---  | SRLV  | SRAV  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  |  JR   |  ---  |  ---  |  ---  |  ---  | BREAK |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   4  |  ADD  | ADDU  |  SUB  | SUBU  |  AND  |  OR   |  XOR  |  NOR  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   5  |  ---  |  ---  |  SLT  | SLTU  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   6  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   7  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+



       18..16                     REGIMM rt
          0       1       2       3       4       5       6       7
20..19+-------+-------+-------+-------+-------+-------+-------+-------+
   0  | BLTZ  | BGEZ  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  |BLTZAL |BGEZAL |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+



       23..21                      COP0 rs
          0       1       2       3       4       5       6       7
 25,24+-------+-------+-------+-------+-------+-------+-------+-------+
   0  | MFC0  |  ---  |  ---  |  ---  | MTC0  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+



       23..21                      COP2 rs
          0       1       2       3       4       5       6       7
 25,24+-------+-------+-------+-------+-------+-------+-------+-------+
   0  | MFC2  |  ---  | CFC2  |  ---  | MTC2  |  ---  | CTC2  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |
      +-------+-------+-------+-------+-------+-------+-------+-------+



        2..0                   VECTOP function
          0       1       2       3       4       5       6       7
 5..3 +-------+-------+-------+-------+-------+-------+-------+-------+
   0  | VMULF | VMULU | VRNDP | VMULQ | VMUDL | VMUDM | VMUDN | VMUDH |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   1  | VMACF | VMACU | VRNDN | VMACQ | VMADL | VMADM | VMADN | VMADH |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   2  | VADD  | VSUB  | VSUT  | VABS  | VADDC | VSUBC | VADDB | VSUBB |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   3  | VACCB | VSUCB | VSAD  | VSAC  | VSUM  | VSAW  |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   4  |  VLT  |  VEQ  |  VNE  |  VGE  |  VCL  |  VCH  |  VCR  | VMRG  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   5  | VAND  | VNAND |  VOR  | VNOR  | VXOR  | VNXOR |  ---  |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   6  | VRCP  | VRCPL | VRCPH | VMOV  | VRSQ  | VRSQL | VRSQH | VNOOP |
      +-------+-------+-------+-------+-------+-------+-------+-------+
   7  | VEXTT | VEXTQ | VEXTN |  ---  | VINST | VINSQ | VINSN |  ---  |
      +-------+-------+-------+-------+-------+-------+-------+-------+


Elemental Specifiers
--------------------

Many of the vector opcodes are formed as:

                  V<op> $v<dest>, $v<src1>, $v<src2>([elem])

The [elem] specifier (elemental) is optional.  The following elemental
specifiers are encoded in the RS bitmask:

                            RS          Elem
                           ----       --------
                             0         <none>
                             1          ---
                             2          [0q]
                             3          [1q]
                             4          [0h]
                             5          [1h]
                             6          [2h]
                             7          [3h]
                             8          [0]
                             9          [1]
                             A          [2]
                             B          [3]
                             C          [4]
                             D          [5]
                             E          [6]
                             F          [7]

The elemental specifier tells the RSP to utilize a specific portion of the
second source register to do operations on.  Take the VADD instruction,
for example.  With no elemental specifier, the following C code would
perform the same operation as:

                           VADD $v2, $v0, $v1

    v2[0] = v0[0] + v1[0];  v2[1] = v0[1] + v1[1];  v2[2] = v0[2] + v1[2];
    v2[3] = v0[3] + v1[3];  v2[4] = v0[4] + v1[4];  v2[5] = v0[5] + v1[5];
    v2[6] = v0[6] + v1[6];  v2[7] = v0[7] + v1[7];


However, if an elemental specifer is used:

                          VADD $v2, $v0, $v1[3]

    v2[0] = v0[0] + v1[3];  v2[1] = v0[1] + v1[3];  v2[2] = v0[2] + v1[3];
    v2[3] = v0[3] + v1[3];  v2[4] = v0[4] + v1[3];  v2[5] = v0[5] + v1[3];
    v2[6] = v0[6] + v1[3];  v2[7] = v0[7] + v1[3];


A simple table can breakdown the source register used for the h and q
operations:

            Specifier                 Elements used
          ------------            ---------------------
             <none>                  0 1 2 3 4 5 6 7
              [0q]                   0 1 2 3 0 1 2 3
              [1q]                   4 5 6 7 4 5 6 7
              [0h]                   0 1 0 1 0 1 0 1
              [1h]                   2 3 2 3 2 3 2 3
              [2h]                   4 5 4 5 4 5 4 5
              [3h]                   6 7 6 7 6 7 6 7
              [0]                    0 0 0 0 0 0 0 0
              [1]                    1 1 1 1 1 1 1 1
              [2]                    2 2 2 2 2 2 2 2
              [3]                    3 3 3 3 3 3 3 3
              [4]                    4 4 4 4 4 4 4 4
              [5]                    5 5 5 5 5 5 5 5
              [6]                    6 6 6 6 6 6 6 6
              [7]                    7 7 7 7 7 7 7 7


Here is one final example using [1q]:

                          VADD $v2, $v0, $v1[1q]

    v2[0] = v0[0] + v1[4];  v2[1] = v0[1] + v1[5];  v2[2] = v0[2] + v1[6];
    v2[3] = v0[3] + v1[7];  v2[4] = v0[4] + v1[4];  v2[5] = v0[5] + v1[5];
    v2[6] = v0[6] + v1[6];  v2[7] = v0[7] + v1[7];


Loads and Stores
----------------

There are two ways to get data in and out of the RSP vector registers.
LWC2/SWC2 is the most common, however MTC2/MFC2 also work.

LWC2/SWC2 are actually broken down into 12 different load and store operations,
depending on the encoding of the RD bitmask:

                RD                  Load Op     Offset Scalar
               ----                ---------   ---------------
                0                     LBV             1
                1                     LSV             2
                2                     LLV             4
                3                     LDV             8
                4                     LQV             16
                5                     LRV             16
                6                     LPV             8
                7                     LUV             8
                8                     LHV             16
                9                     LFV             16
                A                     LWV             16
                B                     LTV             16


The offset scalar is the value that the offset is multiplied by.  For example,
if a LQV instruction is encoded with an offset of 1, the actual opcode would
be:
                          LQV $v<reg>[0], 16(<base>)

For loads and stores that do not have a scalar of 16, an offset into the
vector register can be specifed by the encoding of the SA bitmask.  (Note
that this value is << 1, so when dissassembling, the value must be >> 1.)

As an example, LLV can be used to load a word (32-bits) at address 0 in DMEM
into elements 2 and 3 of vector register 0 by using the following:

                             LLV $v0[4], 0(r0)


The Opcodes
-----------

This section will only contain the vector opcode descriptions specific to
the RSP.  For opcode descriptions of the standard R4000 opcodes, please see
the R4000 Programmers Manual.

The following codes are used for the Format descriptor:

    reg             This can be any R4000 register, such as R0, T1, S5, etc.

    base            The base GPR to use for load/store operations.  See reg.

    offset          The additional offset to use from a base register.
                    This can be just about any immediate 16-bit value.

    $v<src>         The source vector register, where the data for an operation
                    will be used.  Example: $v29

    $v<s1>          One of the source vector registers.  See $v<src>.

    $v<s2>          A second source vector register.  See $v<src>.

    $v<dest>        The destination vector register, where the final result
                    of the operation will be placed.  Example: $v12

    [sel]           Specifies the element to be used on a source vector
                    register in an operation.  This value ranges from 0 to
                    7.  Example: [5]

    [el]            This also specifies the element to be used on a source
                    vector register, but is optional and can also reference
                    the quad or half portions of a vector.  See [0q], [1q],
                    [0h] thru [3h] above.  Example: [1h]

    [del]           Specifies the destination element to be used as the
                    result for the operation.  See [sel].

=============================================================================

Opcode:    LBV                      Format:  LBV $v<dest>[del], offset(base)
Function:  Load byte to vector

Description:  LBV loads a byte from DMEM into the vector register and element
    specified in the opcode.

=============================================================================

Opcode:    LSV                      Format:  LSV $v<dest>[del], offset(base)
Function:  Load short (halfword) to vector

Description:  LSV loads a short (halfword, 16-bits) from DMEM into the vector
    register and element specified in the opcode.

=============================================================================

Opcode:    LLV                      Format:  LLV $v<dest>[del], offset(base)
Function:  Load long (word) to vector

Description:  LLV loads a long (word, 32-bits) from DMEM into the vector
    register and element specified in the opcode.

=============================================================================

Opcode:    LDV                      Format:  LDV $v<dest>[del], offset(base)
Function:  Load double (doubleword) to vector

Description:  LDV loads a double (doubleword, 64-bits) from DMEM into the
    vector register and element specified in the opcode.

=============================================================================

Opcode:    LQV                      Format:  LQV $v<dest>[del], offset(base)
Function:  Load quad (quadword) to vector

Description:  LQV loads a quad (quadword, 128-bits) from DMEM into the vector
    register and element specified in the opcode.

=============================================================================

Opcode:    LPV                      Format:  LPV $v<dest>[del], offset(base)
Function:  Load packed? to vector

Description:  LPV loads a packed? value from DMEM?

=============================================================================

Opcode:    LUV                      Format:  LUV $v<dest>[del], offset(base)
Function:  Load unpacked? to vector

Description:  LUV loads an unpacked? value from DMEM?

=============================================================================

Opcode:    LHV                      Format:  LHV $v<dest>[del], offset(base)
Function:  Load half? to vector

Description:  LHV loads a half? value from DMEM?

=============================================================================

Opcode:    LFV                      Format:  LFV $v<dest>[del], offset(base)
Function:  Load fourth? to vector

Description:  LFV loads a fourth? value from DMEM?

=============================================================================

Opcode:    LWV                      Format:  LWV $v<dest>[del], offset(base)
Function:  Load wrap? to vector

Description:  LWV loads a wrapped? value from DMEM?

=============================================================================

Opcode:    LTV                      Format:  LTV $v<dest>[del], offset(base)
Function:  Load transpose? to vector

Description:  LTV loads a transposed? value from DMEM?

=============================================================================

Opcode:    SBV                      Format:  SBV $v<dest>[del], offset(base)
Function:  Store byte from vector

Description:  SBV stores a byte from the vector register and element specified
    in the opcode to DMEM.

=============================================================================

Opcode:    SSV                      Format:  SSV $v<dest>[del], offset(base)
Function:  Store short (halfword) from vector

Description:  SSV stores a short (halfword, 16-bits) from the vector register
    and element specified in the opcode to DMEM.

=============================================================================

Opcode:    SLV                      Format:  SLV $v<dest>[del], offset(base)
Function:  Store long (word) to vector

Description:  SLV stores a long (word, 32-bits) from the vector register and
    element specified in the opcode to DMEM.

=============================================================================

Opcode:    SDV                      Format:  SDV $v<dest>[del], offset(base)
Function:  Store double (doubleword) to vector

Description:  SDV stores a double (doubleword, 64-bits) from the vector
    register and element specified in the opcode to DMEM.

=============================================================================

Opcode:    SQV                      Format:  SQV $v<dest>[del], offset(base)
Function:  Store quad (quadword) to vector

Description:  SQV stores a quad (quadword, 128-bits) from the vector register
    and element specified in the opcode to DMEM.

=============================================================================

Opcode:    SPV                      Format:  SPV $v<dest>[del], offset(base)
Function:  Store packed? to vector

Description:  SPV stores a packed? value to DMEM?

=============================================================================

Opcode:    SUV                      Format:  SUV $v<dest>[del], offset(base)
Function:  Store unpacked? to vector

Description:  SUV stores an unpacked? value to DMEM?

=============================================================================

Opcode:    SHV                      Format:  SHV $v<dest>[del], offset(base)
Function:  Store half? to vector

Description:  SHV stores a half? value to DMEM?

=============================================================================

Opcode:    SFV                      Format:  SFV $v<dest>[del], offset(base)
Function:  Store fourth? to vector

Description:  SFV stores a fourth? value to DMEM?

=============================================================================

Opcode:    SWV                      Format:  SWV $v<dest>[del], offset(base)
Function:  Store wrap? to vector

Description:  SWV stores a wrapped? value to DMEM?

=============================================================================

Opcode:    STV                      Format:  STV $v<dest>[del], offset(base)
Function:  Store transpose? to vector

Description:  STV stores a transposed? value to DMEM?

=============================================================================

Opcode:    VNOOP                    Format:  VNOOP
Function:  Vector no-op

Description:  Does nothing.

=============================================================================

Opcode:    VMOV                     Format:  VMOV $v<dest>[del], $<src>[sel]
Function:  Vector move

Description:  Moves a 16-bit element from one vector to another.

=============================================================================

Opcode:    VXOR                     Format:  VXOR $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector XOR

Description:  Performs a bitwise XOR operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VNXOR                    Format:  VNXOR $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector NXOR

Description:  Performs a bitwise NOT XOR operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VAND                     Format:  VAND $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector AND

Description:  Performs a bitwise AND operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VNAND                    Format:  VNAND $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector NAND

Description:  Performs a bitwise NOT AND operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VOR                      Format:  VOR $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector OR

Description:  Performs a bitwise OR operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VNOR                     Format:  VNOR $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector NOR

Description:  Performs a bitwise NOT OR operation on the vector registers
    specified by s1 and s2 and element el.

=============================================================================

Opcode:    VADD                     Format:  VADD $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector ADD

Description:  Performs a vector add on the vector registers specified by s1
    and s2 and element el.  Automatically clamps at 7FFFh if s1+s2 is too
    large to fit into a 16-bit value.

=============================================================================

Opcode:    VSUB                     Format:  VSUB $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector SUB

Description:  Performs a vector subtract on the vector registers specified by
    s1 and s2 and element el.  Automatically clamps at 8000h if s1-s2 is too
    small to fit into a 16-bit value.  Subtracts an extra 1 if the flag for
    the unit is on.

=============================================================================

Opcode:    VSUBC                    Format:  VSUBC $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector SUB, noclamp

Description:  Performs a vector subtract on the vector registers specified by
    s1 and s2 and element el.  Does not clip, but sets the appropriate bit(s)
    in the flags register if wraparound occurs.

=============================================================================

Opcode:    VSUT                     Format:  VSUT $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector SUT?

Description:  Unknown.  Appears to zap the vector register specified by dest
    in the opcode.

=============================================================================

Opcode:    VRCPH                    Format:  VRCPH $v<dest>[del], $v<src>[sel]
Function:  Vector reciprocal (high)

Description:  VRCPH/VRCPL are "pairable" instructions which utilize special
    values held inside temporary registers.  VRCPH immediately writes the
    current value of 'high_val' to the destination vector.  It then stores
    away the value contained in the source vector to 'high_source'.

=======================================================================

Opcode:    VRCPL                    Format:  VRCPL $v<dest>[del], $v<src>[sel]
Function:  Vector reciprocal (low)

Description:  VRCPH/VRCPL are "pairable" instructions which utilize special
    values held inside temporary registers.  VRCPL immediately performs the
    following math operation:

                                   0.4999963
                val = ----------------------------------
                      (high_source << 16) | $v<src>[sel]

    'val' is then converted to 16.16 fixed point, and the lower portion is
    written to $v<dest>[del].  The upper portion is stored in 'high_val' to
    be written to a vector register by a later VRCPH.

    NOTE: The dividend *really* is 0.4999963; it is not 0.5.

=======================================================================

Opcode:    VMUDN                    Format:  VMUDN $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector mid-n multiply

Description:  Performs an unsigned multiply of two vectors, storing the result
    both in the destination vector and in the lower half of the accumulator.

=============================================================================

Opcode:    VMADN                    Format:  VMADN $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector mid-n multiply accumulate

Description:  Performs an unsigned multiply of two vectors, adding the result
    to the lower half of the accumulator and storing the accumulator's result
    to the destination vector.

=============================================================================

Opcode:    VMUDH                    Format:  VMUDH $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector high multiply

Description:  Performs a signed multiply of two vectors, storing the result
    both in the destination vector and in the upper half of the accumulator.
    Automatically clips at 7FFFh if overflow occurs and 8000h if underflow
    occurs.

=============================================================================

Opcode:    VMADH                    Format:  VMADH $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector high multiply accumulate

Description:  Performs a signed multiply of two vectors, adding the result
    to the upper half of the accumulator and storing the upper accumulator's
    result to the destination vector.  Automatically clips at 7FFFh if overflow
    occurs and 8000h if underflow occurs.

=============================================================================

Opcode:    VMUDL                    Format:  VMUDL $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector low multiply

Description:  Performs an unsigned multiply of two vectors, storing the upper
    16 bits of the result to both the destination vector and the upper half of
    the accumulator.

=============================================================================

Opcode:    VMADL                    Format:  VMADL $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector low multiply accumulate

Description:  Performs an unsigned multiply of two vectors, taking the upper
    16 bits of the result, adding it to the upper half of the accumulator and
    storing the upper half of the accumulator to the destination vector.

=============================================================================

Opcode:    VMUDM                    Format:  VMUDM $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector mid-m multiply

Description:  Performs a signed multiply of two vectors, storing the upper
    16 bits of the result to both the destination vector and the upper half of
    the accumulator.

=============================================================================

Opcode:    VMADM                    Format:  VMADM $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector mid-m multiply accumulate

Description:  Performs a signed multiply of two vectors, taking the upper
    16 bits of the result, adding it to the upper half of the accumulator and
    storing the upper half of the accumulator to the destination vector.

=============================================================================

Opcode:    VMULF                    Format:  VMULF $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector (frac?) multiply

Description:  Performs a signed multiply of two vectors, except the multiplier
    (vector s2) is multiplied by 2 before the multiply operation.  The final
    result is then "rounded down" by 16 bits and stored in both the destination
    vector and high portion of the accumulator.  The lower portion of the
    accumulator holds the result of the multiply, truncated to 16 bits, and
    XORed by 8000h.

    The "rounded down" algorithm is implemented as:

        if (val & 0x8000) return (val >> 16) + 1; else return (val >> 16);

=============================================================================

Opcode:    VMACF                    Format:  VMACF $v<dest>, $v<s1>, $v<s2>[el]
Function:  Vector (frac?) multiply accumulate

Description:  Performs a signed multiply of two vectors, except the multiplier
    (vector s2) is multiplied by 2 before the multiply operation and the
    upper half of the accumulator is added to the result.  The final result
    is then "rounded down" by 16 bits and stored in both the destination
    vector and high portion of the accumulator.  The lower portion of the
    accumulator holds the result of the multiply, truncated to 16 bits, and
    XORed by 8000h.

=============================================================================

Opcode:    MTC2                     Format:  MTC2 $v<dest>[el], reg
Function:  Move to COP2

Description:  Moves the lower 16-bit portion of a GPR to an element in a
    vector register.  The vector element specifier can range from 0-15 and is
    a byte offset into the 128-bit vector register.  Examples:

        ORI t0, r0, 0x1234
        MTC2 $v0[0], t0             - Moves 1234h into element v0[0]

        ORI t0, r0, 0xFEDC          - Moves FEh into the lower half of v0[0]
        MTC2 $v0[1], t0               and DCh into the upper half of v0[1]

        ORI t0, r0, 0x5489
        MTC2 $v0[14], t0            - Moves 5489h into element v0[7]

        ORI t0, r0, 0xABCD
        MTC2 $v0[15], t0            - Moves ABh into the lower half of v0[7]

=============================================================================

Opcode:    MFC2                     Format:  MFC2 $v<dest>[el], reg
Function:  Move from COP2

Description:  Moves a 16-bit value from a vector register into the a GPR.
    The vector element specifier can range from 0-15 and is a byte offset
    into the 128-bit vector register.  See MTC2 for examples.

