Excessive checking in compatibility mode hypercall argument translation:

The hypercall argument translation needed for 32-bit guests running on 64-bit hypervisors performs checks on the final register state. These checks cover all registers potentially holding hypercall arguments, not just the ones actually doing so for the hypercall being processed, since the code was originally intended for use only by PV guests.

While this is not a problem for PV guests (as they can’t enter 64-bit mode and hence can’t alter the high halves of any of the registers), the subsequent reuse of the same functionality for HVM guests exposed those checks to values (specifically, unexpected values for the high halves of registers not holding hypercall arguments) controlled by guest software.


logic error (32->64 compact)



HVM/PVH guests can otherwise trigger the final BUG_ON() in that function by entering 64-bit mode, setting the high halves of affected registers to non-zero values, leaving 64-bit mode, and issuing a hypercall that might get preempted and hence become subject to continuation argument translation (HYPERVISOR_memory_op being the only one possible for HVM, PVH also having the option of using HYPERVISOR_mmuext_op). This issue got introduced when HVM code was switched to use compat_memory_op() - neither that nor hypercall_xlat_continuation() were originally intended to be used by other than PV guests (which can’t enter 64-bit mode and hence have no way to alter the high halves of 64-bit registers).


hypercall_xlat_continuation(NULL,  2, 0x2, nat, arg);
hypercall_xlat_continuation(&left, 4, 0x01, nat_ops, cmp_uops)
hypercall_xlat_continuation(&left, 4, 0)
hypercall_xlat_continuation(&cmd,  2, 0x02, nat.hnd, compat)


 * Translate a native continuation into a compat guest continuation.
 * id: If non-NULL then points to an integer N between 0-5. Will be updated
 * with the value of the N'th argument to the hypercall. The N'th argument must
 * not be subject to translation (i.e. cannot be referenced by @mask below).
 * This option is useful for extracting the "op" argument or similar from the
 * hypercall to enable further xlat processing.
 * nr: Total number of arguments the hypercall has.
 * mask: Specifies which of the hypercall arguments require compat translation.
 * bit 0 indicates that the 0'th argument requires translation, bit 1 indicates
 * that the first argument requires translation and so on. Native and compat
 * values for each translated argument are provided as @varargs (see below).
 * varargs: For each bit which is set in @mask the varargs contain a native
 * value (unsigned long) and a compat value (unsigned int). If the native value
 * and compat value differ and the N'th argument is equal to the native value
 * then that argument is replaced by the compat value. If the native and compat
 * values are equal then no translation takes place. If the N'th argument does
 * not equal the native value then no translation takes place.
 * Any untranslated argument (whether due to not being requested in @mask,
 * native and compat values being equal or N'th argument not equalling native
 * value) must be equal in both native and compat representations (i.e. the
 * native version cannot have any bits > 32 set)
 * Return: Number of arguments which were actually translated.

    int hypercall_xlat_continuation(unsigned int *id, unsigned int nr,
                                    unsigned int mask, ...)
        ASSERT(nr <= ARRAY_SIZE(mcs->call.args));
        ASSERT(!(mask >> nr));     
        ASSERT(!id || *id < nr);   
        ASSERT(!id || !(mask & (1U << *id)));
        va_start(args, mask);
        regs = guest_cpu_user_regs();
        for ( i = 0; i < nr; ++i, mask >>= 1 ) // 本来为 i < 6
            unsigned long *reg;
            switch ( i )
            case 0: reg = &regs->ebx; break;
            case 1: reg = &regs->ecx; break;
            case 2: reg = &regs->edx; break;
            case 3: reg = &regs->esi; break;
            case 4: reg = &regs->edi; break;
            case 5: reg = &regs->ebp; break;
            default: BUG(); reg = NULL; break;

            if ( (mask & 1) )
                nval = va_arg(args, unsigned long);
                cval = va_arg(args, unsigned int);
                if ( cval == nval )
                    mask &= ~1U;
                    BUG_ON(nval == (unsigned int)nval);
            else if ( id && *id == i )
                *id = *reg;
                id = NULL;

            if ( (mask & 1) && *reg == nval ) // 说明 cval != nval
                *reg = cval;
                BUG_ON(*reg != (unsigned int)*reg);

没有PATCH之前,最后一个BUG_ON可能会被触发,因为reg是64位,而unsigned int是32位(参见wikipedia)。其实大部分时候都会执行到红色的BUG_ON,因为前面一个if成立的条件太苛刻了:需要*reg==nval,意味着xen要能猜到reg的值(即xen只能处理该值),不太可能。如果hypercall本身只有2个参数,那么攻击者可以利用后面4个空闲的寄存器,制造BUG_ON攻击。

Xen假设reg的高32位为0,所以才有此BUG_ON。如果一个32位的guest VM进入64位模式,设置寄存器的高32位为非零的值,然后回到32位模式,然后调用可能被抢占的hypercall。


 * To allow safe resume of do_memory_op() after preemption, we need to know
 * at what point in the page list to resume. For this purpose I steal the
 * high-order bits of the @cmd parameter, which are otherwise unused and zero.
 * Note that both of these values are effectively part of the ABI, even if
 * we don't need to make them a formal part of it: A guest suspended for
 * migration in the middle of a continuation would fail to work if resumed on
 * a hypervisor using different values.
#define MEMOP_EXTENT_SHIFT 6 /* cmd[:6] == start_extent */
#define MEMOP_CMD_MASK     ((1 << MEMOP_EXTENT_SHIFT) - 1)



A buggy or malicious HVM guest can crash the host.