[interp] Rework the allocation of offsets for variables (#49072)
authorVlad Brezae <brezaevlad@gmail.com>
Fri, 12 Mar 2021 15:03:37 +0000 (17:03 +0200)
committerGitHub <noreply@github.com>
Fri, 12 Mar 2021 15:03:37 +0000 (17:03 +0200)
* [interp] Cleanup get_interp_local_offset

Use it just for allocating an offset for a var, at the top of the locals space.

* [interp] Save the args inside call instructions

This will aid later optimizations, in order to easily detect the call args for an opcode. This is stored as a -1 terminated array of var indexes. Also change the structure of newobj_reg_map array, so it can reuse this format (newobj_reg_map should be killed at some point anyway).

* [interp] Pass target_ip in a normal var to MINT_CALLI_NAT_FAST

Making it consistent with other calli opcodes and simplifies a little bit the code generation path.

* [interp] Add explicit return for CallArgs opcodes

Before this change, calls used to receive a single special dreg argument. This was resolved to an offset. At this offset, the call could find all the parameters and the return value was also written at the same offset. With this change we move towards having an explicit dreg return. For calls, the last sreg must be of the special type MINT_CALL_ARGS_SREG. The var offset allocator should ensure all call args are allocated one after the other and that this special reg type is resolved to the offset where these args reside.

* [interp] Remove call args flag from code generation / optimization phases

This flag should only be relevant to the var offset allocator

* [interp] Add new local offset allocator

This change aims to simplify the handling of vars during optimizations. Before this change we had different types of vars : managed locals, var residing on the execution stack, vars that are the argument of a call. Multiple restrictions applied to vars residing on the execution stack and to call args. Following this change, all vars share the same semantics during optimizations passes. At the very end, we allocate offsets for them and we will end up with 3 types of vars : global vars (used from multiple bblocks), local vars (used in a single bblock) and call arg vars. Call arg vars are always local.

The first step of the allocator is to detect all global vars and allocate offsets for them by doing a full iteration over the code. They will reside in the first section of the stack frame and they are allocated one after the other in the order they are detected. The param area (containing call arg vars) will have to be allocated after the local var space, otherwise a call would overwrite vars in the calling method. These vars are allocated for one basic block at a time.

For simple local vars we do an initial iteration over the bblock instructions and we set the liveness information for each referenced var (live_start and live_end). We will maintain a list of active vars and the current top of stack. As a var becomes alive we allocate it at the current offset and add it to the active_vars array. As a var becomes dead, we remove entries from the active_vars array and update the current top of stack, if space has been freed at the end of the stack.

For call args, because we must control the offset at which these vars are allocated, in the initial pass we generate MOVs from the var to a new local var, if the call arg was initially global. Afterwards, call arg vars are allocated in a similar manner to normal local vars. The space for them is tied to the param area of the call, so the entire space is allocated at once. A call become active when any of its args is first written. The liveness of the call ends when the actual call is done, at which point we resolve the offset of every arg relative to the start of the param area of the method. Once all normal local vars are allocated, we will compute the final offset of the call arg vars.

* [interp] Improve dumping for call instructions

* [interp] Fix var type of valuetype this

* [interp] Re-enable copy propagation

* [interp] Rename MINT_NEWOBJ opcodes

* [interp] Disable tracking of offsets on the execution stack during codegen

They are no longer needed. We generate offsets for every var at the very end.

* [interp] Remove memmove of args during newobj

The offset allocator is allocating the vars at the right offset in the param area. We also used `push_types` to add the arguments back on the stack, which was allocating new vars for each argument. We no longer do this, so newobj_reg_map is not needed anymore.

* [interp] Re-enable inlining of constructors

For object ctors, MINT_NEWOBJ_INLINED allocates an object which will be used both as a `this` arg to the ctor as well as the return var from the newobj operation.

For valuetype ctors, we need to first inform the var offset allocator that the valuetype exists before the MINT_NEWOBJ_VT_INLINED invocation, which will take its address, which will be used as `this` arg to the inline method. We also need to dummy use the valuetype, so it never dies before the ctor is inlined, otherwise `this` points to garbage. We use this def/dummy_use mechanism in order to avoid promoting the valuetype to a global var, as it happens with normal vars that have their address taken (via ldloca).

* [interp] Avoid optimization if newobj is guarded

MINT_NEWOBJ* should not store into a local if the ctor might throw, because we set the return value before the ctor starts executing, and a guarding clause can see this variable as being set.

* [interp] Refactor the active vars code a bit

* [interp] Add missing implicit conversion

When passing an argument or returning a value from a method. The stack contents are not necessarily matching the signature type, in which case we add conversions.

* [interp] Disable test using excessive stack space

This test was exceeding the stack limit even before the new offset allocator, it was just not reported.

src/mono/mono/mini/interp/interp-internals.h
src/mono/mono/mini/interp/interp.c
src/mono/mono/mini/interp/mintops.def
src/mono/mono/mini/interp/mintops.h
src/mono/mono/mini/interp/transform.c
src/mono/mono/mini/interp/transform.h
src/tests/issues.targets

index 4414a83..2afac6e 100644 (file)
@@ -125,10 +125,7 @@ struct InterpMethod {
        MonoType **param_types;
        MonoJitInfo *jinfo;
 
-       // This doesn't include the size of stack locals
-       guint32 total_locals_size;
-       // The size of locals that map to the execution stack
-       guint32 stack_size;
+       guint32 locals_size;
        guint32 alloca_size;
        int num_clauses; // clauses
        int transformed; // boolean
index 89bd0be..2c6bcc9 100644 (file)
@@ -93,8 +93,6 @@ struct FrameClauseArgs {
        const guint16 *end_at_ip;
        /* When exiting this clause we also exit the frame */
        int exit_clause;
-       /* Exception that we are filtering */
-       MonoException *filter_exception;
        /* Frame that is executing this clause */
        InterpFrame *exec_frame;
 };
@@ -222,11 +220,12 @@ frame_data_allocator_pop (FrameDataAllocator *stack, InterpFrame *frame)
  *   Reinitialize a frame.
  */
 static void
-reinit_frame (InterpFrame *frame, InterpFrame *parent, InterpMethod *imethod, gpointer stack)
+reinit_frame (InterpFrame *frame, InterpFrame *parent, InterpMethod *imethod, gpointer retval, gpointer stack)
 {
        frame->parent = parent;
        frame->imethod = imethod;
        frame->stack = (stackval*)stack;
+       frame->retval = (stackval*)retval;
        frame->state.ip = NULL;
 }
 
@@ -1092,7 +1091,7 @@ ves_array_get (InterpFrame *frame, stackval *sp, stackval *retval, MonoMethodSig
 }
 
 static MonoException*
-ves_array_element_address (InterpFrame *frame, MonoClass *required_type, MonoArray *ao, stackval *sp, gboolean needs_typecheck)
+ves_array_element_address (InterpFrame *frame, MonoClass *required_type, MonoArray *ao, gpointer *ret, stackval *sp, gboolean needs_typecheck)
 {
        MonoClass *ac = ((MonoObject *) ao)->vtable->klass;
 
@@ -1105,7 +1104,7 @@ ves_array_element_address (InterpFrame *frame, MonoClass *required_type, MonoArr
        if (needs_typecheck && !mono_class_is_assignable_from_internal (m_class_get_element_class (mono_object_class ((MonoObject *) ao)), required_type))
                return mono_get_exception_array_type_mismatch ();
        gint32 esize = mono_array_element_size (ac);
-       sp [-1].data.p = mono_array_addr_with_size_fast (ao, esize, pos);
+       *ret = mono_array_addr_with_size_fast (ao, esize, pos);
        return NULL;
 }
 
@@ -1379,12 +1378,12 @@ retry:
                case MONO_TYPE_U8:
                case MONO_TYPE_VALUETYPE:
                case MONO_TYPE_GENERICINST:
-                       margs->retval = &frame->stack->data.p;
+                       margs->retval = (gpointer*)frame->retval;
                        margs->is_float_ret = 0;
                        break;
                case MONO_TYPE_R4:
                case MONO_TYPE_R8:
-                       margs->retval = &frame->stack->data.p;
+                       margs->retval = (gpointer*)frame->retval;
                        margs->is_float_ret = 1;
                        break;
                case MONO_TYPE_VOID:
@@ -1404,9 +1403,9 @@ interp_frame_arg_to_data (MonoInterpFrameHandle frame, MonoMethodSignature *sig,
        InterpFrame *iframe = (InterpFrame*)frame;
        InterpMethod *imethod = iframe->imethod;
 
-       // If index == -1, we finished executing an InterpFrame and the result is at the bottom of the stack.
+       // If index == -1, we finished executing an InterpFrame and the result is at retval.
        if (index == -1)
-               stackval_to_data (sig->ret, iframe->stack, data, TRUE);
+               stackval_to_data (sig->ret, iframe->retval, data, TRUE);
        else if (sig->hasthis && index == 0)
                *(gpointer*)data = iframe->stack->data.p;
        else
@@ -1421,7 +1420,7 @@ interp_data_to_frame_arg (MonoInterpFrameHandle frame, MonoMethodSignature *sig,
 
        // Get result from pinvoke call, put it directly on top of execution stack in the caller frame
        if (index == -1)
-               stackval_from_data (sig->ret, iframe->stack, data, TRUE);
+               stackval_from_data (sig->ret, iframe->retval, data, TRUE);
        else if (sig->hasthis && index == 0)
                iframe->stack->data.p = *(gpointer*)data;
        else
@@ -1435,7 +1434,7 @@ interp_frame_arg_to_storage (MonoInterpFrameHandle frame, MonoMethodSignature *s
        InterpMethod *imethod = iframe->imethod;
 
        if (index == -1)
-               return iframe->stack;
+               return iframe->retval;
        else
                return STACK_ADD_BYTES (iframe->stack, get_arg_offset (imethod, sig, index));
 }
@@ -1475,6 +1474,7 @@ ves_pinvoke_method (
        MonoFuncV addr,
        ThreadContext *context,
        InterpFrame *parent_frame,
+       stackval *ret_sp,
        stackval *sp,
        gboolean save_last_error,
        gpointer *cache)
@@ -1483,6 +1483,7 @@ ves_pinvoke_method (
        frame.parent = parent_frame;
        frame.imethod = imethod;
        frame.stack = sp;
+       frame.retval = ret_sp;
 
        MonoLMFExt ext;
        gpointer args;
@@ -1556,7 +1557,7 @@ ves_pinvoke_method (
 #else
        // Only the vt address has been returned, we need to copy the entire content on interp stack
        if (!context->has_resume_state && MONO_TYPE_ISSTRUCT (sig->ret))
-               stackval_from_data (sig->ret, frame.stack, (char*)frame.stack->data.p, sig->pinvoke);
+               stackval_from_data (sig->ret, frame.retval, (char*)frame.retval->data.p, sig->pinvoke);
 
        g_free (margs->iargs);
        g_free (margs->fargs);
@@ -1840,6 +1841,7 @@ interp_runtime_invoke (MonoMethod *method, void *obj, void **params, MonoObject
        InterpFrame frame = {0};
        frame.imethod = imethod;
        frame.stack = sp;
+       frame.retval = sp;
 
        // The method to execute might not be transformed yet, so we don't know how much stack
        // it uses. We bump the stack_pointer here so any code triggered by method compilation
@@ -1926,6 +1928,7 @@ interp_entry (InterpEntryData *data)
        InterpFrame frame = {0};
        frame.imethod = data->rmethod;
        frame.stack = sp;
+       frame.retval = sp;
 
        context->stack_pointer = (guchar*)sp_args;
 
@@ -1953,7 +1956,7 @@ interp_entry (InterpEntryData *data)
 }
 
 static void
-do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean save_last_error)
+do_icall (MonoMethodSignature *sig, int op, stackval *ret_sp, stackval *sp, gpointer ptr, gboolean save_last_error)
 {
        if (save_last_error)
                mono_marshal_clear_last_error ();
@@ -1968,7 +1971,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_V_P: {
                typedef gpointer (*T)(void);
                T func = (T)ptr;
-               sp [0].data.p = func ();
+               ret_sp->data.p = func ();
                break;
        }
        case MINT_ICALL_P_V: {
@@ -1980,7 +1983,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_P_P: {
                typedef gpointer (*T)(gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p);
+               ret_sp->data.p = func (sp [0].data.p);
                break;
        }
        case MINT_ICALL_PP_V: {
@@ -1992,7 +1995,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_PP_P: {
                typedef gpointer (*T)(gpointer,gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p, sp [1].data.p);
+               ret_sp->data.p = func (sp [0].data.p, sp [1].data.p);
                break;
        }
        case MINT_ICALL_PPP_V: {
@@ -2004,7 +2007,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_PPP_P: {
                typedef gpointer (*T)(gpointer,gpointer,gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p);
+               ret_sp->data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p);
                break;
        }
        case MINT_ICALL_PPPP_V: {
@@ -2016,7 +2019,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_PPPP_P: {
                typedef gpointer (*T)(gpointer,gpointer,gpointer,gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p);
+               ret_sp->data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p);
                break;
        }
        case MINT_ICALL_PPPPP_V: {
@@ -2028,7 +2031,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_PPPPP_P: {
                typedef gpointer (*T)(gpointer,gpointer,gpointer,gpointer,gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p, sp [4].data.p);
+               ret_sp->data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p, sp [4].data.p);
                break;
        }
        case MINT_ICALL_PPPPPP_V: {
@@ -2040,7 +2043,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
        case MINT_ICALL_PPPPPP_P: {
                typedef gpointer (*T)(gpointer,gpointer,gpointer,gpointer,gpointer,gpointer);
                T func = (T)ptr;
-               sp [0].data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p, sp [4].data.p, sp [5].data.p);
+               ret_sp->data.p = func (sp [0].data.p, sp [1].data.p, sp [2].data.p, sp [3].data.p, sp [4].data.p, sp [5].data.p);
                break;
        }
        default:
@@ -2052,7 +2055,7 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
 
        /* convert the native representation to the stackval representation */
        if (sig)
-               stackval_from_data (sig->ret, &sp [0], (char*) &sp [0].data.p, sig->pinvoke);
+               stackval_from_data (sig->ret, ret_sp, (char*) &ret_sp->data.p, sig->pinvoke);
 }
 
 /* MONO_NO_OPTIMIZATION is needed due to usage of INTERP_PUSH_LMF_WITH_CTX. */
@@ -2061,12 +2064,12 @@ do_icall (MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean
 #endif
 // Do not inline in case order of frame addresses matters, and maybe other reasons.
 static MONO_NO_OPTIMIZATION MONO_NEVER_INLINE gpointer
-do_icall_wrapper (InterpFrame *frame, MonoMethodSignature *sig, int op, stackval *sp, gpointer ptr, gboolean save_last_error)
+do_icall_wrapper (InterpFrame *frame, MonoMethodSignature *sig, int op, stackval *ret_sp, stackval *sp, gpointer ptr, gboolean save_last_error)
 {
        MonoLMFExt ext;
        INTERP_PUSH_LMF_WITH_CTX (frame, ext, exit_icall);
 
-       do_icall (sig, op, sp, ptr, save_last_error);
+       do_icall (sig, op, ret_sp, sp, ptr, save_last_error);
 
        interp_pop_lmf (&ext);
 
@@ -2264,7 +2267,7 @@ init_jit_call_info (InterpMethod *rmethod, MonoError *error)
 }
 
 static MONO_NEVER_INLINE void
-do_jit_call (stackval *sp, InterpFrame *frame, InterpMethod *rmethod, MonoError *error)
+do_jit_call (stackval *ret_sp, stackval *sp, InterpFrame *frame, InterpMethod *rmethod, MonoError *error)
 {
        MonoLMFExt ext;
        JitCallInfo *cinfo;
@@ -2293,7 +2296,7 @@ do_jit_call (stackval *sp, InterpFrame *frame, InterpMethod *rmethod, MonoError
        }
        /* return address */
        if (cinfo->ret_mt != -1)
-               args [pindex ++] = sp;
+               args [pindex ++] = ret_sp;
        for (int i = 0; i < rmethod->param_count; ++i) {
                stackval *sval = STACK_ADD_BYTES (sp, get_arg_offset_fast (rmethod, stack_index + i));
                if (cinfo->arginfo [i] == JIT_ARG_BYVAL)
@@ -2330,16 +2333,16 @@ do_jit_call (stackval *sp, InterpFrame *frame, InterpMethod *rmethod, MonoError
                //  Sign/zero extend if necessary
                switch (cinfo->ret_mt) {
                case MINT_TYPE_I1:
-                       sp->data.i = *(gint8*)sp;
+                       ret_sp->data.i = *(gint8*)sp;
                        break;
                case MINT_TYPE_U1:
-                       sp->data.i = *(guint8*)sp;
+                       ret_sp->data.i = *(guint8*)sp;
                        break;
                case MINT_TYPE_I2:
-                       sp->data.i = *(gint16*)sp;
+                       ret_sp->data.i = *(gint16*)sp;
                        break;
                case MINT_TYPE_U2:
-                       sp->data.i = *(guint16*)sp;
+                       ret_sp->data.i = *(guint16*)sp;
                        break;
                case MINT_TYPE_I4:
                case MINT_TYPE_I8:
@@ -2347,7 +2350,7 @@ do_jit_call (stackval *sp, InterpFrame *frame, InterpMethod *rmethod, MonoError
                case MINT_TYPE_R8:
                case MINT_TYPE_VT:
                case MINT_TYPE_O:
-                       /* The result was written to sp */
+                       /* The result was written to ret_sp */
                        break;
                default:
                        g_assert_not_reached ();
@@ -2593,6 +2596,7 @@ interp_entry_from_trampoline (gpointer ccontext_untyped, gpointer rmethod_untype
        InterpFrame frame = {0};
        frame.imethod = rmethod;
        frame.stack = sp;
+       frame.retval = sp;
 
        /* Copy the args saved in the trampoline to the frame stack */
        gpointer retp = mono_arch_get_native_call_context_args (ccontext, &frame, sig);
@@ -3007,7 +3011,7 @@ mono_interp_leave (InterpFrame* parent_frame)
         * to check the abort threshold. For this to work we use frame as a
         * dummy frame that is stored in the lmf and serves as the transition frame
         */
-       do_icall_wrapper (&frame, NULL, MINT_ICALL_V_P, &tmp_sp, (gpointer)mono_thread_get_undeniable_exception, FALSE);
+       do_icall_wrapper (&frame, NULL, MINT_ICALL_V_P, &tmp_sp, &tmp_sp, (gpointer)mono_thread_get_undeniable_exception, FALSE);
 
        return (MonoException*)tmp_sp.data.p;
 }
@@ -3114,6 +3118,7 @@ interp_exec_method (InterpFrame *frame, ThreadContext *context, FrameClauseArgs
        const guint16 *ip = NULL;
        unsigned char *locals = NULL;
        int call_args_offset;
+       int return_offset;
 
 #if DEBUG_INTERP
        int tracing = global_tracing;
@@ -3161,11 +3166,6 @@ interp_exec_method (InterpFrame *frame, ThreadContext *context, FrameClauseArgs
 
        INIT_INTERP_STATE (frame, clause_args);
 
-       if (clause_args && clause_args->filter_exception) {
-               // Write the exception on to the first slot on the excecution stack
-               LOCAL_VAR (frame->imethod->total_locals_size, MonoException*) = clause_args->filter_exception;
-       }
-
 #ifdef ENABLE_EXPERIMENT_TIERED
        mini_tiered_inc (frame->imethod->method, &frame->imethod->tiered_counter, 0);
 #endif
@@ -3193,6 +3193,8 @@ main_loop:
                        MINT_IN_BREAK;
                MINT_IN_CASE(MINT_NOP)
                MINT_IN_CASE(MINT_NIY)
+               MINT_IN_CASE(MINT_DEF)
+               MINT_IN_CASE(MINT_DUMMY_USE)
                        g_assert_not_reached ();
                        MINT_IN_BREAK;
                MINT_IN_CASE(MINT_BREAK)
@@ -3208,10 +3210,10 @@ main_loop:
                        ip += 2;
                        MINT_IN_BREAK;
                MINT_IN_CASE(MINT_INIT_ARGLIST) {
-                       const guint16 *call_ip = frame->parent->state.ip - 5;
+                       const guint16 *call_ip = frame->parent->state.ip - 6;
                        g_assert_checked (*call_ip == MINT_CALL_VARARG);
-                       int params_stack_size = call_ip [4];
-                       MonoMethodSignature *sig = (MonoMethodSignature*)frame->parent->imethod->data_items [call_ip [3]];
+                       int params_stack_size = call_ip [5];
+                       MonoMethodSignature *sig = (MonoMethodSignature*)frame->parent->imethod->data_items [call_ip [4]];
 
                        // we are being overly conservative with the size here, for simplicity
                        gpointer arglist = frame_data_allocator_alloc (&context->data_stack, frame, params_stack_size + MINT_STACK_SLOT_SIZE);
@@ -3307,9 +3309,10 @@ main_loop:
                }
                MINT_IN_CASE(MINT_CALL_DELEGATE) {
                        // FIXME We don't need to encode the whole signature, just param_count
-                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [3]];
+                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [4]];
                        int param_count = csignature->param_count;
-                       call_args_offset = ip [1];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
                        MonoDelegate *del = LOCAL_VAR (call_args_offset, MonoDelegate*);
                        gboolean is_multicast = del->method == NULL;
                        InterpMethod *del_imethod = (InterpMethod*)del->interp_invoke_impl;
@@ -3351,31 +3354,31 @@ main_loop:
                                        // Target method is static but the delegate has a target object. We handle
                                        // this separately from the case below, because, for these calls, the instance
                                        // is allowed to be null.
-                                       LOCAL_VAR (ip [1], MonoObject*) = del->target;
+                                       LOCAL_VAR (call_args_offset, MonoObject*) = del->target;
                                } else if (del->target) {
                                        MonoObject *this_arg = del->target;
 
                                        // replace the MonoDelegate* on the stack with 'this' pointer
                                        if (m_class_is_valuetype (this_arg->vtable->klass)) {
                                                gpointer unboxed = mono_object_unbox_internal (this_arg);
-                                               LOCAL_VAR (ip [1], gpointer) = unboxed;
+                                               LOCAL_VAR (call_args_offset, gpointer) = unboxed;
                                        } else {
-                                               LOCAL_VAR (ip [1], MonoObject*) = this_arg;
+                                               LOCAL_VAR (call_args_offset, MonoObject*) = this_arg;
                                        }
                                } else {
                                        // skip the delegate pointer for static calls
                                        // FIXME we could avoid memmove
-                                       memmove (locals + call_args_offset, locals + call_args_offset + MINT_STACK_SLOT_SIZE, ip [2]);
+                                       memmove (locals + call_args_offset, locals + call_args_offset + MINT_STACK_SLOT_SIZE, ip [3]);
                                }
                        }
-                       ip += 4;
+                       ip += 5;
 
                        goto call;
                }
                MINT_IN_CASE(MINT_CALLI) {
                        MonoMethodSignature *csignature;
 
-                       csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [3]];
+                       csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [4]];
 
                        cmethod = LOCAL_VAR (ip [2], InterpMethod*);
                        if (cmethod->method->flags & METHOD_ATTRIBUTE_PINVOKE_IMPL) {
@@ -3383,7 +3386,8 @@ main_loop:
                                mono_interp_error_cleanup (error); /* FIXME: don't swallow the error */
                        }
 
-                       call_args_offset = ip [1];
+                       return_offset = ip [1];
+                       call_args_offset = ip [3];
 
                        if (csignature->hasthis) {
                                MonoObject *this_arg = LOCAL_VAR (call_args_offset, MonoObject*); 
@@ -3393,66 +3397,69 @@ main_loop:
                                        LOCAL_VAR (call_args_offset, gpointer) = unboxed;
                                }
                        }
-                       ip += 4;
+                       ip += 5;
 
                        goto call;
                }
                MINT_IN_CASE(MINT_CALLI_NAT_FAST) {
-                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [2]];
-                       int opcode = ip [3];
-                       gboolean save_last_error = ip [4];
+                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [4]];
+                       int opcode = ip [5];
+                       gboolean save_last_error = ip [6];
 
-                       stackval *args = (stackval*)(locals + ip [1]);
-                       gpointer target_ip = args [csignature->param_count].data.p;
+                       stackval *ret = (stackval*)(locals + ip [1]);
+                       gpointer target_ip = LOCAL_VAR (ip [2], gpointer);
+                       stackval *args = (stackval*)(locals + ip [3]);
                        /* for calls, have ip pointing at the start of next instruction */
-                       frame->state.ip = ip + 5;
+                       frame->state.ip = ip + 7;
 
-                       do_icall_wrapper (frame, csignature, opcode, args, target_ip, save_last_error);
+                       do_icall_wrapper (frame, csignature, opcode, ret, args, target_ip, save_last_error);
                        EXCEPTION_CHECKPOINT_GC_UNSAFE;
                        CHECK_RESUME_STATE (context);
-                       ip += 5;
+                       ip += 7;
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_CALLI_NAT_DYNAMIC) {
-                       MonoMethodSignature* csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [3]];
+                       MonoMethodSignature* csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [4]];
 
-                       call_args_offset = ip [1];
+                       return_offset = ip [1];
                        guchar* code = LOCAL_VAR (ip [2], guchar*);
+                       call_args_offset = ip [3];
 
                        cmethod = mono_interp_get_native_func_wrapper (frame->imethod, csignature, code);
 
-                       ip += 4;
+                       ip += 5;
                        goto call;
                }
                MINT_IN_CASE(MINT_CALLI_NAT) {
-                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [3]];
-                       InterpMethod *imethod = (InterpMethod*)frame->imethod->data_items [ip [4]];
+                       MonoMethodSignature *csignature = (MonoMethodSignature*)frame->imethod->data_items [ip [4]];
+                       InterpMethod *imethod = (InterpMethod*)frame->imethod->data_items [ip [5]];
 
                        guchar *code = LOCAL_VAR (ip [2], guchar*);
 
-                       gboolean save_last_error = ip [5];
-                       gpointer *cache = (gpointer*)&frame->imethod->data_items [ip [6]];
+                       gboolean save_last_error = ip [6];
+                       gpointer *cache = (gpointer*)&frame->imethod->data_items [ip [7]];
                        /* for calls, have ip pointing at the start of next instruction */
-                       frame->state.ip = ip + 7;
-                       ves_pinvoke_method (imethod, csignature, (MonoFuncV)code, context, frame, (stackval*)(locals + ip [1]), save_last_error, cache);
+                       frame->state.ip = ip + 8;
+                       ves_pinvoke_method (imethod, csignature, (MonoFuncV)code, context, frame, (stackval*)(locals + ip [1]), (stackval*)(locals + ip [3]), save_last_error, cache);
 
                        EXCEPTION_CHECKPOINT_GC_UNSAFE;
                        CHECK_RESUME_STATE (context);
 
-                       ip += 7;
+                       ip += 8;
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_CALLVIRT_FAST) {
                        MonoObject *this_arg;
                        int slot;
 
-                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
-                       call_args_offset = ip [1];
+                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
                        this_arg = LOCAL_VAR (call_args_offset, MonoObject*);
 
-                       slot = (gint16)ip [3];
-                       ip += 4;
+                       slot = (gint16)ip [4];
+                       ip += 5;
                        cmethod = get_virtual_method_fast (cmethod, this_arg->vtable, slot);
                        if (m_class_is_valuetype (this_arg->vtable->klass) && m_class_is_valuetype (cmethod->method->klass)) {
                                /* unbox */
@@ -3482,7 +3489,7 @@ main_loop:
                        } else if (code_type == IMETHOD_CODE_COMPILED) {
                                frame->state.ip = ip;
                                error_init_reuse (error);
-                               do_jit_call ((stackval*)(locals + call_args_offset), frame, cmethod, error);
+                               do_jit_call ((stackval*)(locals + return_offset), (stackval*)(locals + call_args_offset), frame, cmethod, error);
                                if (!is_ok (error)) {
                                        MonoException *ex = mono_error_convert_to_exception (error);
                                        THROW_EX (ex, ip);
@@ -3494,18 +3501,20 @@ main_loop:
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_CALL_VARARG) {
-                       // Same as MINT_CALL, except at ip [3] we have the index for the csignature,
+                       // Same as MINT_CALL, except at ip [4] we have the index for the csignature,
                        // which is required by the called method to set up the arglist.
-                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
-                       call_args_offset = ip [1];
-                       ip += 5;
+                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
+                       ip += 6;
                        goto call;
                }
 
                MINT_IN_CASE(MINT_CALLVIRT) {
                        // FIXME CALLVIRT opcodes are not used on netcore. We should kill them.
-                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
-                       call_args_offset = ip [1];
+                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
                        MonoObject *this_arg = LOCAL_VAR (call_args_offset, MonoObject*);
 
@@ -3519,18 +3528,19 @@ main_loop:
 #ifdef ENABLE_EXPERIMENT_TIERED
                        ip += 5;
 #else
-                       ip += 3;
+                       ip += 4;
 #endif
                        goto call;
                }
                MINT_IN_CASE(MINT_CALL) {
-                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
-                       call_args_offset = ip [1];
+                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
 #ifdef ENABLE_EXPERIMENT_TIERED
                        ip += 5;
 #else
-                       ip += 3;
+                       ip += 4;
 #endif
 call:
                        /*
@@ -3548,7 +3558,7 @@ call:
                                        // Not free currently, but will be when allocation attempted.
                                        frame->next_free = child_frame;
                                }
-                               reinit_frame (child_frame, frame, cmethod, locals + call_args_offset);
+                               reinit_frame (child_frame, frame, cmethod, locals + return_offset, locals + call_args_offset);
                                frame = child_frame;
                        }
                        if (method_entry (context, frame,
@@ -3570,18 +3580,18 @@ call:
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_JIT_CALL) {
-                       InterpMethod *rmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
+                       InterpMethod *rmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
                        error_init_reuse (error);
                        /* for calls, have ip pointing at the start of next instruction */
-                       frame->state.ip = ip + 3;
-                       do_jit_call ((stackval*)(locals + ip [1]), frame, rmethod, error);
+                       frame->state.ip = ip + 4;
+                       do_jit_call ((stackval*)(locals + ip [1]), (stackval*)(locals + ip [2]), frame, rmethod, error);
                        if (!is_ok (error)) {
                                MonoException *ex = mono_error_convert_to_exception (error);
                                THROW_EX (ex, ip);
                        }
 
                        CHECK_RESUME_STATE (context);
-                       ip += 3;
+                       ip += 4;
 
                        MINT_IN_BREAK;
                }
@@ -3612,23 +3622,23 @@ call:
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_RET)
-                       frame->stack [0] = LOCAL_VAR (ip [1], stackval);
+                       frame->retval [0] = LOCAL_VAR (ip [1], stackval);
                        goto exit_frame;
                MINT_IN_CASE(MINT_RET_VOID)
                        goto exit_frame;
                MINT_IN_CASE(MINT_RET_VT) {
-                       memmove (frame->stack, locals + ip [1], ip [2]);
+                       memmove (frame->retval, locals + ip [1], ip [2]);
                        goto exit_frame;
                }
                MINT_IN_CASE(MINT_RET_LOCALLOC)
-                       frame->stack [0] = LOCAL_VAR (ip [1], stackval);
+                       frame->retval [0] = LOCAL_VAR (ip [1], stackval);
                        frame_data_allocator_pop (&context->data_stack, frame);
                        goto exit_frame;
                MINT_IN_CASE(MINT_RET_VOID_LOCALLOC)
                        frame_data_allocator_pop (&context->data_stack, frame);
                        goto exit_frame;
                MINT_IN_CASE(MINT_RET_VT_LOCALLOC) {
-                       memmove (frame->stack, locals + ip [1], ip [2]);
+                       memmove (frame->retval, locals + ip [1], ip [2]);
                        frame_data_allocator_pop (&context->data_stack, frame);
                        goto exit_frame;
                }
@@ -4705,24 +4715,21 @@ call:
                }
                MINT_IN_CASE(MINT_NEWOBJ_ARRAY) {
                        MonoClass *newobj_class;
-                       guint32 token = ip [2];
-                       guint16 param_count = ip [3];
+                       guint32 token = ip [3];
+                       guint16 param_count = ip [4];
 
                        newobj_class = (MonoClass*) frame->imethod->data_items [token];
 
-                       LOCAL_VAR (ip [1], MonoObject*) = ves_array_create (newobj_class, param_count, (stackval*)(locals + ip [1]), error);
+                       LOCAL_VAR (ip [1], MonoObject*) = ves_array_create (newobj_class, param_count, (stackval*)(locals + ip [2]), error);
                        if (!is_ok (error))
                                THROW_EX (mono_error_convert_to_exception (error), ip);
-                       ip += 4;
+                       ip += 5;
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_NEWOBJ_STRING) {
-                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [2]];
-                       call_args_offset = ip [1];
-
-                       int param_size = ip [3];
-                       if (param_size)
-                               memmove (locals + call_args_offset + MINT_STACK_SLOT_SIZE, locals + call_args_offset, param_size);
+                       cmethod = (InterpMethod*)frame->imethod->data_items [ip [3]];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
                        // `this` is implicit null. The created string will be returned
                        // by the call, even though the call has void return (?!).
@@ -4730,17 +4737,12 @@ call:
                        ip += 4;
                        goto call;
                }
-               MINT_IN_CASE(MINT_NEWOBJ_FAST) {
-                       MonoVTable *vtable = (MonoVTable*) frame->imethod->data_items [ip [3]];
+               MINT_IN_CASE(MINT_NEWOBJ) {
+                       MonoVTable *vtable = (MonoVTable*) frame->imethod->data_items [ip [4]];
                        INIT_VTABLE (vtable);
-                       guint16 imethod_index = ip [2];
-                       guint16 param_size = ip [4];
-                       call_args_offset = ip [1];
-                       const gboolean is_inlined = imethod_index == INLINED_METHOD_FLAG;
-
-                       // Make room for two copies of o -- this parameter and return value.
-                       if (param_size)
-                               memmove (locals + call_args_offset + 2 * MINT_STACK_SLOT_SIZE, locals + call_args_offset, param_size);
+                       guint16 imethod_index = ip [3];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
                        MonoObject *o = mono_gc_alloc_obj (vtable, m_class_get_instance_size (vtable->klass));
                        if (G_UNLIKELY (!o)) {
@@ -4749,51 +4751,63 @@ call:
                        }
 
                        // This is return value
-                       LOCAL_VAR (call_args_offset, MonoObject*) = o;
+                       LOCAL_VAR (return_offset, MonoObject*) = o;
                        // Set `this` arg for ctor call
-                       call_args_offset += MINT_STACK_SLOT_SIZE;
                        LOCAL_VAR (call_args_offset, MonoObject*) = o;
-                       ip += 6;
-                       if (!is_inlined) {
-                               cmethod = (InterpMethod*)frame->imethod->data_items [imethod_index];
-                               goto call;
-                       }
+                       ip += 5;
+
+                       cmethod = (InterpMethod*)frame->imethod->data_items [imethod_index];
+                       goto call;
                        MINT_IN_BREAK;
                }
+               MINT_IN_CASE(MINT_NEWOBJ_INLINED) {
+                       MonoVTable *vtable = (MonoVTable*) frame->imethod->data_items [ip [2]];
+                       INIT_VTABLE (vtable);
 
-               MINT_IN_CASE(MINT_NEWOBJ_VT_FAST) {
-                       guint16 imethod_index = ip [2];
-                       guint16 ret_size = ip [3];
-                       guint16 param_size = ip [4];
-                       gboolean is_inlined = imethod_index == INLINED_METHOD_FLAG;
-                       call_args_offset = ip [1];
-                       gpointer this_vt = locals + call_args_offset;
+                       MonoObject *o = mono_gc_alloc_obj (vtable, m_class_get_instance_size (vtable->klass));
+                       if (G_UNLIKELY (!o)) {
+                               mono_error_set_out_of_memory (error, "Could not allocate %i bytes", m_class_get_instance_size (vtable->klass));
+                               THROW_EX (mono_error_convert_to_exception (error), ip);
+                       }
 
-                       if (param_size)
-                               memmove (locals + call_args_offset + ret_size + MINT_STACK_SLOT_SIZE, locals + call_args_offset, param_size);
+                       // This is return value
+                       LOCAL_VAR (ip [1], MonoObject*) = o;
+                       ip += 3;
+                       MINT_IN_BREAK;
+               }
+
+               MINT_IN_CASE(MINT_NEWOBJ_VT) {
+                       guint16 imethod_index = ip [3];
+                       guint16 ret_size = ip [4];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
+                       gpointer this_vt = locals + return_offset;
 
                        // clear the valuetype
                        memset (this_vt, 0, ret_size);
-                       call_args_offset += ret_size;
                        // pass the address of the valuetype
                        LOCAL_VAR (call_args_offset, gpointer) = this_vt;
+                       ip += 5;
 
-                       ip += 6;
-                       if (!is_inlined) {
-                               cmethod = (InterpMethod*)frame->imethod->data_items [imethod_index];
-                               goto call;
-                       }
+                       cmethod = (InterpMethod*)frame->imethod->data_items [imethod_index];
+                       goto call;
                        MINT_IN_BREAK;
                }
-               MINT_IN_CASE(MINT_NEWOBJ) {
-                       guint32 const token = ip [2];
-                       guint16 param_size = ip [3];
-                       call_args_offset = ip [1];
+               MINT_IN_CASE(MINT_NEWOBJ_VT_INLINED) {
+                       guint16 ret_size = ip [3];
+                       gpointer this_vt = locals + ip [2];
 
-                       cmethod = (InterpMethod*)frame->imethod->data_items [token];
+                       memset (this_vt, 0, ret_size);
+                       LOCAL_VAR (ip [1], gpointer) = this_vt;
+                       ip += 4;
+                       MINT_IN_BREAK;
+               }
+               MINT_IN_CASE(MINT_NEWOBJ_SLOW) {
+                       guint32 const token = ip [3];
+                       return_offset = ip [1];
+                       call_args_offset = ip [2];
 
-                       if (param_size)
-                               memmove (locals + call_args_offset + 2 * MINT_STACK_SLOT_SIZE, locals + call_args_offset, param_size);
+                       cmethod = (InterpMethod*)frame->imethod->data_items [token];
 
                        MonoClass * const newobj_class = cmethod->method->klass;
 
@@ -4812,8 +4826,7 @@ call:
                        }
                        error_init_reuse (error);
                        MonoObject* o = mono_object_new_checked (newobj_class, error);
-                       LOCAL_VAR (call_args_offset, MonoObject*) = o; // return value
-                       call_args_offset += MINT_STACK_SLOT_SIZE;
+                       LOCAL_VAR (return_offset, MonoObject*) = o; // return value
                        LOCAL_VAR (call_args_offset, MonoObject*) = o; // first parameter
 
                        mono_interp_error_cleanup (error); // FIXME: do not swallow the error
@@ -5509,9 +5522,9 @@ call:
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_LDELEMA) {
-                       guint16 rank = ip [2];
-                       guint16 esize = ip [3];
-                       stackval *sp = (stackval*)(locals + ip [1]);
+                       guint16 rank = ip [3];
+                       guint16 esize = ip [4];
+                       stackval *sp = (stackval*)(locals + ip [2]);
 
                        MonoArray *ao = (MonoArray*) sp [0].data.o;
                        NULL_CHECK (ao);
@@ -5527,21 +5540,21 @@ call:
                                pos = (pos * len) + (guint32)(idx - lower);
                        }
 
-                       sp [0].data.p = mono_array_addr_with_size_fast (ao, esize, pos);
-                       ip += 4;
+                       LOCAL_VAR (ip [1], gpointer) = mono_array_addr_with_size_fast (ao, esize, pos);
+                       ip += 5;
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_LDELEMA_TC) {
-                       stackval *sp = (stackval*)(locals + ip [1]);
+                       stackval *sp = (stackval*)(locals + ip [2]);
 
                        MonoObject *o = (MonoObject*) sp [0].data.o;
                        NULL_CHECK (o);
 
-                       MonoClass *klass = (MonoClass*)frame->imethod->data_items [ip [2]];
-                       MonoException *ex = ves_array_element_address (frame, klass, (MonoArray *) o, sp + 1, TRUE);
+                       MonoClass *klass = (MonoClass*)frame->imethod->data_items [ip [3]];
+                       MonoException *ex = ves_array_element_address (frame, klass, (MonoArray *) o, (gpointer*)(locals + ip [1]), sp + 1, TRUE);
                        if (ex)
                                THROW_EX (ex, ip);
-                       ip += 3;
+                       ip += 4;
                        MINT_IN_BREAK;
                }
 
@@ -6099,25 +6112,31 @@ call:
                        MINT_IN_BREAK;
                }
                MINT_IN_CASE(MINT_ICALL_V_V) 
-               MINT_IN_CASE(MINT_ICALL_V_P)
                MINT_IN_CASE(MINT_ICALL_P_V) 
-               MINT_IN_CASE(MINT_ICALL_P_P)
                MINT_IN_CASE(MINT_ICALL_PP_V)
-               MINT_IN_CASE(MINT_ICALL_PP_P)
                MINT_IN_CASE(MINT_ICALL_PPP_V)
-               MINT_IN_CASE(MINT_ICALL_PPP_P)
                MINT_IN_CASE(MINT_ICALL_PPPP_V)
-               MINT_IN_CASE(MINT_ICALL_PPPP_P)
                MINT_IN_CASE(MINT_ICALL_PPPPP_V)
-               MINT_IN_CASE(MINT_ICALL_PPPPP_P)
                MINT_IN_CASE(MINT_ICALL_PPPPPP_V)
-               MINT_IN_CASE(MINT_ICALL_PPPPPP_P)
                        frame->state.ip = ip + 3;
-                       do_icall_wrapper (frame, NULL, *ip, (stackval*)(locals + ip [1]), frame->imethod->data_items [ip [2]], FALSE);
+                       do_icall_wrapper (frame, NULL, *ip, NULL, (stackval*)(locals + ip [1]), frame->imethod->data_items [ip [2]], FALSE);
                        EXCEPTION_CHECKPOINT_GC_UNSAFE;
                        CHECK_RESUME_STATE (context);
                        ip += 3;
                        MINT_IN_BREAK;
+               MINT_IN_CASE(MINT_ICALL_V_P)
+               MINT_IN_CASE(MINT_ICALL_P_P)
+               MINT_IN_CASE(MINT_ICALL_PP_P)
+               MINT_IN_CASE(MINT_ICALL_PPP_P)
+               MINT_IN_CASE(MINT_ICALL_PPPP_P)
+               MINT_IN_CASE(MINT_ICALL_PPPPP_P)
+               MINT_IN_CASE(MINT_ICALL_PPPPPP_P)
+                       frame->state.ip = ip + 4;
+                       do_icall_wrapper (frame, NULL, *ip, (stackval*)(locals + ip [1]), (stackval*)(locals + ip [2]), frame->imethod->data_items [ip [3]], FALSE);
+                       EXCEPTION_CHECKPOINT_GC_UNSAFE;
+                       CHECK_RESUME_STATE (context);
+                       ip += 4;
+                       MINT_IN_BREAK;
                MINT_IN_CASE(MINT_MONO_LDPTR) 
                        LOCAL_VAR (ip [1], gpointer) = frame->imethod->data_items [ip [2]];
                        ip += 3;
@@ -6406,9 +6425,9 @@ call:
                        int i32 = READ32 (ip + 3);
                        if (i32 == -1) {
                        } else if (i32) {
-                               memmove (frame->stack, locals + ip [1], i32);
+                               memmove (frame->retval, locals + ip [1], i32);
                        } else {
-                               frame->stack [0] = LOCAL_VAR (ip [1], stackval);
+                               frame->retval [0] = LOCAL_VAR (ip [1], stackval);
                        }
 
                        if ((flag & TRACING_FLAG) || ((flag & PROFILING_FLAG) && MONO_PROFILER_ENABLED (method_leave) &&
@@ -6417,7 +6436,7 @@ call:
                                prof_ctx->interp_frame = frame;
                                prof_ctx->method = frame->imethod->method;
                                if (i32 != -1)
-                                       prof_ctx->return_value = frame->stack;
+                                       prof_ctx->return_value = frame->retval;
                                if (flag & TRACING_FLAG)
                                        mono_trace_leave_method (frame->imethod->method, frame->imethod->jinfo, prof_ctx);
                                if (flag & PROFILING_FLAG)
@@ -6691,8 +6710,6 @@ resume:
                        /* spec says stack should be empty at endfinally so it should be at the start too */
                        locals = (guchar*)frame->stack;
                        g_assert (context->exc_gchandle);
-                       // Write the exception on to the first slot on the excecution stack
-                       LOCAL_VAR (frame->imethod->total_locals_size, MonoObject*) = mono_gchandle_get_target_internal (context->exc_gchandle);
 
                        clear_resume_state (context);
                        // goto main_loop instead of MINT_IN_DISPATCH helps the compiler and therefore conserves stack.
@@ -6875,19 +6892,20 @@ interp_run_filter (StackFrameInfo *frame, MonoException *ex, int clause_index, g
        child_frame.retval = &retval;
 
        /* Copy the stack frame of the original method */
-       memcpy (child_frame.stack, iframe->stack, iframe->imethod->total_locals_size);
+       memcpy (child_frame.stack, iframe->stack, iframe->imethod->locals_size);
+       // Write the exception object in its reserved stack slot
+       *((MonoException**)((char*)child_frame.stack + iframe->imethod->clause_data_offsets [clause_index])) = ex;
        context->stack_pointer += iframe->imethod->alloca_size;
 
        memset (&clause_args, 0, sizeof (FrameClauseArgs));
        clause_args.start_with_ip = (const guint16*)handler_ip;
        clause_args.end_at_ip = (const guint16*)handler_ip_end;
-       clause_args.filter_exception = ex;
        clause_args.exec_frame = &child_frame;
 
        interp_exec_method (&child_frame, context, &clause_args);
 
        /* Copy back the updated frame */
-       memcpy (iframe->stack, child_frame.stack, iframe->imethod->total_locals_size);
+       memcpy (iframe->stack, child_frame.stack, iframe->imethod->locals_size);
 
        context->stack_pointer = (guchar*)child_frame.stack;
 
index 04dfb90..03f5184 100644 (file)
@@ -10,6 +10,8 @@
 
 OPDEF(MINT_NOP, "nop", 1, 0, 0, MintOpNoArgs)
 OPDEF(MINT_NIY, "niy", 1, 0, 0, MintOpNoArgs)
+OPDEF(MINT_DEF, "def", 2, 1, 0, MintOpNoArgs)
+OPDEF(MINT_DUMMY_USE, "dummy_use", 2, 0, 1, MintOpNoArgs)
 OPDEF(MINT_BREAK, "break", 1, 0, 0, MintOpNoArgs)
 OPDEF(MINT_BREAKPOINT, "breakpoint", 1, 0, 0, MintOpNoArgs)
 OPDEF(MINT_LDNULL, "ldnull", 2, 1, 0, MintOpNoArgs)
@@ -294,11 +296,13 @@ OPDEF(MINT_JMP, "jmp", 2, 0, 0, MintOpMethodToken)
 
 OPDEF(MINT_ENDFILTER, "endfilter", 2, 0, 1, MintOpNoArgs)
 
-OPDEF(MINT_NEWOBJ, "newobj", 4, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_NEWOBJ_ARRAY, "newobj_array", 4, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_NEWOBJ_STRING, "newobj_string", 4, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_NEWOBJ_FAST, "newobj_fast", 6, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_NEWOBJ_VT_FAST, "newobj_vt_fast", 6, CallArgs, 0, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_SLOW, "newobj_slow", 4, 1, 1, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_ARRAY, "newobj_array", 5, 1, 1, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_STRING, "newobj_string", 4, 1, 1, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ, "newobj", 5, 1, 1, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_INLINED, "newobj_inlined", 3, 1, 0, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_VT, "newobj_vt", 5, 1, 1, MintOpMethodToken)
+OPDEF(MINT_NEWOBJ_VT_INLINED, "newobj_vt_inlined", 4, 1, 1, MintOpMethodToken)
 OPDEF(MINT_INITOBJ, "initobj", 3, 0, 1, MintOpShortInt)
 OPDEF(MINT_CASTCLASS, "castclass", 4, 1, 1, MintOpClassToken)
 OPDEF(MINT_ISINST, "isinst", 4, 1, 1, MintOpClassToken)
@@ -339,8 +343,8 @@ OPDEF(MINT_LDELEM_REF, "ldelem.ref", 4, 1, 2, MintOpNoArgs)
 OPDEF(MINT_LDELEM_VT, "ldelem.vt", 5, 1, 2, MintOpShortInt)
 
 OPDEF(MINT_LDELEMA1, "ldelema1", 5, 1, 2, MintOpShortInt)
-OPDEF(MINT_LDELEMA, "ldelema", 4, CallArgs, 0, MintOpTwoShorts)
-OPDEF(MINT_LDELEMA_TC, "ldelema.tc", 3, CallArgs, 0, MintOpTwoShorts)
+OPDEF(MINT_LDELEMA, "ldelema", 5, 1, 1, MintOpTwoShorts)
+OPDEF(MINT_LDELEMA_TC, "ldelema.tc", 4, 1, 1, MintOpTwoShorts)
 
 OPDEF(MINT_STELEM_I, "stelem.i", 4, 0, 3, MintOpNoArgs)
 OPDEF(MINT_STELEM_I1, "stelem.i1", 4, 0, 3, MintOpNoArgs)
@@ -605,34 +609,34 @@ OPDEF(MINT_ARRAY_ELEMENT_SIZE, "array_element_size", 3, 1, 1, MintOpNoArgs)
 OPDEF(MINT_ARRAY_IS_PRIMITIVE, "array_is_primitive", 3, 1, 1, MintOpNoArgs)
 
 /* Calls */
-OPDEF(MINT_CALL, "call", 3, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_CALLVIRT, "callvirt", 3, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_CALLVIRT_FAST, "callvirt.fast", 4, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_CALL_DELEGATE, "call.delegate", 4, CallArgs, 0, MintOpTwoShorts)
-OPDEF(MINT_CALLI, "calli", 4, CallArgs, 1, MintOpMethodToken)
-OPDEF(MINT_CALLI_NAT, "calli.nat", 7, CallArgs, 1, MintOpMethodToken)
-OPDEF(MINT_CALLI_NAT_DYNAMIC, "calli.nat.dynamic", 4, CallArgs, 1, MintOpMethodToken)
-OPDEF(MINT_CALLI_NAT_FAST, "calli.nat.fast", 5, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_CALL_VARARG, "call.vararg", 5, CallArgs, 0, MintOpMethodToken)
-OPDEF(MINT_CALLRUN, "callrun", 4, CallArgs, 0, MintOpNoArgs)
-
-OPDEF(MINT_ICALL_V_V, "mono_icall_v_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_V_P, "mono_icall_v_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_P_V, "mono_icall_p_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_P_P, "mono_icall_p_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PP_V, "mono_icall_pp_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PP_P, "mono_icall_pp_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPP_V, "mono_icall_ppp_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPP_P, "mono_icall_ppp_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPP_V, "mono_icall_pppp_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPP_P, "mono_icall_pppp_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPPP_V, "mono_icall_ppppp_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPPP_P, "mono_icall_ppppp_p", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPPPP_V, "mono_icall_pppppp_v", 3, CallArgs, 0, MintOpShortInt)
-OPDEF(MINT_ICALL_PPPPPP_P, "mono_icall_pppppp_p", 3, CallArgs, 0, MintOpShortInt)
+OPDEF(MINT_CALL, "call", 4, 1, 1, MintOpMethodToken)
+OPDEF(MINT_CALLVIRT, "callvirt", 4, 1, 1, MintOpMethodToken)
+OPDEF(MINT_CALLVIRT_FAST, "callvirt.fast", 5, 1, 1, MintOpMethodToken)
+OPDEF(MINT_CALL_DELEGATE, "call.delegate", 5, 1, 1, MintOpTwoShorts)
+OPDEF(MINT_CALLI, "calli", 5, 1, 2, MintOpMethodToken)
+OPDEF(MINT_CALLI_NAT, "calli.nat", 8, 1, 2, MintOpMethodToken)
+OPDEF(MINT_CALLI_NAT_DYNAMIC, "calli.nat.dynamic", 5, 1, 2, MintOpMethodToken)
+OPDEF(MINT_CALLI_NAT_FAST, "calli.nat.fast", 7, 1, 2, MintOpMethodToken)
+OPDEF(MINT_CALL_VARARG, "call.vararg", 6, 1, 1, MintOpMethodToken)
+OPDEF(MINT_CALLRUN, "callrun", 5, 1, 1, MintOpNoArgs)
+
+OPDEF(MINT_ICALL_V_V, "mono_icall_v_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_V_P, "mono_icall_v_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_P_V, "mono_icall_p_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_P_P, "mono_icall_p_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PP_V, "mono_icall_pp_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PP_P, "mono_icall_pp_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPP_V, "mono_icall_ppp_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPP_P, "mono_icall_ppp_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPP_V, "mono_icall_pppp_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPP_P, "mono_icall_pppp_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPPP_V, "mono_icall_ppppp_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPPP_P, "mono_icall_ppppp_p", 4, 1, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPPPP_V, "mono_icall_pppppp_v", 3, 0, 1, MintOpShortInt)
+OPDEF(MINT_ICALL_PPPPPP_P, "mono_icall_pppppp_p", 4, 1, 1, MintOpShortInt)
 // FIXME: MintOp
-OPDEF(MINT_JIT_CALL, "mono_jit_call", 3, CallArgs, 0, MintOpNoArgs)
-OPDEF(MINT_JIT_CALL2, "mono_jit_call2", 6, CallArgs, 0, MintOpNoArgs)
+OPDEF(MINT_JIT_CALL, "mono_jit_call", 4, 1, 1, MintOpNoArgs)
+OPDEF(MINT_JIT_CALL2, "mono_jit_call2", 7, 1, 1, MintOpNoArgs)
 
 OPDEF(MINT_MONO_LDPTR, "mono_ldptr", 3, 1, 0, MintOpShortInt)
 OPDEF(MINT_MONO_SGEN_THREAD_INFO, "mono_sgen_thread_info", 2, 1, 0, MintOpNoArgs)
index 262062e..f2ae6df 100644 (file)
@@ -60,7 +60,6 @@ typedef enum {
 #define MINT_IS_BINOP_CONDITIONAL_BRANCH(op) ((op) >= MINT_BEQ_I4 && (op) <= MINT_BLT_UN_R8_S)
 #define MINT_IS_CALL(op) ((op) >= MINT_CALL && (op) <= MINT_JIT_CALL)
 #define MINT_IS_PATCHABLE_CALL(op) ((op) >= MINT_CALL && (op) <= MINT_VCALL)
-#define MINT_IS_NEWOBJ(op) ((op) >= MINT_NEWOBJ && (op) <= MINT_NEWOBJ_MAGIC)
 #define MINT_IS_LDC_I4(op) ((op) >= MINT_LDC_I4_M1 && (op) <= MINT_LDC_I4)
 #define MINT_IS_UNOP(op) ((op) >= MINT_ADD1_I4 && (op) <= MINT_CEQ0_I4)
 #define MINT_IS_BINOP(op) ((op) >= MINT_ADD_I4 && (op) <= MINT_CLT_UN_R8)
@@ -68,6 +67,7 @@ typedef enum {
 #define MINT_IS_STFLD(op) ((op) >= MINT_STFLD_I1 && (op) <= MINT_STFLD_O)
 
 #define MINT_CALL_ARGS 2
+#define MINT_CALL_ARGS_SREG -2
 
 extern unsigned char const mono_interp_oplen[];
 extern int const mono_interp_op_dregs [];
index 848ad9c..f22cfb0 100644 (file)
@@ -377,12 +377,12 @@ realloc_stack (TransformData *td)
 }
 
 static int
-get_tos_offset (TransformData *td)
+get_stack_size (StackInfo *sp, int count)
 {
-       if (td->sp == td->stack)
-               return 0;
-       else
-               return td->sp [-1].offset + td->sp [-1].size;
+       int result = 0;
+       for (int i = 0; i < count; i++)
+               result += sp [i].size;
+       return result;
 }
 
 static MonoType*
@@ -423,38 +423,55 @@ create_interp_local_explicit (TransformData *td, MonoType *type, int size)
        td->locals [td->locals_size].indirects = 0;
        td->locals [td->locals_size].offset = -1;
        td->locals [td->locals_size].size = size;
+       td->locals [td->locals_size].live_start = -1;
+       td->locals [td->locals_size].bb_index = -1;
        td->locals_size++;
        return td->locals_size - 1;
 
 }
 
 static int
-create_interp_stack_local (TransformData *td, int type, MonoClass *k, int type_size, int offset)
+create_interp_stack_local (TransformData *td, int type, MonoClass *k, int type_size)
 {
        int local = create_interp_local_explicit (td, get_type_from_stack (type, k), type_size);
 
        td->locals [local].flags |= INTERP_LOCAL_FLAG_EXECUTION_STACK;
-       td->locals [local].stack_offset = offset;
        return local;
 }
 
 static void
-push_type_explicit (TransformData *td, int type, MonoClass *k, int type_size)
+ensure_stack (TransformData *td, int additional)
 {
-       int sp_height;
-       sp_height = td->sp - td->stack + 1;
-       if (sp_height > td->max_stack_height)
-               td->max_stack_height = sp_height;
-       if (sp_height > td->stack_capacity)
+       int current_height = td->sp - td->stack;
+       int new_height = current_height + additional;
+       if (new_height > td->stack_capacity)
                realloc_stack (td);
+       if (new_height > td->max_stack_height)
+               td->max_stack_height = new_height;
+}
+
+static void
+push_type_explicit (TransformData *td, int type, MonoClass *k, int type_size)
+{
+       ensure_stack (td, 1);
        td->sp->type = type;
        td->sp->klass = k;
        td->sp->flags = 0;
-       td->sp->offset = get_tos_offset (td);
-       td->sp->local = create_interp_stack_local (td, type, k, type_size, td->sp->offset);
+       td->sp->local = create_interp_stack_local (td, type, k, type_size);
        td->sp->size = ALIGN_TO (type_size, MINT_STACK_SLOT_SIZE);
-       if ((td->sp->size + td->sp->offset) > td->max_stack_size)
-               td->max_stack_size = td->sp->size + td->sp->offset;
+       td->sp++;
+}
+
+static void
+push_var (TransformData *td, int var_index)
+{
+       InterpLocal *var = &td->locals [var_index];
+       ensure_stack (td, 1);
+       td->sp->type = stack_type [var->mt];
+       td->sp->klass = mono_class_from_mono_type_internal (var->type);
+       td->sp->flags = 0;
+       td->sp->local = var_index;
+       td->sp->size = ALIGN_TO (var->size, MINT_STACK_SLOT_SIZE);
        td->sp++;
 }
 
@@ -482,7 +499,7 @@ static void
 set_type_and_local (TransformData *td, StackInfo *sp, MonoClass *klass, int type)
 {
        SET_TYPE (sp, type, klass);
-       sp->local = create_interp_stack_local (td, type, NULL, MINT_STACK_SLOT_SIZE, sp->offset);
+       sp->local = create_interp_stack_local (td, type, NULL, MINT_STACK_SLOT_SIZE);
 }
 
 static void
@@ -1168,19 +1185,23 @@ interp_generate_mae_throw (TransformData *td, MonoMethod *method, MonoMethod *ta
        push_simple_type (td, STACK_TYPE_I);
        interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
        td->last_ins->data [0] = get_data_item_index (td, method);
-       td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
 
        interp_add_ins (td, MINT_MONO_LDPTR);
        push_simple_type (td, STACK_TYPE_I);
        interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
        td->last_ins->data [0] = get_data_item_index (td, target_method);
-       td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
 
        td->sp -= 2;
+       int *call_args = (int*)mono_mempool_alloc (td->mempool, 3 * sizeof (int));
+       call_args [0] = td->sp [0].local;
+       call_args [1] = td->sp [1].local;
+       call_args [2] = -1;
+
        interp_add_ins (td, MINT_ICALL_PP_V);
-       interp_ins_set_dreg (td->last_ins, td->sp [0].local);
+       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
        td->last_ins->data [0] = get_data_item_index (td, (gpointer)info->func);
-
+       td->last_ins->info.call_args = call_args;
+       td->last_ins->flags |= INTERP_INST_FLAG_CALL;
 }
 
 static void
@@ -1189,11 +1210,10 @@ interp_generate_bie_throw (TransformData *td)
        MonoJitICallInfo *info = &mono_get_jit_icall_info ()->mono_throw_bad_image;
 
        interp_add_ins (td, MINT_ICALL_V_V);
-       // Allocate a dummy local to serve as dreg for this instruction
-       push_simple_type (td, STACK_TYPE_I4);
-       td->sp--;
-       interp_ins_set_dreg (td->last_ins, td->sp [0].local);
+       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
        td->last_ins->data [0] = get_data_item_index (td, (gpointer)info->func);
+       td->last_ins->info.call_args = NULL;
+       td->last_ins->flags |= INTERP_INST_FLAG_CALL;
 }
 
 static void
@@ -1202,11 +1222,10 @@ interp_generate_not_supported_throw (TransformData *td)
        MonoJitICallInfo *info = &mono_get_jit_icall_info ()->mono_throw_not_supported;
 
        interp_add_ins (td, MINT_ICALL_V_V);
-       // Allocate a dummy local to serve as dreg for this instruction
-       push_simple_type (td, STACK_TYPE_I4);
-       td->sp--;
-       interp_ins_set_dreg (td->last_ins, td->sp [0].local);
+       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
        td->last_ins->data [0] = get_data_item_index (td, (gpointer)info->func);
+       td->last_ins->info.call_args = NULL;
+       td->last_ins->flags |= INTERP_INST_FLAG_CALL;
 }
 
 static void
@@ -1232,13 +1251,18 @@ interp_generate_ipe_throw_with_msg (TransformData *td, MonoError *error_msg)
        interp_add_ins (td, MINT_MONO_LDPTR);
        push_simple_type (td, STACK_TYPE_I);
        interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-       td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
        td->last_ins->data [0] = get_data_item_index (td, msg);
 
        td->sp -= 1;
+       int *call_args = (int*)mono_mempool_alloc (td->mempool, 2 * sizeof (int));
+       call_args [0] = td->sp [0].local;
+       call_args [1] = -1;
+
        interp_add_ins (td, MINT_ICALL_P_V);
-       interp_ins_set_dreg (td->last_ins, td->sp [0].local);
+       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
        td->last_ins->data [0] = get_data_item_index (td, (gpointer)info->func);
+       td->last_ins->info.call_args = call_args;
+       td->last_ins->flags |= INTERP_INST_FLAG_CALL;
 }
 
 static int
@@ -1252,37 +1276,28 @@ create_interp_local (TransformData *td, MonoType *type)
        return create_interp_local_explicit (td, type, size);
 }
 
+// Allocates var at the offset that tos points to, also updating it.
 static int
-get_interp_local_offset (TransformData *td, int local, gboolean resolve_stack_locals)
+alloc_var_offset (TransformData *td, int local, gint32 *ptos)
 {
-       // FIXME MINT_PROF_EXIT when void
-       if (local == -1)
-               return -1;
+       int size, offset;
 
-       if ((td->locals [local].flags & INTERP_LOCAL_FLAG_EXECUTION_STACK) && !resolve_stack_locals)
-               return -1;
+       offset = *ptos;
+       size = td->locals [local].size;
 
-       if (td->locals [local].offset != -1)
-               return td->locals [local].offset;
+       td->locals [local].offset = offset;
 
-       if (td->locals [local].flags & INTERP_LOCAL_FLAG_EXECUTION_STACK) {
-               td->locals [local].offset = td->total_locals_size + td->locals [local].stack_offset;
-       } else {
-               int size, offset;
-
-               offset = td->total_locals_size;
-               size = td->locals [local].size;
-
-               td->locals [local].offset = offset;
-
-               td->total_locals_size = ALIGN_TO (offset + size, MINT_STACK_SLOT_SIZE);
-       }
-
-       //g_assert (td->total_locals_size < G_MAXUINT16);
+       *ptos = ALIGN_TO (offset + size, MINT_STACK_SLOT_SIZE);
 
        return td->locals [local].offset;
 }
 
+static int
+alloc_global_var_offset (TransformData *td, int var)
+{
+       return alloc_var_offset (td, var, &td->total_locals_size);
+}
+
 /*
  * ins_offset is the associated offset of this instruction
  * if ins is null, it means the data belongs to an instruction that was
@@ -1384,9 +1399,7 @@ dump_interp_compacted_ins (const guint16 *ip, const guint16 *start)
        g_print ("IR_%04x: %-14s", ins_offset, mono_interp_opname (opcode));
        ip++;
 
-        if (mono_interp_op_dregs [opcode] == MINT_CALL_ARGS)
-                g_print (" [call_args %d <-", *ip++);
-        else if (mono_interp_op_dregs [opcode] > 0)
+        if (mono_interp_op_dregs [opcode] > 0)
                 g_print (" [%d <-", *ip++);
         else
                 g_print (" [nil <-");
@@ -1419,20 +1432,30 @@ dump_interp_inst_no_newline (InterpInst *ins)
        int opcode = ins->opcode;
        g_print ("IL_%04x: %-14s", ins->il_offset, mono_interp_opname (opcode));
 
-        if (mono_interp_op_dregs [opcode] == MINT_CALL_ARGS)
-                g_print (" [call_args %d <-", ins->dreg);
-        else if (mono_interp_op_dregs [opcode] > 0)
-                g_print (" [%d <-", ins->dreg);
-        else
-                g_print (" [nil <-");
-
-        if (mono_interp_op_sregs [opcode] > 0) {
-                for (int i = 0; i < mono_interp_op_sregs [opcode]; i++)
-                        g_print (" %d", ins->sregs [i]);
-                g_print ("],");
-        } else {
-                g_print (" nil],");
-        }
+       if (mono_interp_op_dregs [opcode] > 0)
+               g_print (" [%d <-", ins->dreg);
+       else
+               g_print (" [nil <-");
+
+       if (mono_interp_op_sregs [opcode] > 0) {
+               for (int i = 0; i < mono_interp_op_sregs [opcode]; i++) {
+                       if (ins->sregs [i] == MINT_CALL_ARGS_SREG) {
+                               g_print (" c:");
+                               int *call_args = ins->info.call_args;
+                               if (call_args) {
+                                       while (*call_args != -1) {
+                                               g_print (" %d", *call_args);
+                                               call_args++;
+                                       }
+                               }
+                       } else {
+                               g_print (" %d", ins->sregs [i]);
+                       }
+               }
+               g_print ("],");
+       } else {
+               g_print (" nil],");
+       }
 
        if (opcode == MINT_LDLOCA_S) {
                // LDLOCA has special semantics, it has data in sregs [0], but it doesn't have any sregs
@@ -1713,7 +1736,6 @@ interp_emit_ldelema (TransformData *td, MonoClass *array_class, MonoClass *check
        MonoClass *element_class = m_class_get_element_class (array_class);
        int rank = m_class_get_rank (array_class);
        int size = mono_class_array_element_size (element_class);
-       gboolean call_args = FALSE;
 
        gboolean bounded = m_class_get_byval_arg (array_class) ? m_class_get_byval_arg (array_class)->type == MONO_TYPE_ARRAY : FALSE;
 
@@ -1727,25 +1749,33 @@ interp_emit_ldelema (TransformData *td, MonoClass *array_class, MonoClass *check
                        td->last_ins->data [0] = size;
                } else {
                        interp_add_ins (td, MINT_LDELEMA);
-                       for (int i = 0; i < rank + 1; i++)
-                               td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                       int *call_args = (int*)mono_mempool_alloc (td->mempool, (rank + 2) * sizeof (int));
+                       for (int i = 0; i < rank + 1; i++) {
+                               call_args [i] = td->sp [i].local;
+                       }
+                       call_args [rank + 1] = -1;
                        td->last_ins->data [0] = rank;
                        g_assert (size < G_MAXUINT16);
                        td->last_ins->data [1] = size;
-                       call_args = TRUE;
+                       td->last_ins->info.call_args = call_args;
+                       td->last_ins->flags |= INTERP_INST_FLAG_CALL;
                }
        } else {
                interp_add_ins (td, MINT_LDELEMA_TC);
-               for (int i = 0; i < rank + 1; i++)
-                       td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+               int *call_args = (int*)mono_mempool_alloc (td->mempool, (rank + 2) * sizeof (int));
+               for (int i = 0; i < rank + 1; i++) {
+                       call_args [i] = td->sp [i].local;
+               }
+               call_args [rank + 1] = -1;
                td->last_ins->data [0] = get_data_item_index (td, check_class);
-               call_args = TRUE;
+               td->last_ins->info.call_args = call_args;
+               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
        }
 
        push_simple_type (td, STACK_TYPE_MP);
        interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-       if (call_args)
-               td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
 }
 
 static gboolean
@@ -2769,6 +2799,101 @@ interp_inline_method (TransformData *td, MonoMethod *target_method, MonoMethodHe
        return ret;
 }
 
+static gboolean
+interp_inline_newobj (TransformData *td, MonoMethod *target_method, MonoMethodSignature *csignature, int ret_mt, StackInfo *sp_params, gboolean is_protected)
+{
+       ERROR_DECL(error);
+       InterpInst *newobj_fast, *prev_last_ins;
+       int dreg, this_reg = -1;
+       int prev_sp_offset;
+       MonoClass *klass = target_method->klass;
+
+       if (!(mono_interp_opt & INTERP_OPT_INLINE) ||
+                       !interp_method_check_inlining (td, target_method, csignature))
+               return FALSE;
+
+       if (mono_class_has_finalizer (klass) ||
+                       m_class_has_weak_fields (klass))
+               return FALSE;
+
+       prev_last_ins = td->cbb->last_ins;
+       prev_sp_offset = td->sp - td->stack;
+
+       // Allocate var holding the newobj result. We do it here, because the var has to be alive
+       // before the call, since newobj writes to it before executing the call.
+       gboolean is_vt = m_class_is_valuetype (klass);
+       int vtsize = 0;
+       if (is_vt) {
+               if (ret_mt == MINT_TYPE_VT)
+                       vtsize = mono_class_value_size (klass, NULL);
+               else
+                       vtsize = MINT_STACK_SLOT_SIZE;
+
+               dreg = create_interp_stack_local (td, stack_type [ret_mt], klass, vtsize);
+
+               // For valuetypes, we need to control the lifetime of the valuetype.
+               // MINT_NEWOBJ_VT_INLINED takes the address of this reg and we should keep
+               // the vt alive until the inlining is completed.
+               interp_add_ins (td, MINT_DEF);
+               interp_ins_set_dreg (td->last_ins, dreg);
+       } else {
+               dreg = create_interp_stack_local (td, stack_type [ret_mt], klass, MINT_STACK_SLOT_SIZE);
+       }
+
+       // Allocate `this` pointer
+       if (is_vt) {
+               push_simple_type (td, STACK_TYPE_I);
+               this_reg = td->sp [-1].local;
+       } else {
+               push_var (td, dreg);
+       }
+
+       // Push back the params to top of stack. The original vars are maintained.
+       ensure_stack (td, csignature->param_count);
+       memcpy (td->sp, sp_params, sizeof (StackInfo) * csignature->param_count);
+       td->sp += csignature->param_count;
+
+       if (is_vt) {
+               // Receives the valuetype allocated with MINT_DEF, and returns its address
+               newobj_fast = interp_add_ins (td, MINT_NEWOBJ_VT_INLINED);
+               interp_ins_set_dreg (newobj_fast, this_reg);
+               interp_ins_set_sreg (newobj_fast, dreg);
+               newobj_fast->data [0] = ALIGN_TO (vtsize, MINT_STACK_SLOT_SIZE);
+       } else {
+               MonoVTable *vtable = mono_class_vtable_checked (klass, error);
+               goto_if_nok (error, fail);
+               newobj_fast = interp_add_ins (td, MINT_NEWOBJ_INLINED);
+               interp_ins_set_dreg (newobj_fast, dreg);
+               newobj_fast->data [0] = get_data_item_index (td, vtable);
+       }
+
+       if (is_protected)
+               newobj_fast->flags |= INTERP_INST_FLAG_PROTECTED_NEWOBJ;
+
+       MonoMethodHeader *mheader = interp_method_get_header (target_method, error);
+       goto_if_nok (error, fail);
+
+       if (!interp_inline_method (td, target_method, mheader, error))
+               goto fail;
+
+       if (is_vt) {
+               interp_add_ins (td, MINT_DUMMY_USE);
+               interp_ins_set_sreg (td->last_ins, dreg);
+       }
+
+       push_var (td, dreg);
+       return TRUE;
+fail:
+       // Restore the state
+       td->sp = td->stack + prev_sp_offset;
+       td->last_ins = prev_last_ins;
+       td->cbb->last_ins = prev_last_ins;
+       if (td->last_ins)
+               td->last_ins->next = NULL;
+
+       return FALSE;
+}
+
 static void
 interp_constrained_box (TransformData *td, MonoClass *constrained_class, MonoMethodSignature *csignature, MonoError *error)
 {
@@ -2799,6 +2924,55 @@ interp_get_method (MonoMethod *method, guint32 token, MonoImage *image, MonoGene
                return (MonoMethod *)mono_method_get_wrapper_data (method, token);
 }
 
+/*
+ * emit_convert:
+ *
+ *   Emit some implicit conversions which are not part of the .net spec, but are allowed by MS.NET.
+ */
+static void
+emit_convert (TransformData *td, StackInfo *sp, MonoType *target_type)
+{
+       int stype = sp->type;
+       target_type = mini_get_underlying_type (target_type);
+
+       // FIXME: Add more
+       switch (target_type->type) {
+       case MONO_TYPE_I8: {
+               switch (stype) {
+               case STACK_TYPE_I4:
+                       interp_add_conv (td, sp, NULL, STACK_TYPE_I8, MINT_CONV_I8_I4);
+                       break;
+               default:
+                       break;
+               }
+               break;
+       }
+#if SIZEOF_VOID_P == 8
+       case MONO_TYPE_I:
+       case MONO_TYPE_U: {
+               switch (stype) {
+               case STACK_TYPE_I4:
+                       interp_add_conv (td, sp, NULL, STACK_TYPE_I8, MINT_CONV_I8_U4);
+                       break;
+               default:
+                       break;
+               }
+       }
+#endif
+       default:
+               break;
+       }
+}
+
+static void
+interp_emit_arg_conv (TransformData *td, MonoMethodSignature *csignature)
+{
+       StackInfo *arg_start = td->sp - csignature->param_count;
+
+       for (int i = 0; i < csignature->param_count; i++)
+               emit_convert (td, &arg_start [i], csignature->params [i]);
+}
+
 /* Return FALSE if error, including inline failure */
 static gboolean
 interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target_method, MonoGenericContext *generic_context, MonoClass *constrained_class, gboolean readonly, MonoError *error, gboolean check_visibility, gboolean save_last_error, gboolean tailcall)
@@ -3010,19 +3184,18 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
        if (calli) {
                --td->sp;
                fp_sreg = td->sp [0].local;
-               td->locals [fp_sreg].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
        }
 
-       guint32 tos_offset = get_tos_offset (td);
-       td->sp -= csignature->param_count + !!csignature->hasthis;
-       guint32 params_stack_size = tos_offset - get_tos_offset (td);
+       interp_emit_arg_conv (td, csignature);
 
-       if (op == -1 || mono_interp_op_dregs [op] == MINT_CALL_ARGS) {
-               // We must not optimize out these locals, storing to them is part of the interp call convention
-               // unless we already intrinsified this call
-               for (int i = 0; i < (csignature->param_count + !!csignature->hasthis); i++)
-                       td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
-       }
+       int num_args = csignature->param_count + !!csignature->hasthis;
+       td->sp -= num_args;
+       guint32 params_stack_size = get_stack_size (td->sp, num_args);
+
+       int *call_args = (int*) mono_mempool_alloc (td->mempool, (num_args + 1) * sizeof (int));
+       for (int i = 0; i < num_args; i++)
+               call_args [i] = td->sp [i].local;
+       call_args [num_args] = -1;
 
        // We overwrite it with the return local, save it for future use
        if (csignature->param_count || csignature->hasthis)
@@ -3049,13 +3222,10 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                        res_size = MINT_STACK_SLOT_SIZE;
                }
                dreg = td->sp [-1].local;
-               if (op == -1 || mono_interp_op_dregs [op] == MINT_CALL_ARGS) {
-                       // This dreg needs to be at the same offset as the call args
-                       td->locals [dreg].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
-               }
        } else {
                // Create a new dummy local to serve as the dreg of the call
-               // This dreg is only used to resolve the call args offset
+               // FIXME Consider adding special dreg type (ex -1), that is
+               // resolved to null offset. The opcode shouldn't really write to it
                push_simple_type (td, STACK_TYPE_I4);
                td->sp--;
                dreg = td->sp [0].local;
@@ -3089,12 +3259,15 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
        } else if (!calli && !is_delegate_invoke && !is_virtual && mono_interp_jit_call_supported (target_method, csignature)) {
                interp_add_ins (td, MINT_JIT_CALL);
                interp_ins_set_dreg (td->last_ins, dreg);
+               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
                td->last_ins->data [0] = get_data_item_index (td, (void *)mono_interp_get_imethod (target_method, error));
                mono_error_assert_ok (error);
        } else {
                if (is_delegate_invoke) {
                        interp_add_ins (td, MINT_CALL_DELEGATE);
                        interp_ins_set_dreg (td->last_ins, dreg);
+                       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
                        td->last_ins->data [0] = params_stack_size;
                        td->last_ins->data [1] = get_data_item_index (td, (void *)csignature);
                } else if (calli) {
@@ -3109,13 +3282,14 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                        if (op != -1) {
                                interp_add_ins (td, MINT_CALLI_NAT_FAST);
                                interp_ins_set_dreg (td->last_ins, dreg);
+                               interp_ins_set_sregs2 (td->last_ins, fp_sreg, MINT_CALL_ARGS_SREG);
                                td->last_ins->data [0] = get_data_item_index (td, (void *)csignature);
                                td->last_ins->data [1] = op;
                                td->last_ins->data [2] = save_last_error;
                        } else if (native && method->dynamic && csignature->pinvoke) {
                                interp_add_ins (td, MINT_CALLI_NAT_DYNAMIC);
                                interp_ins_set_dreg (td->last_ins, dreg);
-                               interp_ins_set_sreg (td->last_ins, fp_sreg);
+                               interp_ins_set_sregs2 (td->last_ins, fp_sreg, MINT_CALL_ARGS_SREG);
                                td->last_ins->data [0] = get_data_item_index (td, (void *)csignature);
                        } else if (native) {
                                interp_add_ins (td, MINT_CALLI_NAT);
@@ -3140,7 +3314,7 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                                }
 
                                interp_ins_set_dreg (td->last_ins, dreg);
-                               interp_ins_set_sreg (td->last_ins, fp_sreg);
+                               interp_ins_set_sregs2 (td->last_ins, fp_sreg, MINT_CALL_ARGS_SREG);
                                td->last_ins->data [0] = get_data_item_index (td, csignature);
                                td->last_ins->data [1] = get_data_item_index (td, imethod);
                                td->last_ins->data [2] = save_last_error;
@@ -3149,7 +3323,7 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                        } else {
                                interp_add_ins (td, MINT_CALLI);
                                interp_ins_set_dreg (td->last_ins, dreg);
-                               interp_ins_set_sreg (td->last_ins, fp_sreg);
+                               interp_ins_set_sregs2 (td->last_ins, fp_sreg, MINT_CALL_ARGS_SREG);
                                td->last_ins->data [0] = get_data_item_index (td, (void *)csignature);
                        }
                } else {
@@ -3172,6 +3346,7 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                                interp_add_ins (td, MINT_CALL);
                        }
                        interp_ins_set_dreg (td->last_ins, dreg);
+                       interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
                        td->last_ins->data [0] = get_data_item_index (td, (void *)imethod);
 
 #ifdef ENABLE_EXPERIMENT_TIERED
@@ -3182,8 +3357,10 @@ interp_transform_call (TransformData *td, MonoMethod *method, MonoMethod *target
                        }
 #endif
                }
+               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
        }
        td->ip += 5;
+       td->last_ins->info.call_args = call_args;
 
        return TRUE;
 }
@@ -3597,16 +3774,16 @@ interp_method_compute_offsets (TransformData *td, InterpMethod *imethod, MonoMet
        for (i = 0; i < num_args; i++) {
                MonoType *type;
                if (sig->hasthis && i == 0)
-                       type = m_class_get_byval_arg (td->method->klass);
+                       type = m_class_is_valuetype (td->method->klass) ? m_class_get_this_arg (td->method->klass) : m_class_get_byval_arg (td->method->klass);
                else
                        type = mono_method_signature_internal (td->method)->params [i - sig->hasthis];
                int mt = mint_type (type);
                td->locals [i].type = type;
                td->locals [i].offset = offset;
-               td->locals [i].flags = 0;
+               td->locals [i].flags = INTERP_LOCAL_FLAG_GLOBAL;
                td->locals [i].indirects = 0;
                td->locals [i].mt = mt;
-               if (mt == MINT_TYPE_VT && (!sig->hasthis || i != 0)) {
+               if (mt == MINT_TYPE_VT) {
                        size = mono_type_size (type, &align);
                        td->locals [i].size = size;
                        offset += ALIGN_TO (size, MINT_STACK_SLOT_SIZE);
@@ -3631,7 +3808,7 @@ interp_method_compute_offsets (TransformData *td, InterpMethod *imethod, MonoMet
                imethod->local_offsets [i] = offset;
                td->locals [index].type = header->locals [i];
                td->locals [index].offset = offset;
-               td->locals [index].flags = 0;
+               td->locals [index].flags = INTERP_LOCAL_FLAG_GLOBAL;
                td->locals [index].indirects = 0;
                td->locals [index].mt = mint_type (header->locals [i]);
                if (td->locals [index].mt == MINT_TYPE_VT)
@@ -3643,16 +3820,17 @@ interp_method_compute_offsets (TransformData *td, InterpMethod *imethod, MonoMet
        }
        offset = ALIGN_TO (offset, MINT_VT_ALIGNMENT);
        td->il_locals_size = offset - td->il_locals_offset;
+       td->total_locals_size = offset;
 
        imethod->clause_data_offsets = (guint32*)g_malloc (header->num_clauses * sizeof (guint32));
+       td->clause_vars = (int*)mono_mempool_alloc (td->mempool, sizeof (int) * header->num_clauses);
        for (i = 0; i < header->num_clauses; i++) {
-               imethod->clause_data_offsets [i] = offset;
-               offset += sizeof (MonoObject*);
+               int var = create_interp_local (td, mono_get_object_type ());
+               td->locals [var].flags |= INTERP_LOCAL_FLAG_GLOBAL;
+               alloc_global_var_offset (td, var);
+               imethod->clause_data_offsets [i] = td->locals [var].offset;
+               td->clause_vars [i] = var;
        }
-       offset = ALIGN_TO (offset, MINT_VT_ALIGNMENT);
-
-       //g_assert (offset < G_MAXUINT16);
-       td->total_locals_size = offset;
 }
 
 void
@@ -3795,37 +3973,6 @@ interp_emit_load_const (TransformData *td, gpointer field_addr, int mt)
        return TRUE;
 }
 
-/*
- * emit_convert:
- *
- *   Emit some implicit conversions which are not part of the .net spec, but are allowed by MS.NET.
- */
-static void
-emit_convert (TransformData *td, int stype, MonoType *ftype)
-{
-       ftype = mini_get_underlying_type (ftype);
-
-       // FIXME: Add more
-       switch (ftype->type) {
-       case MONO_TYPE_I8: {
-               switch (stype) {
-               case STACK_TYPE_I4:
-                       td->sp--;
-                       interp_add_ins (td, MINT_CONV_I8_I4);
-                       interp_ins_set_sreg (td->last_ins, td->sp [0].local);
-                       push_simple_type (td, STACK_TYPE_I8);
-                       interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-                       break;
-               default:
-                       break;
-               }
-               break;
-       }
-       default:
-               break;
-       }
-}
-
 static void
 interp_emit_sfld_access (TransformData *td, MonoClassField *field, MonoClass *field_class, int mt, gboolean is_load, MonoError *error)
 {
@@ -3949,8 +4096,7 @@ initialize_clause_bblocks (TransformData *td)
                        bb->stack_state [0].type = STACK_TYPE_O;
                        bb->stack_state [0].klass = NULL; /*FIX*/
                        bb->stack_state [0].size = MINT_STACK_SLOT_SIZE;
-                       bb->stack_state [0].offset = 0;
-                       bb->stack_state [0].local = create_interp_stack_local (td, STACK_TYPE_O, NULL, MINT_STACK_SLOT_SIZE, 0);
+                       bb->stack_state [0].local = td->clause_vars [i];
                }
 
                if (c->flags == MONO_EXCEPTION_CLAUSE_FILTER) {
@@ -3962,8 +4108,7 @@ initialize_clause_bblocks (TransformData *td)
                        bb->stack_state [0].type = STACK_TYPE_O;
                        bb->stack_state [0].klass = NULL; /*FIX*/
                        bb->stack_state [0].size = MINT_STACK_SLOT_SIZE;
-                       bb->stack_state [0].offset = 0;
-                       bb->stack_state [0].local = create_interp_stack_local (td, STACK_TYPE_O, NULL, MINT_STACK_SLOT_SIZE, 0);
+                       bb->stack_state [0].local = td->clause_vars [i];
                } else if (c->flags == MONO_EXCEPTION_CLAUSE_NONE) {
                        /*
                         * JIT doesn't emit sdb seq intr point at the start of catch clause, probably
@@ -4033,6 +4178,17 @@ handle_stelem (TransformData *td, int op)
 }
 
 static gboolean
+is_ip_protected (MonoMethodHeader *header, int offset)
+{
+       for (int i = 0; i < header->num_clauses; i++) {
+               MonoExceptionClause *clause = &header->clauses [i];
+               if (clause->try_offset <= offset && offset < (clause->try_offset + clause->try_len))
+                       return TRUE;
+       }
+       return FALSE;
+}
+
+static gboolean
 generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header, MonoGenericContext *generic_context, MonoError *error)
 {
        int target;
@@ -4600,6 +4756,12 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                }
                case CEE_RET: {
                        link_bblocks = FALSE;
+                       MonoType *ult = mini_type_get_underlying_type (signature->ret);
+                       if (ult->type != MONO_TYPE_VOID) {
+                               // Convert stack contents to return type if necessary
+                               CHECK_STACK (td, 1);
+                               emit_convert (td, td->sp - 1, ult);
+                       }
                        /* Return from inlined method, return value is on top of stack */
                        if (inlining) {
                                td->ip++;
@@ -4612,9 +4774,7 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                        }
 
                        int vt_size = 0;
-                       MonoType *ult = mini_type_get_underlying_type (signature->ret);
                        if (ult->type != MONO_TYPE_VOID) {
-                               CHECK_STACK (td, 1);
                                --td->sp;
                                if (mint_type (ult) == MINT_TYPE_VT) {
                                        MonoClass *klass = mono_class_from_mono_type_internal (ult);
@@ -5308,6 +5468,7 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                case CEE_NEWOBJ: {
                        MonoMethod *m;
                        MonoMethodSignature *csignature;
+                       gboolean is_protected = is_ip_protected (header, td->ip - header->code);
 
                        td->ip++;
                        token = read32 (td->ip);
@@ -5349,30 +5510,40 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                        interp_add_conv (td, td->sp - 1, NULL, stack_type [ret_mt], MINT_CONV_OVF_I4_I8);
 #endif
                        } else if (m_class_get_parent (klass) == mono_defaults.array_class) {
+                               int *call_args = (int*)mono_mempool_alloc (td->mempool, (csignature->param_count + 1) * sizeof (int));
                                td->sp -= csignature->param_count;
-                               for (int i = 0; i < csignature->param_count; i++)
-                                       td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               for (int i = 0; i < csignature->param_count; i++) {
+                                       call_args [i] = td->sp [i].local;
+                               }
+                               call_args [csignature->param_count] = -1;
 
                                interp_add_ins (td, MINT_NEWOBJ_ARRAY);
                                td->last_ins->data [0] = get_data_item_index (td, m->klass);
                                td->last_ins->data [1] = csignature->param_count;
                                push_type (td, stack_type [ret_mt], klass);
                                interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-                               td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
+                               td->last_ins->info.call_args = call_args;
                        } else if (klass == mono_defaults.string_class) {
-                               guint32 tos_offset = get_tos_offset (td);
+                               int *call_args = (int*)mono_mempool_alloc (td->mempool, (csignature->param_count + 2) * sizeof (int));
                                td->sp -= csignature->param_count;
-                               guint32 params_stack_size = tos_offset - get_tos_offset (td);
 
-                               for (int i = 0; i < csignature->param_count; i++)
-                                       td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               // First arg is dummy var, it is null when passed to the ctor
+                               call_args [0] = create_interp_stack_local (td, stack_type [ret_mt], NULL, MINT_STACK_SLOT_SIZE);
+                               for (int i = 0; i < csignature->param_count; i++) {
+                                       call_args [i + 1] = td->sp [i].local;
+                               }
+                               call_args [csignature->param_count + 1] = -1;
 
                                interp_add_ins (td, MINT_NEWOBJ_STRING);
                                td->last_ins->data [0] = get_data_item_index (td, mono_interp_get_imethod (m, error));
-                               td->last_ins->data [1] = params_stack_size;
                                push_type (td, stack_type [ret_mt], klass);
+
                                interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-                               td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
+                               td->last_ins->info.call_args = call_args;
                        } else if (m_class_get_image (klass) == mono_defaults.corlib &&
                                        !strcmp (m_class_get_name (m->klass), "ByReference`1") &&
                                        !strcmp (m->name, ".ctor")) {
@@ -5397,19 +5568,14 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                push_type_vt (td, klass, mono_class_value_size (klass, NULL));
                                interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
                        } else {
-                               guint32 tos_offset = get_tos_offset (td);
                                td->sp -= csignature->param_count;
-                               guint32 params_stack_size = tos_offset - get_tos_offset (td);
 
                                // Move params types in temporary buffer
-                               // FIXME stop leaking sp_params
-                               StackInfo *sp_params = (StackInfo*) g_malloc (sizeof (StackInfo) * csignature->param_count);
+                               StackInfo *sp_params = (StackInfo*) mono_mempool_alloc (td->mempool, sizeof (StackInfo) * csignature->param_count);
                                memcpy (sp_params, td->sp, sizeof (StackInfo) * csignature->param_count);
 
-                               // We must not optimize out these locals, storing to them is part of the interp call convention
-                               // FIXME this affects inlining efficiency. We need to first remove the param moving by NEWOBJ
-                               for (int i = 0; i < csignature->param_count; i++)
-                                       td->locals [sp_params [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               if (interp_inline_newobj (td, m, csignature, ret_mt, sp_params, is_protected))
+                                       break;
 
                                // Push the return value and `this` argument to the ctor
                                gboolean is_vt = m_class_is_valuetype (klass);
@@ -5426,63 +5592,53 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                        push_type (td, stack_type [ret_mt], klass);
                                }
                                int dreg = td->sp [-2].local;
-                               td->locals [dreg].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
 
-                               // Push back the params to top of stack
-                               push_types (td, sp_params, csignature->param_count);
+                               // Push back the params to top of stack. The original vars are maintained.
+                               ensure_stack (td, csignature->param_count);
+                               memcpy (td->sp, sp_params, sizeof (StackInfo) * csignature->param_count);
+                               td->sp += csignature->param_count;
 
                                if (!mono_class_has_finalizer (klass) &&
                                        !m_class_has_weak_fields (klass)) {
                                        InterpInst *newobj_fast;
 
                                        if (is_vt) {
-                                               newobj_fast = interp_add_ins (td, MINT_NEWOBJ_VT_FAST);
+                                               newobj_fast = interp_add_ins (td, MINT_NEWOBJ_VT);
                                                interp_ins_set_dreg (newobj_fast, dreg);
                                                newobj_fast->data [1] = ALIGN_TO (vtsize, MINT_STACK_SLOT_SIZE);
                                        } else {
                                                MonoVTable *vtable = mono_class_vtable_checked (klass, error);
                                                goto_if_nok (error, exit);
-                                               newobj_fast = interp_add_ins (td, MINT_NEWOBJ_FAST);
+                                               newobj_fast = interp_add_ins (td, MINT_NEWOBJ);
                                                interp_ins_set_dreg (newobj_fast, dreg);
                                                newobj_fast->data [1] = get_data_item_index (td, vtable);
                                        }
-                                       // FIXME remove these once we have our own local offset allocator, even for execution stack locals
-                                       newobj_fast->data [2] = params_stack_size;
-                                       newobj_fast->data [3] = csignature->param_count;
-
-                                       if ((mono_interp_opt & INTERP_OPT_INLINE) && interp_method_check_inlining (td, m, csignature)) {
-                                               MonoMethodHeader *mheader = interp_method_get_header (m, error);
-                                               goto_if_nok (error, exit);
-
-                                               // Add local mapping information for cprop to use, in case we inline
-                                               int param_count = csignature->param_count;
-                                               int *newobj_reg_map = (int*)mono_mempool_alloc (td->mempool, sizeof (int) * param_count * 2);
-                                               for (int i = 0; i < param_count; i++) {
-                                                       newobj_reg_map [2 * i] = sp_params [i].local;
-                                                       newobj_reg_map [2 * i + 1] = td->sp [-param_count + i].local;
-                                               }
 
-                                               if (interp_inline_method (td, m, mheader, error)) {
-                                                       newobj_fast->data [0] = INLINED_METHOD_FLAG;
-                                                       newobj_fast->info.newobj_reg_map = newobj_reg_map;
-                                                       break;
-                                               }
-                                       }
                                        // Inlining failed. Set the method to be executed as part of newobj instruction
                                        newobj_fast->data [0] = get_data_item_index (td, mono_interp_get_imethod (m, error));
                                        /* The constructor was not inlined, abort inlining of current method */
                                        if (!td->aggressive_inlining)
                                                INLINE_FAILURE;
                                } else {
-                                       interp_add_ins (td, MINT_NEWOBJ);
+                                       interp_add_ins (td, MINT_NEWOBJ_SLOW);
                                        g_assert (!m_class_is_valuetype (klass));
                                        interp_ins_set_dreg (td->last_ins, dreg);
                                        td->last_ins->data [0] = get_data_item_index (td, mono_interp_get_imethod (m, error));
-                                       td->last_ins->data [1] = params_stack_size;
                                }
                                goto_if_nok (error, exit);
+
+                               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
+                               if (is_protected)
+                                       td->last_ins->flags |= INTERP_INST_FLAG_PROTECTED_NEWOBJ;
                                // Parameters and this pointer are popped of the stack. The return value remains
                                td->sp -= csignature->param_count + 1;
+                                // Save the arguments for the call
+                               int *call_args = (int*) mono_mempool_alloc (td->mempool, (csignature->param_count + 2) * sizeof (int));
+                               for (int i = 0; i < csignature->param_count + 1; i++)
+                                       call_args [i] = td->sp [i].local;
+                               call_args [csignature->param_count + 1] = -1;
+                               td->last_ins->info.call_args = call_args;
                        }
                        break;
                }
@@ -5816,7 +5972,7 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                        MonoType *ftype = mono_field_get_type_internal (field);
                        mt = mint_type (ftype);
 
-                       emit_convert (td, td->sp [-1].type, ftype);
+                       emit_convert (td, td->sp - 1, ftype);
 
                        /* the vtable of the field might not be initialized at this point */
                        MonoClass *fld_klass = mono_class_from_mono_type_internal (ftype);
@@ -6006,12 +6162,16 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                CHECK_TYPELOAD (klass);
                                interp_add_ins (td, MINT_LDELEMA_TC);
                                td->sp -= 2;
-                               td->locals [td->sp [0].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
-                               td->locals [td->sp [1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                               int *call_args = (int*)mono_mempool_alloc (td->mempool, 3 * sizeof (int));
+                               call_args [0] = td->sp [0].local;
+                               call_args [1] = td->sp [1].local;
+                               call_args [2] = -1;
                                push_simple_type (td, STACK_TYPE_MP);
                                interp_ins_set_dreg (td->last_ins, td->sp [-1].local);
-                               td->locals [td->sp [-1].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
                                td->last_ins->data [0] = get_data_item_index (td, klass);
+                               td->last_ins->info.call_args = call_args;
+                               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
                        } else {
                                interp_add_ins (td, MINT_LDELEMA1);
                                td->sp -= 2;
@@ -6598,7 +6758,6 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                        // Push back to top of stack and fixup the local offset
                                        push_types (td, &tos, 1);
                                        td->sp [-1].local = saved_local;
-                                       td->locals [saved_local].stack_offset = td->sp [-1].offset;
 
                                        if (!interp_transform_call (td, method, NULL, generic_context, NULL, FALSE, error, FALSE, FALSE, FALSE))
                                                goto exit;
@@ -6616,27 +6775,23 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
                                        break;
                                }
                                case CEE_MONO_ICALL: {
-                                       int dreg;
+                                       int dreg = -1;
                                        MonoJitICallId const jit_icall_id = (MonoJitICallId)read32 (td->ip + 1);
                                        MonoJitICallInfo const * const info = mono_find_jit_icall_info (jit_icall_id);
                                        td->ip += 5;
 
                                        CHECK_STACK (td, info->sig->param_count);
                                        td->sp -= info->sig->param_count;
+                                       int *call_args = (int*)mono_mempool_alloc (td->mempool, (info->sig->param_count + 1) * sizeof (int));
                                        for (int i = 0; i < info->sig->param_count; i++)
-                                               td->locals [td->sp [i].local].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                                               call_args [i] = td->sp [i].local;
+                                       call_args [info->sig->param_count] = -1;
                                        if (!MONO_TYPE_IS_VOID (info->sig->ret)) {
                                                int mt = mint_type (info->sig->ret);
                                                push_simple_type (td, stack_type [mt]);
                                                dreg = td->sp [-1].local;
-                                               td->locals [dreg].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
-                                       } else {
-                                               // Create a new dummy local to serve as the dreg of the call
-                                               // This dreg is only used to resolve the call args offset
-                                               push_simple_type (td, STACK_TYPE_I4);
-                                               td->sp--;
-                                               dreg = td->sp [0].local;
                                        }
+
                                        if (jit_icall_id == MONO_JIT_ICALL_mono_threads_attach_coop) {
                                                rtm->needs_thread_attach = 1;
                                        } else if (jit_icall_id == MONO_JIT_ICALL_mono_threads_detach_coop) {
@@ -6647,8 +6802,12 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
 
                                                interp_add_ins (td, icall_op);
                                                // hash here is overkill
-                                               interp_ins_set_dreg (td->last_ins, dreg);
+                                               if (dreg != -1)
+                                                       interp_ins_set_dreg (td->last_ins, dreg);
+                                               interp_ins_set_sreg (td->last_ins, MINT_CALL_ARGS_SREG);
+                                               td->last_ins->flags |= INTERP_INST_FLAG_CALL;
                                                td->last_ins->data [0] = get_data_item_index (td, (gpointer)info->func);
+                                               td->last_ins->info.call_args = call_args;
                                        }
                                        break;
                                }
@@ -7295,13 +7454,13 @@ emit_compacted_instruction (TransformData *td, guint16* start_ip, InterpInst *in
                g_array_append_val (td->line_numbers, lne);
        }
 
-       if (opcode == MINT_NOP)
+       if (opcode == MINT_NOP || opcode == MINT_DEF || opcode == MINT_DUMMY_USE)
                return ip;
 
        *ip++ = opcode;
        if (opcode == MINT_SWITCH) {
                int labels = READ32 (&ins->data [0]);
-               *ip++ = get_interp_local_offset (td, ins->sregs [0], TRUE);
+               *ip++ = td->locals [ins->sregs [0]].offset;
                // Write number of switch labels
                *ip++ = ins->data [0];
                *ip++ = ins->data [1];
@@ -7320,7 +7479,7 @@ emit_compacted_instruction (TransformData *td, guint16* start_ip, InterpInst *in
                        opcode == MINT_BR_S || opcode == MINT_LEAVE_S || opcode == MINT_LEAVE_S_CHECK || opcode == MINT_CALL_HANDLER_S) {
                const int br_offset = start_ip - td->new_code;
                for (int i = 0; i < mono_interp_op_sregs [opcode]; i++)
-                       *ip++ = get_interp_local_offset (td, ins->sregs [i], TRUE);
+                       *ip++ = td->locals [ins->sregs [i]].offset;
                if (ins->info.target_bb->native_offset >= 0) {
                        // Backwards branch. We can already patch it.
                        *ip++ = ins->info.target_bb->native_offset - br_offset;
@@ -7341,7 +7500,7 @@ emit_compacted_instruction (TransformData *td, guint16* start_ip, InterpInst *in
                        opcode == MINT_BR || opcode == MINT_LEAVE || opcode == MINT_LEAVE_CHECK || opcode == MINT_CALL_HANDLER) {
                const int br_offset = start_ip - td->new_code;
                for (int i = 0; i < mono_interp_op_sregs [opcode]; i++)
-                       *ip++ = get_interp_local_offset (td, ins->sregs [i], TRUE);
+                       *ip++ = td->locals [ins->sregs [i]].offset;
                if (ins->info.target_bb->native_offset >= 0) {
                        // Backwards branch. We can already patch it
                        int target_offset = ins->info.target_bb->native_offset - br_offset;
@@ -7407,14 +7566,18 @@ emit_compacted_instruction (TransformData *td, guint16* start_ip, InterpInst *in
 #endif
        } else {
                if (mono_interp_op_dregs [opcode])
-                       *ip++ = get_interp_local_offset (td, ins->dreg, TRUE);
+                       *ip++ = td->locals [ins->dreg].offset;
 
                if (mono_interp_op_sregs [opcode]) {
-                       for (int i = 0; i < mono_interp_op_sregs [opcode]; i++)
-                               *ip++ = get_interp_local_offset (td, ins->sregs [i], TRUE);
+                       for (int i = 0; i < mono_interp_op_sregs [opcode]; i++) {
+                               if (ins->sregs [i] == MINT_CALL_ARGS_SREG)
+                                       *ip++ = td->locals [ins->info.call_args [0]].offset;
+                               else
+                                       *ip++ = td->locals [ins->sregs [i]].offset;
+                       }
                } else if (opcode == MINT_LDLOCA_S) {
                        // This opcode receives a local but it is not viewed as a sreg since we don't load the value
-                       *ip++ = get_interp_local_offset (td, ins->sregs [0], TRUE);
+                       *ip++ = td->locals [ins->sregs [0]].offset;
                }
 
                int left = get_inst_length (ins) - (ip - start_ip);
@@ -7426,22 +7589,6 @@ emit_compacted_instruction (TransformData *td, guint16* start_ip, InterpInst *in
        return ip;
 }
 
-static void
-alloc_ins_locals (TransformData *td, InterpInst *ins)
-{
-       int opcode = ins->opcode;
-       if (mono_interp_op_sregs [opcode]) {
-               for (int i = 0; i < mono_interp_op_sregs [opcode]; i++)
-                       get_interp_local_offset (td, ins->sregs [i], FALSE);
-       } else if (opcode == MINT_LDLOCA_S) {
-               // This opcode receives a local but it is not viewed as a sreg since we don't load the value
-               get_interp_local_offset (td, ins->sregs [0], FALSE);
-       }
-
-       if (mono_interp_op_dregs [opcode])
-               get_interp_local_offset (td, ins->dreg, FALSE);
-}
-
 // Generates the final code, after we are done with all the passes
 static void
 generate_compacted_code (TransformData *td)
@@ -7456,7 +7603,6 @@ generate_compacted_code (TransformData *td)
                InterpInst *ins = bb->first_ins;
                while (ins) {
                        size += get_inst_length (ins);
-                       alloc_ins_locals (td, ins);
                        ins = ins->next;
                }
        }
@@ -7523,7 +7669,6 @@ interp_local_deadce (TransformData *td, int *local_ref_count)
                g_assert (td->locals [i].indirects >= 0);
                if (!local_ref_count [i] &&
                                !td->locals [i].indirects &&
-                               !(td->locals [i].flags & INTERP_LOCAL_FLAG_CALL_ARGS) &&
                                (td->locals [i].flags & INTERP_LOCAL_FLAG_DEAD) == 0) {
                        needs_dce = TRUE;
                        td->locals [i].flags |= INTERP_LOCAL_FLAG_DEAD;
@@ -7936,6 +8081,30 @@ interp_fold_binop_cond_br (TransformData *td, InterpBasicBlock *cbb, LocalValue
 }
 
 static void
+cprop_sreg (TransformData *td, InterpInst *ins, int *psreg, int *local_ref_count, LocalValue *local_defs)
+{
+       int sreg = *psreg;
+
+       local_ref_count [sreg]++;
+       if (local_defs [sreg].type == LOCAL_VALUE_LOCAL) {
+               int cprop_local = local_defs [sreg].local;
+
+               // We are trying to replace sregs [i] with its def local (cprop_local), but cprop_local has since been
+               // modified, so we can't use it.
+               if (local_defs [cprop_local].ins != NULL && local_defs [cprop_local].def_index > local_defs [sreg].def_index)
+                       return;
+
+               if (td->verbose_level)
+                       g_print ("cprop %d -> %d:\n\t", sreg, cprop_local);
+               local_ref_count [sreg]--;
+               *psreg = cprop_local;
+               local_ref_count [cprop_local]++;
+               if (td->verbose_level)
+                       dump_interp_inst (ins);
+       }
+}
+
+static void
 interp_cprop (TransformData *td)
 {
        LocalValue *local_defs = (LocalValue*) g_malloc (td->locals_size * sizeof (LocalValue));
@@ -7981,27 +8150,21 @@ retry:
                                // FIXME MINT_PROF_EXIT when void
                                if (sregs [i] == -1)
                                        continue;
-                               local_ref_count [sregs [i]]++;
-                               if (local_defs [sregs [i]].type == LOCAL_VALUE_LOCAL) {
-                                       int cprop_local = local_defs [sregs [i]].local;
-                                       // We are not allowed to extend the liveness of execution stack locals because
-                                       // it can end up conflicting with another such local. Once we will have our
-                                       // own offset allocator for these locals, this restriction can be lifted.
-                                       if (td->locals [cprop_local].flags & INTERP_LOCAL_FLAG_EXECUTION_STACK)
-                                               continue;
-
-                                       // We are trying to replace sregs [i] with its def local (cprop_local), but cprop_local has since been
-                                       // modified, so we can't use it.
-                                       if (local_defs [cprop_local].ins != NULL && local_defs [cprop_local].def_index > local_defs [sregs [i]].def_index)
-                                               continue;
-
-                                       if (td->verbose_level)
-                                               g_print ("cprop %d -> %d:\n\t", sregs [i], cprop_local);
-                                       local_ref_count [sregs [i]]--;
-                                       sregs [i] = cprop_local;
-                                       local_ref_count [cprop_local]++;
-                                       if (td->verbose_level)
-                                               dump_interp_inst (ins);
+                               if (sregs [i] == MINT_CALL_ARGS_SREG) {
+                                       int *call_args = ins->info.call_args;
+                                       if (call_args) {
+                                               while (*call_args != -1) {
+                                                       cprop_sreg (td, ins, call_args, local_ref_count, local_defs);
+                                                       call_args++;
+                                               }
+                                       }
+                               } else {
+                                       cprop_sreg (td, ins, &sregs [i], local_ref_count, local_defs);
+                                       // This var is used as a source to a normal instruction. In case this var will
+                                       // also be used as source to a call, make sure the offset allocator will create
+                                       // a new temporary call arg var and not use this one. Call arg vars have special
+                                       // semantics. They can be assigned only once and they die once the call is made.
+                                       td->locals [sregs [i]].flags |= INTERP_LOCAL_FLAG_NO_CALL_ARGS;
                                }
                        }
 
@@ -8043,9 +8206,9 @@ retry:
                                        }
                                } else if (local_defs [sreg].ins != NULL &&
                                                (td->locals [sreg].flags & INTERP_LOCAL_FLAG_EXECUTION_STACK) &&
-                                               !(td->locals [sreg].flags & INTERP_LOCAL_FLAG_CALL_ARGS) &&
                                                !(td->locals [dreg].flags & INTERP_LOCAL_FLAG_EXECUTION_STACK) &&
-                                               interp_prev_ins (ins) == local_defs [sreg].ins) {
+                                               interp_prev_ins (ins) == local_defs [sreg].ins &&
+                                               !(interp_prev_ins (ins)->flags & INTERP_INST_FLAG_PROTECTED_NEWOBJ)) {
                                        // hackish temporary optimization that won't be necessary in the future
                                        // We replace `local1 <- ?, local2 <- local1` with `local2 <- ?, local1 <- local2`
                                        // if local1 is execution stack local and local2 is normal global local. This makes
@@ -8104,16 +8267,6 @@ retry:
                                ins = interp_fold_binop (td, local_defs, local_ref_count, ins);
                        } else if (MINT_IS_BINOP_CONDITIONAL_BRANCH (opcode)) {
                                ins = interp_fold_binop_cond_br (td, bb, local_defs, local_ref_count, ins);
-                       } else if ((ins->opcode == MINT_NEWOBJ_FAST || ins->opcode == MINT_NEWOBJ_VT_FAST) && ins->data [0] == INLINED_METHOD_FLAG) {
-                               // FIXME Drop the CALL_ARGS flag on the params so this will no longer be necessary
-                               int param_count = ins->data [3];
-                               int *newobj_reg_map = ins->info.newobj_reg_map;
-                               for (int i = 0; i < param_count; i++) {
-                                       int src = newobj_reg_map [2 * i];
-                                       int dst = newobj_reg_map [2 * i + 1];
-                                       local_defs [dst] = local_defs [src];
-                                       local_defs [dst].ins = NULL;
-                               }
                        } else if (MINT_IS_LDFLD (opcode) && ins->data [0] == 0) {
                                InterpInst *ldloca = local_defs [sregs [0]].ins;
                                if (ldloca != NULL && ldloca->opcode == MINT_LDLOCA_S &&
@@ -8192,6 +8345,430 @@ interp_optimize_code (TransformData *td)
                MONO_TIME_TRACK (mono_interp_stats.super_instructions_time, interp_super_instructions (td));
 }
 
+static void
+foreach_local_var (TransformData *td, InterpInst *ins, int data, void (*callback)(TransformData*, int, int))
+{
+       int opcode = ins->opcode;
+       if (mono_interp_op_sregs [opcode]) {
+               for (int i = 0; i < mono_interp_op_sregs [opcode]; i++) {
+                       int sreg = ins->sregs [i];
+
+                       if (sreg == MINT_CALL_ARGS_SREG) {
+                               int *call_args = ins->info.call_args;
+                               if (call_args) {
+                                       int var = *call_args;
+                                       while (var != -1) {
+                                               callback (td, var, data);
+                                               call_args++;
+                                               var = *call_args;
+                                       }
+                               }
+                       } else {
+                               callback (td, sreg, data);
+                       }
+               }
+       }
+
+       if (mono_interp_op_dregs [opcode])
+               callback (td, ins->dreg, data);
+}
+
+static void
+set_var_live_range (TransformData *td, int var, int ins_index)
+{
+       // We don't track liveness yet for global vars
+       if (td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL)
+               return;
+       if (td->locals [var].live_start == -1)
+               td->locals [var].live_start = ins_index;
+       td->locals [var].live_end = ins_index;
+}
+
+static void
+initialize_global_var (TransformData *td, int var, int bb_index)
+{
+       // Check if already handled
+       if (td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL)
+               return;
+
+       if (td->locals [var].bb_index == -1) {
+               td->locals [var].bb_index = bb_index;
+       } else if (td->locals [var].bb_index != bb_index) {
+               // var used in multiple basic blocks
+               if (td->verbose_level)
+                       g_print ("alloc global var %d to offset %d\n", var, td->total_locals_size);
+               alloc_global_var_offset (td, var);
+               td->locals [var].flags |= INTERP_LOCAL_FLAG_GLOBAL;
+       }
+} 
+
+static void
+initialize_global_vars (TransformData *td)
+{
+       InterpBasicBlock *bb;
+
+       for (bb = td->entry_bb; bb != NULL; bb = bb->next_bb) {
+               InterpInst *ins;
+
+               for (ins = bb->first_ins; ins != NULL; ins = ins->next) {
+                       int opcode = ins->opcode;
+                       if (opcode == MINT_NOP) {
+                               continue;
+                       } else if (opcode == MINT_LDLOCA_S) {
+                               int var = ins->sregs [0];
+                               // If global flag is set, it means its offset was already allocated
+                               if (!(td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL)) {
+                                       if (td->verbose_level)
+                                               g_print ("alloc ldloca global var %d to offset %d\n", var, td->total_locals_size);
+                                       alloc_global_var_offset (td, var);
+                                       td->locals [var].flags |= INTERP_LOCAL_FLAG_GLOBAL;
+                               }
+                       }
+                       foreach_local_var (td, ins, bb->index, initialize_global_var);
+               }
+       }
+}
+
+// Data structure used for offset allocation of call args
+typedef struct {
+       InterpInst *call;
+       int param_size;
+} ActiveCall;
+
+typedef struct {
+       ActiveCall *active_calls;
+       int active_calls_count;
+       int active_calls_capacity;
+       int param_size;
+} ActiveCalls;
+
+static void
+init_active_calls (TransformData *td, ActiveCalls *ac)
+{
+       ac->active_calls_count = 0;
+       ac->active_calls_capacity = 5;
+       ac->active_calls = (ActiveCall*)mono_mempool_alloc (td->mempool, ac->active_calls_capacity * sizeof (ActiveCall));
+       ac->param_size = 0;
+}
+
+static void
+reinit_active_calls (TransformData *td, ActiveCalls *ac)
+{
+       ac->active_calls_count = 0;
+       ac->param_size = 0;
+}
+
+static int
+get_call_param_size (TransformData *td, InterpInst *call)
+{
+       int *call_args = call->info.call_args;
+       if (!call_args)
+               return 0;
+
+       int param_size = 0;
+
+       int var = *call_args;
+       while (var != -1) {
+               param_size = ALIGN_TO (param_size + td->locals [var].size, MINT_STACK_SLOT_SIZE);
+               call_args++;
+               var = *call_args;
+       }
+       return param_size;
+}
+
+static void
+add_active_call (TransformData *td, ActiveCalls *ac, InterpInst *call)
+{
+       // Check if already added
+       if (call->flags & INTERP_INST_FLAG_ACTIVE_CALL)
+               return;
+
+       if (ac->active_calls_count == ac->active_calls_capacity) {
+               ActiveCall *old = ac->active_calls;
+               ac->active_calls_capacity *= 2;
+               ac->active_calls = (ActiveCall*)mono_mempool_alloc (td->mempool, ac->active_calls_capacity * sizeof (ActiveCall));
+               memcpy (ac->active_calls, old, ac->active_calls_count * sizeof (ActiveCall));
+       }
+
+       ac->active_calls [ac->active_calls_count].call = call;
+       ac->active_calls [ac->active_calls_count].param_size = get_call_param_size (td, call);
+       ac->param_size += ac->active_calls [ac->active_calls_count].param_size;
+       ac->active_calls_count++;
+
+       // Mark a flag on it so we don't have to lookup the array with every argument store.
+       call->flags |= INTERP_INST_FLAG_ACTIVE_CALL;
+}
+
+static void
+end_active_call (TransformData *td, ActiveCalls *ac, InterpInst *call)
+{
+       // Remove call from array
+       for (int i = 0; i < ac->active_calls_count; i++) {
+               if (ac->active_calls [i].call == call) {
+                       ac->active_calls_count--;
+                       ac->param_size -= ac->active_calls [i].param_size;
+                       // Since this entry is removed, move the last entry into it
+                       if (ac->active_calls_count > 0 && i < ac->active_calls_count)
+                               ac->active_calls [i] = ac->active_calls [ac->active_calls_count];
+               }
+       }
+       // This is the relative offset (to the start of the call args stack) where the args
+       // for this call reside.
+       int start_offset = ac->param_size;
+
+       // Compute to offset of each call argument
+       int *call_args = call->info.call_args;
+       if (call_args && (*call_args != -1)) {
+               int var = *call_args;
+               while (var != -1) {
+                       alloc_var_offset (td, var, &start_offset);
+                       call_args++;
+                       var = *call_args;
+               }
+       } else {
+               // This call has no argument. Allocate a dummy one so when we resolve the
+               // offset for MINT_CALL_ARGS_SREG during compacted instruction emit, we can
+               // always use the offset of the first var in the call_args array
+               int new_var = create_interp_local (td, mono_get_int_type ());
+               td->locals [new_var].call = call;
+               td->locals [new_var].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+               alloc_var_offset (td, new_var, &start_offset);
+
+               call_args = (int*)mono_mempool_alloc (td->mempool, 3 * sizeof (int));
+               call_args [0] = new_var;
+               call_args [1] = -1;
+
+               call->info.call_args = call_args;
+       }
+}
+
+// Data structure used for offset allocation of local vars
+
+typedef struct {
+       int var;
+       gboolean is_alive;
+} ActiveVar;
+
+typedef struct {
+       ActiveVar *active_vars;
+       int active_vars_count;
+       int active_vars_capacity;
+} ActiveVars;
+
+static void
+init_active_vars (TransformData *td, ActiveVars *av)
+{
+       av->active_vars_count = 0;
+       av->active_vars_capacity = MAX (td->locals_size / td->bb_count, 10);
+       av->active_vars = (ActiveVar*)mono_mempool_alloc (td->mempool, av->active_vars_capacity * sizeof (ActiveVars));
+}
+
+static void
+reinit_active_vars (TransformData *td, ActiveVars *av)
+{
+       av->active_vars_count = 0;
+}
+
+static void
+add_active_var (TransformData *td, ActiveVars *av, int var)
+{
+       if (av->active_vars_count == av->active_vars_capacity) {
+               av->active_vars_capacity *= 2;
+               ActiveVar *new_array = (ActiveVar*)mono_mempool_alloc (td->mempool, av->active_vars_capacity * sizeof (ActiveVar));
+               memcpy (new_array, av->active_vars, av->active_vars_count * sizeof (ActiveVar));
+               av->active_vars = new_array;
+       }
+       av->active_vars [av->active_vars_count].var = var;
+       av->active_vars [av->active_vars_count].is_alive = TRUE;
+       av->active_vars_count++;
+}
+
+static void
+end_active_var (TransformData *td, ActiveVars *av, int var)
+{
+       // Iterate over active vars, set the entry associated with var as !is_alive
+       for (int i = 0; i < av->active_vars_count; i++) {
+               if (av->active_vars [i].var == var) {
+                       av->active_vars [i].is_alive = FALSE;
+                       return;
+               }
+       }
+}
+
+static void
+compact_active_vars (TransformData *td, ActiveVars *av, gint32 *current_offset)
+{
+       if (!av->active_vars_count)
+               return;
+       int i = av->active_vars_count - 1;
+       while (i >= 0 && !av->active_vars [i].is_alive) {
+               av->active_vars_count--;
+               *current_offset = td->locals [av->active_vars [i].var].offset;
+               i--;
+       }
+}
+
+static void
+dump_active_vars (TransformData *td, ActiveVars *av)
+{
+       if (td->verbose_level) {
+               g_print ("active :");
+               for (int i = 0; i < av->active_vars_count; i++) {
+                       if (av->active_vars [i].is_alive)
+                               g_print (" %d (end %d),", av->active_vars [i].var, td->locals [av->active_vars [i].var].live_end);
+               }
+               g_print ("\n");
+       }
+}
+
+static void
+interp_alloc_offsets (TransformData *td)
+{
+       InterpBasicBlock *bb;
+       ActiveCalls ac;
+       ActiveVars av;
+
+       if (td->verbose_level)
+               g_print ("\nvar offset allocator iteration\n");
+
+       initialize_global_vars (td);
+
+       init_active_vars (td, &av);
+       init_active_calls (td, &ac);
+
+       int final_total_locals_size = td->total_locals_size;
+       // We now have the top of stack offset. All local regs are allocated after this offset, with each basic block
+       for (bb = td->entry_bb; bb != NULL; bb = bb->next_bb) {
+               InterpInst *ins;
+               int ins_index = 0;
+               if (td->verbose_level)
+                       g_print ("BB%d\n", bb->index);
+
+               reinit_active_calls (td, &ac);
+               reinit_active_vars (td, &av);
+
+               for (ins = bb->first_ins; ins != NULL; ins = ins->next) {
+                       if (ins->opcode == MINT_NOP)
+                               continue;
+                       if (ins->opcode == MINT_NEWOBJ || ins->opcode == MINT_NEWOBJ_VT ||
+                                       ins->opcode == MINT_NEWOBJ_SLOW || ins->opcode == MINT_NEWOBJ_STRING) {
+                               // The offset allocator assumes that the liveness of destination var starts
+                               // after the source vars, which means the destination var can be allocated
+                               // at the same offset as some of the arguments. However, for newobj opcodes,
+                               // the created object is set before the call is made. We solve this by making
+                               // sure that the dreg is not allocated in the param area, so there is no
+                               // risk of conflicts.
+                               td->locals [ins->dreg].flags |= INTERP_LOCAL_FLAG_NO_CALL_ARGS;
+                       }
+                       if (ins->flags & INTERP_INST_FLAG_CALL) {
+                               int *call_args = ins->info.call_args;
+                               if (call_args) {
+                                       int var = *call_args;
+                                       while (var != -1) {
+                                               if (td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL ||
+                                                               td->locals [var].flags & INTERP_LOCAL_FLAG_NO_CALL_ARGS) {
+                                                       // A global var is an argument to a call, which is not allowed. We need
+                                                       // to copy the global var into a local var
+                                                       int new_var = create_interp_local (td, td->locals [var].type);
+                                                       td->locals [new_var].call = ins;
+                                                       td->locals [new_var].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                                                       int opcode = get_mov_for_type (mint_type (td->locals [var].type), FALSE);
+                                                       InterpInst *new_inst = interp_insert_ins_bb (td, bb, ins->prev, opcode);
+                                                       interp_ins_set_dreg (new_inst, new_var);
+                                                       interp_ins_set_sreg (new_inst, var);
+                                                       if (opcode == MINT_MOV_VT)
+                                                               new_inst->data [0] = td->locals [var].size;
+                                                       // The arg of the call is no longer global
+                                                       *call_args = new_var;
+                                                       // Also update liveness for this instruction
+                                                       foreach_local_var (td, new_inst, ins_index, set_var_live_range);
+                                                       ins_index++;
+                                               } else {
+                                                       // Flag this var as it has special storage on the call args stack
+                                                       td->locals [var].call = ins;
+                                                       td->locals [var].flags |= INTERP_LOCAL_FLAG_CALL_ARGS;
+                                               }
+                                               call_args++;
+                                               var = *call_args;
+                                       }
+                               }
+                       }
+                       // Set live_start and live_end for every referenced local that is not global
+                       foreach_local_var (td, ins, ins_index, set_var_live_range);
+                       ins_index++;
+               }
+               gint32 current_offset = td->total_locals_size;
+
+               ins_index = 0;
+               for (ins = bb->first_ins; ins != NULL; ins = ins->next) {
+                       int opcode = ins->opcode;
+                       gboolean is_call = ins->flags & INTERP_INST_FLAG_CALL;
+
+                       if (opcode == MINT_NOP)
+                               continue;
+
+                       if (td->verbose_level) {
+                               g_print ("\tins_index %d\t", ins_index);
+                                dump_interp_inst (ins);
+                       }
+
+                       // Expire source vars. We first mark them as not alive and then compact the array
+                       for (int i = 0; i < mono_interp_op_sregs [opcode]; i++) {
+                               int var = ins->sregs [i];
+                               if (var == MINT_CALL_ARGS_SREG)
+                                       continue;
+                               if (!(td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL) && td->locals [var].live_end == ins_index) {
+                                       g_assert (!(td->locals [var].flags & INTERP_LOCAL_FLAG_CALL_ARGS));
+                                       end_active_var (td, &av, var);
+                               }
+                       }
+
+                       if (is_call)
+                               end_active_call (td, &ac, ins);
+
+                       compact_active_vars (td, &av, &current_offset);
+
+                       // Alloc dreg local starting at the stack_offset
+                       if (mono_interp_op_dregs [opcode]) {
+                               int var = ins->dreg;
+
+                               if (td->locals [var].flags & INTERP_LOCAL_FLAG_CALL_ARGS) {
+                                       add_active_call (td, &ac, td->locals [var].call);
+                               } else if (!(td->locals [var].flags & INTERP_LOCAL_FLAG_GLOBAL) && td->locals [var].offset == -1) {
+                                       alloc_var_offset (td, var, &current_offset);
+                                       if (current_offset > final_total_locals_size)
+                                               final_total_locals_size = current_offset;
+
+                                       if (td->verbose_level)
+                                               g_print ("alloc var %d to offset %d\n", var, td->locals [var].offset);
+
+                                       if (td->locals [var].live_end > ins_index) {
+                                               // if dreg is still used in the basic block, add it to the active list
+                                               add_active_var (td, &av, var);
+                                       } else {
+                                               current_offset = td->locals [var].offset;
+                                       }
+                               }
+                       }
+                       if (td->verbose_level)
+                               dump_active_vars (td, &av);
+                       ins_index++;
+               }
+       }
+
+       // Iterate over all call args locals, update their final offset (aka add td->total_locals_size to them)
+       // then also update td->total_locals_size to account for this space.
+       td->param_area_offset = final_total_locals_size;
+       for (int i = 0; i < td->locals_size; i++) {
+               // These are allocated separately at the end of the stack
+               if (td->locals [i].flags & INTERP_LOCAL_FLAG_CALL_ARGS) {
+                       td->locals [i].offset += td->param_area_offset;
+                       final_total_locals_size = MAX (td->locals [i].offset + td->locals [i].size, final_total_locals_size);
+               }
+       }
+       td->total_locals_size = ALIGN_TO (final_total_locals_size, MINT_STACK_SLOT_SIZE);
+}
+
 /*
  * Very few methods have localloc. Handle it separately to not impact performance
  * of other methods. We replace the normal return opcodes with opcodes that also
@@ -8304,6 +8881,8 @@ generate (MonoMethod *method, MonoMethodHeader *header, InterpMethod *rtm, MonoG
 
        interp_optimize_code (td);
 
+       interp_alloc_offsets (td);
+
        generate_compacted_code (td);
 
        if (td->total_locals_size >= G_MAXUINT16) {
@@ -8317,7 +8896,7 @@ generate (MonoMethod *method, MonoMethodHeader *header, InterpMethod *rtm, MonoG
 
        if (td->verbose_level) {
                g_print ("Runtime method: %s %p\n", mono_method_full_name (method, TRUE), rtm);
-               g_print ("Locals size %d, stack size: %d\n", td->total_locals_size, td->max_stack_size);
+               g_print ("Locals size %d\n", td->total_locals_size);
                g_print ("Calculated stack height: %d, stated height: %d\n", td->max_stack_height, header->max_stack);
                dump_interp_code (td->new_code, td->new_code_end);
        }
@@ -8348,11 +8927,8 @@ generate (MonoMethod *method, MonoMethodHeader *header, InterpMethod *rtm, MonoG
                if (c->flags & MONO_EXCEPTION_CLAUSE_FILTER)
                        c->data.filter_offset = get_native_offset (td, c->data.filter_offset);
        }
-       rtm->stack_size = td->max_stack_size;
-       // FIXME revisit whether we actually need this
-       rtm->stack_size += 2 * MINT_STACK_SLOT_SIZE; /* + 1 for returns of called functions  + 1 for 0-ing in trace*/
-       rtm->total_locals_size = ALIGN_TO (td->total_locals_size, MINT_VT_ALIGNMENT);
-       rtm->alloca_size = ALIGN_TO (rtm->total_locals_size + rtm->stack_size, 8);
+       rtm->alloca_size = td->total_locals_size;
+       rtm->locals_size = td->param_area_offset;
        rtm->data_items = (gpointer*)mono_mem_manager_alloc0 (td->mem_manager, td->n_data_items * sizeof (td->data_items [0]));
        memcpy (rtm->data_items, td->data_items, td->n_data_items * sizeof (td->data_items [0]));
 
@@ -8530,8 +9106,7 @@ mono_interp_transform_method (InterpMethod *imethod, ThreadContext *context, Mon
                }
                if (nm == NULL) {
                        mono_os_mutex_lock (&calc_section);
-                       imethod->stack_size = sizeof (stackval); /* for tracing */
-                       imethod->alloca_size = imethod->stack_size;
+                       imethod->alloca_size = sizeof (stackval); /* for tracing */
                        mono_memory_barrier ();
                        imethod->transformed = TRUE;
                        mono_interp_stats.methods_transformed++;
index bc31c6f..0609d20 100644 (file)
@@ -9,10 +9,17 @@
 #define INTERP_INST_FLAG_SEQ_POINT_METHOD_EXIT 4
 #define INTERP_INST_FLAG_SEQ_POINT_NESTED_CALL 8
 #define INTERP_INST_FLAG_RECORD_CALL_PATCH 16
+#define INTERP_INST_FLAG_CALL 32
+// Flag used internally by the var offset allocator
+#define INTERP_INST_FLAG_ACTIVE_CALL 64
+// This instruction is protected by a clause
+#define INTERP_INST_FLAG_PROTECTED_NEWOBJ 128
 
 #define INTERP_LOCAL_FLAG_DEAD 1
 #define INTERP_LOCAL_FLAG_EXECUTION_STACK 2
 #define INTERP_LOCAL_FLAG_CALL_ARGS 4
+#define INTERP_LOCAL_FLAG_GLOBAL 8
+#define INTERP_LOCAL_FLAG_NO_CALL_ARGS 16
 
 typedef struct _InterpInst InterpInst;
 typedef struct _InterpBasicBlock InterpBasicBlock;
@@ -27,8 +34,6 @@ typedef struct
         * the stack a new local is created.
         */
        int local;
-       /* The offset from the execution stack start where this is stored */
-       int offset;
        /* Saves how much stack this is using. It is a multiple of MINT_VT_ALIGNMENT */
        int size;
 } StackInfo;
@@ -70,9 +75,10 @@ struct _InterpInst {
        union {
                InterpBasicBlock *target_bb;
                InterpBasicBlock **target_bb_table;
-               // We handle newobj poorly due to not having our own local offset allocator.
-               // We temporarily use this array to let cprop know the values of the newobj args.
-               int *newobj_reg_map;
+               // For call instructions, this represents an array of all call arg vars
+               // in the order they are pushed to the stack. This makes it easy to find
+               // all source vars for these types of opcodes. This is terminated with -1.
+               int *call_args;
        } info;
        // Variable data immediately following the dreg/sreg information. This is represented exactly
        // in the final code stream as in this array.
@@ -135,9 +141,12 @@ typedef struct {
        int indirects;
        int offset;
        int size;
+       int live_start, live_end;
+       // index of first basic block where this var is used
+       int bb_index;
        union {
-               // the offset from the start of the execution stack locals space
-               int stack_offset;
+               // If var is INTERP_LOCAL_FLAG_CALL_ARGS, this is the call instruction using it
+               InterpInst *call;
        };
 } InterpLocal;
 
@@ -161,8 +170,8 @@ typedef struct
        StackInfo *sp;
        unsigned int max_stack_height;
        unsigned int stack_capacity;
-       unsigned int max_stack_size;
-       unsigned int total_locals_size;
+       gint32 param_area_offset;
+       gint32 total_locals_size;
        InterpLocal *locals;
        unsigned int il_locals_offset;
        unsigned int il_locals_size;
@@ -176,6 +185,7 @@ typedef struct
        GHashTable *patchsite_hash;
 #endif
        int *clause_indexes;
+       int *clause_vars;
        gboolean gen_sdb_seq_points;
        GPtrArray *seq_points;
        InterpBasicBlock **offset_to_bb;
index c982ed9..188de18 100644 (file)
         <ExcludeList Include = "$(XunitTestBinBase)/JIT/jit64/opt/cse/HugeArray1/**">
             <Issue>https://github.com/dotnet/runtime/issues/46622</Issue>
         </ExcludeList>
+        <ExcludeList Include = "$(XunitTestBinBase)/JIT/Regression/JitBlue/GitHub_21990/**">
+            <Issue>https://github.com/dotnet/runtime/issues/46622</Issue>
+        </ExcludeList>
         <ExcludeList Include = "$(XunitTestBinBase)/JIT/Directed/zeroinit/tail/**">
             <Issue>https://github.com/dotnet/runtime/issues/37955</Issue>
         </ExcludeList>