深入 OpenJDK 17 源码：Java Native 方法的注册、绑定与调用全流程

油盐米

223人浏览 · 2026-06-04 17:01:40

油盐米 · 2026-06-04 17:01:40 发布

前言：Java 如何“走出”虚拟机？

作为一名 Java 开发者，你或许很少直接编写 JNI（Java Native Interface）代码，但当你调用 Thread.start()、FileOutputStream.write() 或是 System.currentTimeMillis() 时，你已经在使用 native 方法了。这些方法的实现并不在 Java 代码中，而是隐藏在 JVM 的内部或底层操作系统里。那么，Java 方法调用是如何跨越虚拟机边界，准确找到 C/C++ 函数的？这个过程涉及哪些关键的数据结构和代码生成技术？

本文基于 OpenJDK 17 源码（以 x86-64 / Linux 平台为例），从 java.lang.Thread 类的 native 方法注册开始，逐步剖析：

JNI 动态注册的内部实现（jni_RegisterNatives）
Method 对象如何绑定 native 函数地址（Method::register_native）
SharedRuntime::generate_native_wrapper 如何为每个 native 方法生成专用的适配代码（native wrapper）
整个调用链路的完整串联

全文大约 8500 字，包含大量源码片段及中文注释，适合希望深入理解 JVM 底层机制的读者。

一、Native 方法概览与 JNI 注册

1.1 两种注册方式

JNI 规范定义了两种将 Java native 方法映射到 C 函数的途径：

静态注册：JVM 按照 Java_包名_类名_方法名 的命名规则在动态库中查找符号。这种方式简单但符号名称冗长，且加载慢。
动态注册：在 JNI_OnLoad 或类的静态初始化器中调用 RegisterNatives 主动注册。这种方式更灵活、高效，也是现代 JDK 广泛采用的方式。

以 java.lang.Thread 为例，其 native 方法均采用动态注册。

1.2 `Thread.registerNatives` 源码分析

在 Thread.java 中有一个静态 native 方法 registerNatives，它在类初始化时被调用：

java

// java.base/share/classes/java/lang/Thread.java
private static native void registerNatives();
static {
    registerNatives();
}

对应的 C 实现位于 Thread.c：

// jdk/src/java.base/share/native/libjava/Thread.c
static JNINativeMethod methods[] = {
    {"start0",           "()V",        (void *)&JVM_StartThread},
    {"stop0",            "(" OBJ ")V", (void *)&JVM_StopThread},
    {"isAlive",          "()Z",        (void *)&JVM_IsThreadAlive},
    // ... 其他方法
};

JNIEXPORT void JNICALL
Java_java_lang_Thread_registerNatives(JNIEnv *env, jclass cls)
{
    (*env)->RegisterNatives(env, cls, methods, ARRAY_LENGTH(methods));
}

注释：

JNINativeMethod 结构包含三个字段：Java 方法名、签名（使用 JNI 类型描述符）、C 函数指针。
JVM_StartThread 等是 JVM 内部实现的函数（位于 jvm.cpp），它们最终会调用操作系统 API 完成线程创建。
RegisterNatives 是一个 JNI 函数，由 JVM 实现（jni_RegisterNatives）。

当我们调用 Thread.start() 时，实际调用的是 start0()，这是一个 native 方法，它通过上述注册表被链接到 JVM_StartThread。下面我们深入 JVM 层，看看 RegisterNatives 到底做了什么。

二、JNI 函数 `RegisterNatives` 的 JVM 实现

JNI 函数 RegisterNatives 的入口位于 jni.cpp 中的 jni_RegisterNatives。我们逐段阅读并添加注释。

cpp

// src/hotspot/share/prims/jni.cpp

JNI_ENTRY(jint, jni_RegisterNatives(JNIEnv *env, jclass clazz,
                                    const JNINativeMethod *methods,
                                    jint nMethods))
  // 宏：记录 JNI 调用入口（用于性能跟踪）
  HOTSPOT_JNI_REGISTERNATIVES_ENTRY(env, clazz, (void *) methods, nMethods);
  jint ret = 0;
  DT_RETURN_MARK(RegisterNatives, jint, (const jint&)ret);

  // 1. 将 jclass 解析为 JVM 内部的 Klass*
  Klass* k = java_lang_Class::as_Klass(JNIHandles::resolve_non_null(clazz));

  // 2. 检查是否需要发出警告（重新注册平台类的方法）
  bool do_warning = false;
  if (k->is_instance_klass()) {
    oop cl = k->class_loader();
    InstanceKlass* ik = InstanceKlass::cast(k);
    // 如果是平台类加载器（boot/platform）加载的类，且来自非同名模块的调用者
    if ((cl == NULL || SystemDictionary::is_platform_class_loader(cl)) &&
        ik->module()->is_named()) {
      Klass* caller = thread->security_get_caller_class(1); // 调用者的类
      do_warning = (caller == NULL) || caller->class_loader() != cl;
    }
  }

  // 3. 遍历每个要注册的方法
  for (int index = 0; index < nMethods; index++) {
    const char* meth_name = methods[index].name;
    const char* meth_sig = methods[index].signature;
    int meth_name_len = (int)strlen(meth_name);

    // 从符号表（SymbolTable）中查找方法名和签名字符串
    // probe() 只查找不创建，若不存在则说明类加载时该方法未被解析，直接失败
    TempNewSymbol name = SymbolTable::probe(meth_name, meth_name_len);
    TempNewSymbol signature = SymbolTable::probe(meth_sig, (int)strlen(meth_sig));

    if (name == NULL || signature == NULL) {
      ResourceMark rm(THREAD);
      stringStream st;
      st.print("Method %s.%s%s not found", k->external_name(), meth_name, meth_sig);
      THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), -1);
    }

    if (do_warning) {
      ResourceMark rm(THREAD);
      log_warning(jni, resolve)("Re-registering of platform native method: %s.%s%s "
              "from code in a different classloader", k->external_name(), meth_name, meth_sig);
    }

    // 4. 核心：调用 Method::register_native 完成绑定
    bool res = Method::register_native(k, name, signature,
                                       (address) methods[index].fnPtr, THREAD);
    if (!res) {
      ret = -1;
      break;
    }
  }
  return ret;
JNI_END

关键点总结：

JNIHandles::resolve_non_null：将 JNI 引用（jclass）转换为 JVM 内部对象指针（oop），再通过 java_lang_Class::as_Klass 得到 Klass*。
SymbolTable::probe：快速从字符串常量池中查找符号，如果找不到说明 Java 类中没有声明该方法，抛出 NoSuchMethodError。
安全检查：若来自不同类加载器的代码试图重新注册平台类的 native 方法，会记录警告（但仍允许，用于调试代理）。
真正执行绑定的是 Method::register_native。

三、`Method::register_native`：绑定方法与函数指针

Method::register_native 定义在 method.cpp 中，它的职责是根据 Klass*、方法名和签名找到对应的 Method 对象，并设置其 _native_function 字段。

cpp

// src/hotspot/share/oops/method.cpp

bool Method::register_native(Klass* k, Symbol* name, Symbol* signature, address entry, TRAPS) {
  // 1. 在 Klass 中查找方法（遍历虚函数表、接口表等）
  Method* method = k->lookup_method(name, signature);
  if (method == NULL) {
    ResourceMark rm(THREAD);
    stringStream st;
    st.print("Method '");
    print_external_name(&st, k, name, signature);
    st.print("' name or signature does not match");
    THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), false);
  }

  // 2. 如果不是 native 方法，尝试查找加了前缀的版本（JVM TI 代理可能添加前缀）
  if (!method->is_native()) {
    method = find_prefixed_native(k, name, signature, THREAD);
    if (method == NULL) {
      ResourceMark rm(THREAD);
      stringStream st;
      st.print("Method '");
      print_external_name(&st, k, name, signature);
      st.print("' is not declared as native");
      THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), false);
    }
  }

  // 3. 设置或清除 native 函数指针
  if (entry != NULL) {
    // 第二个参数表示是否需要发送 JVMTI 绑定事件（通常为 true）
    method->set_native_function(entry, native_bind_event_is_interesting);
  } else {
    method->clear_native_function();
  }

  // 记录调试日志（如果启用）
  if (log_is_enabled(Debug, jni, resolve)) {
    ResourceMark rm(THREAD);
    log_debug(jni, resolve)("[Registering JNI native method %s.%s]",
                            method->method_holder()->external_name(),
                            method->name()->as_C_string());
  }
  return true;
}

补充说明：

lookup_method 会遍历当前类及其父类、接口的方法表，因此注册时不必要求该方法在 Java 代码中显式声明为 native（但必须存在）。
find_prefixed_native 用于处理 JVM TI 的 RetransformClasses 功能，代理可能给方法名加前缀（例如 jdk_internal_...），确保能找到正确的 native 实现。
native_bind_event_is_interesting 是一个全局标志，表示是否有 JVMTI 代理监听 NativeMethodBind 事件。如果有，set_native_function 会调用 JvmtiExport::post_native_method_bind，允许代理修改目标函数地址。

3.1 `set_native_function` 与 `Method` 的内存布局

Method::set_native_function 负责将 entry 写入 Method 对象后面的一个指针槽位。

cpp

// method.hpp 中定义
address* native_function_addr() const {
  assert(is_native(), "must be native");
  // Method 对象之后紧跟一个 address 大小的字段，用于存储 native 函数指针
  return (address*) (this + 1);
}

// method.cpp
void Method::set_native_function(address function, bool post_event_flag) {
  assert(function != NULL, "use clear_native_function to unregister");
  address* native_function = native_function_addr();

  address current = *native_function;
  if (current == function) return;   // 已指向相同地址，无需重复设置

  // JVMTI 事件处理：允许代理修改 function 指针
  if (post_event_flag && JvmtiExport::should_post_native_method_bind() &&
      function != NULL) {
    JvmtiExport::post_native_method_bind(this, &function);
  }

  // 写入新的函数地址
  *native_function = function;

  // 如果该方法已有编译好的 native wrapper（nmethod），需要使其失效
  // 因为老的 wrapper 可能内联了旧的函数地址
  CompiledMethod* nm = code();
  if (nm != NULL) {
    nm->make_not_entrant();
  }
}

布局示意（简化）：

text

+------------------+
| Method header    |
| ...              |
| _constMethod     |
| _from_compiled_entry |
| ...              |
+------------------+   <-- Method 对象起始 (this)
| _native_function |   <-- this+1 处（仅 native 方法存在）
+------------------+

因此，当我们在 Java 层调用 Thread.start0() 时，JVM 可以通过 Method::native_function() 直接获取到 JVM_StartThread 的地址。

四、Native Wrapper：从 Java 调用到 C 函数的桥梁

注册完成后，JVM 如何实际执行一个 native 方法？关键在于 native wrapper —— 一段动态生成的机器码，它充当了 Java 调用约定与 C 调用约定之间的适配层。每次第一次执行 native 方法时（或方法被重新注册后），JVM 会调用 SharedRuntime::generate_native_wrapper 为该 Method 创建专用的 nmethod。

下面我们深入 generate_native_wrapper 的实现（位于 sharedRuntime.cpp），理解它的工作流程。由于函数较长，我们将其划分为几个逻辑阶段。

4.1 函数签名与初始获取

cpp

nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm,
                                                const methodHandle& method,
                                                int compile_id,
                                                BasicType* in_sig_bt,
                                                VMRegPair* in_regs,
                                                BasicType ret_type,
                                                address critical_entry) {
  // 处理 MethodHandle intrinsic 的特殊情况（快速路径）
  if (method->is_method_handle_intrinsic()) {
    // ... 省略（通常不涉及普通 native）
  }

  bool is_critical_native = true;
  address native_func = critical_entry;
  if (native_func == NULL) {
    // 从 Method 对象中获取已注册的 native 函数地址
    native_func = method->native_function();
    is_critical_native = false;
  }
  assert(native_func != NULL, "must have function");

methodHandle 是一个智能指针，持有 Method*。
in_sig_bt 和 in_regs 描述了 Java 调用约定下参数的类型和位置（解释器或 JIT 传过来的）。
critical_entry 用于 Critical Native（一种优化，直接暴露数组原始指针，避免 JNI 函数调用开销），这里略过。
method->native_function() 返回之前在注册时设置的地址。

4.2 构造 C 调用约定的参数数组

JNI 函数在 C 层面总是有两个隐含参数：JNIEnv* 和（静态方法）jclass。我们需要将 Java 参数列表扩展为 C 参数列表。

cpp

  const int total_in_args = method->size_of_parameters();   // Java 可见参数个数
  int total_c_args = total_in_args;
  if (!is_critical_native) {
    total_c_args += 1;                    // +1 for JNIEnv*
    if (method->is_static()) {
      total_c_args++;                     // +1 for jclass
    }
  } else {
    // Critical native 处理数组参数时会拆分成 (length, pointer) 对
    for (int i = 0; i < total_in_args; i++) {
      if (in_sig_bt[i] == T_ARRAY) {
        total_c_args++;
      }
    }
  }

  BasicType* out_sig_bt = NEW_RESOURCE_ARRAY(BasicType, total_c_args);
  VMRegPair* out_regs   = NEW_RESOURCE_ARRAY(VMRegPair, total_c_args);

  int argc = 0;
  if (!is_critical_native) {
    out_sig_bt[argc++] = T_ADDRESS;       // JNIEnv* (opaque pointer)
    if (method->is_static()) {
      out_sig_bt[argc++] = T_OBJECT;      // jclass
    }
    for (int i = 0; i < total_in_args; i++) {
      out_sig_bt[argc++] = in_sig_bt[i];
    }
  } else { /* ... 省略 ... */ }

  // 根据平台 ABI 决定每个参数放在哪个寄存器或栈上
  int out_arg_slots = c_calling_convention(out_sig_bt, out_regs, NULL, total_c_args);

c_calling_convention 根据 System V AMD64 ABI（或 Windows x64）将参数分配到寄存器（RDI, RSI, RDX, RCX, R8, R9 及 XMM 寄存器）或栈上。
返回的 out_arg_slots 是栈上为参数分配的总槽位数（每个 slot 8 字节）。

4.3 计算栈帧布局

wrapper 需要一个栈帧来保存返回地址、调用者保存的寄存器、JNI 句柄表、同步锁的 lock record 等。

cpp

  // 基础栈大小 = ABI 保留区 + C 参数所需的栈空间
  int stack_slots = SharedRuntime::out_preserve_stack_slots() + out_arg_slots;

  // 空间用于保存输入寄存器中的 oop（对象引用）句柄
  int total_save_slots = 6 * VMRegImpl::slots_per_word;  // 最多6个通用寄存器参数
  if (is_critical_native) {
    // 实际只保存真正用到的寄存器，重新计算 total_save_slots
    // ...
  }
  int oop_handle_offset = stack_slots;
  stack_slots += total_save_slots;

  // 静态方法需要额外槽位存储 klass 句柄（jclass）
  int klass_slot_offset = 0, klass_offset = -1;
  if (method->is_static()) {
    klass_slot_offset = stack_slots;
    stack_slots += VMRegImpl::slots_per_word;
    klass_offset = klass_slot_offset * VMRegImpl::stack_slot_size;
  }

  // 同步方法需要 lock record（BasicLock）
  int lock_slot_offset = 0;
  if (method->is_synchronized()) {
    lock_slot_offset = stack_slots;
    stack_slots += VMRegImpl::slots_per_word;
  }

  // 额外预留 6 个槽位用于参数移动时的临时存储
  stack_slots += 6;

  // 16 字节对齐
  stack_slots = align_up(stack_slots, StackAlignmentInSlots);
  int stack_size = stack_slots * VMRegImpl::stack_slot_size;

栈帧图示（从高地址到低地址）：

text

        +-------------------------+
   rbp  |  old rbp                |
        +-------------------------+
        |  return address         |
        +-------------------------+
        |  6个临时槽位            |
        +-------------------------+
        |  lock record (同步)     | <- lock_slot_offset
        +-------------------------+
        |  klass handle (静态)    | <- klass_slot_offset
        +-------------------------+
        |  oop handle area        | <- oop_handle_offset
        +-------------------------+
        |  C 参数栈空间           |
        +-------------------------+
        |  ABI 保留区             |
   rsp  +-------------------------+

4.4 入口代码：内联缓存与栈溢出检查

cpp

  // 内联缓存（IC）检查：确保接收者的 klass 与预期一致
  const Register ic_reg = rax;
  const Register receiver = j_rarg0;          // 非静态方法第一个参数是 this
  __ load_klass(rscratch1, receiver, rscratch2);
  __ cmpq(ic_reg, rscratch1);
  __ jcc(Assembler::equal, hit);
  __ jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub()));
  __ bind(hit);

  // 栈溢出检查（触碰保护页）
  __ bang_stack_with_offset((int)StackOverflow::stack_shadow_zone_size());

  // 建立新的栈帧
  __ enter();                                 // push rbp ; mov rbp, rsp
  __ subptr(rsp, stack_size - 2*wordSize);    // 预留栈空间

4.5 参数移动：从 Java 位置到 C 位置（“大洗牌”）

这是 wrapper 中最复杂的部分。参数移动时需要处理三种情况：

直接移动（同一个寄存器或栈位置）。
使用临时寄存器打破循环依赖。
对象参数需要创建 JNI 本地引用（handlize），并记录到 OopMap。

cpp

  OopMap* map = new OopMap(stack_slots * 2, 0);
  GrowableArray<int> arg_order(2 * total_in_args);
  VMRegPair tmp_vmreg;
  tmp_vmreg.set2(rbx->as_VMReg());           // RBX 作为临时寄存器

  if (!is_critical_native) {
    // 从后向前移动可以避免覆盖未移动的值
    for (int i = total_in_args - 1, c_arg = total_c_args - 1; i >= 0; i--, c_arg--) {
      arg_order.push(i);
      arg_order.push(c_arg);
    }
  } else {
    // 需要用算法计算无冲突的移动顺序（此处略）
  }

  int temploc = -1;
  for (int ai = 0; ai < arg_order.length(); ai += 2) {
    int i = arg_order.at(ai);          // Java 参数索引
    int c_arg = arg_order.at(ai + 1);  // C 参数索引
    if (c_arg == -1) {
      // 需要暂时移到临时寄存器
      __ mov(tmp_vmreg.first()->as_Register(), in_regs[i].first()->as_Register());
      in_regs[i] = tmp_vmreg;
      temploc = i;
      continue;
    } else if (i == -1) {
      i = temploc;
      temploc = -1;
    }

    switch (in_sig_bt[i]) {
      case T_OBJECT:
        // 对象参数需要转换为 JNI 本地引用，存储在 oop handle area
        __ object_move(map, oop_handle_offset, stack_slots, in_regs[i], out_regs[c_arg],
                       ((i == 0) && (!is_static)), &receiver_offset);
        break;
      case T_LONG:
        __ long_move(in_regs[i], out_regs[c_arg]);
        break;
      // ... 其他基本类型
    }
  }

object_move 会将原始 oop 写入到栈上的 handle 区域，并让 out_regs[c_arg] 指向该 handle 的地址（即 jobject）。
同时，它会更新 OopMap，标记这些 handle 位置为 oop，以便 GC 能够扫描。

4.6 同步方法的锁获取

cpp

  if (method->is_synchronized()) {
    // 获取对象锁（非静态：this；静态：klass mirror）
    __ movptr(obj_reg, Address(oop_handle_reg, 0));
    if (UseBiasedLocking) {
      __ biased_locking_enter(lock_reg, obj_reg, swap_reg, rscratch1, rscratch2, false, lock_done, &slow_path_lock);
    }
    // 尝试轻量级锁：CAS 替换对象头中的 mark word
    __ movptr(swap_reg, 1);
    __ orptr(swap_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes()));
    __ movptr(Address(lock_reg, mark_word_offset), swap_reg);
    __ lock();
    __ cmpxchgptr(lock_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes()));
    __ jcc(Assembler::equal, lock_done);
    // 慢路径：膨胀为重量级锁或等待
    // ...
  }

4.7 调用前的准备：设置线程状态和 last_Java_frame

cpp

  intptr_t the_pc = (intptr_t) __ pc();
  oop_maps->add_gc_map(the_pc - start, map);

  // 记录当前栈指针和 PC，供 GC 扫描
  __ set_last_Java_frame(rsp, noreg, (address)the_pc);

  if (!is_critical_native) {
    // 将 JNIEnv* 放入第一个参数寄存器 (RDI)
    __ lea(c_rarg0, Address(r15_thread, in_bytes(JavaThread::jni_environment_offset())));
    // 切换线程状态为 _thread_in_native（表示当前执行 native 代码）
    __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_native);
  }

  // 调用真正的 native 函数
  __ call(RuntimeAddress(native_func));

set_last_Java_frame 将 rsp 和 the_pc 保存到 JavaThread 对象中。当 GC 发生时，虚拟机能够从该位置开始扫描栈上的 oop。
线程状态变为 _thread_in_native 后，JVM 认为该线程不持有任何 Java 堆的引用（实际上可能通过 JNI 引用持有），因此 GC 不需要暂停它（会等待它返回）。

4.8 从 native 返回后的处理：安全点与挂起检查

cpp

  Label after_transition;
  if (is_critical_native) {
    // 快速检查是否需要在返回后进入安全点
    __ safepoint_poll(needs_safepoint, r15_thread, false, false);
    // ...
  }

  // 切换到过渡状态
  __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_native_trans);
  __ membar(Assembler::StoreLoad);   // 确保内存可见性

  // 检查安全点或线程挂起请求
  {
    Label slow_path;
    __ safepoint_poll(slow_path, r15_thread, true, false);
    __ cmpl(Address(r15_thread, JavaThread::suspend_flags_offset()), 0);
    __ jcc(Assembler::equal, Continue);
    __ bind(slow_path);
    // 保存返回值，调用 C 函数处理挂起/安全点
    save_native_result(masm, ret_type, stack_slots);
    __ mov(c_rarg0, r15_thread);
    __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, JavaThread::check_special_condition_for_native_trans)));
    restore_native_result(masm, ret_type, stack_slots);
    __ bind(Continue);
  }

  // 恢复为 _thread_in_Java 状态
  __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_Java);
  __ bind(after_transition);

_thread_in_native_trans 是一个短暂状态，表示刚从 native 返回但尚未完成安全点检查。在此状态下，JVM 可以安全地暂停线程。
check_special_condition_for_native_trans 会处理挂起请求和 GC 同步。

4.9 返回值解包、解锁、异常转发

cpp

  // 根据返回类型调整返回值（如将 jint 符号扩展为 Java int）
  switch (ret_type) {
    case T_BOOLEAN: __ c2bool(rax); break;
    case T_BYTE:    __ sign_extend_byte(rax); break;
    // ...
  }

  // 同步方法需要释放锁（类似 unlock）
  if (method->is_synchronized()) {
    // ... 释放锁的汇编代码
  }

  // 检查是否有待决异常
  __ cmpptr(Address(r15_thread, Thread::pending_exception_offset()), NULL_WORD);
  __ jcc(Assembler::notEqual, exception_pending);

  // 清除 JNI 本地引用表（重置 top）
  __ movptr(rcx, Address(r15_thread, JavaThread::active_handles_offset()));
  __ movl(Address(rcx, JNIHandleBlock::top_offset_in_bytes()), 0);

  __ leave();
  __ ret(0);

  // 异常处理代码
  __ bind(exception_pending);
  __ jump(RuntimeAddress(StubRoutines::forward_exception_entry()));

4.10 生成 nmethod 并返回

cpp

  nmethod *nm = nmethod::new_native_nmethod(method,
                                            compile_id,
                                            masm->code(),
                                            vep_offset,
                                            frame_complete,
                                            stack_slots / VMRegImpl::slots_per_word,
                                            (is_static ? in_ByteSize(klass_offset) : in_ByteSize(receiver_offset)),
                                            in_ByteSize(lock_slot_offset*VMRegImpl::stack_slot_size),
                                            oop_maps);
  return nm;
}

new_native_nmethod 分配内存，将生成的代码拷贝到 CodeCache 中，并返回 nmethod 指针。
此后，Method 对象的 _from_compiled_entry 将指向该 nmethod 的入口，后续调用直接执行 wrapper，无需重新生成。

五、完整的调用链路串联

现在我们可以将一个 native 方法从声明到执行的全过程串联起来：

Java 类加载：Thread 类被加载，执行 <clinit>，调用 registerNatives()。
静态注册入口：Java_java_lang_Thread_registerNatives 调用 (*env)->RegisterNatives。
JVM 处理注册：jni_RegisterNatives 对每个方法调用 Method::register_native，将 JVM_StartThread 等地址写入 Method 对象的 _native_function 槽位。
Java 调用 thread.start() → start0()。
解释器/JIT 发现 Method::_from_compiled_entry 为空（首次调用），调用 SharedRuntime::generate_native_wrapper 生成 nmethod。
Wrapper 执行：
- 保存上下文，移动参数，创建 JNI 句柄。
- 设置线程状态 _thread_in_native，记录 last_Java_frame。
- 调用 native_func（即 JVM_StartThread）。
- 返回后处理安全点/挂起，解锁，恢复状态。
JVM_StartThread 内部调用 pthread_create 或 CreateThread 创建操作系统线程。
返回 wrapper，清理 JNI 引用，返回 Java 层。

六、调试验证：如何观察 native wrapper 的生成？

如果你想亲眼看到 native wrapper 的生成过程，可以在 Linux 上构建一个 Debug 版本的 OpenJDK，并设置如下 JVM 参数：

text

-XX:+PrintNativeMethodStubs -Xlog:jni+resolve=debug

在控制台会输出类似：

text

[Registering JNI native method java.lang.Thread.start0()V]

并且 generate_native_wrapper 会打印参数移动的注释（若在源码中启用了 __ block_comment）。

此外，可以使用 hsdis 反汇编生成的 nmethod，例如：

text

java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*Thread.start0

七、总结

通过以上对 OpenJDK 17 源码的详细解读，我们揭示了 Java native 方法的底层运作机制：

注册阶段：通过 RegisterNatives 将方法名、签名映射到 C 函数地址，并存储在 Method 对象的内存扩展区。
Wrapper 生成阶段：SharedRuntime::generate_native_wrapper 为每个 native 方法动态生成一段汇编代码，负责处理参数转换、线程状态切换、锁同步、安全点协调等复杂任务。
调用阶段：生成的 nmethod 被安装到 Method 对象中，后续调用直接执行，实现高效的跨语言调用。

理解这些底层细节，不仅有助于解决 JNI 相关的疑难问题（如 UnsatisfiedLinkError、JNI 临界区死锁），也为深入 JVM 性能调优和编译器开发打下坚实基础。希望本文能成为你探索 OpenJDK 源码旅途中的一份实用地图。

#源码

static JNINativeMethod methods[] = {
    {"start0",           "()V",        (void *)&JVM_StartThread},
    {"stop0",            "(" OBJ ")V", (void *)&JVM_StopThread},
    {"isAlive",          "()Z",        (void *)&JVM_IsThreadAlive},
    {"suspend0",         "()V",        (void *)&JVM_SuspendThread},
    {"resume0",          "()V",        (void *)&JVM_ResumeThread},
    {"setPriority0",     "(I)V",       (void *)&JVM_SetThreadPriority},
    {"yield",            "()V",        (void *)&JVM_Yield},
    {"sleep",            "(J)V",       (void *)&JVM_Sleep},
    {"currentThread",    "()" THD,     (void *)&JVM_CurrentThread},
    {"interrupt0",       "()V",        (void *)&JVM_Interrupt},
    {"holdsLock",        "(" OBJ ")Z", (void *)&JVM_HoldsLock},
    {"getThreads",        "()[" THD,   (void *)&JVM_GetAllThreads},
    {"dumpThreads",      "([" THD ")[[" STE, (void *)&JVM_DumpThreads},
    {"setNativeName",    "(" STR ")V", (void *)&JVM_SetNativeThreadName},
};

JNIEXPORT void JNICALL
Java_java_lang_Thread_registerNatives(JNIEnv *env, jclass cls)
{
    (*env)->RegisterNatives(env, cls, methods, ARRAY_LENGTH(methods));
}

#注册JNI
JNI_ENTRY(jint, jni_RegisterNatives(JNIEnv *env, jclass clazz,
                                    const JNINativeMethod *methods,
                                    jint nMethods))
  HOTSPOT_JNI_REGISTERNATIVES_ENTRY(env, clazz, (void *) methods, nMethods);
  jint ret = 0;
  DT_RETURN_MARK(RegisterNatives, jint, (const jint&)ret);

  Klass* k = java_lang_Class::as_Klass(JNIHandles::resolve_non_null(clazz));

  // There are no restrictions on native code registering native methods,
  // which allows agents to redefine the bindings to native methods, however
  // we issue a warning if any code running outside of the boot/platform
  // loader is rebinding any native methods in classes loaded by the
  // boot/platform loader that are in named modules. That will catch changes
  // to platform classes while excluding classes added to the bootclasspath.
  bool do_warning = false;

  // Only instanceKlasses can have native methods
  if (k->is_instance_klass()) {
    oop cl = k->class_loader();
    InstanceKlass* ik = InstanceKlass::cast(k);
    // Check for a platform class
    if ((cl ==  NULL || SystemDictionary::is_platform_class_loader(cl)) &&
        ik->module()->is_named()) {
      Klass* caller = thread->security_get_caller_class(1);
      // If no caller class, or caller class has a different loader, then
      // issue a warning below.
      do_warning = (caller == NULL) || caller->class_loader() != cl;
    }
  }


  for (int index = 0; index < nMethods; index++) {
    const char* meth_name = methods[index].name;
    const char* meth_sig = methods[index].signature;
    int meth_name_len = (int)strlen(meth_name);

    // The class should have been loaded (we have an instance of the class
    // passed in) so the method and signature should already be in the symbol
    // table.  If they're not there, the method doesn't exist.
    TempNewSymbol  name = SymbolTable::probe(meth_name, meth_name_len);
    TempNewSymbol  signature = SymbolTable::probe(meth_sig, (int)strlen(meth_sig));

    if (name == NULL || signature == NULL) {
      ResourceMark rm(THREAD);
      stringStream st;
      st.print("Method %s.%s%s not found", k->external_name(), meth_name, meth_sig);
      // Must return negative value on failure
      THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), -1);
    }

    if (do_warning) {
      ResourceMark rm(THREAD);
      log_warning(jni, resolve)("Re-registering of platform native method: %s.%s%s "
              "from code in a different classloader", k->external_name(), meth_name, meth_sig);
    }

    bool res = Method::register_native(k, name, signature,
                                       (address) methods[index].fnPtr, THREAD);
    if (!res) {
      ret = -1;
      break;
    }
  }
  return ret;
JNI_END


bool Method::register_native(Klass* k, Symbol* name, Symbol* signature, address entry, TRAPS) {
  Method* method = k->lookup_method(name, signature);
  if (method == NULL) {
    ResourceMark rm(THREAD);
    stringStream st;
    st.print("Method '");
    print_external_name(&st, k, name, signature);
    st.print("' name or signature does not match");
    THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), false);
  }
  if (!method->is_native()) {
    // trying to register to a non-native method, see if a JVM TI agent has added prefix(es)
    method = find_prefixed_native(k, name, signature, THREAD);
    if (method == NULL) {
      ResourceMark rm(THREAD);
      stringStream st;
      st.print("Method '");
      print_external_name(&st, k, name, signature);
      st.print("' is not declared as native");
      THROW_MSG_(vmSymbols::java_lang_NoSuchMethodError(), st.as_string(), false);
    }
  }

  if (entry != NULL) {
    method->set_native_function(entry, native_bind_event_is_interesting);
  } else {
    method->clear_native_function();
  }
  if (log_is_enabled(Debug, jni, resolve)) {
    ResourceMark rm(THREAD);
    log_debug(jni, resolve)("[Registering JNI native method %s.%s]",
                            method->method_holder()->external_name(),
                            method->name()->as_C_string());
  }
  return true;
}


void Method::set_native_function(address function, bool post_event_flag) {
  assert(function != NULL, "use clear_native_function to unregister natives");
  assert(!is_method_handle_intrinsic() || function == SharedRuntime::native_method_throw_unsatisfied_link_error_entry(), "");
  address* native_function = native_function_addr();

  // We can see racers trying to place the same native function into place. Once
  // is plenty.
  address current = *native_function;
  if (current == function) return;
  if (post_event_flag && JvmtiExport::should_post_native_method_bind() &&
      function != NULL) {
    // native_method_throw_unsatisfied_link_error_entry() should only
    // be passed when post_event_flag is false.
    assert(function !=
      SharedRuntime::native_method_throw_unsatisfied_link_error_entry(),
      "post_event_flag mis-match");

    // post the bind event, and possible change the bind function
    JvmtiExport::post_native_method_bind(this, &function);
  }
  *native_function = function;
  // This function can be called more than once. We must make sure that we always
  // use the latest registered method -> check if a stub already has been generated.
  // If so, we have to make it not_entrant.
  CompiledMethod* nm = code(); // Put it into local variable to guard against concurrent updates
  if (nm != NULL) {
    nm->make_not_entrant();
  }
}

address* native_function_addr() const          { assert(is_native(), "must be native"); return (address*) (this+1); }

// ---------------------------------------------------------------------------
// Generate a native wrapper for a given method.  The method takes arguments
// in the Java compiled code convention, marshals them to the native
// convention (handlizes oops, etc), transitions to native, makes the call,
// returns to java state (possibly blocking), unhandlizes any result and
// returns.
//
// Critical native functions are a shorthand for the use of
// GetPrimtiveArrayCritical and disallow the use of any other JNI
// functions.  The wrapper is expected to unpack the arguments before
// passing them to the callee. Critical native functions leave the state _in_Java,
// since they cannot stop for GC.
// Some other parts of JNI setup are skipped like the tear down of the JNI handle
// block and the check for pending exceptions it's impossible for them
// to be thrown.
//
nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm,
                                                const methodHandle& method,
                                                int compile_id,
                                                BasicType* in_sig_bt,
                                                VMRegPair* in_regs,
                                                BasicType ret_type,
                                                address critical_entry) {
  if (method->is_method_handle_intrinsic()) {
    vmIntrinsics::ID iid = method->intrinsic_id();
    intptr_t start = (intptr_t)__ pc();
    int vep_offset = ((intptr_t)__ pc()) - start;
    gen_special_dispatch(masm,
                         method,
                         in_sig_bt,
                         in_regs);
    int frame_complete = ((intptr_t)__ pc()) - start;  // not complete, period
    __ flush();
    int stack_slots = SharedRuntime::out_preserve_stack_slots();  // no out slots at all, actually
    return nmethod::new_native_nmethod(method,
                                       compile_id,
                                       masm->code(),
                                       vep_offset,
                                       frame_complete,
                                       stack_slots / VMRegImpl::slots_per_word,
                                       in_ByteSize(-1),
                                       in_ByteSize(-1),
                                       (OopMapSet*)NULL);
  }
  bool is_critical_native = true;
  address native_func = critical_entry;
  if (native_func == NULL) {
    native_func = method->native_function();
    is_critical_native = false;
  }
  assert(native_func != NULL, "must have function");

  // An OopMap for lock (and class if static)
  OopMapSet *oop_maps = new OopMapSet();
  intptr_t start = (intptr_t)__ pc();

  // We have received a description of where all the java arg are located
  // on entry to the wrapper. We need to convert these args to where
  // the jni function will expect them. To figure out where they go
  // we convert the java signature to a C signature by inserting
  // the hidden arguments as arg[0] and possibly arg[1] (static method)

  const int total_in_args = method->size_of_parameters();
  int total_c_args = total_in_args;
  if (!is_critical_native) {
    total_c_args += 1;
    if (method->is_static()) {
      total_c_args++;
    }
  } else {
    for (int i = 0; i < total_in_args; i++) {
      if (in_sig_bt[i] == T_ARRAY) {
        total_c_args++;
      }
    }
  }

  BasicType* out_sig_bt = NEW_RESOURCE_ARRAY(BasicType, total_c_args);
  VMRegPair* out_regs   = NEW_RESOURCE_ARRAY(VMRegPair, total_c_args);
  BasicType* in_elem_bt = NULL;

  int argc = 0;
  if (!is_critical_native) {
    out_sig_bt[argc++] = T_ADDRESS;
    if (method->is_static()) {
      out_sig_bt[argc++] = T_OBJECT;
    }

    for (int i = 0; i < total_in_args ; i++ ) {
      out_sig_bt[argc++] = in_sig_bt[i];
    }
  } else {
    in_elem_bt = NEW_RESOURCE_ARRAY(BasicType, total_in_args);
    SignatureStream ss(method->signature());
    for (int i = 0; i < total_in_args ; i++ ) {
      if (in_sig_bt[i] == T_ARRAY) {
        // Arrays are passed as int, elem* pair
        out_sig_bt[argc++] = T_INT;
        out_sig_bt[argc++] = T_ADDRESS;
        ss.skip_array_prefix(1);  // skip one '['
        assert(ss.is_primitive(), "primitive type expected");
        in_elem_bt[i] = ss.type();
      } else {
        out_sig_bt[argc++] = in_sig_bt[i];
        in_elem_bt[i] = T_VOID;
      }
      if (in_sig_bt[i] != T_VOID) {
        assert(in_sig_bt[i] == ss.type() ||
               in_sig_bt[i] == T_ARRAY, "must match");
        ss.next();
      }
    }
  }

  // Now figure out where the args must be stored and how much stack space
  // they require.
  int out_arg_slots;
  out_arg_slots = c_calling_convention(out_sig_bt, out_regs, NULL, total_c_args);

  // Compute framesize for the wrapper.  We need to handlize all oops in
  // incoming registers

  // Calculate the total number of stack slots we will need.

  // First count the abi requirement plus all of the outgoing args
  int stack_slots = SharedRuntime::out_preserve_stack_slots() + out_arg_slots;

  // Now the space for the inbound oop handle area
  int total_save_slots = 6 * VMRegImpl::slots_per_word;  // 6 arguments passed in registers
  if (is_critical_native) {
    // Critical natives may have to call out so they need a save area
    // for register arguments.
    int double_slots = 0;
    int single_slots = 0;
    for ( int i = 0; i < total_in_args; i++) {
      if (in_regs[i].first()->is_Register()) {
        const Register reg = in_regs[i].first()->as_Register();
        switch (in_sig_bt[i]) {
          case T_BOOLEAN:
          case T_BYTE:
          case T_SHORT:
          case T_CHAR:
          case T_INT:  single_slots++; break;
          case T_ARRAY:  // specific to LP64 (7145024)
          case T_LONG: double_slots++; break;
          default:  ShouldNotReachHere();
        }
      } else if (in_regs[i].first()->is_XMMRegister()) {
        switch (in_sig_bt[i]) {
          case T_FLOAT:  single_slots++; break;
          case T_DOUBLE: double_slots++; break;
          default:  ShouldNotReachHere();
        }
      } else if (in_regs[i].first()->is_FloatRegister()) {
        ShouldNotReachHere();
      }
    }
    total_save_slots = double_slots * 2 + single_slots;
    // align the save area
    if (double_slots != 0) {
      stack_slots = align_up(stack_slots, 2);
    }
  }

  int oop_handle_offset = stack_slots;
  stack_slots += total_save_slots;

  // Now any space we need for handlizing a klass if static method

  int klass_slot_offset = 0;
  int klass_offset = -1;
  int lock_slot_offset = 0;
  bool is_static = false;

  if (method->is_static()) {
    klass_slot_offset = stack_slots;
    stack_slots += VMRegImpl::slots_per_word;
    klass_offset = klass_slot_offset * VMRegImpl::stack_slot_size;
    is_static = true;
  }

  // Plus a lock if needed

  if (method->is_synchronized()) {
    lock_slot_offset = stack_slots;
    stack_slots += VMRegImpl::slots_per_word;
  }

  // Now a place (+2) to save return values or temp during shuffling
  // + 4 for return address (which we own) and saved rbp
  stack_slots += 6;

  // Ok The space we have allocated will look like:
  //
  //
  // FP-> |                     |
  //      |---------------------|
  //      | 2 slots for moves   |
  //      |---------------------|
  //      | lock box (if sync)  |
  //      |---------------------| <- lock_slot_offset
  //      | klass (if static)   |
  //      |---------------------| <- klass_slot_offset
  //      | oopHandle area      |
  //      |---------------------| <- oop_handle_offset (6 java arg registers)
  //      | outbound memory     |
  //      | based arguments     |
  //      |                     |
  //      |---------------------|
  //      |                     |
  // SP-> | out_preserved_slots |
  //
  //


  // Now compute actual number of stack words we need rounding to make
  // stack properly aligned.
  stack_slots = align_up(stack_slots, StackAlignmentInSlots);

  int stack_size = stack_slots * VMRegImpl::stack_slot_size;

  // First thing make an ic check to see if we should even be here

  // We are free to use all registers as temps without saving them and
  // restoring them except rbp. rbp is the only callee save register
  // as far as the interpreter and the compiler(s) are concerned.


  const Register ic_reg = rax;
  const Register receiver = j_rarg0;

  Label hit;
  Label exception_pending;

  assert_different_registers(ic_reg, receiver, rscratch1);
  __ verify_oop(receiver);
  __ load_klass(rscratch1, receiver, rscratch2);
  __ cmpq(ic_reg, rscratch1);
  __ jcc(Assembler::equal, hit);

  __ jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub()));

  // Verified entry point must be aligned
  __ align(8);

  __ bind(hit);

  int vep_offset = ((intptr_t)__ pc()) - start;

  if (VM_Version::supports_fast_class_init_checks() && method->needs_clinit_barrier()) {
    Label L_skip_barrier;
    Register klass = r10;
    __ mov_metadata(klass, method->method_holder()); // InstanceKlass*
    __ clinit_barrier(klass, r15_thread, &L_skip_barrier /*L_fast_path*/);

    __ jump(RuntimeAddress(SharedRuntime::get_handle_wrong_method_stub())); // slow path

    __ bind(L_skip_barrier);
  }

#ifdef COMPILER1
  // For Object.hashCode, System.identityHashCode try to pull hashCode from object header if available.
  if ((InlineObjectHash && method->intrinsic_id() == vmIntrinsics::_hashCode) || (method->intrinsic_id() == vmIntrinsics::_identityHashCode)) {
    inline_check_hashcode_from_object_header(masm, method, j_rarg0 /*obj_reg*/, rax /*result*/);
  }
#endif // COMPILER1

  // The instruction at the verified entry point must be 5 bytes or longer
  // because it can be patched on the fly by make_non_entrant. The stack bang
  // instruction fits that requirement.

  // Generate stack overflow check
  __ bang_stack_with_offset((int)StackOverflow::stack_shadow_zone_size());

  // Generate a new frame for the wrapper.
  __ enter();
  // -2 because return address is already present and so is saved rbp
  __ subptr(rsp, stack_size - 2*wordSize);

  BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler();
  bs->nmethod_entry_barrier(masm);

  // Frame is now completed as far as size and linkage.
  int frame_complete = ((intptr_t)__ pc()) - start;

    if (UseRTMLocking) {
      // Abort RTM transaction before calling JNI
      // because critical section will be large and will be
      // aborted anyway. Also nmethod could be deoptimized.
      __ xabort(0);
    }

#ifdef ASSERT
    {
      Label L;
      __ mov(rax, rsp);
      __ andptr(rax, -16); // must be 16 byte boundary (see amd64 ABI)
      __ cmpptr(rax, rsp);
      __ jcc(Assembler::equal, L);
      __ stop("improperly aligned stack");
      __ bind(L);
    }
#endif /* ASSERT */


  // We use r14 as the oop handle for the receiver/klass
  // It is callee save so it survives the call to native

  const Register oop_handle_reg = r14;

  //
  // We immediately shuffle the arguments so that any vm call we have to
  // make from here on out (sync slow path, jvmti, etc.) we will have
  // captured the oops from our caller and have a valid oopMap for
  // them.

  // -----------------
  // The Grand Shuffle

  // The Java calling convention is either equal (linux) or denser (win64) than the
  // c calling convention. However the because of the jni_env argument the c calling
  // convention always has at least one more (and two for static) arguments than Java.
  // Therefore if we move the args from java -> c backwards then we will never have
  // a register->register conflict and we don't have to build a dependency graph
  // and figure out how to break any cycles.
  //

  // Record esp-based slot for receiver on stack for non-static methods
  int receiver_offset = -1;

  // This is a trick. We double the stack slots so we can claim
  // the oops in the caller's frame. Since we are sure to have
  // more args than the caller doubling is enough to make
  // sure we can capture all the incoming oop args from the
  // caller.
  //
  OopMap* map = new OopMap(stack_slots * 2, 0 /* arg_slots*/);

  // Mark location of rbp (someday)
  // map->set_callee_saved(VMRegImpl::stack2reg( stack_slots - 2), stack_slots * 2, 0, vmreg(rbp));

  // Use eax, ebx as temporaries during any memory-memory moves we have to do
  // All inbound args are referenced based on rbp and all outbound args via rsp.


#ifdef ASSERT
  bool reg_destroyed[RegisterImpl::number_of_registers];
  bool freg_destroyed[XMMRegisterImpl::number_of_registers];
  for ( int r = 0 ; r < RegisterImpl::number_of_registers ; r++ ) {
    reg_destroyed[r] = false;
  }
  for ( int f = 0 ; f < XMMRegisterImpl::number_of_registers ; f++ ) {
    freg_destroyed[f] = false;
  }

#endif /* ASSERT */

  // This may iterate in two different directions depending on the
  // kind of native it is.  The reason is that for regular JNI natives
  // the incoming and outgoing registers are offset upwards and for
  // critical natives they are offset down.
  GrowableArray<int> arg_order(2 * total_in_args);

  VMRegPair tmp_vmreg;
  tmp_vmreg.set2(rbx->as_VMReg());

  if (!is_critical_native) {
    for (int i = total_in_args - 1, c_arg = total_c_args - 1; i >= 0; i--, c_arg--) {
      arg_order.push(i);
      arg_order.push(c_arg);
    }
  } else {
    // Compute a valid move order, using tmp_vmreg to break any cycles
    ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg);
  }

  int temploc = -1;
  for (int ai = 0; ai < arg_order.length(); ai += 2) {
    int i = arg_order.at(ai);
    int c_arg = arg_order.at(ai + 1);
    __ block_comment(err_msg("move %d -> %d", i, c_arg));
    if (c_arg == -1) {
      assert(is_critical_native, "should only be required for critical natives");
      // This arg needs to be moved to a temporary
      __ mov(tmp_vmreg.first()->as_Register(), in_regs[i].first()->as_Register());
      in_regs[i] = tmp_vmreg;
      temploc = i;
      continue;
    } else if (i == -1) {
      assert(is_critical_native, "should only be required for critical natives");
      // Read from the temporary location
      assert(temploc != -1, "must be valid");
      i = temploc;
      temploc = -1;
    }
#ifdef ASSERT
    if (in_regs[i].first()->is_Register()) {
      assert(!reg_destroyed[in_regs[i].first()->as_Register()->encoding()], "destroyed reg!");
    } else if (in_regs[i].first()->is_XMMRegister()) {
      assert(!freg_destroyed[in_regs[i].first()->as_XMMRegister()->encoding()], "destroyed reg!");
    }
    if (out_regs[c_arg].first()->is_Register()) {
      reg_destroyed[out_regs[c_arg].first()->as_Register()->encoding()] = true;
    } else if (out_regs[c_arg].first()->is_XMMRegister()) {
      freg_destroyed[out_regs[c_arg].first()->as_XMMRegister()->encoding()] = true;
    }
#endif /* ASSERT */
    switch (in_sig_bt[i]) {
      case T_ARRAY:
        if (is_critical_native) {
          unpack_array_argument(masm, in_regs[i], in_elem_bt[i], out_regs[c_arg + 1], out_regs[c_arg]);
          c_arg++;
#ifdef ASSERT
          if (out_regs[c_arg].first()->is_Register()) {
            reg_destroyed[out_regs[c_arg].first()->as_Register()->encoding()] = true;
          } else if (out_regs[c_arg].first()->is_XMMRegister()) {
            freg_destroyed[out_regs[c_arg].first()->as_XMMRegister()->encoding()] = true;
          }
#endif
          break;
        }
      case T_OBJECT:
        assert(!is_critical_native, "no oop arguments");
        __ object_move(map, oop_handle_offset, stack_slots, in_regs[i], out_regs[c_arg],
                    ((i == 0) && (!is_static)),
                    &receiver_offset);
        break;
      case T_VOID:
        break;

      case T_FLOAT:
        __ float_move(in_regs[i], out_regs[c_arg]);
          break;

      case T_DOUBLE:
        assert( i + 1 < total_in_args &&
                in_sig_bt[i + 1] == T_VOID &&
                out_sig_bt[c_arg+1] == T_VOID, "bad arg list");
        __ double_move(in_regs[i], out_regs[c_arg]);
        break;

      case T_LONG :
        __ long_move(in_regs[i], out_regs[c_arg]);
        break;

      case T_ADDRESS: assert(false, "found T_ADDRESS in java args");

      default:
        __ move32_64(in_regs[i], out_regs[c_arg]);
    }
  }

  int c_arg;

  // Pre-load a static method's oop into r14.  Used both by locking code and
  // the normal JNI call code.
  if (!is_critical_native) {
    // point c_arg at the first arg that is already loaded in case we
    // need to spill before we call out
    c_arg = total_c_args - total_in_args;

    if (method->is_static()) {

      //  load oop into a register
      __ movoop(oop_handle_reg, JNIHandles::make_local(method->method_holder()->java_mirror()));

      // Now handlize the static class mirror it's known not-null.
      __ movptr(Address(rsp, klass_offset), oop_handle_reg);
      map->set_oop(VMRegImpl::stack2reg(klass_slot_offset));

      // Now get the handle
      __ lea(oop_handle_reg, Address(rsp, klass_offset));
      // store the klass handle as second argument
      __ movptr(c_rarg1, oop_handle_reg);
      // and protect the arg if we must spill
      c_arg--;
    }
  } else {
    // For JNI critical methods we need to save all registers in save_args.
    c_arg = 0;
  }

  // Change state to native (we save the return address in the thread, since it might not
  // be pushed on the stack when we do a a stack traversal). It is enough that the pc()
  // points into the right code segment. It does not have to be the correct return pc.
  // We use the same pc/oopMap repeatedly when we call out

  intptr_t the_pc = (intptr_t) __ pc();
  oop_maps->add_gc_map(the_pc - start, map);

  __ set_last_Java_frame(rsp, noreg, (address)the_pc);


  // We have all of the arguments setup at this point. We must not touch any register
  // argument registers at this point (what if we save/restore them there are no oop?

  {
    SkipIfEqual skip(masm, &DTraceMethodProbes, false);
    // protect the args we've loaded
    save_args(masm, total_c_args, c_arg, out_regs);
    __ mov_metadata(c_rarg1, method());
    __ call_VM_leaf(
      CAST_FROM_FN_PTR(address, SharedRuntime::dtrace_method_entry),
      r15_thread, c_rarg1);
    restore_args(masm, total_c_args, c_arg, out_regs);
  }

  // RedefineClasses() tracing support for obsolete method entry
  if (log_is_enabled(Trace, redefine, class, obsolete)) {
    // protect the args we've loaded
    save_args(masm, total_c_args, c_arg, out_regs);
    __ mov_metadata(c_rarg1, method());
    __ call_VM_leaf(
      CAST_FROM_FN_PTR(address, SharedRuntime::rc_trace_method_entry),
      r15_thread, c_rarg1);
    restore_args(masm, total_c_args, c_arg, out_regs);
  }

  // Lock a synchronized method

  // Register definitions used by locking and unlocking

  const Register swap_reg = rax;  // Must use rax for cmpxchg instruction
  const Register obj_reg  = rbx;  // Will contain the oop
  const Register lock_reg = r13;  // Address of compiler lock object (BasicLock)
  const Register old_hdr  = r13;  // value of old header at unlock time

  Label slow_path_lock;
  Label lock_done;

  if (method->is_synchronized()) {
    assert(!is_critical_native, "unhandled");


    const int mark_word_offset = BasicLock::displaced_header_offset_in_bytes();

    // Get the handle (the 2nd argument)
    __ mov(oop_handle_reg, c_rarg1);

    // Get address of the box

    __ lea(lock_reg, Address(rsp, lock_slot_offset * VMRegImpl::stack_slot_size));

    // Load the oop from the handle
    __ movptr(obj_reg, Address(oop_handle_reg, 0));

    if (UseBiasedLocking) {
      __ biased_locking_enter(lock_reg, obj_reg, swap_reg, rscratch1, rscratch2, false, lock_done, &slow_path_lock);
    }

    // Load immediate 1 into swap_reg %rax
    __ movl(swap_reg, 1);

    // Load (object->mark() | 1) into swap_reg %rax
    __ orptr(swap_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes()));

    // Save (object->mark() | 1) into BasicLock's displaced header
    __ movptr(Address(lock_reg, mark_word_offset), swap_reg);

    // src -> dest iff dest == rax else rax <- dest
    __ lock();
    __ cmpxchgptr(lock_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes()));
    __ jcc(Assembler::equal, lock_done);

    // Hmm should this move to the slow path code area???

    // Test if the oopMark is an obvious stack pointer, i.e.,
    //  1) (mark & 3) == 0, and
    //  2) rsp <= mark < mark + os::pagesize()
    // These 3 tests can be done by evaluating the following
    // expression: ((mark - rsp) & (3 - os::vm_page_size())),
    // assuming both stack pointer and pagesize have their
    // least significant 2 bits clear.
    // NOTE: the oopMark is in swap_reg %rax as the result of cmpxchg

    __ subptr(swap_reg, rsp);
    __ andptr(swap_reg, 3 - os::vm_page_size());

    // Save the test result, for recursive case, the result is zero
    __ movptr(Address(lock_reg, mark_word_offset), swap_reg);
    __ jcc(Assembler::notEqual, slow_path_lock);

    // Slow path will re-enter here

    __ bind(lock_done);
  }

  // Finally just about ready to make the JNI call

  // get JNIEnv* which is first argument to native
  if (!is_critical_native) {
    __ lea(c_rarg0, Address(r15_thread, in_bytes(JavaThread::jni_environment_offset())));

    // Now set thread in native
    __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_native);
  }

  __ call(RuntimeAddress(native_func));

  // Verify or restore cpu control state after JNI call
  __ restore_cpu_control_state_after_jni();

  // Unpack native results.
  switch (ret_type) {
  case T_BOOLEAN: __ c2bool(rax);            break;
  case T_CHAR   : __ movzwl(rax, rax);      break;
  case T_BYTE   : __ sign_extend_byte (rax); break;
  case T_SHORT  : __ sign_extend_short(rax); break;
  case T_INT    : /* nothing to do */        break;
  case T_DOUBLE :
  case T_FLOAT  :
    // Result is in xmm0 we'll save as needed
    break;
  case T_ARRAY:                 // Really a handle
  case T_OBJECT:                // Really a handle
      break; // can't de-handlize until after safepoint check
  case T_VOID: break;
  case T_LONG: break;
  default       : ShouldNotReachHere();
  }

  Label after_transition;

  // If this is a critical native, check for a safepoint or suspend request after the call.
  // If a safepoint is needed, transition to native, then to native_trans to handle
  // safepoints like the native methods that are not critical natives.
  if (is_critical_native) {
    Label needs_safepoint;
    __ safepoint_poll(needs_safepoint, r15_thread, false /* at_return */, false /* in_nmethod */);
    __ cmpl(Address(r15_thread, JavaThread::suspend_flags_offset()), 0);
    __ jcc(Assembler::equal, after_transition);
    __ bind(needs_safepoint);
  }

  // Switch thread to "native transition" state before reading the synchronization state.
  // This additional state is necessary because reading and testing the synchronization
  // state is not atomic w.r.t. GC, as this scenario demonstrates:
  //     Java thread A, in _thread_in_native state, loads _not_synchronized and is preempted.
  //     VM thread changes sync state to synchronizing and suspends threads for GC.
  //     Thread A is resumed to finish this native method, but doesn't block here since it
  //     didn't see any synchronization is progress, and escapes.
  __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_native_trans);

  // Force this write out before the read below
  __ membar(Assembler::Membar_mask_bits(
              Assembler::LoadLoad | Assembler::LoadStore |
              Assembler::StoreLoad | Assembler::StoreStore));

  // check for safepoint operation in progress and/or pending suspend requests
  {
    Label Continue;
    Label slow_path;

    __ safepoint_poll(slow_path, r15_thread, true /* at_return */, false /* in_nmethod */);

    __ cmpl(Address(r15_thread, JavaThread::suspend_flags_offset()), 0);
    __ jcc(Assembler::equal, Continue);
    __ bind(slow_path);

    // Don't use call_VM as it will see a possible pending exception and forward it
    // and never return here preventing us from clearing _last_native_pc down below.
    // Also can't use call_VM_leaf either as it will check to see if rsi & rdi are
    // preserved and correspond to the bcp/locals pointers. So we do a runtime call
    // by hand.
    //
    __ vzeroupper();
    save_native_result(masm, ret_type, stack_slots);
    __ mov(c_rarg0, r15_thread);
    __ mov(r12, rsp); // remember sp
    __ subptr(rsp, frame::arg_reg_save_area_bytes); // windows
    __ andptr(rsp, -16); // align stack as required by ABI
    __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, JavaThread::check_special_condition_for_native_trans)));
    __ mov(rsp, r12); // restore sp
    __ reinit_heapbase();
    // Restore any method result value
    restore_native_result(masm, ret_type, stack_slots);
    __ bind(Continue);
  }

  // change thread state
  __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_Java);
  __ bind(after_transition);

  Label reguard;
  Label reguard_done;
  __ cmpl(Address(r15_thread, JavaThread::stack_guard_state_offset()), StackOverflow::stack_guard_yellow_reserved_disabled);
  __ jcc(Assembler::equal, reguard);
  __ bind(reguard_done);

  // native result if any is live

  // Unlock
  Label unlock_done;
  Label slow_path_unlock;
  if (method->is_synchronized()) {

    // Get locked oop from the handle we passed to jni
    __ movptr(obj_reg, Address(oop_handle_reg, 0));

    Label done;

    if (UseBiasedLocking) {
      __ biased_locking_exit(obj_reg, old_hdr, done);
    }

    // Simple recursive lock?

    __ cmpptr(Address(rsp, lock_slot_offset * VMRegImpl::stack_slot_size), (int32_t)NULL_WORD);
    __ jcc(Assembler::equal, done);

    // Must save rax if if it is live now because cmpxchg must use it
    if (ret_type != T_FLOAT && ret_type != T_DOUBLE && ret_type != T_VOID) {
      save_native_result(masm, ret_type, stack_slots);
    }


    // get address of the stack lock
    __ lea(rax, Address(rsp, lock_slot_offset * VMRegImpl::stack_slot_size));
    //  get old displaced header
    __ movptr(old_hdr, Address(rax, 0));

    // Atomic swap old header if oop still contains the stack lock
    __ lock();
    __ cmpxchgptr(old_hdr, Address(obj_reg, oopDesc::mark_offset_in_bytes()));
    __ jcc(Assembler::notEqual, slow_path_unlock);

    // slow path re-enters here
    __ bind(unlock_done);
    if (ret_type != T_FLOAT && ret_type != T_DOUBLE && ret_type != T_VOID) {
      restore_native_result(masm, ret_type, stack_slots);
    }

    __ bind(done);

  }
  {
    SkipIfEqual skip(masm, &DTraceMethodProbes, false);
    save_native_result(masm, ret_type, stack_slots);
    __ mov_metadata(c_rarg1, method());
    __ call_VM_leaf(
         CAST_FROM_FN_PTR(address, SharedRuntime::dtrace_method_exit),
         r15_thread, c_rarg1);
    restore_native_result(masm, ret_type, stack_slots);
  }

  __ reset_last_Java_frame(false);

  // Unbox oop result, e.g. JNIHandles::resolve value.
  if (is_reference_type(ret_type)) {
    __ resolve_jobject(rax /* value */,
                       r15_thread /* thread */,
                       rcx /* tmp */);
  }

  if (CheckJNICalls) {
    // clear_pending_jni_exception_check
    __ movptr(Address(r15_thread, JavaThread::pending_jni_exception_check_fn_offset()), NULL_WORD);
  }

  if (!is_critical_native) {
    // reset handle block
    __ movptr(rcx, Address(r15_thread, JavaThread::active_handles_offset()));
    __ movl(Address(rcx, JNIHandleBlock::top_offset_in_bytes()), (int32_t)NULL_WORD);
  }

  // pop our frame

  __ leave();

  if (!is_critical_native) {
    // Any exception pending?
    __ cmpptr(Address(r15_thread, in_bytes(Thread::pending_exception_offset())), (int32_t)NULL_WORD);
    __ jcc(Assembler::notEqual, exception_pending);
  }

  // Return

  __ ret(0);

  // Unexpected paths are out of line and go here

  if (!is_critical_native) {
    // forward the exception
    __ bind(exception_pending);

    // and forward the exception
    __ jump(RuntimeAddress(StubRoutines::forward_exception_entry()));
  }

  // Slow path locking & unlocking
  if (method->is_synchronized()) {

    // BEGIN Slow path lock
    __ bind(slow_path_lock);

    // has last_Java_frame setup. No exceptions so do vanilla call not call_VM
    // args are (oop obj, BasicLock* lock, JavaThread* thread)

    // protect the args we've loaded
    save_args(masm, total_c_args, c_arg, out_regs);

    __ mov(c_rarg0, obj_reg);
    __ mov(c_rarg1, lock_reg);
    __ mov(c_rarg2, r15_thread);

    // Not a leaf but we have last_Java_frame setup as we want
    __ call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_locking_C), 3);
    restore_args(masm, total_c_args, c_arg, out_regs);

#ifdef ASSERT
    { Label L;
    __ cmpptr(Address(r15_thread, in_bytes(Thread::pending_exception_offset())), (int32_t)NULL_WORD);
    __ jcc(Assembler::equal, L);
    __ stop("no pending exception allowed on exit from monitorenter");
    __ bind(L);
    }
#endif
    __ jmp(lock_done);

    // END Slow path lock

    // BEGIN Slow path unlock
    __ bind(slow_path_unlock);

    // If we haven't already saved the native result we must save it now as xmm registers
    // are still exposed.
    __ vzeroupper();
    if (ret_type == T_FLOAT || ret_type == T_DOUBLE ) {
      save_native_result(masm, ret_type, stack_slots);
    }

    __ lea(c_rarg1, Address(rsp, lock_slot_offset * VMRegImpl::stack_slot_size));

    __ mov(c_rarg0, obj_reg);
    __ mov(c_rarg2, r15_thread);
    __ mov(r12, rsp); // remember sp
    __ subptr(rsp, frame::arg_reg_save_area_bytes); // windows
    __ andptr(rsp, -16); // align stack as required by ABI

    // Save pending exception around call to VM (which contains an EXCEPTION_MARK)
    // NOTE that obj_reg == rbx currently
    __ movptr(rbx, Address(r15_thread, in_bytes(Thread::pending_exception_offset())));
    __ movptr(Address(r15_thread, in_bytes(Thread::pending_exception_offset())), (int32_t)NULL_WORD);

    // args are (oop obj, BasicLock* lock, JavaThread* thread)
    __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C)));
    __ mov(rsp, r12); // restore sp
    __ reinit_heapbase();
#ifdef ASSERT
    {
      Label L;
      __ cmpptr(Address(r15_thread, in_bytes(Thread::pending_exception_offset())), (int)NULL_WORD);
      __ jcc(Assembler::equal, L);
      __ stop("no pending exception allowed on exit complete_monitor_unlocking_C");
      __ bind(L);
    }
#endif /* ASSERT */

    __ movptr(Address(r15_thread, in_bytes(Thread::pending_exception_offset())), rbx);

    if (ret_type == T_FLOAT || ret_type == T_DOUBLE ) {
      restore_native_result(masm, ret_type, stack_slots);
    }
    __ jmp(unlock_done);

    // END Slow path unlock

  } // synchronized

  // SLOW PATH Reguard the stack if needed

  __ bind(reguard);
  __ vzeroupper();
  save_native_result(masm, ret_type, stack_slots);
  __ mov(r12, rsp); // remember sp
  __ subptr(rsp, frame::arg_reg_save_area_bytes); // windows
  __ andptr(rsp, -16); // align stack as required by ABI
  __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::reguard_yellow_pages)));
  __ mov(rsp, r12); // restore sp
  __ reinit_heapbase();
  restore_native_result(masm, ret_type, stack_slots);
  // and continue
  __ jmp(reguard_done);



  __ flush();

  nmethod *nm = nmethod::new_native_nmethod(method,
                                            compile_id,
                                            masm->code(),
                                            vep_offset,
                                            frame_complete,
                                            stack_slots / VMRegImpl::slots_per_word,
                                            (is_static ? in_ByteSize(klass_offset) : in_ByteSize(receiver_offset)),
                                            in_ByteSize(lock_slot_offset*VMRegImpl::stack_slot_size),
                                            oop_maps);

  return nm;
}

address native_function() const                { return *(native_function_addr()); }