新生代
之前叙述了G1的分区和Rset,这一次来关注一下G1新生代在发生GC的主要处理方式。G1的新生代的分区个数受之前动态计算出得分区的大小影响,如果设置了(MaxNewSize和NewSize)。除以G1推断的分区大小,可以得到新生代的最大分区数和最小分区数。如果同时设置(MaxNewSize和NewSize)和(NewRatio),则忽略(NewRatio)。如果只设置了(NewRatio),那么新生代的大小就是堆内存/(NewRatio—+1)。除以G1推断的分区大小,可以得到新生代的分区数。
如果没有设置(MaxNewSize和NewSize)或者(NewRatio),那么G1会根据独有的参数(G1MaxNewSizePercent=60)和(GNewSizePercent=5)占整个堆的比例来计算最大和最小分区数
新生代最大分区数和最小分区数被G1推断得到的一样的时候,这就意味着新生代不会动态变化,这就意味着在停顿预测的时候可能会无法满足期望值。
void G1YoungGenSizer::recalculate_min_max_young_length(uint number_of_heap_regions, uint* min_young_length, uint* max_young_length) {
assert(number_of_heap_regions > 0, "Heap must be initialized");
switch (_sizer_kind) {
case SizerDefaults:
*min_young_length = calculate_default_min_length(number_of_heap_regions);
*max_young_length = calculate_default_max_length(number_of_heap_regions);
break;
case SizerNewSizeOnly:
*max_young_length = calculate_default_max_length(number_of_heap_regions);
*max_young_length = MAX2(*min_young_length, *max_young_length);
break;
case SizerMaxNewSizeOnly:
*min_young_length = calculate_default_min_length(number_of_heap_regions);
*min_young_length = MIN2(*min_young_length, *max_young_length);
break;
case SizerMaxAndNewSize:
// Do nothing. Values set on the command line, don't update them at runtime.
break;
case SizerNewRatio:
*min_young_length = number_of_heap_regions / (NewRatio + 1);
*max_young_length = *min_young_length;
break;
default:
ShouldNotReachHere();
}
G1动态处理新生代大小,自适应新生代大小扩充。动态计算参见代码G1CollectorPolicy.cpp#expansion_amount如下。
size_t G1CollectorPolicy::expansion_amount() {
double recent_gc_overhead = recent_avg_pause_time_ratio() * 100.0;
double threshold = _gc_overhead_perc;
if (recent_gc_overhead > threshold) {
const size_t min_expand_bytes = 1*M;
size_t reserved_bytes = _g1->max_capacity();
size_t committed_bytes = _g1->capacity();
size_t uncommitted_bytes = reserved_bytes - committed_bytes;
size_t expand_bytes;
size_t expand_bytes_via_pct =
uncommitted_bytes * G1ExpandByPercentOfAvailable / 100;
expand_bytes = MIN2(expand_bytes_via_pct, committed_bytes);
expand_bytes = MAX2(expand_bytes, min_expand_bytes);
expand_bytes = MIN2(expand_bytes, uncommitted_bytes);
return expand_bytes;
} else {
return 0;
}
}
_gc_overhead_perc这个阈值关联参数(GCTimeRatio=9),
代表GC时间占用的时间和应用时间不超过10%不需要拓展,超过则需要拓展内存。需要扩展的大小和(G1ExpandByPercentOfAvailable=20)相关,把现有空间增加一倍,或者以G1ExpandByPercentOfAvailable设定的可扩展空间的百分比,以较小的为准,以最小扩展为界,最大分配一倍的当前已分配的内存,最小分配1M的内存,如果最小值都难以满足的话,则把剩下的所有空间都分配。触发时机参见代码CollectedHeap.cpp#do_collection_pause_at_safepoint在执行GC垃圾停顿收集的时候触发,最终调用expand方法进行内存扩充。
bool G1CollectedHeap::expand(size_t expand_bytes) {
size_t aligned_expand_bytes = ReservedSpace::page_align_size_up(expand_bytes);
aligned_expand_bytes = align_size_up(aligned_expand_bytes,
HeapRegion::GrainBytes);
ergo_verbose2(ErgoHeapSizing,
"expand the heap",
ergo_format_byte("requested expansion amount")
ergo_format_byte("attempted expansion amount"),
expand_bytes, aligned_expand_bytes);
if (is_maximal_no_gc()) {
ergo_verbose0(ErgoHeapSizing,
"did not expand the heap",
ergo_format_reason("heap already fully expanded"));
return false;
}
uint regions_to_expand = (uint)(aligned_expand_bytes / HeapRegion::GrainBytes);
assert(regions_to_expand > 0, "Must expand by at least one region");
uint expanded_by = _hrm.expand_by(regions_to_expand);
if (expanded_by > 0) {
size_t actual_expand_bytes = expanded_by * HeapRegion::GrainBytes;
assert(actual_expand_bytes <= aligned_expand_bytes, "post-condition");
g1_policy()->record_new_heap_size(num_regions());
} else {
ergo_verbose0(ErgoHeapSizing,
"did not expand the heap",
ergo_format_reason("heap expansion operation failed"));
// The expansion of the virtual storage space was unsuccessful.
// Let's see if it was because we ran out of swap.
if (G1ExitOnExpansionFailure &&
_hrm.available() >= regions_to_expand) {
// We had head room...
vm_exit_out_of_memory(aligned_expand_bytes, OOM_MMAP_ERROR, "G1 heap expansion");
}
}
return regions_to_expand > 0;
}
G1-YGC
我们都知道当新生代剩下的空间不够分配会触发GC垃圾回收,新生代的GC是对部分内存进行垃圾回收,GC时间比较少,分区化的G1堆针对新生代的收集的内存也是不固定的。首先我们明白在进行YGC的时候会进行STW。然后会选择需要收集的CSet,针对新生代而言就是整个新生代分区。然后加入收集任务中,去并行处理引用。引用关系搜索完毕之后,就是进行对象引用回收,处理对象晋升,晋升失败的还原对象头,尝试扩展内存等。G1-YGC工作流程如下
直接进入CollectedHeap.cpp#evacuate_collection_set方法一探其究。下图为并行清理CSet方法的工作流程
-
使用G1RootProcessor类去执行根扫描,扫描直接强引用。主要是JVM根和Java根。使用G1ParCopyHelper把对象复制。
-
Java根
-
类加载器
深度遍历当前类的加载的所有存活的Klass对象,找到之后复制到Survivor区或者晋升老年代。
-
线程栈
处理Java线程栈和本地方法栈中找,通过
StackFrameStream
的next执行飞到Sender,从而得到调用者,进而其找到关联的活跃堆内对象,将其复制到Survivor区或者晋升老年代。
知道了G1RootProcessor类会从上述的两个大方向上去找活跃对象,那么直接看代码,g1RootProcessor.cpp#evacuate_roots
void G1RootProcessor::process_java_roots(OopClosure* strong_roots, CLDClosure* thread_stack_clds, CLDClosure* strong_clds, CLDClosure* weak_clds, CodeBlobClosure* strong_code, G1GCPhaseTimes* phase_times, uint worker_i) { assert(thread_stack_clds == NULL || weak_clds == NULL, "There is overlap between those, only one may be set"); // Iterating over the CLDG and the Threads are done early to allow us to // first process the strong CLDs and nmethods and then, after a barrier, // let the thread process the weak CLDs and nmethods. { G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::CLDGRoots, worker_i); if (!_process_strong_tasks->is_task_claimed(G1RP_PS_ClassLoaderDataGraph_oops_do)) { ClassLoaderDataGraph::roots_cld_do(strong_clds, weak_clds); } } { G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::ThreadRoots, worker_i); Threads::possibly_parallel_oops_do(strong_roots, thread_stack_clds, strong_code); } } void ClassLoaderDataGraph::roots_cld_do(CLDClosure* strong, CLDClosure* weak) { for (ClassLoaderData* cld = _head; cld != NULL; cld = cld->_next) { CLDClosure* closure = cld->keep_alive() ? strong : weak; if (closure != NULL) { closure->do_cld(cld); } } } void ClassLoaderData::oops_do(OopClosure* f, KlassClosure* klass_closure, bool must_claim) { if (must_claim && !claim()) { return; } f->do_oop(&_class_loader); _dependencies.oops_do(f); _handles->oops_do(f); if (klass_closure != NULL) { classes_do(klass_closure); } } void ClassLoaderData::classes_do(KlassClosure* klass_closure) { for (Klass* k = _klasses; k != NULL; k = k->next_link()) { klass_closure->do_klass(k); assert(k != k->next_link(), "no loops!"); } }
最终发现调用的
G1KlassScanClosure
中的do_klass
class G1KlassScanClosure : public KlassClosure { G1ParCopyHelper* _closure; bool _process_only_dirty; int _count; public: G1KlassScanClosure(G1ParCopyHelper* closure, bool process_only_dirty) : _process_only_dirty(process_only_dirty), _closure(closure), _count(0) {} void do_klass(Klass* klass) { if (!_process_only_dirty || klass->has_modified_oops()) { klass->clear_modified_oops(); _closure->set_scanned_klass(klass); klass->oops_do(_closure); _closure->set_scanned_klass(NULL); } _count++; } };
主要执行
klass->oops_do(_closure);
,这个f为G1ParCopyHelper的对象,所以最终调用的g1CollectedHeap.cpp@G1ParCopyClosure#do_oop_workG1ParCopyHelper
的do_oop
最终调用do_oop_work
来把活跃对象复制到新分区。针对线程的处理则是在thread.cpp#possibly_parallel_oops_do
Threads::possibly_parallel_oops_do(strong_roots, thread_stack_clds, strong_code);
实际调用JavaThread::oops_do
遍历栈桢void Thread::oops_do(OopClosure* f, CLDClosure* cld_f, CodeBlobClosure* cf) { active_handles()->oops_do(f); // Do oop for ThreadShadow f->do_oop((oop*)&_pending_exception); handle_area()->oops_do(f); } void JavaThread::oops_do(OopClosure* f, CLDClosure* cld_f, CodeBlobClosure* cf) { Thread::oops_do(f, cld_f, cf); assert( (!has_last_Java_frame() && java_call_counter() == 0) || (has_last_Java_frame() && java_call_counter() > 0), "wrong java_sp info!"); if (has_last_Java_frame()) { RememberProcessedThread rpt(this); if (_privileged_stack_top != NULL) { _privileged_stack_top->oops_do(f); } if (_array_for_gc != NULL) { for (int index = 0; index < _array_for_gc->length(); index++) { f->do_oop(_array_for_gc->adr_at(index)); } } for (MonitorChunk* chunk = monitor_chunks(); chunk != NULL; chunk = chunk->next()) { chunk->oops_do(f); } for(StackFrameStream fst(this); !fst.is_done(); fst.next()) { fst.current()->oops_do(f, cld_f, cf, fst.register_map()); } } set_callee_target(NULL); assert(vframe_array_head() == NULL, "deopt in progress at a safepoint!"); GrowableArray* list = deferred_locals(); if (list != NULL) { for (int i = 0; i < list->length(); i++) { list->at(i)->oops_do(f); } } f->do_oop((oop*) &_threadObj); f->do_oop((oop*) &_vm_result); f->do_oop((oop*) &_exception_oop); f->do_oop((oop*) &_pending_async_exception); if (jvmti_thread_state() != NULL) { jvmti_thread_state()->oops_do(f); } }
从JNI本地代码栈和JVM内部方法栈中找活跃对象,从java栈中找,遍历Monitor块,遍历jvmti(JVM Tool Interface)这里主要使用是JavaAgent。最后执行
G1ParCopyHelper
的do_oop
最终调用do_oop_work
来把活跃对象复制到新分区。 -
-
JVM根
一些全局JVM对象,如Universe,JNIHandles,SystemDictionary,StringTable等等
void G1RootProcessor::process_vm_roots(OopClosure* strong_roots, OopClosure* weak_roots, G1GCPhaseTimes* phase_times, uint worker_i) { { G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::UniverseRoots, worker_i); if (!_process_strong_tasks->is_task_claimed(G1RP_PS_Universe_oops_do)) { Universe::oops_do(strong_roots); } } .... void Universe::oops_do(OopClosure* f, bool do_all) { f->do_oop((oop*) &_int_mirror); f->do_oop((oop*) &_float_mirror); f->do_oop((oop*) &_double_mirror); ........ }
针对JVM根 同样也是调用的
G1ParCopyHelper
的do_oop
只不过对JVM根而言则是各种全局对象。例如Universe
g1CollectedHeap.cpp@G1ParCopyClosure#do_oop_work工作流程如下
执行对象复制复制的操作在G1ParScanThreadState#copy_to_survivor_space方法中。具体处理如下
-
-
处理RSet
-
我们在G1ParTask的work方法中来看处理RSet的入口。
void G1RootProcessor::scan_remembered_sets(G1ParPushHeapRSClosure* scan_rs, OopClosure* scan_non_heap_weak_roots, uint worker_i) { ... _g1h->g1_rem_set()->oops_into_collection_set_do(scan_rs, &scavenge_cs_nmethods, worker_i); }
主要是去执行G1RemSet中的
oops_into_collection_set_do
方法。主要信息更新RSet和扫描RSet。void G1RemSet::oops_into_collection_set_do(G1ParPushHeapRSClosure* oc, CodeBlobClosure* code_root_cl, uint worker_i) { DirtyCardQueue into_cset_dcq(&_g1->into_cset_dirty_card_queue_set()); updateRS(&into_cset_dcq, worker_i); scanRS(oc, code_root_cl, worker_i); _cset_rs_update_cl[worker_i] = NULL; }
这里看到有个DCQ,在研究RSet的时候就遇到这种队列,当时说的是给予Mutator用于记录应用线程运行时引用情况,这里这个主要是用于记录复制失败后,要保留的引用,此队列数据将传递到用于管理RSet更新的DirtyCardQueueSet。
-
更新RSet
主要用于把上面这个DCQ对象存到RSet的PRT当中。
G1GCParPhaseTimesTracker x(_g1p->phase_times(), G1GCPhaseTimes::UpdateRS, worker_i); // Apply the given closure to all remaining log entries. RefineRecordRefsIntoCSCardTableEntryClosure into_cset_update_rs_cl(_g1, into_cset_dcq); _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, into_cset_dcq, false, worker_i); } void G1CollectedHeap::iterate_dirty_card_closure(CardTableEntryClosure* cl, DirtyCardQueue* into_cset_dcq, bool concurrent, uint worker_i) { // Clean cards in the hot card cache G1HotCardCache* hot_card_cache = _cg1r->hot_card_cache(); hot_card_cache->drain(worker_i, g1_rem_set(), into_cset_dcq); DirtyCardQueueSet& dcqs = JavaThread::dirty_card_queue_set(); size_t n_completed_buffers = 0; while (dcqs.apply_closure_to_completed_buffer(cl, worker_i, 0, true)) { n_completed_buffers++; } g1_policy()->phase_times()->record_thread_work_item(G1GCPhaseTimes::UpdateRS, worker_i, n_completed_buffers); dcqs.clear_n_completed_buffers(); assert(!dcqs.completed_buffers_exist_dirty(), "Completed buffers exist!"); }
首先使用
RefineRecordRefsIntoCSCardTableEntryClosure
闭包处理,处理整个卡中如果存在对堆内对象的引用,就是脏卡,就需要入队,被Refine线程处理。iterate_dirty_card_closure
方法处理DCQS中剩余的DCQ,和Java线程处理方式一样。 -
扫描Rset
根据Rset中的信息找到引用者
void G1RemSet::scanRS(G1ParPushHeapRSClosure* oc, CodeBlobClosure* code_root_cl, uint worker_i) { double rs_time_start = os::elapsedTime(); HeapRegion *startRegion = _g1->start_cset_region_for_worker(worker_i); ScanRSClosure scanRScl(oc, code_root_cl, worker_i); _g1->collection_set_iterate_from(startRegion, &scanRScl); scanRScl.set_try_claimed(); _g1->collection_set_iterate_from(startRegion, &scanRScl); double scan_rs_time_sec = (os::elapsedTime() - rs_time_start) - scanRScl.strong_code_root_scan_time_sec(); assert(_cards_scanned != NULL, "invariant"); _cards_scanned[worker_i] = scanRScl.cards_done(); _g1p->phase_times()->record_time_secs(G1GCPhaseTimes::ScanRS, worker_i, scan_rs_time_sec); _g1p->phase_times()->record_time_secs(G1GCPhaseTimes::CodeRoots, worker_i, scanRScl.strong_code_root_scan_time_sec()); }
使用GC线程id分片处理不同的分区,执行流程主要是俩次扫描分区。处理一般对象和代码对象主要处理内联优化之后的代码引用对象。主要执行流程如下
-
- 对象复制
-
主要处理根扫描出的对象和 RSet中找到的子对象全部复制到新的分区当中。所有的对象都被放在ParScanState的队列中。执行复制的过程就是从该队列中出队,处理不同的对象类型。最终调用deal_with_reference方法来处理。把cset中所有的活跃对象都复制到新的分区的Survivor或者老年代当中。