From 0e5ce680295709d971a32fa9d13008227332c1a0 Mon Sep 17 00:00:00 2001 From: Ivan Maidanski Date: Tue, 20 Jun 2017 10:31:44 +0300 Subject: [PATCH] Convert overview.html, tree.html to Markdown format * README.md: Replace overview.html with overview.md. * doc/doc.am (dist_doc_DATA): Likewise. * doc/doc.am (dist_doc_DATA): Replace tree.html with tree.md. * doc/gcdescr.md (Mark phase): Likewise. * doc/overview.html: Change file suffix to .md; convert text format from HTML to Markdown. * doc/tree.html: Likewise. --- README.md | 4 +- doc/doc.am | 4 +- doc/gcdescr.md | 2 +- doc/overview.html | 431 ------------------------------------------------------ doc/overview.md | 388 ++++++++++++++++++++++++++++++++++++++++++++++++ doc/tree.html | 195 ------------------------ doc/tree.md | 175 ++++++++++++++++++++++ 7 files changed, 568 insertions(+), 631 deletions(-) delete mode 100644 doc/overview.html create mode 100644 doc/overview.md delete mode 100644 doc/tree.html create mode 100644 doc/tree.md diff --git a/README.md b/README.md index 1ba70ad..527161f 100644 --- a/README.md +++ b/README.md @@ -73,8 +73,8 @@ collector. (See doc/README.cords and H.-J. Boehm, R. Atkinson, and M. Plass, (December 1995), pp. 1315-1330. This is very similar to the "rope" package in Xerox Cedar, or the "rope" package in the SGI STL or the g++ distribution.) -Further collector documentation can be found -in [overview.html](doc/overview.html). +Further collector documentation can be found in the +[overview](doc/overview.md). ## General Description diff --git a/doc/doc.am b/doc/doc.am index ae1bd92..73197ac 100644 --- a/doc/doc.am +++ b/doc/doc.am @@ -42,8 +42,8 @@ dist_doc_DATA = \ doc/gcdescr.md \ doc/gcinterface.md \ doc/leak.md \ - doc/overview.html \ + doc/overview.md \ doc/porting.md \ doc/scale.md \ doc/simple_example.md \ - doc/tree.html + doc/tree.md diff --git a/doc/gcdescr.md b/doc/gcdescr.md index 4dbae2f..bddbab3 100644 --- a/doc/gcdescr.md +++ b/doc/gcdescr.md @@ -251,7 +251,7 @@ block. This is done in the following steps: actually consist of multiple such pages. HBLKSIZE is usually the page size divided by a small power of two.) * The page address part of the candidate pointer is looked up in - a [table](tree.html). Each table entry contains either 0, indicating that + a [table](tree.md). Each table entry contains either 0, indicating that the page is not part of the garbage collected heap, a small integer _n_, indicating that the page is part of large object, starting at least _n_ pages back, or a pointer to a descriptor for the page. In the first case, diff --git a/doc/overview.html b/doc/overview.html deleted file mode 100644 index 3c31c86..0000000 --- a/doc/overview.html +++ /dev/null @@ -1,431 +0,0 @@ - -A garbage collector for C and C++ - - - - - - - - - - -
Interface OverviewTutorial SlidesFAQExampleDownloadLicense
-

A garbage collector for C and C++

- -[ This is an updated version of the page formerly at -www.hpl.hp.com/personal/Hans_Boehm/gc/, -before that at -http://reality.sgi.com/boehm/gc.html -and before that at -ftp://ftp.parc.xerox.com/pub/gc/gc.html. ] -

-The Boehm-Demers-Weiser -conservative Garbage Collector (BDWGC) can -be used as a garbage collecting -replacement for C malloc or C++ new. -It allows you to allocate memory basically as you normally would, -without explicitly deallocating memory that is no longer useful. -The collector automatically recycles memory when it determines -that it can no longer be otherwise accessed. -A simple example of such a use is given -here. -

-The collector is also used by a number of programming language -implementations that either use C as intermediate code, want -to facilitate easier interoperation with C libraries, or -just prefer the simple collector interface. -For a more detailed description of the interface, see -here. -

-Alternatively, the garbage collector may be used as -a leak detector -for C or C++ programs, though that is not its primary goal. -

-Typically several versions are offered for -downloading: -preview, stable, legacy. -Usually you should use the one marked as the latest stable release. -Preview versions may contain additional features, platform support, -but are likely to be less well tested. -The list of changes for each version is specified on the -releases page. -

-The arguments for and against conservative garbage collection -in C and C++ are briefly -discussed in -issues.html. -The beginnings of a frequently-asked-questions list are -here. -

-The garbage collector code is copyrighted by -Hans-J. Boehm, -Alan J. Demers, -Xerox Corporation, -Silicon Graphics, -and -Hewlett-Packard Company. -It may be used and copied without payment of a fee under minimal restrictions. -See the README file in the distribution or the -license for more details. -IT IS PROVIDED AS IS, -WITH ABSOLUTELY NO WARRANTY EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK. -

-Empirically, this collector works with most unmodified C programs, -simply by replacing -malloc with GC_malloc calls, -replacing realloc with GC_realloc calls, and removing -free calls. Exceptions are discussed -in issues.html. -

Platforms

-The collector is not completely portable, but the distribution -includes ports to most standard PC and UNIX/Linux platforms. -The collector should work on Linux, *BSD, recent Windows versions, -MacOS X, HP/UX, Solaris, -Tru64, Irix and a few other operating systems. -Some ports are more polished than others. -

-Irix pthreads, Linux threads, Win32 threads, Solaris threads -(pthreads only), -HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported -in recent versions. -

Separately distributed ports

-For MacOS 9/Classic use, Patrick Beard's latest port is available from -http://homepage.mac.com/pcbeard/gc/. -(Unfortunately, that's now quite dated. -I'm not in a position to test under MacOS. Although I try to -incorporate changes, it is impossible for -me to update the project file.) -

-Precompiled versions of the collector for NetBSD are available -here. -

-Debian Linux includes prepackaged -versions of the collector. -

Scalable multiprocessor versions

-Kenjiro Taura, Toshio Endo, and Akinori Yonezawa have made available -a parallel collector -based on this one. Their collector takes advantage of multiple processors -during a collection. Starting with collector version 6.0alpha1 -we also do this, though with more modest processor scalability goals. -Our approach is discussed briefly in -this document. -

Some Collector Details

-The collector uses a mark-sweep algorithm. -It provides incremental and generational -collection under operating systems which provide the right kind of -virtual memory support. (Currently this includes SunOS[45], IRIX, -OSF/1, Linux, and Windows, with varying restrictions.) -It allows finalization code -to be invoked when an object is collected. -It can take advantage of type information to locate pointers if such -information is provided, but it is usually used without such information. -See the README and -gc.h files in the distribution for more details. -

-For an overview of the implementation, see here. -

-The garbage collector distribution includes a C string -(cord) package that provides -for fast concatenation and substring operations on long strings. -A simple curses- and win32-based editor that represents the entire file -as a cord is included as a -sample application. -

-Performance of the nonincremental collector is typically competitive -with malloc/free implementations. Both space and time overhead are -likely to be only slightly higher -for programs written for malloc/free -(see Detlefs, Dosser and Zorn's -Memory Allocation Costs in Large C and C++ Programs.) -For programs allocating primarily very small objects, the collector -may be faster; for programs allocating primarily large objects it will -be slower. If the collector is used in a multi-threaded environment -and configured for thread-local allocation, it may in some cases -significantly outperform malloc/free allocation in time. -

-We also expect that in many cases any additional overhead -will be more than compensated for by decreased copying etc. -if programs are written -and tuned for garbage collection. -

Further Reading:

-The beginnings of a frequently asked questions list for this -collector are here. -

-The following provide information on garbage collection in general: -

-Paul Wilson's garbage collection ftp archive and GC survey. -

-The Ravenbrook -Memory Management Reference. -

-David Chase's -GC FAQ. -

-Richard Jones' - -Garbage Collection Page and - -his book. -

-The following papers describe the collector algorithms we use -and the underlying design decisions at -a higher level. -

-(Some of the lower level details can be found -here.) -

-The first one is not available -electronically due to copyright considerations. Most of the others are -subject to ACM copyright. -

-Boehm, H., "Dynamic Memory Allocation and Garbage Collection", Computers in Physics -9, 3, May/June 1995, pp. 297-303. This is directed at an otherwise sophisticated -audience unfamiliar with memory allocation issues. The algorithmic details differ -from those in the implementation. There is a related letter to the editor and a minor -correction in the next issue. -

-Boehm, H., and M. Weiser, -"Garbage Collection in an Uncooperative Environment", -Software Practice & Experience, September 1988, pp. 807-820. -

-Boehm, H., A. Demers, and S. Shenker, "Mostly Parallel Garbage Collection", -Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, -SIGPLAN Notices 26, 6 (June 1991), pp. 157-164. -

-Boehm, H., "Space Efficient Conservative Garbage Collection", -Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design -and Implementation, SIGPLAN Notices 28, 6 (June 1993), pp. 197-206. -

-Boehm, H., "Reducing Garbage Collector Cache Misses", - Proceedings of the 2000 International Symposium on Memory Management . - -Official version. - -Technical report version. Describes the prefetch strategy -incorporated into the collector for some platforms. Explains why -the sweep phase of a "mark-sweep" collector should not really be -a distinct phase. -

-M. Serrano, H. Boehm, -"Understanding Memory Allocation of Scheme Programs", -Proceedings of the Fifth ACM SIGPLAN International Conference on -Functional Programming, 2000, Montreal, Canada, pp. 245-256. - -Official version. - -Earlier Technical Report version. Includes some discussion of the -collector debugging facilities for identifying causes of memory retention. -

-Boehm, H., -"Fast Multiprocessor Memory Allocation and Garbage Collection", - -HP Labs Technical Report HPL 2000-165. Discusses the parallel -collection algorithms, and presents some performance results. -

-Boehm, H., "Bounding Space Usage of Conservative Garbage Collectors", -Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles of -Programming Languages, Jan. 2002, pp. 93-100. - -Official version. - -Technical report version. -Includes a discussion of a collector facility to much more reliably test for -the potential of unbounded heap growth. -

-The following papers discuss language and compiler restrictions necessary to guaranteed -safety of conservative garbage collection. -

-We thank John Levine and JCLT for allowing -us to make the second paper available electronically, and providing PostScript for the final -version. -

-Boehm, H., "Simple Garbage-Collector-Safety", -Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design -and Implementation. -

-Boehm, H., and D. Chase, "A Proposal for Garbage-Collector-Safe C Compilation", -Journal of C Language Translation 4, 2 (December 1992), pp. 126-141. -

-Other related information: -

-The Detlefs, Dosser and Zorn's Memory Allocation Costs in Large C and C++ Programs. - This is a performance comparison of the Boehm-Demers-Weiser collector to malloc/free, -using programs written for malloc/free. -

-Joel Bartlett's mostly copying conservative garbage collector for C++. -

-John Ellis and David Detlef's -Safe Efficient Garbage Collection for C++ -proposal. -

-Henry Baker's paper collection. -

-Slides for Hans Boehm's Allocation and GC Myths talk. -

Current users:

-Known current users of some variant of this collector include: -

-The runtime system for -GCJ, -the static GNU java compiler. -

-W3m, a text-based web browser. -

-Some versions of the Xerox DocuPrint printer software. -

-The Mozilla project, as leak -detector. -

-The Mono project, -an open source implementation of the .NET development framework. -

-The DotGNU Portable.NET -project, another open source .NET implementation. -

-The Irssi IRC client. -

-The Berkeley Titanium project. -

-The NAGWare f90 Fortran 90 compiler. -

-Elwood Corporation's Eclipse Common Lisp system, C library, and translator. -

-The Bigloo Scheme -and Camloo ML compilers -written by Manuel Serrano and others. -

-Brent Benson's libscheme. -

-The MzScheme scheme implementation. -

-The University of Washington Cecil Implementation. -

-The Berkeley Sather implementation. -

-The Berkeley Harmonia Project. -

-The Toba Java Virtual -Machine to C translator. -

-The Gwydion Dylan compiler. -

-The -GNU Objective C runtime. -

-Macaulay 2, a system to support -research in algebraic geometry and commutative algebra. -

-The Vesta configuration management -system. -

-Visual Prolog 6. -

-Asymptote LaTeX-compatible -vector graphics language. -

More information on the BDWGC primary site

-A simple illustration of how to build and -use the collector. -

-Description of alternate interfaces to the -garbage collector. -

-Slides from an ISMM 2004 tutorial about the GC. -

-A FAQ (frequently asked questions) list. -

-How to use the garbage collector as a leak detector. -

-Some hints on debugging garbage collected -applications. -

-An overview of the implementation of the -garbage collector. -

-The data structure used for fast pointer lookups. -

-Scalability of the collector to multiprocessors. -

-Directory containing -the distribution files of all garbage collector releases. -It duplicates -Download page on -GitHub. -

More background information

-An attempt to establish a bound on space usage of -conservative garbage collectors. -

-Mark-sweep versus copying garbage collectors -and their complexity. -

-Pros and cons of conservative garbage collectors, -in comparison to other collectors. -

-Issues related to garbage collection vs. -manual memory management in C/C++. -

-An example of a case in which garbage collection -results in a much faster implementation as a result of reduced synchronization. -

-Slide set discussing performance of nonmoving -garbage collectors. -

- -Slide set discussing Destructors, Finalizers, and Synchronization -(POPL 2003). -

- -Paper corresponding to above slide set -( -Technical Report version). -

-A Java/Scheme/C/C++ garbage collection benchmark. -

-Slides for talk on memory allocation myths. -

-Slides for OOPSLA 98 garbage collection talk. -

-Related papers. -

Contacts and new release announcements

-GitHub and Stack Overflow are the major two places for communication. -

-Technical questions (how to, how does it work, etc.) should be posted to -Stack Overflow -with "boehm-gc" tag. -

-To contribute, please rebase your code to the latest -master and submit -a pull request to GitHub. -

-To report a bug, or propose (request) a new feature, create -a GitHub issue. -Please make sure it has not been reported yet by someone else. -

-To receive notifications on every release, please subscribe to -Releases RSS feed. -Notifications on all issues and pull requests are available by -watching the project. -

-Mailing lists (bdwgc-announce@lists.opendylan.org, bdwgc@lists.opendylan.org, -and the former gc-announce@linux.hpl.hp.com and gc@linux.hpl.hp.com) are not -used at this moment. Their content is available in -bdwgc-announce -and -bdwgc -archive files, respectively. -The gc list archive may also be read at -Narkive. -

-Some prior discussion of the collector has taken place on the gcc -java mailing list, whose archives appear -here, and also on -gclist@iecc.com. -

diff --git a/doc/overview.md b/doc/overview.md new file mode 100644 index 0000000..fb3aa2a --- /dev/null +++ b/doc/overview.md @@ -0,0 +1,388 @@ +[Interface Overview](gcinterface.md) | [Tutorial Slides](http://www.hboehm.info/gc/04tutorial.pdf) | [FAQ](http://www.hboehm.info/gc/faq.html) | [Example](simple_example.md) | [Download](https://github.com/ivmai/bdwgc/wiki/Download) | [License](http://www.hboehm.info/gc/license.txt) +---|---|---|---|---|--- + +# A garbage collector for C and C++ + + * Platforms + * Scalable multiprocessor versions + * Some collector details + * Further reading + * Current users + * Local Links for this collector + * Local Background Links + * Contacts and Mailing List + +[ This is an updated version of the page formerly at +`www.hpl.hp.com/personal/Hans_Boehm/gc/`, before that at +`http://reality.sgi.com/boehm/gc.html` and before that at +`ftp://ftp.parc.xerox.com/pub/gc/gc.html`. ] + +The +[Boehm](http://www.hboehm.info)-[Demers](http://www.cs.cornell.edu/annual_report/00-01/bios.htm#demers)-[Weiser](http://www.ubiq.com/hypertext/weiser/weiser.html) +conservative Garbage Collector (**BDWGC**) can be used as a garbage collecting +replacement for C `malloc` or C++ `new`. It allows you to allocate memory +basically as you normally would, without explicitly deallocating memory that +is no longer useful. The collector automatically recycles memory when +it determines that it can no longer be otherwise accessed. A simple example +of such a use is given [here](simple_example.md). + +The collector is also used by a number of programming language implementations +that either use C as intermediate code, want to facilitate easier +interoperation with C libraries, or just prefer the simple collector +interface. For a more detailed description of the interface, see +[here](gcinterface.md). + +Alternatively, the garbage collector may be used as a [leak detector](leak.md) +for C or C++ programs, though that is not its primary goal. + +Typically several versions are offered for +[downloading](https://github.com/ivmai/bdwgc/wiki/Download): preview, stable, +legacy. Usually you should use the one marked as the _latest stable_ release. +Preview versions may contain additional features, platform support, but are +likely to be less well tested. The list of changes for each version +is specified on the [releases](https://github.com/ivmai/bdwgc/releases) page. + +The arguments for and against conservative garbage collection in C and C++ are +briefly discussed [here](http://www.hboehm.info/gc/issues.html). The +beginnings of a frequently-asked-questions list are +[here](http://www.hboehm.info/gc/faq.html). + +The garbage collector code is copyrighted by +[Hans-J. Boehm](http://www.hboehm.info), Alan J. Demers, +[Xerox Corporation](http://www.xerox.com/), +[Silicon Graphics](http://www.sgi.com/), and +[Hewlett-Packard Company](http://www.hp.com/). It may be used and copied +without payment of a fee under minimal restrictions. See the README.md file +in the distribution or the [license](http://www.hboehm.info/gc/license.txt) +for more details. **IT IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY +EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK**. + +Empirically, this collector works with most unmodified C programs, simply +by replacing `malloc` with `GC_malloc` calls, replacing `realloc` with +`GC_realloc` calls, and removing free calls. Exceptions are discussed +[here](http://www.hboehm.info/gc/issues.html). + +## Platforms + +The collector is not completely portable, but the distribution includes ports +to most standard PC and UNIX/Linux platforms. The collector should work +on Linux, *BSD, recent Windows versions, MacOS X, HP/UX, Solaris, Tru64, Irix +and a few other operating systems. Some ports are more polished than others. + +Irix pthreads, Linux threads, Win32 threads, Solaris threads (pthreads only), +HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported in recent +versions. + +### Separately distributed ports + +For MacOS 9/Classic use, Patrick Beard's latest port is available from +`http://homepage.mac.com/pcbeard/gc/`. (Unfortunately, that's now quite dated. +I'm not in a position to test under MacOS. Although I try to incorporate +changes, it is impossible for me to update the project file.) + +Precompiled versions of the collector for NetBSD are available +[here](ftp://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/devel/boehm-gc/README.html). + + +[Debian Linux](http://www.debian.org/) includes prepackaged versions of the +collector. + +## Scalable multiprocessor versions + +Kenjiro Taura, Toshio Endo, and Akinori Yonezawa have made available +a [parallel collector](http://ieeexplore.ieee.org/abstract/document/1592629/) +based on this one. Their collector takes advantage of multiple processors +during a collection. Starting with GC v6.0alpha1 we also do this, though with +more modest processor scalability goals. Our approach is discussed briefly +in [this document](scale.md). + +## Some Collector Details + +The collector uses a [mark-sweep](http://www.hboehm.info/gc/complexity.html) +algorithm. It provides incremental and generational collection under operating +systems which provide the right kind of virtual memory support. (Currently +this includes SunOS[45], IRIX, OSF/1, Linux, and Windows, with varying +restrictions.) It allows [_finalization_](finalization.md) code to be invoked +when an object is collected. It can take advantage of type information +to locate pointers if such information is provided, but it is usually used +without such information. See the README and `gc.h` files in the distribution +for more details. + +For an overview of the implementation, see [here](gcdescr.md). + +The garbage collector distribution includes a C string (`cord.h`) package that +provides for fast concatenation and substring operations on long strings. +A simple curses- and win32-based editor that represents the entire file as +a cord is included as a sample application. + +Performance of the non-incremental collector is typically competitive with +`malloc`/`free` implementations. Both space and time overhead are likely to be +only slightly higher for programs written for `malloc`/`free` (see Detlefs, +Dosser and Zorn's +[Memory Allocation Costs in Large C and C++ Programs](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.3073&rep=rep1&type=ps)). +For programs allocating primarily very small objects, the collector may be +faster; for programs allocating primarily large objects it will be slower. +If the collector is used in a multi-threaded environment and configured for +thread-local allocation, it may in some cases significantly outperform +`malloc`/`free` allocation in time. + +We also expect that in many cases any additional overhead will be more than +compensated for by decreased copying etc. if programs are written and tuned +for garbage collection. + +# Further Reading: + +**The beginnings of a frequently asked questions list for this collector are +[here](http://www.hboehm.info/gc/faq.html)**. + +**The following provide information on garbage collection in general**: Paul +Wilson's [garbage collection ftp archive](ftp://ftp.cs.utexas.edu/pub/garbage) +and [GC survey](ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps). + +The Ravenbrook +[Memory Management Reference](http://www.memorymanagement.org/). + +David Chase's [GC FAQ](http://www.iecc.com/gclist/GC-faq.html). + +Richard Jones' +[Garbage Collection Page](https://www.cs.kent.ac.uk/people/staff/rej/gc.html) +and his [book](http://www.cs.kent.ac.uk/people/staff/rej/gcbook/gcbook.html). + +**The following papers describe the collector algorithms we use and the +underlying design decisions at a higher level.** + +(Some of the lower level details can be found [here](gcdescr.md).) + +The first one is not available electronically due to copyright considerations. +Most of the others are subject to ACM copyright. + +Boehm, H., Dynamic Memory Allocation and Garbage Collection, +_Computers in Physics 9_, 3, May/June 1995, pp. 297-303. This is directed +at an otherwise sophisticated audience unfamiliar with memory allocation +issues. The algorithmic details differ from those in the implementation. There +is a related letter to the editor and a minor correction in the next issue. + +Boehm, H., and [M. Weiser](http://www.ubiq.com/hypertext/weiser/weiser.html), +[Garbage Collection in an Uncooperative Environment](http://www.hboehm.info/spe_gc_paper/), +_Software Practice & Experience_, September 1988, pp. 807-820. + +Boehm, H., A. Demers, and S. Shenker, +[Mostly Parallel Garbage Collection](http://www.hboehm.info/gc/papers/pldi91.ps.Z), +Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design +and Implementation, _SIGPLAN Notices 26_, 6 (June 1991), pp. 157-164. + +Boehm, H., +[Space Efficient Conservative Garbage Collection](http://www.hboehm.info/gc/papers/pldi93.ps.Z), +Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design +and Implementation, _SIGPLAN Notices 28_, 6 (June 1993), pp. 197-206. + +Boehm, H., Reducing Garbage Collector Cache Misses, +_Proceedings of the 2000 International Symposium on Memory Management_. +[Official version](http://portal.acm.org/citation.cfm?doid=362422.362438). +[Technical report](http://www.hpl.hp.com/techreports/2000/HPL-2000-99.html) +version. Describes the prefetch strategy incorporated into the collector for +some platforms. Explains why the sweep phase of a _mark-sweep_ collector +should not really be a distinct phase. + +M. Serrano, H. Boehm, Understanding Memory Allocation of Scheme Programs, +_Proceedings of the Fifth ACM SIGPLAN International Conference on Functional +Programming_, 2000, Montreal, Canada, pp. 245-256. +[Official version](http://dl.acm.org/citation.cfm?id=351264). Earlier +[Technical Report](http://www.hpl.hp.com/techreports/2000/HPL-2000-62.html) +version. Includes some discussion of the collector debugging facilities for +identifying causes of memory retention. + +Boehm, H., Fast Multiprocessor Memory Allocation and Garbage Collection, +[HP Labs Technical Report HPL 2000-165](http://www.hpl.hp.com/techreports/2000/HPL-2000-165.html). +Discusses the parallel collection algorithms, and presents some performance +results. + +Boehm, H., Bounding Space Usage of Conservative Garbage Collectors, +_Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles +of Programming Languages_, Jan. 2002, pp. 93-100. +[Official version](http://portal.acm.org/citation.cfm?doid=503272.503282). +[Technical report](http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html) +version. Includes a discussion of a collector facility to much more reliably +test for the potential of unbounded heap growth. + +**The following papers discuss language and compiler restrictions necessary +to guaranteed safety of conservative garbage collection.** + +We thank John Levine and JCLT for allowing us to make the second paper +available electronically, and providing PostScript for the final version. + +Boehm, H., +[Simple Garbage-Collector-Safety](http://www.hboehm.info/gc/papers/pldi96.ps.gz), +Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design +and Implementation. + +Boehm, H., and D. Chase, +[A Proposal for Garbage-Collector-Safe C Compilation](http://www.hboehm.info/gc/papers/boecha.ps.gz), +_Journal of C Language Translation 4_, 2 (December 1992), pp. 126-141. + +**Other related information:** + +The Detlefs, Dosser and Zorn's +[Memory Allocation Costs in Large C and C++ Programs](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.3073&rep=rep1&type=ps). +This is a performance comparison of the Boehm-Demers-Weiser collector +to `malloc`/`free`, using programs written for `malloc`/`free`. + +Joel Bartlett's +[mostly copying conservative garbage collector for C++](ftp://gatekeeper.dec.com/pub/compaq/WRL/research-reports/WRL-TN-12.ps). + +John Ellis and David Detlef's +[Safe Efficient Garbage Collection for C++](http://dl.acm.org/citation.cfm?id=1267983) +proposal. + +Henry Baker's [paper collection](http://home.pipeline.com/%7Ehbaker1/). + +Slides for Hans Boehm's +[Allocation and GC Myths](http://www.hboehm.info/gc/myths.ps) talk. + +# Current users: + +Known current users of some variant of this collector include: + +The runtime system for [GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcj/), +the static GNU java compiler. + +[W3m](http://w3m.sourceforge.net/), a text-based web browser. + +Some versions of the Xerox DocuPrint printer software. + +The [Mozilla](http://www.mozilla.org/) project, as leak detector. + +The [Mono](http://www.mono-project.com) project, an open source implementation +of the .NET development framework. + +The [DotGNU Portable.NET project](http://www.gnu.org/projects/dotgnu/), +another open source .NET implementation. + +The [Irssi IRC client](http://irssi.org/). + +[The Berkeley Titanium project](http://titanium.cs.berkeley.edu/). + +[The NAGWare f90 Fortran 90 compiler](http://www.nag.co.uk/nagware/NP/NP_desc.asp). + +Elwood Corporation's Eclipse Common Lisp system, C library, and translator. + +The [Bigloo Scheme](http://www-sop.inria.fr/mimosa/fp/Bigloo/) and +[Camloo ML compilers](https://github.com/samoht/camloo) written by Manuel +Serrano and others. + +Brent Benson's +[libscheme](http://www.cs.indiana.edu/scheme-repository/libscheme-vhll/final.html). + +The MzScheme scheme implementation. + +The +[University of Washington Cecil Implementation](http://www.cs.washington.edu/research/projects/cecil/www/cecil-home.html). + +[The Berkeley Sather implementation](http://www1.icsi.berkeley.edu/~sather/). + +[The Berkeley Harmonia Project](http://www.cs.berkeley.edu/~harmonia/harmonia/index.html). + +The +[Toba](http://www.cs.arizona.edu/projects/sumatra/toba/) Java Virtual Machine +to C translator. + +The [Gwydion Dylan compiler](http://www.gwydiondylan.org/). + +The +[GNU Objective C runtime](http://gcc.gnu.org/onlinedocs/gcc/Objective-C.html). + +[Macaulay 2](http://www.math.uiuc.edu/Macaulay2), a system to support research +in algebraic geometry and commutative algebra. + +The [Vesta](http://www.vestasys.org/) configuration management system. + +[Visual Prolog 6](http://www.visual-prolog.com/). + +[Asymptote LaTeX-compatible vector graphics language](http://asymptote.sf.net/). + +# More information on the BDWGC primary site + +[A simple illustration of how to build and use the collector](simple_example.md). + +[Description of alternate interfaces to the garbage collector](gcinterface.md). + +[Slides from an ISMM 2004 tutorial about the GC](http://www.hboehm.info/gc/04tutorial.pdf). + +[A FAQ (frequently asked questions) list](http://www.hboehm.info/gc/faq.html). + +[How to use the garbage collector as a leak detector](leak.md). + +[Some hints on debugging garbage collected applications](debugging.md). + +[An overview of the implementation of the garbage collector](gcdescr.md). + +[The data structure used for fast pointer lookups](tree.md). + +[Scalability of the collector to multiprocessors](scale.md). + +[Directory](http://www.hboehm.info/gc/gc_source/) containing the distribution +files of all garbage collector releases. It duplicates +[Download](https://github.com/ivmai/bdwgc/wiki/Download) page on GitHub. + +# More background information + +[An attempt to establish a bound on space usage of conservative garbage collectors](http://www.hboehm.info/gc/bounds.html). + +[Mark-sweep versus copying garbage collectors and their complexity](http://www.hboehm.info/gc/complexity.html). + +[Pros and cons of conservative garbage collectors, in comparison to other collectors](http://www.hboehm.info/gc/conservative.html). + +[Issues related to garbage collection vs. manual memory management in C/C++](http://www.hboehm.info/gc/issues.html). + +[An example of a case in which garbage collection results in a much faster implementation as a result of reduced synchronization](http://www.hboehm.info/gc/example.html). + +[Slide set discussing performance of nonmoving garbage collectors](http://www.hboehm.info/gc/nonmoving/). + +[Slide set discussing _Destructors, Finalizers, and Synchronization_, POPL 2003](http://www.hboehm.info/popl03/web/). + +[Paper corresponding to above slide set](http://portal.acm.org/citation.cfm?doid=604131.604153) +([Technical Report](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) +version). + +[A Java/Scheme/C/C++ garbage collection benchmark](http://www.hboehm.info/gc/gc_bench/). + +[Slides for talk on memory allocation myths](http://www.hboehm.info/gc/myths.ps). + +[Slides for OOPSLA 98 garbage collection talk](http://www.hboehm.info/gc/gctalk.ps). + +[Related papers](http://www.hboehm.info/gc/papers/). + +# Contacts and new release announcements + +GitHub and Stack Overflow are the major two places for communication. + +Technical questions (how to, how does it work, etc.) should be posted +to [Stack Overflow](https://stackoverflow.com/questions/tagged/boehm-gc) with +_boehm-gc_ tag. + +To contribute, please rebase your code to the latest +[master](https://github.com/ivmai/bdwgc/tree/master/) and submit +a [pull request](https://github.com/ivmai/bdwgc/pulls) to GitHub. + +To report a bug, or propose (request) a new feature, create +a [GitHub issue](https://github.com/ivmai/bdwgc/issues). Please make sure +it has not been reported yet by someone else. + +To receive notifications on every release, please subscribe to +[Releases RSS feed](https://github.com/ivmai/bdwgc/releases.atom). +Notifications on all issues and pull requests are available +by [watching](https://github.com/ivmai/bdwgc/watchers) the project. + +Mailing lists (bdwgc-announce@lists.opendylan.org, bdwgc@lists.opendylan.org, +and the former gc-announce@linux.hpl.hp.com and gc@linux.hpl.hp.com) are not +used at this moment. Their content is available +in +[bdwgc-announce](https://github.com/ivmai/bdwgc/files/1037650/bdwgc-announce-mailing-list-archive-2014_02.tar.gz) +and +[bdwgc](https://github.com/ivmai/bdwgc/files/1038163/bdwgc-mailing-list-archive-2017_04.tar.gz) +archive files, respectively. The gc list archive may also be read +at [Narkive](http://bdwgc.opendylan.narkive.com). + +Some prior discussion of the collector has taken place on the gcc java mailing +list, whose archives appear [here](http://gcc.gnu.org/ml/java/), and also +on [gclist@iecc.com](http://lists.tunes.org/mailman/listinfo/gclist). diff --git a/doc/tree.html b/doc/tree.html deleted file mode 100644 index 2bfd583..0000000 --- a/doc/tree.html +++ /dev/null @@ -1,195 +0,0 @@ - - - Two-Level Tree Structure for Fast Pointer Lookup - Hans-J. Boehm, Silicon Graphics (now at HP) - - -

Two-Level Tree Structure for Fast Pointer Lookup

-

-The BDWGC conservative garbage collector uses a 2-level tree -data structure to aid in fast pointer identification. -This data structure is described in a bit more detail here, since -

    -
  1. Variations of the data structure are more generally useful. -
  2. It appears to be hard to understand by reading the code. -
  3. Some other collectors appear to use inferior data structures to -solve the same problem. -
  4. It is central to fast collector operation. -
-A candidate pointer is divided into three sections, the high, -middle, and low bits. The exact division between these -three groups of bits is dependent on the detailed collector configuration. -

-The high and middle bits are used to look up an entry in the table described -here. The resulting table entry consists of either a block descriptor -(struct hblkhdr * or hdr *) -identifying the layout of objects in the block, or an indication that this -address range corresponds to the middle of a large block, together with a -hint for locating the actual block descriptor. Such a hint consist -of a displacement that can be subtracted from the middle bits of the candidate -pointer without leaving the object. -

-In either case, the block descriptor (struct hblkhdr) -refers to a table of object starting addresses (the hb_map field). -The starting address table is indexed by the low bits if the candidate pointer. -The resulting entry contains a displacement to the beginning of the object, -or an indication that this cannot be a valid object pointer. -(If all interior pointer are recognized, pointers into large objects -are handled specially, as appropriate.) - -

The Tree

-

-The rest of this discussion focuses on the two level data structure -used to map the high and middle bits to the block descriptor. -

-The high bits are used as an index into the GC_top_index (really -GC_arrays._top_index) array. Each entry points to a -bottom_index data structure. This structure in turn consists -mostly of an array index indexed by the middle bits of -the candidate pointer. The index array contains the actual -hdr pointers. -

-Thus a pointer lookup consists primarily of a handful of memory references, -and can be quite fast: -

    -
  1. The appropriate bottom_index pointer is looked up in -GC_top_index, based on the high bits of the candidate pointer. -
  2. The appropriate hdr pointer is looked up in the -bottom_index structure, based on the middle bits. -
  3. The block layout map pointer is retrieved from the hdr -structure. (This memory reference is necessary since we try to share -block layout maps.) -
  4. The displacement to the beginning of the object is retrieved from the -above map. -
-

-In order to conserve space, not all GC_top_index entries in fact -point to distinct bottom_index structures. If no address with -the corresponding high bits is part of the heap, then the entry points -to GC_all_nils, a single bottom_index structure consisting -only of NULL hdr pointers. -

-Bottom_index structures contain slightly more information than -just hdr pointers. The asc_link field is used to link -all bottom_index structures in ascending order for fast traversal. -This list is pointed to be GC_all_bottom_indices. -It is maintained with the aid of key field that contains the -high bits corresponding to the bottom_index. - -

64 bit addresses

-

-In the case of 64 bit addresses, this picture is complicated slightly -by the fact that one of the index structures would have to be huge to -cover the entire address space with a two level tree. We deal with this -by turning GC_top_index into a chained hash table, instead of -a simple array. This adds a hash_link field to the -bottom_index structure. -

-The "hash function" consists of dropping the high bits. This is cheap to -compute, and guarantees that there will be no collisions if the heap -is contiguous and not excessively large. - -

A picture

-

-The following is an ASCII diagram of the data structure. -This was contributed by Dave Barrett several years ago. -

-
-                Data Structure used by GC_base in gc3.7:
-                              21-Apr-94
-
-
-
-
-    63                  LOG_TOP_SZ[11]  LOG_BOTTOM_SZ[10]   LOG_HBLKSIZE[13]
-   +------------------+----------------+------------------+------------------+
- p:|                  |   TL_HASH(hi)  |                  |   HBLKDISPL(p)   |
-   +------------------+----------------+------------------+------------------+
-    \-----------------------HBLKPTR(p)-------------------/
-    \------------hi-------------------/
-                      \______ ________/ \________ _______/ \________ _______/
-                             V                   V                  V
-                             |                   |                  |
-           GC_top_index[]    |                   |                  |
- ---      +--------------+   |                   |                  |
-  ^       |              |   |                   |                  |
-  |       |              |   |                   |                  |
- TOP      +--------------+<--+                   |                  |
- _SZ   +-<|      []      | *                     |                  |
-(items)|  +--------------+  if 0 < bi< HBLKSIZE  |                  |
-  |    |  |              | then large object     |                  |
-  |    |  |              | starts at the bi'th   |                  |
-  v    |  |              | HBLK before p.        |             i    |
- ---   |  +--------------+                       |          (word-  |
-       v                                         |         aligned) |
-   bi= |GET_BI(p){->hash_link}->key==hi          |                  |
-       v                                         |                  |
-       |   (bottom_index)  \ scratch_alloc'd     |                  |
-       |   ( struct  bi )  / by get_index()      |                  |
- ---   +->+--------------+                       |                  |
-  ^       |              |                       |                  |
-  ^       |              |                       |                  |
- BOTTOM   |              |   ha=GET_HDR_ADDR(p)  |                  |
-_SZ(items)+--------------+<----------------------+          +-------+
-  |   +--<|   index[]    |                                  |
-  |   |   +--------------+                      GC_obj_map: v
-  |   |   |              |              from      / +-+-+-----+-+-+-+-+  ---
-  v   |   |              |              GC_add   < 0| | |     | | | | |   ^
- ---  |   +--------------+             _map_entry \ +-+-+-----+-+-+-+-+   |
-      |   |   asc_link   |                          +-+-+-----+-+-+-+-+ MAXOBJSZ
-      |   +--------------+                      +-->| | |  j  | | | | |  +1
-      |   |     key      |                      |   +-+-+-----+-+-+-+-+   |
-      |   +--------------+                      |   +-+-+-----+-+-+-+-+   |
-      |   |  hash_link   |                      |   | | |     | | | | |   v
-      |   +--------------+                      |   +-+-+-----+-+-+-+-+  ---
-      |                                         |   |<--MAX_OFFSET--->|
-      |                                         |         (bytes)
-HDR(p)| GC_find_header(p)                       |   |<--MAP_ENTRIES-->|
-      |                           \ from        |    =HBLKSIZE/WORDSZ
-      |    (hdr) (struct hblkhdr) / alloc_hdr() |    (1024 on Alpha)
-      +-->+----------------------+              |    (8/16 bits each)
-GET_HDR(p)| word   hb_sz (words) |              |
-          +----------------------+              |
-          | struct hblk *hb_next |              |
-          +----------------------+              |
-          |mark_proc hb_mark_proc|              |
-          +----------------------+              |
-          | char * hb_map        |>-------------+
-          +----------------------+
-          | ushort hb_obj_kind   |
-          +----------------------+
-          |   hb_last_reclaimed  |
- ---      +----------------------+
-  ^       |                      |
- MARK_BITS|       hb_marks[]     | *if hdr is free, hb_sz
-_SZ(words)|                      |  is the size of a heap chunk (struct hblk)
-  v       |                      |  of at least MININCR*HBLKSIZE bytes (below),
- ---      +----------------------+  otherwise, size of each object in chunk.
-
-Dynamic data structures above are interleaved throughout the heap in blocks of
-size MININCR * HBLKSIZE bytes as done by gc_scratch_alloc which cannot be
-freed; free lists are used (e.g. alloc_hdr).  HBLK's below are collected.
-
-              (struct hblk)                                  HDR_BYTES
- ---      +----------------------+ < HBLKSIZE  ---            (bytes)
-  ^       +-----hb_body----------+ (and WORDSZ) ^         ---   ---
-  |       |                      |   aligned    |          ^     ^
-  |       |                      |              |        hb_sz   |
-  |       |                      |              |       (words)  |
-  |       |      Object 0        |              |          |     |
-  |       |                      |            i |(word-    v     |
-  |       + - - - - - - - - - - -+ ---   (bytes)|aligned) ---    |
-  |       |                      |  ^           |          ^     |
-  |       |                      |  j (words)   |          |     |
-  n *     |      Object 1        |  v           v        hb_sz BODY_SZ
- HBLKSIZE |                      |---------------          |   (words)
- (bytes)  |                      |                         v   MAX_OFFSET
-  |       + - - - - - - - - - - -+                        ---  (bytes)
-  |       |                      | !ALL_INTERIOR_POINTERS  ^     |
-  |       |                      | sets j only for       hb_sz   |
-  |       |      Object N        | valid object offsets.   |     |
-  v       |                      | All objects WORDSZ      v     v
- ---      +----------------------+ aligned.               ---   ---
-
-
- diff --git a/doc/tree.md b/doc/tree.md new file mode 100644 index 0000000..cdf1aa7 --- /dev/null +++ b/doc/tree.md @@ -0,0 +1,175 @@ +# Two-Level Tree Structure for Fast Pointer Lookup + +The Boehm-Demers-Weiser conservative Garbage Collector uses a 2-level tree +data structure to aid in fast pointer identification. This data structure +is described in a bit more detail here, since + + 1. Variations of the data structure are more generally useful. + 2. It appears to be hard to understand by reading the code. + 3. Some other collectors appear to use inferior data structures to solve the + same problem. + 4. It is central to fast collector operation. A candidate pointer + is divided into three sections, the _high_, _middle_, and _low_ bits. The + exact division between these three groups of bits is dependent on the + detailed collector configuration. + +The high and middle bits are used to look up an entry in the table described +here. The resulting table entry consists of either a block descriptor +(`struct hblkhdr *` or `hdr *`) identifying the layout of objects in the +block, or an indication that this address range corresponds to the middle of +a large block, together with a hint for locating the actual block descriptor. +Such a hint consist of a displacement that can be subtracted from the middle +bits of the candidate pointer without leaving the object. + +In either case, the block descriptor (`struct hblkhdr`) refers to a table +of object starting addresses (the `hb_map` field). The starting address table +is indexed by the low bits if the candidate pointer. The resulting entry +contains a displacement to the beginning of the object, or an indication that +this cannot be a valid object pointer. (If all interior pointer are +recognized, pointers into large objects are handled specially, +as appropriate.) + +## The Tree + +The rest of this discussion focuses on the two level data structure used +to map the high and middle bits to the block descriptor. + +The high bits are used as an index into the `GC_top_index` (really +`GC_arrays._top_index`) array. Each entry points to a `bottom_index` data +structure. This structure in turn consists mostly of an array `index` indexed +by the middle bits of the candidate pointer. The `index` array contains the +actual `hdr` pointers. + +Thus a pointer lookup consists primarily of a handful of memory references, +and can be quite fast: + + 1. The appropriate `bottom_index` pointer is looked up in `GC_top_index`, + based on the high bits of the candidate pointer. + 2. The appropriate `hdr` pointer is looked up in the `bottom_index` + structure, based on the middle bits. + 3. The block layout map pointer is retrieved from the `hdr` structure. (This + memory reference is necessary since we try to share block layout maps.) + 4. The displacement to the beginning of the object is retrieved from the + above map. + +In order to conserve space, not all `GC_top_index` entries in fact point +to distinct `bottom_index` structures. If no address with the corresponding +high bits is part of the heap, then the entry points to `GC_all_nils`, +a single `bottom_index` structure consisting only of `NULL` `hdr` pointers. + +`Bottom_index` structures contain slightly more information than just `hdr` +pointers. The `asc_link` field is used to link all `bottom_index` structures +in ascending order for fast traversal. This list is pointed to be +`GC_all_bottom_indices`. It is maintained with the aid of `key` field that +contains the high bits corresponding to the `bottom_index`. + +## 64-bit addresses + +In the case of 64-bit addresses, this picture is complicated slightly by the +fact that one of the index structures would have to be huge to cover the +entire address space with a two level tree. We deal with this by turning +`GC_top_index` into a chained hash table, instead of a simple array. This adds +a `hash_link` field to the `bottom_index` structure. + +The _hash function_ consists of dropping the high bits. This is cheap +to compute, and guarantees that there will be no collisions if the heap +is contiguous and not excessively large. + +## A picture + +The following is an _ASCII_ diagram of the data structure used by GC_base (as +of GC v3.7, Apr 21, 1994). This was contributed by Dave Barrett. + + + 63 LOG_TOP_SZ[11] LOG_BOTTOM_SZ[10] LOG_HBLKSIZE[13] + +------------------+----------------+------------------+------------------+ + p:| | TL_HASH(hi) | | HBLKDISPL(p) | + +------------------+----------------+------------------+------------------+ + \-----------------------HBLKPTR(p)-------------------/ + \------------hi-------------------/ + \______ ________/ \________ _______/ \________ _______/ + V V V + | | | + GC_top_index[] | | | + --- +--------------+ | | | + ^ | | | | | + | | | | | | + TOP +--------------+<--+ | | + _SZ +-<| [] | * | | + (items)| +--------------+ if 0 < bi< HBLKSIZE | | + | | | | then large object | | + | | | | starts at the bi'th | | + v | | | HBLK before p. | i | + --- | +--------------+ | (word- | + v | aligned) | + bi= |GET_BI(p){->hash_link}->key==hi | | + v | | + | (bottom_index) \ scratch_alloc'd | | + | ( struct bi ) / by get_index() | | + --- +->+--------------+ | | + ^ | | | | + ^ | | | | + BOTTOM | | ha=GET_HDR_ADDR(p) | | + _SZ(items)+--------------+<----------------------+ +-------+ + | +--<| index[] | | + | | +--------------+ GC_obj_map: v + | | | | from / +-+-+-----+-+-+-+-+ --- + v | | | GC_add < 0| | | | | | | | ^ + --- | +--------------+ _map_entry \ +-+-+-----+-+-+-+-+ | + | | asc_link | +-+-+-----+-+-+-+-+ MAXOBJSZ + | +--------------+ +-->| | | j | | | | | +1 + | | key | | +-+-+-----+-+-+-+-+ | + | +--------------+ | +-+-+-----+-+-+-+-+ | + | | hash_link | | | | | | | | | | v + | +--------------+ | +-+-+-----+-+-+-+-+ --- + | | |<--MAX_OFFSET--->| + | | (bytes) + HDR(p)| GC_find_header(p) | |<--MAP_ENTRIES-->| + | \ from | =HBLKSIZE/WORDSZ + | (hdr) (struct hblkhdr) / alloc_hdr() | (1024 on Alpha) + +-->+----------------------+ | (8/16 bits each) + GET_HDR(p)| word hb_sz (words) | | + +----------------------+ | + | struct hblk *hb_next | | + +----------------------+ | + |mark_proc hb_mark_proc| | + +----------------------+ | + | char * hb_map |>-------------+ + +----------------------+ + | ushort hb_obj_kind | + +----------------------+ + | hb_last_reclaimed | + --- +----------------------+ + ^ | | + MARK_BITS| hb_marks[] | * if hdr is free, hb_sz is the size of + _SZ(words)| | a heap chunk (struct hblk) of at least + v | | MININCR*HBLKSIZE bytes (below), + --- +----------------------+ otherwise, size of each object in chunk. + + +Dynamic data structures above are interleaved throughout the heap in blocks +of size `MININCR * HBLKSIZE` bytes as done by `gc_scratch_alloc` which cannot +be freed; free lists are used (e.g. `alloc_hdr`). `HBLK`'s below are +collected. + + + (struct hblk) HDR_BYTES + --- +----------------------+ < HBLKSIZE --- (bytes) + ^ +-----hb_body----------+ (and WORDSZ) ^ --- --- + | | | aligned | ^ ^ + | | | | hb_sz | + | | | | (words) | + | | Object 0 | | | | + | | | i |(word- v | + | + - - - - - - - - - - -+ --- (bytes)|aligned) --- | + | | | ^ | ^ | + | | | j (words) | | | + n * | Object 1 | v v hb_sz BODY_SZ + HBLKSIZE | |--------------- | (words) + (bytes) | | v MAX_OFFSET + | + - - - - - - - - - - -+ --- (bytes) + | | | !ALL_INTERIOR_POINTERS ^ | + | | | sets j only for hb_sz | + | | Object N | valid object offsets. | | + v | | All objects WORDSZ v v + --- +----------------------+ aligned. --- --- -- 2.7.4