1 #+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier
2 #+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier
3 #+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript
4 #+STYLE: <link rel="stylesheet" type="text/css" href="docstyle.css" />
6 #+EMAIL: mihai.bazon@gmail.com
10 I started working on UglifyJS's successor, version 2. It's almost a full
11 rewrite (except for the parser which is heavily modified, everything else
12 starts from scratch). I've detailed my reasons in the README, see the
15 [[https://github.com/mishoo/UglifyJS2][https://github.com/mishoo/UglifyJS2]]
17 Version 1 will continue to be maintained for fixing show-stopper bugs, but
18 no new features should be expected.
20 Please help me focus on version 2 by [[http://pledgie.com/campaigns/18110][making a donation]]!
22 * UglifyJS --- a JavaScript parser/compressor/beautifier
24 This package implements a general-purpose JavaScript
25 parser/compressor/beautifier toolkit. It is developed on [[http://nodejs.org/][NodeJS]], but it
26 should work on any JavaScript platform supporting the CommonJS module system
27 (and if your platform of choice doesn't support CommonJS, you can easily
28 implement it, or discard the =exports.*= lines from UglifyJS sources).
30 The tokenizer/parser generates an abstract syntax tree from JS code. You
31 can then traverse the AST to learn more about the code, or do various
32 manipulations on it. This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a
33 port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn
36 ( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of
39 The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and
40 manipulates the AST generated by the parser to provide the following:
42 - ability to re-generate JavaScript code from the AST. Optionally
43 indented---you can use this if you want to “beautify” a program that has
44 been compressed, so that you can inspect the source. But you can also run
45 our code generator to print out an AST without any whitespace, so you
46 achieve compression as well.
48 - shorten variable names (usually to single characters). Our mangler will
49 analyze the code and generate proper variable names, depending on scope
50 and usage, and is smart enough to deal with globals defined elsewhere, or
51 with =eval()= calls or =with{}= statements. In short, if =eval()= or
52 =with{}= are used in some scope, then all variables in that scope and any
53 variables in the parent scopes will remain unmangled, and any references
54 to such variables remain unmangled as well.
56 - various small optimizations that may lead to faster code but certainly
57 lead to smaller code. Where possible, we do the following:
59 - foo["bar"] ==> foo.bar
61 - remove block brackets ={}=
63 - join consecutive var declarations:
64 var a = 10; var b = 20; ==> var a=10,b=20;
66 - resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the
67 replacement if the result occupies less bytes; for example 1/3 would
68 translate to 0.333333333333, so in this case we don't replace it.
70 - consecutive statements in blocks are merged into a sequence; in many
71 cases, this leaves blocks with a single statement, so then we can remove
74 - various optimizations for IF statements:
76 - if (foo) bar(); else baz(); ==> foo?bar():baz();
77 - if (!foo) bar(); else baz(); ==> foo?baz():bar();
78 - if (foo) bar(); ==> foo&&bar();
79 - if (!foo) bar(); ==> foo||bar();
80 - if (foo) return bar(); else return baz(); ==> return foo?bar():baz();
81 - if (foo) return bar(); else something(); ==> {if(foo)return bar();something()}
83 - remove some unreachable code and warn about it (code that follows a
84 =return=, =throw=, =break= or =continue= statement, except
85 function/variable declarations).
87 - act a limited version of a pre-processor (c.f. the pre-processor of
88 C/C++) to allow you to safely replace selected global symbols with
89 specified values. When combined with the optimisations above this can
90 make UglifyJS operate slightly more like a compilation process, in
91 that when certain symbols are replaced by constant values, entire code
92 blocks may be optimised away as unreachable.
94 ** <<Unsafe transformations>>
96 The following transformations can in theory break code, although they're
97 probably safe in most practical cases. To enable them you need to pass the
100 *** Calls involving the global Array constructor
102 The following transformations occur:
105 new Array(1, 2, 3, 4) => [1,2,3,4]
106 Array(a, b, c) => [a,b,c]
107 new Array(5) => Array(5)
108 new Array(a) => Array(a)
111 These are all safe if the Array name isn't redefined. JavaScript does allow
112 one to globally redefine Array (and pretty much everything, in fact) but I
113 personally don't see why would anyone do that.
115 UglifyJS does handle the case where Array is redefined locally, or even
116 globally but with a =function= or =var= declaration. Therefore, in the
117 following cases UglifyJS *doesn't touch* calls or instantiations of Array:
120 // case 1. globally declared variable
125 // or (can be declared later)
129 // or (can be a function)
131 function Array() { ... }
133 // case 2. declared in a function
135 a = new Array(1, 2, 3);
142 return Array(5, 6, 7);
147 return new Array(1, 2, 3, 4);
148 function Array() { ... }
154 *** =obj.toString()= ==> =obj+“”=
158 UglifyJS is now available through NPM --- =npm install uglify-js= should do
161 ** Install latest code from GitHub
164 ## clone the repository
165 mkdir -p /where/you/wanna/put/it
166 cd /where/you/wanna/put/it
167 git clone git://github.com/mishoo/UglifyJS.git
169 ## make the module available to Node
170 mkdir -p ~/.node_libraries/
171 cd ~/.node_libraries/
172 ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js
174 ## and if you want the CLI script too:
177 ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs
178 # (then add ~/bin to your $PATH if it's not there already)
183 There is a command-line tool that exposes the functionality of this library
184 for your shell-scripting needs:
187 uglifyjs [ options... ] [ filename ]
190 =filename= should be the last argument and should name the file from which
191 to read the JavaScript code. If you don't specify it, it will read code
196 - =-b= or =--beautify= --- output indented code; when passed, additional
197 options control the beautifier:
199 - =-i N= or =--indent N= --- indentation level (number of spaces)
201 - =-q= or =--quote-keys= --- quote keys in literal objects (by default,
202 only keys that cannot be identifier names will be quotes).
204 - =-c= or =----consolidate-primitive-values= --- consolidates null, Boolean,
205 and String values. Known as aliasing in the Closure Compiler. Worsens the
206 data compression ratio of gzip.
208 - =--ascii= --- pass this argument to encode non-ASCII characters as
209 =\uXXXX= sequences. By default UglifyJS won't bother to do it and will
210 output Unicode characters instead. (the output is always encoded in UTF8,
211 but if you pass this option you'll only get ASCII).
213 - =-nm= or =--no-mangle= --- don't mangle names.
215 - =-nmf= or =--no-mangle-functions= -- in case you want to mangle variable
216 names, but not touch function names.
218 - =-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various
219 optimizations that result in smaller, less readable code).
221 - =-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too
222 (by default we don't do this).
224 - =--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass
225 =--no-squeeze=) it will reduce consecutive statements in blocks into a
226 sequence. For example, "a = 10; b = 20; foo();" will be written as
227 "a=10,b=20,foo();". In various occasions, this allows us to discard the
228 block brackets (since the block becomes a single statement). This is ON
229 by default because it seems safe and saves a few hundred bytes on some
230 libs that I tested it on, but pass =--no-seqs= to disable it.
232 - =--no-dead-code= --- by default, UglifyJS will remove code that is
233 obviously unreachable (code that follows a =return=, =throw=, =break= or
234 =continue= statement and is not a function/variable declaration). Pass
235 this option to disable this optimization.
237 - =-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial
238 comment tokens in the generated code (assumed to be copyright information
239 etc.). If you pass this it will discard it.
241 - =-o filename= or =--output filename= --- put the result in =filename=. If
242 this isn't given, the result goes to standard output (or see next one).
244 - =--overwrite= --- if the code is read from a file (not from STDIN) and you
245 pass =--overwrite= then the output will be written in the same file.
247 - =--ast= --- pass this if you want to get the Abstract Syntax Tree instead
248 of JavaScript as output. Useful for debugging or learning more about the
251 - =-v= or =--verbose= --- output some notes on STDERR (for now just how long
252 each operation takes).
254 - =-d SYMBOL[=VALUE]= or =--define SYMBOL[=VALUE]= --- will replace
255 all instances of the specified symbol where used as an identifier
256 (except where symbol has properly declared by a var declaration or
257 use as function parameter or similar) with the specified value. This
258 argument may be specified multiple times to define multiple
259 symbols - if no value is specified the symbol will be replaced with
260 the value =true=, or you can specify a numeric value (such as
261 =1024=), a quoted string value (such as ="object"= or
262 ='https://github.com'=), or the name of another symbol or keyword
263 (such as =null= or =document=).
264 This allows you, for example, to assign meaningful names to key
265 constant values but discard the symbolic names in the uglified
266 version for brevity/efficiency, or when used wth care, allows
267 UglifyJS to operate as a form of *conditional compilation*
268 whereby defining appropriate values may, by dint of the constant
269 folding and dead code removal features above, remove entire
270 superfluous code blocks (e.g. completely remove instrumentation or
271 trace code for production use).
272 Where string values are being defined, the handling of quotes are
273 likely to be subject to the specifics of your command shell
274 environment, so you may need to experiment with quoting styles
275 depending on your platform, or you may find the option
276 =--define-from-module= more suitable for use.
278 - =-define-from-module SOMEMODULE= --- will load the named module (as
279 per the NodeJS =require()= function) and iterate all the exported
280 properties of the module defining them as symbol names to be defined
281 (as if by the =--define= option) per the name of each property
282 (i.e. without the module name prefix) and given the value of the
283 property. This is a much easier way to handle and document groups of
284 symbols to be defined rather than a large number of =--define=
287 - =--unsafe= --- enable other additional optimizations that are known to be
288 unsafe in some contrived situations, but could still be generally useful.
291 - foo.toString() ==> foo+""
292 - new Array(x,...) ==> [x,...]
293 - new Array(x) ==> Array(x)
295 - =--max-line-len= (default 32K characters) --- add a newline after around
296 32K characters. I've seen both FF and Chrome croak when all the code was
297 on a single line of around 670K. Pass --max-line-len 0 to disable this
300 - =--reserved-names= --- some libraries rely on certain names to be used, as
301 pointed out in issue #92 and #81, so this option allow you to exclude such
302 names from the mangler. For example, to keep names =require= and =$super=
303 intact you'd specify --reserved-names "require,$super".
305 - =--inline-script= -- when you want to include the output literally in an
306 HTML =<script>= tag you can use this option to prevent =</script= from
307 showing up in the output.
309 - =--lift-vars= -- when you pass this, UglifyJS will apply the following
310 transformations (see the notes in API, =ast_lift_variables=):
312 - put all =var= declarations at the start of the scope
313 - make sure a variable is declared only once
314 - discard unused function arguments
315 - discard unused inner (named) functions
316 - finally, try to merge assignments into that one =var= declaration, if
321 To use the library from JavaScript, you'd do the following (example for
325 var jsp = require("uglify-js").parser;
326 var pro = require("uglify-js").uglify;
328 var orig_code = "... JS code here";
329 var ast = jsp.parse(orig_code); // parse code and get the initial AST
330 ast = pro.ast_mangle(ast); // get a new AST with mangled names
331 ast = pro.ast_squeeze(ast); // get an AST with compression optimizations
332 var final_code = pro.gen_code(ast); // compressed code here
335 The above performs the full compression that is possible right now. As you
336 can see, there are a sequence of steps which you can apply. For example if
337 you want compressed output but for some reason you don't want to mangle
338 variable names, you would simply skip the line that calls
339 =pro.ast_mangle(ast)=.
341 Some of these functions take optional arguments. Here's a description:
343 - =jsp.parse(code, strict_semicolons)= -- parses JS code and returns an AST.
344 =strict_semicolons= is optional and defaults to =false=. If you pass
345 =true= then the parser will throw an error when it expects a semicolon and
346 it doesn't find it. For most JS code you don't want that, but it's useful
347 if you want to strictly sanitize your code.
349 - =pro.ast_lift_variables(ast)= -- merge and move =var= declarations to the
350 scop of the scope; discard unused function arguments or variables; discard
351 unused (named) inner functions. It also tries to merge assignments
352 following the =var= declaration into it.
354 If your code is very hand-optimized concerning =var= declarations, this
355 lifting variable declarations might actually increase size. For me it
356 helps out. On jQuery it adds 865 bytes (243 after gzip). YMMV. Also
357 note that (since it's not enabled by default) this operation isn't yet
358 heavily tested (please report if you find issues!).
360 Note that although it might increase the image size (on jQuery it gains
361 865 bytes, 243 after gzip) it's technically more correct: in certain
362 situations, dead code removal might drop variable declarations, which
363 would not happen if the variables are lifted in advance.
365 Here's an example of what it does:
368 function f(a, b, c, d, e) {
373 for (var i = 1; i < 10; ++i) {
376 for (var i = 0; i < 1; ++i) {
379 function foo(){ ... }
380 function bar(){ ... }
381 function baz(){ ... }
384 // transforms into ==>
386 function f(a, b, c) {
387 var i, boo, w = 10, q = 20;
388 for (i = 1; i < 10; ++i) {
391 for (i = 0; i < 1; ++i) {
394 function foo() { ... }
395 function bar() { ... }
399 - =pro.ast_mangle(ast, options)= -- generates a new AST containing mangled
400 (compressed) variable and function names. It supports the following
403 - =toplevel= -- mangle toplevel names (by default we don't touch them).
404 - =except= -- an array of names to exclude from compression.
405 - =defines= -- an object with properties named after symbols to
406 replace (see the =--define= option for the script) and the values
407 representing the AST replacement value. For example,
408 ={ defines: { DEBUG: ['name', 'false'], VERSION: ['string', '1.0'] } }=
410 - =pro.ast_squeeze(ast, options)= -- employs further optimizations designed
411 to reduce the size of the code that =gen_code= would generate from the
412 AST. Returns a new AST. =options= can be a hash; the supported options
415 - =make_seqs= (default true) which will cause consecutive statements in a
416 block to be merged using the "sequence" (comma) operator
418 - =dead_code= (default true) which will remove unreachable code.
420 - =pro.gen_code(ast, options)= -- generates JS code from the AST. By
421 default it's minified, but using the =options= argument you can get nicely
422 formatted output. =options= is, well, optional :-) and if you pass it it
423 must be an object and supports the following properties (below you can see
426 - =beautify: false= -- pass =true= if you want indented output
427 - =indent_start: 0= (only applies when =beautify= is =true=) -- initial
428 indentation in spaces
429 - =indent_level: 4= (only applies when =beautify= is =true=) --
430 indentation level, in spaces (pass an even number)
431 - =quote_keys: false= -- if you pass =true= it will quote all keys in
433 - =space_colon: false= (only applies when =beautify= is =true=) -- wether
434 to put a space before the colon in object literals
435 - =ascii_only: false= -- pass =true= if you want to encode non-ASCII
436 characters as =\uXXXX=.
437 - =inline_script: false= -- pass =true= to escape occurrences of
438 =</script= in strings
440 *** Beautifier shortcoming -- no more comments
442 The beautifier can be used as a general purpose indentation tool. It's
443 useful when you want to make a minified file readable. One limitation,
444 though, is that it discards all comments, so you don't really want to use it
445 to reformat your code, unless you don't have, or don't care about, comments.
447 In fact it's not the beautifier who discards comments --- they are dumped at
448 the parsing stage, when we build the initial AST. Comments don't really
449 make sense in the AST, and while we could add nodes for them, it would be
450 inconvenient because we'd have to add special rules to ignore them at all
451 the processing stages.
453 *** Use as a code pre-processor
455 The =--define= option can be used, particularly when combined with the
456 constant folding logic, as a form of pre-processor to enable or remove
457 particular constructions, such as might be used for instrumenting
458 development code, or to produce variations aimed at a specific
461 The code below illustrates the way this can be done, and how the
462 symbol replacement is performed.
465 CLAUSE1: if (typeof DEVMODE === 'undefined') {
469 CLAUSE2: function init() {
471 console.log("init() called");
474 DEVMODE && console.log("init() complete");
477 CLAUSE3: function reportDeviceStatus(device) {
478 var DEVMODE = device.mode, DEVNAME = device.name;
479 if (DEVMODE === 'open') {
485 When the above code is normally executed, the undeclared global
486 variable =DEVMODE= will be assigned the value *true* (see =CLAUSE1=)
487 and so the =init()= function (=CLAUSE2=) will write messages to the
488 console log when executed, but in =CLAUSE3= a locally declared
489 variable will mask access to the =DEVMODE= global symbol.
491 If the above code is processed by UglifyJS with an argument of
492 =--define DEVMODE=false= then UglifyJS will replace =DEVMODE= with the
493 boolean constant value *false* within =CLAUSE1= and =CLAUSE2=, but it
494 will leave =CLAUSE3= as it stands because there =DEVMODE= resolves to
495 a validly declared variable.
497 And more so, the constant-folding features of UglifyJS will recognise
498 that the =if= condition of =CLAUSE1= is thus always false, and so will
499 remove the test and body of =CLAUSE1= altogether (including the
500 otherwise slightly problematical statement =false = true;= which it
501 will have formed by replacing =DEVMODE= in the body). Similarly,
502 within =CLAUSE2= both calls to =console.log()= will be removed
505 In this way you can mimic, to a limited degree, the functionality of
506 the C/C++ pre-processor to enable or completely remove blocks
507 depending on how certain symbols are defined - perhaps using UglifyJS
508 to generate different versions of source aimed at different
511 It is recommmended (but not made mandatory) that symbols designed for
512 this purpose are given names consisting of =UPPER_CASE_LETTERS= to
513 distinguish them from other (normal) symbols and avoid the sort of
514 clash that =CLAUSE3= above illustrates.
516 ** Compression -- how good is it?
518 Here are updated statistics. (I also updated my Google Closure and YUI
521 We're still a lot better than YUI in terms of compression, though slightly
522 slower. We're still a lot faster than Closure, and compression after gzip
525 | File | UglifyJS | UglifyJS+gzip | Closure | Closure+gzip | YUI | YUI+gzip |
526 |-----------------------------+------------------+---------------+------------------+--------------+------------------+----------|
527 | jquery-1.6.2.js | 91001 (0:01.59) | 31896 | 90678 (0:07.40) | 31979 | 101527 (0:01.82) | 34646 |
528 | paper.js | 142023 (0:01.65) | 43334 | 134301 (0:07.42) | 42495 | 173383 (0:01.58) | 48785 |
529 | prototype.js | 88544 (0:01.09) | 26680 | 86955 (0:06.97) | 26326 | 92130 (0:00.79) | 28624 |
530 | thelib-full.js (DynarchLIB) | 251939 (0:02.55) | 72535 | 249911 (0:09.05) | 72696 | 258869 (0:01.94) | 76584 |
534 Unfortunately, for the time being there is no automated test suite. But I
535 ran the compressor manually on non-trivial code, and then I tested that the
536 generated code works as expected. A few hundred times.
538 DynarchLIB was started in times when there was no good JS minifier.
539 Therefore I was quite religious about trying to write short code manually,
540 and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a
541 = 10 : b = 20”, though the more readable version would clearly be to use
544 Since the parser/compressor runs fine on DL and jQuery, I'm quite confident
545 that it's solid enough for production use. If you can identify any bugs,
546 I'd love to hear about them ([[http://groups.google.com/group/uglifyjs][use the Google Group]] or email me directly).
548 [1] I even reported a few bugs and suggested some fixes in the original
549 [[http://marijn.haverbeke.nl/parse-js/][parse-js]] library, and Marijn pushed fixes literally in minutes.
553 - Twitter: [[http://twitter.com/UglifyJS][@UglifyJS]]
554 - Project at GitHub: [[http://github.com/mishoo/UglifyJS][http://github.com/mishoo/UglifyJS]]
555 - Google Group: [[http://groups.google.com/group/uglifyjs][http://groups.google.com/group/uglifyjs]]
556 - Common Lisp JS parser: [[http://marijn.haverbeke.nl/parse-js/][http://marijn.haverbeke.nl/parse-js/]]
557 - JS-to-Lisp compiler: [[http://github.com/marijnh/js][http://github.com/marijnh/js]]
558 - Common Lisp JS uglifier: [[http://github.com/mishoo/cl-uglify-js][http://github.com/mishoo/cl-uglify-js]]
562 UglifyJS is released under the BSD license:
565 Copyright 2010 (c) Mihai Bazon <mihai.bazon@gmail.com>
566 Based on parse-js (http://marijn.haverbeke.nl/parse-js/).
568 Redistribution and use in source and binary forms, with or without
569 modification, are permitted provided that the following conditions
572 * Redistributions of source code must retain the above
573 copyright notice, this list of conditions and the following
576 * Redistributions in binary form must reproduce the above
577 copyright notice, this list of conditions and the following
578 disclaimer in the documentation and/or other materials
579 provided with the distribution.
581 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY
582 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
583 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
584 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE
585 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
586 OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
587 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
588 PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
589 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
590 TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
591 THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF