From: Father Chrysostomos Date: Tue, 7 Aug 2012 20:25:24 +0000 (-0700) Subject: Prevent double frees/crashes with format syntax errs X-Git-Tag: upstream/5.20.0~5854 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=6c7ae946aa60721f014be4f2a95bdd91470cf3d9;p=platform%2Fupstream%2Fperl.git Prevent double frees/crashes with format syntax errs This was brought up in ticket #43425. The new slab allocator for ops (8be227ab5e) makes a CV responsible for cleaning up its ops if it is freed prematurely (before the root is attached). Certain syntax errors involving formats can cause the parser to free the CV owning an op when that op is on the PL_nextval stack. This happens in these cases: format = @ use; format strict . format = @ ;use strict . format foo require bar In the first two cases it is the line containing ‘strict’ that is being interpreted as a format picture line and being fed through force_next by S_scan_formline. Then the error condition kicks in after the force, causing a LEAVE_SCOPE in the parser (perly.c) which frees the sub owning the const op for the format picture. Then a token with a freed op is fed to the parser. To make this clearer, when the second case above is parsed, the tokens produced are as follows: format = [;] [formline] "@\n" [,] ; use [] [;] [formline] [;] . Notice how there is an implicit semicolon before each (implicit) formline. Notice also how there is an implicit semicolon before the final dot. The thing represents the "strict\n" constant after it has been freed. (-DT displays it as THING(opval=op_freed).) When the implicit semicolon is emitted before a formline, the next two tokens (formline and the string constant for this format picture line) are put on to the PL_nextval stack via force_next. It is when the implicit semicolon before "strict\n" is emitted that the parser sees the error (there is only one path through the gram- mar that uses the USE token, and it must have two WORDs following it; therefore a semicolon after one WORD is an immediate error), calling LEAVE_SCOPE, which frees the sub created by ‘use’, which owns the const op on the PL_nextval stack containing the word "strict" and con- sequently frees it. I thought I could fix this by putting an implicit do { ... } around the argument line. (This would fix another bug, whereby the argument line in a format can ‘leak out’ of the formline(...).) But this does not solve anything, as we end up with four tokens ( } ; formline const ) on the PL_nextval stack when we emit the implicit semicolon after ‘use’, instead of two. format= @ ;use strict . will turn into format = [;] [formline] "@\n" [,] [do] [{] ; use [] [;] [}] [;] [formline] "strict\n"/ [;] . It is when the lexer reaches "strict" that it will emit the semicolon after the use. So we will be in the same situation as before. So fixing the fact that the argument line can ‘leak out’ of the formline and start a new statement won’t solve this particu- lar problem. I tried eliminating the LEAVE_SCOPE. (See where Dave Mitchell explains that the LEAVE_SCOPE is not strictly nec- essary, but it is ‘still good to ensure that the savestack gets cor- rectly popped during error recovery’.) That does not help, because the lexer itself does ENTER/LEAVE to make sure form_lex_state and lex_formbrack get restored properly after the lexer exits the format (see 583c9d5cccf and 64a408986cf). So when the final dot is reached, the ‘use’ CV is freed. Then an op tree that includes the now-freed "strict\n" const op is passed to newFORM, which tries to do op_free(block) (as of 3 commits ago; before that the errors were more catastrophic), and ends up freeing an op belonging to a freed slab. Removing LEAVE_SCOPE did actually fix ‘format foo require bar’, because there is no ENTER/LEAVE involved there, as the = (ENTER) has not been reached yet. It was failing because ‘require bar’ would call force_next for "bar", and then feed a REQUIRE token to the parser, which would immediately see the error and call LEAVE_SCOPE (free- ing the format), with the "bar" (belonging to the format’s slab) still pending. The final solution I came up with was to reuse an mechanism I came up with earlier. Since the savestack may cause ops to outlive their CVs due to SAVEFREEOP, opslab_force_free (called when an incomplete CV is freed prematurely) will skip any op with o->op_savestack set. The nextval stack can use the same flag. To make sure nothing goes awry (we don’t want the same op on the nextval stack and the savestack at the same time), I added a few assertions. --- diff --git a/perly.act b/perly.act index d88e7ed..c961797 100644 --- a/perly.act +++ b/perly.act @@ -1725,5 +1725,5 @@ case 2: /* Generated from: * 50649beb21bb272a5dafa43d5fbe244206c6a99e0e67b39b3123a22a35702873 perly.y - * 38f866dcd8341ad3c0810347587113eb2c6ac7d4f33bdab75b020efce92865be regen_perly.pl + * 4a3ced51fe6585f6ec59e65e1ffd74bc976c498b0e6992eaab337219fee72e51 regen_perly.pl * ex: set ro: */ diff --git a/perly.h b/perly.h index 1097227..29c317c 100644 --- a/perly.h +++ b/perly.h @@ -211,6 +211,28 @@ +#ifdef PERL_IN_TOKE_C +static bool +S_is_opval_token(int type) { + switch (type) { + case FUNC0OP: + case FUNC0SUB: + case FUNCMETH: + case LSTOPSUB: + case METHOD: + case PLUGEXPR: + case PLUGSTMT: + case PMFUNC: + case PRIVATEREF: + case QWLIST: + case THING: + case UNIOPSUB: + case WORD: + return 1; + } + return 0; +} +#endif /* PERL_IN_TOKE_C */ #endif /* PERL_CORE */ #if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED typedef union YYSTYPE @@ -243,5 +265,5 @@ typedef union YYSTYPE /* Generated from: * 50649beb21bb272a5dafa43d5fbe244206c6a99e0e67b39b3123a22a35702873 perly.y - * 38f866dcd8341ad3c0810347587113eb2c6ac7d4f33bdab75b020efce92865be regen_perly.pl + * 4a3ced51fe6585f6ec59e65e1ffd74bc976c498b0e6992eaab337219fee72e51 regen_perly.pl * ex: set ro: */ diff --git a/perly.tab b/perly.tab index c5a8e66..edb9de4 100644 --- a/perly.tab +++ b/perly.tab @@ -1114,5 +1114,5 @@ static const toketypes yy_type_tab[] = /* Generated from: * 50649beb21bb272a5dafa43d5fbe244206c6a99e0e67b39b3123a22a35702873 perly.y - * 38f866dcd8341ad3c0810347587113eb2c6ac7d4f33bdab75b020efce92865be regen_perly.pl + * 4a3ced51fe6585f6ec59e65e1ffd74bc976c498b0e6992eaab337219fee72e51 regen_perly.pl * ex: set ro: */ diff --git a/regen_perly.pl b/regen_perly.pl index e0120fa..124b031 100644 --- a/regen_perly.pl +++ b/regen_perly.pl @@ -95,6 +95,7 @@ close $ctmp_fh; my ($actlines, $tablines) = extract($clines); +our %tokens; $tablines .= make_type_tab($y_file, $tablines); my ($act_fh, $tab_fh, $h_fh) = map { @@ -120,7 +121,23 @@ my $tokens; while (<$tmph_fh>) { print $h_fh "#ifdef PERL_CORE\n" if $. == 1; if (!$endcore_done and /YYSTYPE_IS_DECLARED/) { - print $h_fh "#endif /* PERL_CORE */\n"; + print $h_fh <\s+// + s/^%((token)|type)\s+<(\w+)>\s+// or die "$y_file: unparseable token/type line: $_"; - $tokens{$_} = $2 for (split ' ', $_); - $types{$2} = 1; + for (split ' ', $_) { + $tokens{$_} = $3; + if ($2) { + $just_tokens{$_} = $3; + } + } + $types{$3} = 1; } + *tokens = \%just_tokens; # perly.h needs this die "$y_file: no __DEFAULT__ token defined\n" unless $default_token; $types{$default_token} = 1; diff --git a/scope.h b/scope.h index f8df5b4..0fad9a3 100644 --- a/scope.h +++ b/scope.h @@ -273,6 +273,7 @@ scope has the given name. Name must be a literal string. # define save_freeop(op) \ ({ \ OP * const _o = (OP *)(op); \ + assert(!_o->op_savefree); \ _o->op_savefree = 1; \ save_pushptr((void *)(_o), SAVEt_FREEOP); \ }) @@ -280,6 +281,7 @@ scope has the given name. Name must be a literal string. # define save_freeop(op) \ ( \ PL_Xpv = (XPV *)(op), \ + assert_(!((OP *)PL_Xpv)->op_savefree) \ ((OP *)PL_Xpv)->op_savefree = 1, \ save_pushptr((void *)(PL_Xpv), SAVEt_FREEOP) \ ) diff --git a/t/op/write.t b/t/op/write.t index 090fc4f..760de1a 100644 --- a/t/op/write.t +++ b/t/op/write.t @@ -61,7 +61,7 @@ for my $tref ( @NumTests ){ my $bas_tests = 21; # number of tests in section 3 -my $bug_tests = 8 + 3 * 3 * 5 * 2 * 3 + 2 + 66 + 4 + 2 + 3 + 96 + 6; +my $bug_tests = 8 + 3 * 3 * 5 * 2 * 3 + 2 + 66 + 4 + 2 + 3 + 96 + 7; # number of tests in section 4 my $hmb_tests = 35; @@ -1058,6 +1058,19 @@ eval { write ERROR }; like $@, qr'Undefined format', 'formats with compilation errors are not created'; +# This syntax error used to cause a crash, double free, or a least +# a bad read. +# See the long-winded explanation at: +# https://rt.perl.org/rt3/Ticket/Display.html?id=43425#txn-1144500 +eval q| +format = +@ +use;format +strict +. +|; +pass('no crash with invalid use/format inside format'); + ############################# ## Section 4 diff --git a/toke.c b/toke.c index 7fdfa51..4fc3f43 100644 --- a/toke.c +++ b/toke.c @@ -1986,6 +1986,11 @@ S_force_next(pTHX_ I32 type) tokereport(type, &NEXTVAL_NEXTTOKE); } #endif + /* Don’t let opslab_force_free snatch it */ + if (S_is_opval_token(type & 0xffff) && NEXTVAL_NEXTTOKE.opval) { + assert(!NEXTVAL_NEXTTOKE.opval->op_savefree); + NEXTVAL_NEXTTOKE.opval->op_savefree = 1; + } #ifdef PERL_MAD if (PL_curforce < 0) start_force(PL_lasttoke); @@ -4478,6 +4483,8 @@ Perl_yylex(pTHX) PL_lex_allbrackets--; next_type &= 0xffff; } + if (S_is_opval_token(next_type) && pl_yylval.opval) + pl_yylval.opval->op_savefree = 0; /* release */ #ifdef PERL_MAD /* FIXME - can these be merged? */ return next_type;