From 9c3de1e2a0591c2526453cf85260c09e3c624189 Mon Sep 17 00:00:00 2001 From: Wouter van Oortmerssen Date: Fri, 25 Jul 2014 15:04:35 -0700 Subject: [PATCH] Extended symbolic enum parsing in JSON for integers and OR-ing. Change-Id: Iedbd9914a1ca3897776fb92aa9a1fdfc4603da3c Tested: on Windows and Linux --- docs/html/md__cpp_usage.html | 1 + docs/html/md__go_usage.html | 2 +- docs/html/md__schemas.html | 9 +++++++- docs/source/CppUsage.md | 3 +++ docs/source/Schemas.md | 26 ++++++++++++++++++++- include/flatbuffers/idl.h | 5 ++-- src/idl_gen_cpp.cpp | 8 +++---- src/idl_parser.cpp | 55 ++++++++++++++++++++++++++++++++++++-------- tests/test.cpp | 12 ++++++---- 9 files changed, 98 insertions(+), 23 deletions(-) diff --git a/docs/html/md__cpp_usage.html b/docs/html/md__cpp_usage.html index f02436e..50a0a13 100644 --- a/docs/html/md__cpp_usage.html +++ b/docs/html/md__cpp_usage.html @@ -107,6 +107,7 @@ assert(inv->Get(9) == 9);

Text & schema parsing

Using binary buffers with the generated header provides a super low overhead use of FlatBuffer data. There are, however, times when you want to use text formats, for example because it interacts better with source control, or you want to give your users easy access to data.

Another reason might be that you already have a lot of data in JSON format, or a tool that generates JSON, and if you can write a schema for it, this will provide you an easy way to use that data directly.

+

(see the schema documentation for some specifics on the JSON format accepted).

There are two ways to use text formats:

Using the compiler as a conversion tool

This is the preferred path, as it doesn't require you to add any new code to your program, and is maximally efficient since you can ship with binary data. The disadvantage is that it is an extra step for your users/developers to perform, though you might be able to automate it.

flatc -b myschema.fbs mydata.json
diff --git a/docs/html/md__go_usage.html b/docs/html/md__go_usage.html
index 8a66250..5a873b2 100644
--- a/docs/html/md__go_usage.html
+++ b/docs/html/md__go_usage.html
@@ -81,7 +81,7 @@ example.MonsterAddTest_Type(builder, 1)
 example.MonsterAddTest(builder, mon2)
 example.MonsterAddTest4(builder, test4s)
 mon := example.MonsterEnd(builder)
-

Unlike C++, Go does not support table creation functions like 'createMonster()'. This is to create the buffer without using temporary object allocation (since the Vec3 is an inline component of Monster, it has to be created right where it is added, whereas the name and the inventory are not inline). Structs do have convenient methods that allow you to construct them in one call. These even have arguments for nested structs, e.g. if a struct has a field a and a nested struct field b (which has fields c and d), then the arguments will be a, c and d.

+

Unlike C++, Go does not support table creation functions like 'createMonster()'. This is to create the buffer without using temporary object allocation (since the Vec3 is an inline component of Monster, it has to be created right where it is added, whereas the name and the inventory are not inline). Structs do have convenient methods that allow you to construct them in one call. These also have arguments for nested structs, e.g. if a struct has a field a and a nested struct field b (which has fields c and d), then the arguments will be a, c and d.

Vectors also use this start/end pattern to allow vectors of both scalar types and structs:

example.MonsterStartInventoryVector(builder, 5)
 for i := 4; i >= 0; i-- {
     builder.PrependByte(byte(i))
diff --git a/docs/html/md__schemas.html b/docs/html/md__schemas.html
index 85381e7..0ddf7e1 100644
--- a/docs/html/md__schemas.html
+++ b/docs/html/md__schemas.html
@@ -112,7 +112,7 @@ root_type Monster;
 

Namespaces

These will generate the corresponding namespace in C++ for all helper code, and packages in Java. You can use . to specify nested namespaces / packages.

Root type

-

This declares what you consider to be the root table (or struct) of the serialized data.

+

This declares what you consider to be the root table (or struct) of the serialized data. This is particular important for parsing JSON data, which doesn't include object type information.

Comments & documentation

May be written as in most C-based languages. Additionally, a triple comment (///) on a line by itself signals that a comment is documentation for whatever is declared on the line after it (table/struct/field/enum/union/element), and the comment is output in the corresponding C++ code. Multiple such lines per item are allowed.

Attributes

@@ -125,6 +125,13 @@ root_type Monster;
  • force_align: size (on a struct): force the alignment of this struct to be something higher than what it is naturally aligned to. Causes these structs to be aligned to that amount inside a buffer, IF that buffer is allocated with that alignment (which is not necessarily the case for buffers accessed directly inside a FlatBufferBuilder).
  • bit_flags (on an enum): the values of this field indicate bits, meaning that any value N specified in the schema will end up representing 1<<N, or if you don't specify values at all, you'll get the sequence 1, 2, 4, 8, ...
  • +

    JSON Parsing

    +

    The same parser that is parsing the schema declarations above is also able to parse JSON objects that conform to this schema. So, unlike other JSON parsers, this parser is strongly typed, and parses directly into a FlatBuffer (see the compiler documentation on how to do this from the command line, or the C++ documentation on how to do this at runtime).

    +

    Besides needing a schema, there are a few other changes to how it parses JSON:

    +

    Gotchas

    Schemas and version control

    FlatBuffers relies on new field declarations being added at the end, and earlier declarations to not be removed, but be marked deprecated when needed. We think this is an improvement over the manual number assignment that happens in Protocol Buffers (and which is still an option using the id attribute mentioned above).

    diff --git a/docs/source/CppUsage.md b/docs/source/CppUsage.md index 5fe14cd..418b1f9 100755 --- a/docs/source/CppUsage.md +++ b/docs/source/CppUsage.md @@ -199,6 +199,9 @@ Another reason might be that you already have a lot of data in JSON format, or a tool that generates JSON, and if you can write a schema for it, this will provide you an easy way to use that data directly. +(see the schema documentation for some specifics on the JSON format +accepted). + There are two ways to use text formats: ### Using the compiler as a conversion tool diff --git a/docs/source/Schemas.md b/docs/source/Schemas.md index 4a02179..30f1862 100755 --- a/docs/source/Schemas.md +++ b/docs/source/Schemas.md @@ -144,7 +144,8 @@ packages. ### Root type This declares what you consider to be the root table (or struct) of the -serialized data. +serialized data. This is particular important for parsing JSON data, +which doesn't include object type information. ### Comments & documentation @@ -193,6 +194,29 @@ Current understood attributes: representing 1<value - - enum_def.vals.vec.front()->value + 1; + auto range = enum_def.vals.vec.back()->value - + enum_def.vals.vec.front()->value + 1; // Average distance between values above which we consider a table // "too sparse". Change at will. static const int kMaxSparseness = 5; - if (range / static_cast(enum_def.vals.vec.size()) < kMaxSparseness) { + if (range / static_cast(enum_def.vals.vec.size()) < kMaxSparseness) { code += "inline const char **EnumNames" + enum_def.name + "() {\n"; code += " static const char *names[] = { "; - int val = enum_def.vals.vec.front()->value; + auto val = enum_def.vals.vec.front()->value; for (auto it = enum_def.vals.vec.begin(); it != enum_def.vals.vec.end(); ++it) { diff --git a/src/idl_parser.cpp b/src/idl_parser.cpp index 29b328c..e0b767f 100644 --- a/src/idl_parser.cpp +++ b/src/idl_parser.cpp @@ -556,16 +556,51 @@ bool Parser::TryTypedValue(int dtoken, return match; } +int64_t Parser::ParseIntegerFromString(Type &type) { + int64_t result = 0; + // Parse one or more enum identifiers, separated by spaces. + const char *next = attribute_.c_str(); + do { + const char *divider = strchr(next, ' '); + std::string word; + if (divider) { + word = std::string(next, divider); + next = divider + strspn(divider, " "); + } else { + word = next; + next += word.length(); + } + if (type.enum_def) { // The field has an enum type + auto enum_val = type.enum_def->vals.Lookup(word); + if (!enum_val) + Error("unknown enum value: " + word + + ", for enum: " + type.enum_def->name); + result |= enum_val->value; + } else { // No enum type, probably integral field. + if (!IsInteger(type.base_type)) + Error("not a valid value for this field: " + word); + // TODO: could check if its a valid number constant here. + const char *dot = strchr(word.c_str(), '.'); + if (!dot) Error("enum values need to be qualified by an enum type"); + std::string enum_def_str(word.c_str(), dot); + std::string enum_val_str(dot + 1, word.c_str() + word.length()); + auto enum_def = enums_.Lookup(enum_def_str); + if (!enum_def) Error("unknown enum: " + enum_def_str); + auto enum_val = enum_def->vals.Lookup(enum_val_str); + if (!enum_val) Error("unknown enum value: " + enum_val_str); + result |= enum_val->value; + } + } while(*next); + return result; +} + void Parser::ParseSingleValue(Value &e) { - // First check if derived from an enum, to allow strings/identifier values: - if (e.type.enum_def && (token_ == kTokenIdentifier || - token_ == kTokenStringConstant)) { - auto enum_val = e.type.enum_def->vals.Lookup(attribute_); - if (!enum_val) - Error("unknown enum value: " + attribute_ + - ", for enum: " + e.type.enum_def->name); - e.constant = NumToString(enum_val->value); - Next(); + // First check if this could be a string/identifier enum value: + if (e.type.base_type != BASE_TYPE_STRING && + e.type.base_type != BASE_TYPE_NONE && + (token_ == kTokenIdentifier || token_ == kTokenStringConstant)) { + e.constant = NumToString(ParseIntegerFromString(e.type)); + Next(); } else if (TryTypedValue(kTokenIntegerConstant, IsScalar(e.type.base_type), e, @@ -653,7 +688,7 @@ void Parser::ParseEnum(bool is_union) { if (static_cast((*it)->value) >= SizeOf(enum_def.underlying_type.base_type) * 8) Error("bit flag out of range of underlying integral type"); - (*it)->value = 1 << (*it)->value; + (*it)->value = 1LL << (*it)->value; } } } diff --git a/tests/test.cpp b/tests/test.cpp index c28be7d..f860897 100644 --- a/tests/test.cpp +++ b/tests/test.cpp @@ -504,10 +504,14 @@ void ScientificTest() { } void EnumStringsTest() { - flatbuffers::Parser parser; - - TEST_EQ(parser.Parse("enum E:byte { A, B, C } table T { F:[E]; } root_type T;" - "{ F:[ A, B, \"C\" ] }"), true); + flatbuffers::Parser parser1; + TEST_EQ(parser1.Parse("enum E:byte { A, B, C } table T { F:[E]; }" + "root_type T;" + "{ F:[ A, B, \"C\", \"A B C\" ] }"), true); + flatbuffers::Parser parser2; + TEST_EQ(parser2.Parse("enum E:byte { A, B, C } table T { F:[int]; }" + "root_type T;" + "{ F:[ \"E.C\", \"E.A E.B E.C\" ] }"), true); } -- 2.7.4