Clarify that (Flat|Flex)Buffers do not deduplicate vector elements (#6415)

[platform/upstream/flatbuffers.git] / docs / source / Internals.md
diff --git a/docs/source/Internals.md b/docs/source/Internals.md

index 60b8130..16a1666 100644 (file)
--- a/docs/source/Internals.md
+++ b/docs/source/Internals.md
@@ -15,6 +15,12 @@ all commonly used CPUs today. FlatBuffers will also work on big-endian
  machines, but will be slightly slower because of additional
  byte-swap intrinsics.
  
+It is assumed that the following conditions are met, to ensure
+cross-platform interoperability:
+- The binary `IEEE-754` format is used for floating-point numbers.
+- The `two's complemented` representation is used for signed integers.
+- The endianness is the same for floating-point numbers as for integers.
+
  On purpose, the format leaves a lot of details about where exactly
  things live in memory undefined, e.g. fields in a table can have any
  order, and objects to some extent can be stored in many orders. This is
@@ -56,7 +62,7 @@ when needed. Unsigned means they can only point in one direction, which
  typically is forward (towards a higher memory location). Any backwards
  offsets will be explicitly marked as such.
  
-The format starts with an `uoffset_t` to the root object in the buffer.
+The format starts with an `uoffset_t` to the root table in the buffer.
  
  We have two kinds of objects, structs and tables.
  
@@ -82,7 +88,9 @@ They start with an `soffset_t` to a vtable. This is a signed version of
  This offset is substracted (not added) from the object start to arrive at
  the vtable start. This offset is followed by all the
  fields as aligned scalars (or offsets). Unlike structs, not all fields
-need to be present. There is no set order and layout.
+need to be present. There is no set order and layout. A table may contain
+field offsets that point to the same value if the user explicitly
+serializes the same offset twice.
  
  To be able to access fields regardless of these uncertainties, we go
  through a vtable of offsets. Vtables are shared between any objects that
@@ -105,13 +113,21 @@ is 0, that means the field is not present in this object, and the
  default value is return. Otherwise, the entry is used as offset to the
  field to be read.
  
+### Unions
+
+Unions are encoded as the combination of two fields: an enum representing the
+union choice and the offset to the actual element. FlatBuffers reserves the
+enumeration constant `NONE` (encoded as 0) to mean that the union field is not
+set.
+
  ### Strings and Vectors
  
  Strings are simply a vector of bytes, and are always
  null-terminated. Vectors are stored as contiguous aligned scalar
  elements prefixed by a 32bit element count (not including any
  null termination). Neither is stored inline in their parent, but are referred to
-by offset.
+by offset. A vector may consist of more than one offset pointing to the same
+value if the user explicitly serializes the same offset twice.
  
  ### Construction
  
@@ -169,7 +185,7 @@ Unions share a lot with enums.
  Predeclare all data types since circular references between types are allowed
  (circular references between object are not, though).
  
-    MANUALLY_ALIGNED_STRUCT(4) Vec3 {
+    FLATBUFFERS_MANUALLY_ALIGNED_STRUCT(4) Vec3 {
       private:
        float x_;
        float y_;
@@ -183,7 +199,7 @@ Predeclare all data types since circular references between types are allowed
        float y() const { return flatbuffers::EndianScalar(y_); }
        float z() const { return flatbuffers::EndianScalar(z_); }
      };
-    STRUCT_END(Vec3, 12);
+    FLATBUFFERS_STRUCT_END(Vec3, 12);
  
  These ugly macros do a couple of things: they turn off any padding the compiler
  might normally do, since we add padding manually (though none in this example),
@@ -340,6 +356,9 @@ Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type
  bytes (one per element of the vector), which are always following the vector,
  and are always a uint8_t even if the vector is made up of bigger scalars.
  
+A vector may include more than one offset pointing to the same value if the
+user explicitly serializes the same offset twice.
+
  ### Types
  
  A type byte is made up of 2 components (see flexbuffers.h for exact values):