Add design docs for upcoming file_time_type change.

author Eric Fiselier <eric@efcs.ca>

Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)

committer Eric Fiselier <eric@efcs.ca>

Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)
author Eric Fiselier <eric@efcs.ca>
Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)
committer Eric Fiselier <eric@efcs.ca>
Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)
diff --git a/libcxx/docs/DesignDocs/FileTimeType.rst b/libcxx/docs/DesignDocs/FileTimeType.rst

new file mode 100644 (file)

index 0000000..83d5fed
--- /dev/null
+++ b/libcxx/docs/DesignDocs/FileTimeType.rst
@@ -0,0 +1,493 @@
+==============
+File Time Type
+==============
+
+.. contents::
+   :local:
+
+.. _file-time-type-motivation:
+
+Motivation
+==========
+
+The filesystem library provides interfaces for getting and setting the last
+write time of a file or directory. The interfaces use the ``file_time_type``
+type, which is a specialization of ``chrono::time_point`` for the
+"filesystem clock". According to [fs.filesystem.syn]
+
+  trivial-clock is an implementation-defined type that satisfies the
+  Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
+  representing and measuring file time values. Implementations should ensure
+  that the resolution and range of file_time_type reflect the operating
+  system dependent resolution and range of file time values.
+
+
+On POSIX systems, file times are represented using the ``timespec`` struct,
+which is defined as follows:
+
+.. code-block:: cpp
+
+  struct timespec {
+    time_t tv_sec;
+    long   tv_nsec;
+  };
+
+To represent the range and resolution of ``timespec``, we need to (A) have
+nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
+
+As the standard requires us to use the ``chrono`` interface, we have to define
+our own filesystem clock which specifies the period and representation of
+the time points and duration it provides. It will look like this:
+
+.. code-block:: cpp
+
+  struct _FilesystemClock {
+    using period = nano;
+    using rep = TBD; // What is this?
+
+    using duration = chrono::duration<rep, period>;
+    using time_point = chrono::time_point<_FilesystemClock>;
+
+    // ... //
+  };
+
+  using file_time_type = _FilesystemClock::time_point;
+
+
+To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
+But what type can we use as the arithmetic representation that is capable
+of representing the range of the ``timespec`` struct?
+
+Problems To Consider
+====================
+
+Before considering solutions, lets consider the problems they should solve,
+and how important solving those problems are:
+
+
+Having a Smaller Range than ``timespec``
+----------------------------------------
+
+One solution to the range problem is to simply reduce the resolution of
+``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
+initial implementation of ``file_time_type`` did; it's also what
+``std::system_clock`` does. As a result, it can represent time points about
+292 thousand years on either side of the epoch, as opposed to only 292 years
+at nanosecond resolution.
+
+``timespec`` can represent time points +/- 292 billion years from the epoch
+(just in case you needed a time point 200 billion years before the big bang,
+and with nanosecond resolution).
+
+To get the same range, we would need to drop our resolution to that of seconds
+to come close to having the same range.
+
+This begs the question, is the range problem "really a problem"? Sane usages
+of file time stamps shouldn't exceed +/- 300, so should we care to support it?
+
+I believe the answer is yes. We're not designing the filesystem time API, we're
+providing glorified C++ wrappers for it. If the underlying API supports
+a value, then we should too. Our wrappers should not place artificial restrictions
+on users that are not present in the underlying filesystem.
+
+Additionally, having a smaller range that the underlying filesystem forces the
+implementation to report ``value_too_large`` errors when it encounters a time
+point that it can't represent. This can cause the call to ``last_write_time``
+to throw in cases where the user was confident the call should succeed. (See below)
+
+
+.. code-block:: cpp
+
+  #include <filesystem>
+  using namespace std::filesystem;
+
+  // Set the times using the system interface.
+  void set_file_times(const char* path, struct timespec ts) {
+    timespec both_times[2];
+    both_times[0] = ts;
+    both_times[1] = ts;
+    int result = ::utimensat(AT_FDCWD, path, both_times, 0);
+    assert(result != -1);
+  }
+
+  // Called elsewhere to set the file time to something insane, and way
+  // out of the 300 year range we might expect.
+  void some_bad_persons_code() {
+    struct timespec new_times;
+    new_times.tv_sec = numeric_limits<time_t>::max();
+    new_times.tv_nsec = 0;
+    set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
+  }
+
+  int main() {
+    path p = "/tmp/foo";
+    file_status st = status(p);
+    if (!exists(st) || !is_regular_file(st))
+      return 1;
+    if ((st.permissions() & perms::others_read) == perms::none)
+      return 1;
+    // It seems reasonable to assume this call should succeed.
+    file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
+  }
+
+
+Having a Smaller Resolution than ``timespec``
+---------------------------------------------
+
+As mentioned in the previous section, one way to solve the range problem
+is by reducing the resolution, and matching the range of ``timespec`` using a
+64 bit representation requires limiting the resolution to seconds.
+
+So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
+I limit my consideration of the point to this: Why was it not good enough for
+the underlying system interfaces? If it wasn't good enough for them, then it
+isn't good enough for us. Our job is to match the filesystems range and
+representation, not design it.
+
+
+Having a Larger Range than ``timespec``
+----------------------------------------
+
+We also should consider the opposite problem of having ``file_time_type``
+be able to represent a larger range than that of ``timespec``. At least in
+this case ``last_write_time`` can be used to get and set all possible values
+supported by the underlying filesystem; meaning ``last_write_time(p)`` will
+never throw a overflow error.
+
+However, this introduces a new problem, where users are allowed to create time
+points beyond what the filesystem can represent. Two particular values are
+``file_time_type::min()`` and ``file_time_type::max()``. As such the following
+code would throw:
+
+.. code-block:: cpp
+
+  void test() {
+    last_write_time("/tmp/foo", file_time_type::max()); // Throws
+    last_write_time("/tmp/foo", file_time_type::min()); // Throws.
+  }
+
+Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
+a valid time point, adding a couple hundred billions of years to it in error,
+and then trying to update a files write time with that value very often.
+
+Compared to having a smaller range, this problem seems preferable. At least
+now we can represent any time point the filesystem can, so users won't be forced
+to revert back to system interfaces to avoid limitations in the C++ STL.
+
+I posit that we should only consider this concern *after* we have something
+with at least the same range and resolution of the underlying filesystem. The
+latter two problems are much more important to solve.
+
+Potential Solutions And Their Complications
+===========================================
+
+Source Code Portability Across Implementations
+-----------------------------------------------
+
+As we've discussed, ``file_time_type`` needs a representation that uses more
+than 64 bits. The possible solutions include using ``__int128_t``, emulating a
+128 bit integer using a class, or potentially defining a ``timespec`` like
+arithmetic type. All three will solve allow us to, at minimum, match the range
+and resolution, and the last one might even allow us to match them exactly.
+
+But when considering these potential solutions, we need to consider more than
+just the values they can represent. We need to consider the effect they will have
+on users and their code. For example, each of them breaks the following code
+in some way:
+
+.. code-block:: cpp
+
+  // Bug caused by an unexpected 'rep' type returned by count.
+  void print_time(path p) {
+    // __int128_t doesn't have streaming operators, and neither would our
+    // custom arithmetic types.
+    cout << last_write_time(p).time_since_epoch().count() << endl;
+  }
+
+  // Overflow during creation bug.
+  file_time_type timespec_to_file_time_type(struct timespec ts) {
+    // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
+    // this may overflow before it's converted to a file_time_type.
+    auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
+    return file_time_type(dur);
+  }
+
+  file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
+    // This is the correct version of the above example, where we
+    // avoid using the chrono typedefs as their not sufficient.
+    // Can we expect users to avoid this bug?
+    using fs_seconds = chrono::duration<file_time_type::rep>;
+    using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
+    auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
+    return file_time_type(dur);
+  }
+
+  // Implicit truncation during conversion bug.
+  intmax_t get_time_in_seconds(path p) {
+    using fs_seconds = duration<file_time_type::rep, /* default ration */>;
+    auto tp = last_write_time(p);
+
+    // This works with truncation for __int128_t, but what does it do for
+    // our custom arithmetic types.
+    return duration_cast<fs_seconds>().count();
+  }
+
+
+Each of the above examples would require a user to adjust their filesystem code
+to the particular eccentricities of the representation type, hopefully in
+such a way that the code is still portable across implementations.
+
+It seems like at least some of the above issues are unavoidable, no matter
+what representation we choose. But some representations may be quirkier than
+others, and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
+provides the least aberrant behavior.
+
+
+Chrono and ``timespec`` Emulation.
+----------------------------------
+
+One of the options we've considered is using something akin to ``timespec``
+to represent the ``file_time_type``. It only seems natural seeing as that's
+what the underlying system uses, and because it might allow us to match
+the range and resolution exactly. But would it work with chrono? And could
+it still act at all like a ``timespec`` struct?
+
+For ease of consideration, lets consider the what the implementation might
+look like.
+
+.. code-block:: cpp
+
+  struct fs_timespec_rep {
+    fs_timespec_rep(long long v)
+      : tv_sec(v / nano::den), tv_nsec(v % nano::den)
+    { }
+  private:
+    time_t tv_sec;
+    long tv_nsec;
+  };
+  bool operator==(fs_timespec_rep, fs_timespec_rep);
+  fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
+  // ... arithmetic operators ... //
+
+The first thing to notice is that we can't construct ``fs_timespec_rep`` like
+a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to constructing
+it from a single 64 bit integer.
+
+We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
+directly. A ``chrono::duration`` represents its value as a tick period and a
+number of ticks stored using ``rep``. The representation is unaware of the
+tick period its being used to represent, but ``timespec`` is setup to assume a
+nanosecond tick period. That's the only case where the names of the ``tv_sec``
+and ``tv_nsec`` members matches the values they store.
+
+When we convert a nanosecond duration to a seconds, ``fs_timespec_rep``
+will be using ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec``
+the remaining seconds. Lets consider how this might cause a bug were users allowed
+to manipulate the fields directly.
+
+.. code-block:: cpp
+
+  template <class Period>
+  timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
+    fs_timespec_rep rep = dur.count();
+    return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
+  }
+
+  template <class Duration>
+  Duration convert_to_duration(timespec ts) {
+    Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
+    return file_time_type(dur);
+    file_time_type tp = last_write_time(p);
+    auto dur =
+  }
+
+  time_t extract_seconds(file_time_type tp) {
+    // Converting to seconds is a silly bug, but I could see it happening.
+    using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
+    auto secs = duration_cast<Secs>(tp.time_since_epoch());
+    // tv_sec is now representing gigaseconds.
+    return secs.count().tv_sec; // Oops!
+  }
+
+However, despite ``fs_timespec_rep`` not being usable in any manner resembling
+``timespec``, it still might buy us our goal of matching its range exactly,
+right?
+
+Sort of. Chrono provides a specialization point which specifies the minimum
+and maximum values for a custom representation. It looks like this:
+
+.. code-block:: cpp
+
+  template <>
+  struct duration_values<fs_timespec_rep> {
+    static fs_timespec_rep zero();
+    static fs_timespec_rep min();
+    static fs_timespec_rep max() { // assume friendship.
+      fs_timespec_rep val;
+      val.tv_sec = numeric_limits<time_t>::max();
+      val.tv_nsec = nano::den - 1;
+      return val;
+    }
+  };
+
+Notice that ``duration_values`` doesn't tell the representation what tick period
+it's actually representing. This would indeed correctly limit the range of
+``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
+nanoseconds isn't the only tick period it will be used to represent. For example:
+
+.. code-block:: cpp
+
+  void test() {
+    using rep = file_time_type::rep;
+    using fs_nsec = duration<rep, nano>;
+    using fs_sec = duration<rep>;
+    fs_nsec nsecs(fs_seconds::max()); // Truncates
+  }
+
+Though the above example may appear silly, I it follows from the incorrect
+notion that using a ``timespec`` rep in chrono actually makes it act as if it
+were an actual ``timespec``.
+
+Interactions with 32 bit ``time_t``
+-----------------------------------
+
+Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
+about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
+for 32 bit builds).
+
+When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
+``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
+involved. And nor should we, as it would suffer from the numerous complications
+described by this paper.
+
+Obviously our implementation for 32-bit builds should act as similarly to the
+64-bit build as possible. Code which compiles in one, should compile in the other.
+This consideration is important when choosing between ``__int128_t`` and
+emulating ``timespec``. The solution which provides the most uniformity is
+the preferable one, with the least eccentricity is the preferable one.
+
+Summary
+=======
+
+The ``file_time_type`` time point is used to represent the write times for files.
+Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
+underlying filesystem uses the ``timespec`` struct for the same purpose.
+
+However, the initial implementation of ``file_time_type`` could not represent
+either the range or resolution of ``timespec``, making it unsuitable. Fixing
+this requires an implementation which uses more than 64 bits to store the
+time point.
+
+We primarily considered two solutions: Using ``__int128_t`` and using a
+arithmetic emulation of ``timespec``. Each has its pros and cons, and both
+come with more than one complication.
+
+The Potential Solutions
+-----------------------
+
+``long long`` - The Status Quo
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Pros:
+
+* As a type ``long long`` places the nicest with others:
+
+  * It works with streaming operators and other library entities which support
+    builtin integer types, but don't support ``__int128_t``.
+  * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
+
+Cons:
+
+* It cannot provide the same resolution as ``timespec`` unless we limit it
+  to a range of +/- 300 years from the epoch.
+* It cannot provide the same range as ``timespec`` unless we limit its resolution
+  to seconds.
+* ``last_write_time`` has to report an error when the time reported by the filesystem
+  is unrepresentable.
+
+__int128_t
+~~~~~~~~~~~
+
+Pros:
+
+* It is an integer type.
+* It makes the implementation simply and efficient.
+* Acts exactly like other arithmetic types.
+* Can be implicitly converted to a builtin integer type by the user.
+
+  * This is important for doing things like:
+
+    .. code-block:: cpp
+
+      void c_interface_using_time_t(const char* p, time_t);
+
+      void foo(path p) {
+        file_time_type tp = last_write_time(p);
+        time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
+        c_interface_using_time_t(p.c_str(), secs);
+      }
+
+Cons:
+
+* It isn't always available (but on 64 bit machines, it normally is).
+* It causes ``file_time_type`` to have a larger range than ``timespec``.
+* It doesn't always act the same as other builtin integer types. For example
+  with ``cout`` or ``to_string``.
+* Allows implicit truncation to 64 bit integers.
+* It can be implicitly converted to a builtin integer type by the user,
+  truncating its value.
+
+Arithmetic ``timespec`` Emulation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Pros:
+
+* It has the exact same range and resolution of ``timespec`` when representing
+  a nanosecond tick period.
+* It's always available, unlike ``__int128_t``.
+
+Cons:
+
+* It has a larger range when representing any period longer than a nanosecond.
+* Doesn't actually allow users to use it like a ``timespec``.
+* The required representation of using ``tv_sec`` to store the giga tick count
+  and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
+  but complicates a lot.
+* It isn't a builtin integer type, and can't be used anything like one.
+* Chrono can be made to work with it, but not nicely.
+* Emulating arithmetic classes come with their own host of problems regarding
+  overload resolution (Each operator needs three SFINAE constrained versions of
+  it in order to act like builtin integer types).
+* It offers little over simply using ``__int128_t``.
+* It acts the most differently than implementations using an actual integer type,
+  which has a high chance of breaking source compatibility.
+
+
+Selected Solution - Using ``__int128_t``
+=========================================
+
+The solution I selected for libc++ is using ``__int128_t`` when available,
+and otherwise falling back to using ``long long`` with nanosecond precision.
+
+When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
+provides same resolution and a greater range than ``timespec``. Otherwise
+it still provides the same resolution, but is limited to a range of +/- 300
+years. This final case should be rather rare, as ``__int128_t``
+is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
+during 32-bit builds.
+
+Although falling back to ``long long`` and nanosecond precision is less than
+ideal, it also happens to be the implementation provided by both libstdc++
+and MSVC. (So that makes it better, right?)
+
+Although the ``timespec`` emulation solution is feasible and would largely
+do what we want, it comes with too many complications, potential problems
+and discrepancies when compared to "normal" chrono time points and durations.
+
+An emulation of a builtin arithmetic type using a class is never going to act
+exactly the same, and the difference will be felt by users. It's not reasonable
+to expect them to tolerate and work around these differences. And once
+we commit to an ABI it will be too late to change. Committing to this seems
+risky.
+
+Therefore, ``__int128_t`` seems like the better solution.
diff --git a/libcxx/docs/Makefile.sphinx b/libcxx/docs/Makefile.sphinx

index ae37a34b54eebbc6dcd46c9016cf25cabb386b16..a34f0cc0bad3eb2eb09d91f1a88805c66826b411 100644 (file)
--- a/libcxx/docs/Makefile.sphinx
+++ b/libcxx/docs/Makefile.sphinx
@@ -5,7 +5,7 @@
  # out-of-tree builds.
  
  # You can set these variables from the command line.
-SPHINXOPTS    = -n -W
+SPHINXOPTS    = -n -W -v
  SPHINXBUILD   = sphinx-build
  PAPER         =
  BUILDDIR      = _build
diff --git a/libcxx/docs/index.rst b/libcxx/docs/index.rst

index 3526b47210f0173fb044a1f89d33e67c003ea638..e4b3a879da9436524796a706fec04ebdb21ec0cb 100644 (file)
--- a/libcxx/docs/index.rst
+++ b/libcxx/docs/index.rst
@@ -134,6 +134,7 @@ Design Documents
     DesignDocs/ABIVersioning
     DesignDocs/VisibilityMacros
     DesignDocs/ThreadingSupportAPI
+   DesignDocs/FileTimeType
  
  * `<atomic> design <http://libcxx.llvm.org/atomic_design.html>`_
  * `<type_traits> design <http://libcxx.llvm.org/type_traits_design.html>`_
author	Eric Fiselier <eric@efcs.ca>
	Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)
committer	Eric Fiselier <eric@efcs.ca>
	Wed, 25 Jul 2018 02:53:53 +0000 (02:53 +0000)
libcxx/docs/DesignDocs/FileTimeType.rst	[new file with mode: 0644]	patch \| blob
libcxx/docs/Makefile.sphinx		patch \| blob \| history
libcxx/docs/index.rst		patch \| blob \| history