libstdc++: Reduce monotonic_buffer_resource overallocation [PR 96942]

The primary reason for this change is to reduce the size of buffers allocated by std::pmr::monotonic_buffer_resource. Previously, a new buffer would always add the size of the linked list node (11 bytes) and then round up to the next power of two. This results in a huge increase if the expected size of the next buffer is already a power of two. For example, if the resource is constructed with a desired initial size of 4096 the first buffer it allocates will be std::bit_ceil(4096+11) which is 8192. If the user has carefully selected the initial size to match their expected memory requirements then allocating double that amount wastes a lot of memory. After this patch the allocated size will be rounded up to a 64-byte boundary, instead of to a power of two. This means for an initial size of 4096 only 4160 bytes get allocated. Previously only the base-2 logarithm of the size was stored, which could be stored in a single 8-bit integer. Now that the size isn't always a power of two we need to use more bits to store it. As the size is always a multiple of 64 the low six bits are not needed, and so we can use the same approach that the pool resources already use of storing the base-2 logarithm of the alignment in the low bits that are not used for the size. To avoid code duplication, a new aligned_size<N> helper class is introduced by this patch, which is then used by both the pool resources' big_block type and the monotonic_buffer_resource::_Chunk type. Originally the big_block type used two bit-fields to store the size and alignment in the space of a single size_t member. The aligned_size type uses a single size_t member and uses masks and bitwise operations to manipulate the size and alignment values. This results in better code than the old version, because the bit-fields weren't optimally ordered for little endian architectures, so the alignment was actually stored in the high bits, not the unused low bits, requiring additional shifts to calculate the values. Using bitwise operations directly avoids needing to reorder the bit-fields depending on the endianness. While adapting the _Chunk and big_block types to use aligned_size<N> I also added checks for size overflows (technically, unsigned wraparound). The memory resources now ensure that when they require an allocation that is too large to represent in size_t they will request SIZE_MAX bytes from the upstream resource, rather than requesting a small value that results from wrapround. The testsuite is enhanced to verify this. libstdc++-v3/ChangeLog: PR libstdc++/96942 * include/std/memory_resource (monotonic_buffer_resource::do_allocate): Use __builtin_expect when checking if a new buffer needs to be allocated from the upstream resource, and for checks for edge cases like zero sized buffers and allocations. * src/c++17/memory_resource.cc (aligned_size): New class template. (aligned_ceil): New helper function to round up to a given alignment. (monotonic_buffer_resource::chunk): Replace _M_size and _M_align with an aligned_size member. Remove _M_canary member. Change _M_next to pointer instead of unaligned buffer. (monotonic_buffer_resource::chunk::allocate): Round up to multiple of 64 instead of to power of two. Check for size overflow. Remove redundant check for minimum required alignment. (monotonic_buffer_resource::chunk::release): Adjust for changes to data members. (monotonic_buffer_resource::_M_new_buffer): Use aligned_ceil. (big_block): Replace _M_size and _M_align with aligned_size member. (big_block::big_block): Check for size overflow. (big_block::size, big_block::align): Adjust to use aligned_size. (big_block::alloc_size): Use aligned_ceil. (munge_options): Use aligned_ceil. (__pool_resource::allocate): Use big_block::align for alignment. * testsuite/20_util/monotonic_buffer_resource/allocate.cc: Check upstream resource gets expected values for impossible sizes. * testsuite/20_util/unsynchronized_pool_resource/allocate.cc: Likewise. Adjust checks for expected alignment in existing test.
2020-09-10 15:39:15 +01:00 · 2020-09-10 15:39:15 +01:00 · 1e718ec51a
commit 1e718ec51a
parent f40866967d
4 changed files with 203 additions and 69 deletions
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@ -636,11 +636,11 @@ namespace pmr
    void*
    do_allocate(size_t __bytes, size_t __alignment) override
    {
-      if (__bytes == 0)
+      if (__builtin_expect(__bytes == 0, false))
 	__bytes = 1; // Ensures we don't return the same pointer twice.

      void* __p = std::align(__alignment, __bytes, _M_current_buf, _M_avail);
-      if (!__p)
+      if (__builtin_expect(__p == nullptr, false))
 	{
 	  _M_new_buffer(__bytes, __alignment);
 	  __p = _M_current_buf;
@ -671,7 +671,7 @@ namespace pmr
    static size_t
    _S_next_bufsize(size_t __buffer_size) noexcept
    {
-      if (__buffer_size == 0)
+      if (__builtin_expect(__buffer_size == 0, false))
 	__buffer_size = 1;
      return __buffer_size * _S_growth_factor;
    }
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@ -175,6 +175,47 @@ namespace pmr
  // versions will not use this symbol.
  monotonic_buffer_resource::~monotonic_buffer_resource() { release(); }

+  namespace {
+
+  // aligned_size<N> stores the size and alignment of a memory allocation.
+  // The size must be a multiple of N, leaving the low log2(N) bits free
+  // to store the base-2 logarithm of the alignment.
+  // For example, allocate(1024, 32) is stored as 1024 + log2(32) = 1029.
+  template<unsigned N>
+  struct aligned_size
+  {
+    // N must be a power of two
+    static_assert( std::__popcount(N) == 1 );
+
+    static constexpr size_t _S_align_mask = N - 1;
+    static constexpr size_t _S_size_mask = ~_S_align_mask;
+
+    constexpr
+    aligned_size(size_t sz, size_t align) noexcept
+    : value(sz | (std::__bit_width(align) - 1u))
+    {
+      __glibcxx_assert(size() == sz); // sz must be a multiple of N
+    }
+
+    constexpr size_t
+    size() const noexcept
+    { return value & _S_size_mask; }
+
+    constexpr size_t
+    alignment() const noexcept
+    { return size_t(1) << (value & _S_align_mask); }
+
+    size_t value; // size | log2(alignment)
+  };
+
+  // Round n up to a multiple of alignment, which must be a power of two.
+  constexpr size_t aligned_ceil(size_t n, size_t alignment)
+  {
+    return (n + alignment - 1) & ~(alignment - 1);
+  }
+
+  } // namespace
+
  // Memory allocated by the upstream resource is managed in a linked list
  // of _Chunk objects. A _Chunk object recording the size and alignment of
  // the allocated block and a pointer to the previous chunk is placed
@ -189,23 +230,26 @@ namespace pmr
    allocate(memory_resource* __r, size_t __size, size_t __align,
 	     _Chunk*& __head)
    {
-      __size = std::__bit_ceil(__size + sizeof(_Chunk));
+      const size_t __orig_size = __size;

-      if constexpr (alignof(_Chunk) > 1)
+      // Add space for the _Chunk object and round up to 64 bytes.
+      __size = aligned_ceil(__size + sizeof(_Chunk), 64);
+
+      // Check for unsigned wraparound
+      if (__size < __orig_size) [[unlikely]]
 	{
-	  // PR libstdc++/90046
-	  // For targets like epiphany-elf where alignof(_Chunk) != 1
-	  // ensure that the last sizeof(_Chunk) bytes in the buffer
-	  // are suitably-aligned for a _Chunk.
-	  // This should be unnecessary, because the caller already
-	  // passes in max(__align, alignof(max_align_t)).
-	  if (__align < alignof(_Chunk))
-	    __align = alignof(_Chunk);
+	  // monotonic_buffer_resource::do_allocate is not allowed to throw.
+	  // If the required size is too large for size_t then ask the
+	  // upstream resource for an impossibly large size and alignment.
+	  __size = -1;
+	  __align = ~(size_t(-1) >> 1);
 	}

      void* __p = __r->allocate(__size, __align);

      // Add a chunk defined by (__p, __size, __align) to linked list __head.
+      // We know the end of the buffer is suitably-aligned for a _Chunk
+      // because the caller ensured __align is at least alignof(max_align_t).
      void* const __back = (char*)__p + __size - sizeof(_Chunk);
      __head = ::new(__back) _Chunk(__size, __align, __head);
      return { __p, __size - sizeof(_Chunk) };
@ -220,16 +264,9 @@ namespace pmr
      while (__next)
 	{
 	  _Chunk* __ch = __next;
-	  __builtin_memcpy(&__next, __ch->_M_next, sizeof(_Chunk*));
-
-	  __glibcxx_assert(__ch->_M_canary != 0);
-	  __glibcxx_assert(__ch->_M_canary == (__ch->_M_size|__ch->_M_align));
-
-	  if (__ch->_M_canary != (__ch->_M_size | __ch->_M_align))
-	    return; // buffer overflow detected!
-
-	  size_t __size = (size_t)1 << __ch->_M_size;
-	  size_t __align = (size_t)1 << __ch->_M_align;
+	  __next = __ch->_M_next;
+	  size_t __size = __ch->_M_size.size();
+	  size_t __align = __ch->_M_size.alignment();
 	  void* __start = (char*)(__ch + 1) - __size;
 	  __r->deallocate(__start, __size, __align);
 	}
@ -237,24 +274,18 @@ namespace pmr

  private:
    _Chunk(size_t __size, size_t __align, _Chunk* __next) noexcept
-    : _M_size(std::__bit_width(__size) - 1),
-      _M_align(std::__bit_width(__align) - 1)
-    {
-      __builtin_memcpy(_M_next, &__next, sizeof(__next));
-      _M_canary = _M_size | _M_align;
-    }
+    : _M_size(__size, __align), _M_next(__next)
+    { }

-    unsigned char _M_canary;
-    unsigned char _M_size;
-    unsigned char _M_align;
-    unsigned char _M_next[sizeof(_Chunk*)];
+    aligned_size<64> _M_size;
+    _Chunk* _M_next;
  };

  void
  monotonic_buffer_resource::_M_new_buffer(size_t bytes, size_t alignment)
  {
    const size_t n = std::max(bytes, _M_next_bufsiz);
-    const size_t m = std::max(alignment, alignof(std::max_align_t));
+    const size_t m = aligned_ceil(alignment, alignof(std::max_align_t));
    auto [p, size] = _Chunk::allocate(_M_upstream, n, m, _M_head);
    _M_current_buf = p;
    _M_avail = size;
@ -550,49 +581,43 @@ namespace pmr
  // An oversized allocation that doesn't fit in a pool.
  struct big_block
  {
-    // Alignment must be a power-of-two so we only need to use enough bits
-    // to store the power, not the actual value:
-    static constexpr unsigned _S_alignbits
-      = std::__bit_width((unsigned)numeric_limits<size_t>::digits - 1);
-    // Use the remaining bits to store the size:
-    static constexpr unsigned _S_sizebits
-      = numeric_limits<size_t>::digits - _S_alignbits;
-    // The maximum value that can be stored in _S_size
-    static constexpr size_t all_ones = size_t(-1) >> _S_alignbits;
-    // The minimum size of a big block (smaller sizes will be rounded up).
-    static constexpr size_t min = 1u << _S_alignbits;
+    // The minimum size of a big block.
+    // All big_block allocations will be a multiple of this value.
+    // Use bit_ceil to get a power of two even for e.g. 20-bit size_t.
+    static constexpr size_t min = __bit_ceil(numeric_limits<size_t>::digits);

+    constexpr
    big_block(size_t bytes, size_t alignment)
-    : _M_size(alloc_size(bytes) >> _S_alignbits),
-      _M_align_exp(std::__bit_width(alignment) - 1u)
-    { }
+    : _M_size(alloc_size(bytes), alignment)
+    {
+      // Check for unsigned wraparound
+      if (size() < bytes) [[unlikely]]
+	{
+	  // (sync|unsync)_pool_resource::do_allocate is not allowed to throw.
+	  // If the required size is too large for size_t then ask the
+	  // upstream resource for an impossibly large size and alignment.
+	  _M_size.value = -1;
+	}
+    }

    void* pointer = nullptr;
-    size_t _M_size : numeric_limits<size_t>::digits - _S_alignbits;
-    size_t _M_align_exp : _S_alignbits;
+    aligned_size<min> _M_size;

    size_t size() const noexcept
    {
-      // If all bits are set in _M_size it means the maximum possible size:
-      if (__builtin_expect(_M_size == (size_t(-1) >> _S_alignbits), false))
-	return (size_t)-1;
-      else
-	return _M_size << _S_alignbits;
+      if (_M_size.value == size_t(-1)) [[unlikely]]
+	return size_t(-1);
+      return _M_size.size();
    }

-    size_t align() const noexcept { return size_t(1) << _M_align_exp; }
+    size_t align() const noexcept
+    { return _M_size.alignment(); }

    // Calculate size to be allocated instead of requested number of bytes.
    // The requested value will be rounded up to a multiple of big_block::min,
-    // so the low _S_alignbits bits are all zero and don't need to be stored.
+    // so the low bits are all zero and can be used to hold the alignment.
    static constexpr size_t alloc_size(size_t bytes) noexcept
-    {
-      const size_t s = bytes + min - 1u;
-      if (__builtin_expect(s < bytes, false))
-	return size_t(-1); // addition wrapped past zero, return max value
-      else
-	return s & ~(min - 1u);
-    }
+    { return aligned_ceil(bytes, min); }

    friend bool operator<(void* p, const big_block& b) noexcept
    { return less<void*>{}(p, b.pointer); }
@ -895,9 +920,8 @@ namespace pmr
      {
 	// Round to preferred granularity
 	static_assert(std::__has_single_bit(pool_sizes[0]));
-	constexpr size_t mask = pool_sizes[0] - 1;
-	opts.largest_required_pool_block += mask;
-	opts.largest_required_pool_block &= ~mask;
+	opts.largest_required_pool_block
+	  = aligned_ceil(opts.largest_required_pool_block, pool_sizes[0]);
      }

    if (opts.largest_required_pool_block < big_block::min)
@ -964,7 +988,9 @@ namespace pmr
    auto& b = _M_unpooled.emplace_back(bytes, alignment);
    __try {
      // N.B. need to allocate b.size(), which might be larger than bytes.
-      void* p = resource()->allocate(b.size(), alignment);
+      // Also use b.align() instead of alignment parameter, which will be
+      // an impossibly large value if (bytes+bookkeeping) > SIZE_MAX.
+      void* p = resource()->allocate(b.size(), b.align());
      b.pointer = p;
      if (_M_unpooled.size() > 1)
 	{
--- a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc
+++ b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc
@ -210,6 +210,51 @@ test06()
  }
 }

+void
+test07()
+{
+  // Custom exception thrown on expected allocation failure.
+  struct very_bad_alloc : std::bad_alloc { };
+
+  struct careful_resource : __gnu_test::memory_resource
+  {
+    void* do_allocate(std::size_t bytes, std::size_t alignment)
+    {
+      // pmr::monotonic_buffer_resource::do_allocate is not allowed to
+      // throw an exception when asked for an allocation it can't satisfy.
+      // The libstdc++ implementation will ask upstream to allocate
+      // bytes=SIZE_MAX and alignment=bit_floor(SIZE_MAX) instead of throwing.
+      // Verify that we got those values:
+      if (bytes != std::numeric_limits<std::size_t>::max())
+	VERIFY( !"upstream allocation should request maximum number of bytes" );
+      if (alignment != (1 + std::numeric_limits<std::size_t>::max() / 2))
+	VERIFY( !"upstream allocation should request maximum alignment" );
+
+      // A successful failure:
+      throw very_bad_alloc();
+    }
+  };
+
+  careful_resource cr;
+  std::pmr::monotonic_buffer_resource mbr(&cr);
+  try
+  {
+    // Try to allocate a ridiculous size:
+    void* p = mbr.allocate(std::size_t(-2), 1);
+    // Should not reach here!
+    VERIFY( !"attempt to allocate SIZE_MAX-1 should not have succeeded" );
+    throw p;
+  }
+  catch (const very_bad_alloc&)
+  {
+    // Should catch this exception from careful_resource::do_allocate
+  }
+  catch (const std::bad_alloc&)
+  {
+    VERIFY( !"monotonic_buffer_resource::do_allocate is not allowed to throw" );
+  }
+}
+
 int
 main()
 {
@ -219,4 +264,5 @@ main()
  test04();
  test05();
  test06();
+  test07();
 }
--- a/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
+++ b/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
@ -189,7 +189,16 @@ test06()
      if (bytes < expected_size)
 	throw bad_size();
      else if (align != expected_alignment)
-	throw bad_alignment();
+      {
+	if (bytes == std::numeric_limits<std::size_t>::max()
+	    && align == (1 + std::numeric_limits<std::size_t>::max() / 2))
+	{
+	  // Pool resources request bytes=SIZE_MAX && align=bit_floor(SIZE_MAX)
+	  // when they are unable to meet an allocation request.
+	}
+	else
+	  throw bad_alignment();
+      }
      // Else just throw, don't really try to allocate:
      throw std::bad_alloc();
    }
@ -239,6 +248,58 @@ test06()
  }
 }

+void
+test07()
+{
+  // Custom exception thrown on expected allocation failure.
+  struct very_bad_alloc : std::bad_alloc { };
+
+  struct careful_resource : __gnu_test::memory_resource
+  {
+    void* do_allocate(std::size_t bytes, std::size_t alignment)
+    {
+      // Need to allow normal allocations for the pool resource's internal
+      // data structures:
+      if (alignment < 1024)
+	return __gnu_test::memory_resource::do_allocate(bytes, alignment);
+
+      // pmr::unsynchronized_pool_resource::do_allocate is not allowed to
+      // throw an exception when asked for an allocation it can't satisfy.
+      // The libstdc++ implementation will ask upstream to allocate
+      // bytes=SIZE_MAX and alignment=bit_floor(SIZE_MAX) instead of throwing.
+      // Verify that we got those values:
+      if (bytes != std::numeric_limits<size_t>::max())
+	VERIFY( !"upstream allocation should request SIZE_MAX bytes" );
+      if (alignment != (1 + std::numeric_limits<size_t>::max() / 2))
+	VERIFY( !"upstream allocation should request SIZE_MAX/2 alignment" );
+
+      // A successful failure:
+      throw very_bad_alloc();
+    }
+  };
+
+  careful_resource cr;
+  std::pmr::unsynchronized_pool_resource upr(&cr);
+  try
+  {
+    // Try to allocate a ridiculous size (and use a large extended alignment
+    // so that careful_resource::do_allocate can distinguish this allocation
+    // from any required for the pool resource's internal data structures):
+    void* p = upr.allocate(std::size_t(-2), 1024);
+    // Should not reach here!
+    VERIFY( !"attempt to allocate SIZE_MAX-1 should not have succeeded" );
+    throw p;
+  }
+  catch (const very_bad_alloc&)
+  {
+    // Should catch this exception from careful_resource::do_allocate
+  }
+  catch (const std::bad_alloc&)
+  {
+    VERIFY( !"unsynchronized_pool_resource::do_allocate is not allowed to throw" );
+  }
+}
+
 int
 main()
 {
@ -248,4 +309,5 @@ main()
  test04();
  test05();
  test06();
+  test07();
 }