Loading...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | # xzone malloc xzone malloc is a memory allocator for Apple OS platforms designed to mitigate heap memory safety vulnerabilities to the maximum extent possible while also achieving excellent performance. It is a part of Apple's [Memory Integrity Enforcement][mie] technology. ## Security features Key security features of xzone malloc include: - Bucketed type isolation - Zero-on-free - Externalized metadata - Probabilistic guard pages - Allocation fronts - MTE support ### Type isolation One of the most important security features of xzone malloc is _bucketed type isolation_. On Apple platforms, clients of the system memory allocator pass information about the associated type for each allocation in addition to size. xzone malloc uses this information to partition the type space into buckets and serve allocations for each bucket from mutually isolated areas of the virtual address space. This makes it impossible to _reliably_ cause allocations that may fall into different buckets to ever share the same virtual address, which disrupts many use-after-free type confusion exploitation techniques that depend on this. The benefit of using the bucketed approach for this mitigation, rather than trying to isolate at the level of individual types, is that the fragmentation impact is significantly lower. There's a security-performance trade-off in the selection of the bucket count, with more buckets being better for security but coming at the cost of increased fragmentation, so deployments of xzone malloc in different contexts are able to choose the best practical option for that context's security significance and memory constraints. Type information used to perform the bucketing of allocations is passed to libmalloc in the form of "type descriptors", which are described in `<malloc/malloc.h>` and come from a number of possible sources: - For manual C/C++ calls to `malloc()`/etc, the [Typed Memory Operations][tmo] [compiler feature][tmo-rfc], which is enabled by default for software in Apple operating systems, infers the allocation type from the passed size expression and rewrites the call to the corresponding `malloc_type_` interface - For C++ `operator new`, Typed Memory Operations uses the type information available via the language and rewrites the call to the typed `operator new` entrypoint - Allocations for objects in the Objective-C and Swift languages have suitable type descriptors synthesized by the runtime - When no type information is supplied directly to the allocator, the caller program counter value is used as a fallback proxy for type xzone malloc's partitioning policy varies according to OS platform, process type and other factors, but normally involves: - A special bucket for "pure data" allocations, i.e. allocations that should not contain pointers - A special bucket for Objective-C object allocations - N general or "pointer" buckets, which contain allocations not falling into any of the special categories The number of general buckets ranges from 1 to 4 depending on configuration. Allocations are assigned randomly to buckets by type using a source of entropy that is stable across executions of a given binary within the same boot, to prevent an attacker to achieve a desired bucketing by repeatedly crashing their process of interest. Allocations for a given (size, type space partition) combination are served from virtual addresses that are isolated from any others for the life of the program, so that it's not possible to reliably cause allocations that may fall into different buckets to ever share the same virtual address. #### Early allocator To mitigate fragmentation caused by type isolation in cases where a particular size-and-type bucket is only lightly utilized, xzone malloc has special functionality for serving "early" allocations. An allocation from a particular bucket is considered early if it's one of the first N allocations of that type in the lifetime of the address space. These allocations generally aren't easily controlled by attackers, so by policy they are allowed to be allocated from a separate, simpler allocator that doesn't enforce the type isolation properties and that is optimized to minimize fragmentation. ### Zero-on-free To reduce the risk of an information leak due to missing initialization of memory, allocations below a size threshold (currently 1KB) are zeroed on free. ### Externalized metadata Almost all of xzone malloc's metadata is stored in portions of the address space that are located separately and at unpredictable offsets from the allocations themselves. This prevents an attacker from turning a general heap memory-safety bug into corruption of the allocator metadata. The only case where xzone malloc uses inline metadata is for free-list linkages, and it takes special steps to defend these from manipulation: when [ARM pointer authentication][pac] is available, each free-list linkage is PAC-signed, with the signature incorporating a sequence number to prevent straightforward replay attacks. ### Probabilistic guard pages When resources permit, xzone malloc probabilistically places inaccessible guard pages between ranges of the heap being used to serve allocations. This is intended to frustrate grooming of the heap and exploitation of out-of-bounds bugs by making it difficult to predict whether any given page will be mapped or used for any particular bucket, even given knowledge about allocations in other pages. ### Allocation fronts To prevent allocations falling into different buckets from being reliably interleaved, xzone malloc partitions the set of buckets into two groups that are each assigned a direction in which to grow within the virtual address space (i.e. a "front"). This ensures that it's not possible to induce a reliable spatial A-before-B placement relationship between any two types A and B that may fall into different buckets. ### MTE support xzone malloc supports [ARM MTE][mte]. When configured to do so on supported hardware, xzone malloc assigns MTE tags to most blocks that are <= 32KB in size. This strongly mitigates exploitation of many use-after-free and overflow bugs. For blocks up to 4KB in size, tags for each block are reassigned on free. The tag-on-free policy is excellent for security and bug-finding, catching use-after-free immediately after deallocation with high probability. For larger blocks up to 32KB in size, tags are reassigned on allocation. The tag-on-alloc policy is better for performance at larger sizes, although it comes at the cost of allowing use-after-free accesses to free blocks before they're next reallocated. That allows an attacker to use use-after-free accesses to a block for scratch space, but they are still prevented from exploiting use-after-free type confusion since that requires the block to be reused. When assigning tags, the tags of the previous incarnation of the block as well as its neighbors are excluded (except, as a current implementation detail, at page boundaries). Under the default policy, allocations that are "pure data", i.e. allocations whose type information indicates that they contain no pointers, are not tagged. xzone malloc can be configured to also tag them via entitlement. ## Configuration xzone malloc's configuration can be modified via mechanisms including entitlements and environment variables in cases where a different security or performance profile than the platform default is needed. ### Hardened heap The "hardened heap" configuration of xzone malloc enables some security features and behavior that are too costly to be part of the platform defaults, but provide valuable additional protection in especially security-sensitive processes. It is engaged: - By the `com.apple.security.hardened-process.hardened-heap` [entitlement][hardened-heap], which is part of the set included by the [Enhanced Security][enhanced-security] capability in Xcode - By the entitlements in the `com.apple.developer.web-browser-engine` family - And by default for certain Apple operating system processes On all platforms, the probabilistic guard pages feature is enabled by the hardened heap configuration. On iOS and watchOS, the hardened heap configuration increases the number of general type isolation buckets. ### MTE xzone malloc's support for MTE is engaged in processes that have MTE enabled. MTE is enabled for a process via the `com.apple.security.hardened-process.checked-allocations` [entitlement][checked-allocations]. Tagging of pure-data allocations up to the tagging size threshold is enabled via the `com.apple.security.hardened-process.checked-allocations.enable-pure-data` [entitlement][enable-pure-data]. ## Design overview An individual allocation served by xzone malloc is a **block**. The finest granularity of virtual memory at which xzone malloc normally manages metadata is a **slice**. This is typically one operating system virtual page, which is 16KB on most Apple platforms. A **span** is a contiguous range of one or more slices. A span that is currently being used to serve one or more blocks is a **chunk**. A span that isn't currently in use is a **free span**. A reservation of virtual address space from which chunks are allocated is a **segment**. The standard segment size in xzone malloc is 4MB. Each segment has its own **segment metadata array**, which is an array with an entry for each slice in the segment containing the metadata (e.g. free-list head or bitmap) for that slice. xzone malloc uses different strategies to serve allocations depending on their size and its configuration. For allocations that are around the standard segment size, xzone malloc allocates a private segment for each allocation, so there is a 1:1:1 relationship between block : chunk : segment. This is the `HUGE` allocation strategy. Allocations that are greater than the slice size are served from standard-sized segments that are subdivided into smaller chunks. In this case, the block : chunk relationship is 1:1, and the chunk : segment relationship is many-to-1. This is the `LARGE` allocation strategy. The _segment layer_ of the allocator is responsible for keeping track of the set of segments in use and which spans within each are free or in use. The **segment tables** are a simple sparse directly-indexed data structure that map each segment-sized granule of the virtual address space to the location of the segment metadata for that segment if one is present. They allow for fast lookup for the metadata associated with any address in the virtual address space, and the indirection they provide enables segment metadata to be kept separate from the segment bodies. The set of all segments can be enumerated by walking the tables. The **segment group** is the entity that tracks the set of free spans across the set of segments that can be used for a particular purpose, and it does so with an array of size-segregated **span queues**. The span queues are maintained by a straightforward split-and-coalesce allocator approach: - Allocation from a segment group takes the smallest available span that will serve the request, splitting off any unneeded remainder size and re-enqueuing it on the appropriate smaller span-queue - Deallocation to a segment group checks the state of the adjacent spans, coalescing with either if they're free and enqueuing the resulting coalesced span to the span queue of the corresponding size For sizes smaller than the slice size, xzone malloc uses a slab allocator design organized around an abstraction called an **xzone**: - An xzone is a slab allocator that serves allocations of a single size - An xzone serves allocations by splitting chunks (i.e. slabs) into equally-sized blocks of the xzone's block size - Each chunk maintains metadata about which blocks within the chunk are allocated and free - xzones keep track of the set of chunks that belong to them, including "current" chunks being used to serve new allocations, partially-full chunks that can be used to serve additional allocations, and empty chunks, which hold no allocations but must be kept isolated to the xzone in order to maintain type isolation - When an xzone has no existing chunks that can be used to serve a new allocation request, it allocates a new chunk from the segment group it is configured with The name "xzone" is short for "xnu-style zone", since they are intended to be similar in concept to the zones in xnu's zone allocator. The "x" qualifier is necessary because "zone" already has a distinct meaning in Darwin's userspace libmalloc, referring to the allocator instances that are interacted with via the `malloc_zone*` interfaces. The sizes served by xzones are quantized into discrete size classes referred to as **bins**. Each bin is served by a set of xzones according to the type isolation policy. On allocation, xzone malloc computes the bin for the allocation from the requested size, and then computes which of the bin's xzones to use based on the supplied type information. xzones use either a **free-list** or **bitmap** approach to track the allocated and free blocks within each of their chunks. There are a few different types of xzones: - The `TINY` xzones are used to serve allocations that are <= 4KB in size, from single-slice chunks. They maintain per-chunk free-lists that are manipulated atomically. - The `SMALL` xzones are normally used to serve allocations where `4KB < size <= 32KB`, from 4-slice chunks. They maintain per-chunk bitmaps of free blocks and use chunk-level locking for synchronization. - In higher-performance configurations, `SMALL_FREELIST` xzones are used to serve `4KB < size <= 32KB` allocations from 8-slice chunks, using the same atomic free-list approach as `TINY`. ## Performance features ### Deferred reclaim One of the traditional trade-offs balanced by a memory allocator is between the memory cost of holding on to pages that have become empty but might be used again in the future and the CPU cost of decommitting those pages so that they can be reused by the rest of the operating system. Allocators prioritizing speed tend to hold on to such pages, while allocators prioritizing memory efficiency tend to give them back. Darwin has introduced a new primitive for decommitting pages that aims to reduce the need for this trade-off by reducing the cost of the operation, called **deferred reclaim**. Rather than a synchronous syscall like the previous `madvise(MADV_FREE_REUSABLE)` primitive, deferred reclaim uses a shared memory ringbuffer between the kernel and the userspace process into which userspace can enqueue entries describing ranges of virtual address space to be decommitted. The kernel monitors process-local and system-wide memory conditions to determine whether and when to drain from this ringbuffer and actually decommit pages in it. When userspace wants to re-commit a range that was previous enqueued, it can try to remove the entry that it placed in the ring, and if the kernel hasn't reclaimed it yet, no syscalls were required on either the decommit or recommit side of the operation. xzone malloc uses this primitive to tell the kernel as eagerly as possible about all of the pages available to be reclaimed while avoiding the expensive syscalls traditionally required to do so, improving both CPU performance and memory efficiency. ### Contention detection and thread caching Memory allocators often make use of per-thread or per-CPU caches of blocks to enable their fast paths to be simple and scale well for multi-core workloads. However, this technique also generally results in increased fragmentation, as opportunities for block reuse are limited by the thread/CPU separation. xzone malloc implements two performance features to balance between these: - **Contention detection** is applied for all xzones: - Each xzone starts out using a single current chunk - Contention on metadata updates is monitored (via metadata atomic compare-exchange failures), and if a threshold level of contention is reached, the xzone upgrades to using per-CPU (or per asymmetric multiprocessing cluster, on Apple Silicon hardware) multiple current chunks - **Thread caching** is applied for xzones serving very small block sizes: - Each xzone starts out with no thread-level cache - Allocation volume and metadata contention are monitored for each xzone on each thread, and if a threshold level of either is reached, a thread-level cache is brought up Both of these features default to a memory-efficient starting configuration, and transition to a higher-performance configuration in response to observed runtime conditions. ## Ancestry xzone malloc is partly derived from the [mimalloc][mimalloc] allocator. At the beginning of its development, the design and implementation of mimalloc was used as a starting point that provided solutions to many of the basic/fundamental problems an allocator needs to solve. Many of the concepts and terminology in xzone malloc are inherited from mimalloc: - mimalloc also reserves virtual memory in **segments** that have an associated **segment metadata array** - mimalloc's finest unit of virtual memory management is also **slices** - mimalloc's **pages** are xzone malloc's **chunks** - mimalloc's **bins** for size classes are the same as xzone malloc's - mimalloc's **heaps** are like **xzones**, managing a set of pages/chunks for allocations of a particular size - mimalloc **tlds** are somewhat like xzone malloc's **segment groups**, in that they maintain **span queues** of free spans across segments From that foundation, xzone malloc diverged by adding and changing aspects of the design focusing on its specific security and performance goals. At the time of this writing (09/25), most of xzone malloc's key security features are not present in mimalloc: - xzones and segment groups are differentiated from mimalloc's heaps and tlds by their support for bucketed type isolation - xzone malloc uses a segment table rather than mimalloc's segment bitmap to allow its metadata to be separated from the contents of the heap - xzone malloc's allocation fronts and guard pages features introduce further obstacles to exploit reliability that there are no direct analogues for in mimalloc - mimalloc does not support ARM MTE xzone malloc's security features are mostly inspired by those in the xnu kernel's memory allocator, [kalloc\_type][kalloc_type], which can be considered its other ancestor. With respect to performance, the main difference between mimalloc and xzone malloc is in the trade-offs they make between speed and fragmentation. Because the goal of xzone malloc was to be used in all processes on Apple OS platforms, including small, long-running operating system services and daemons, it makes a number of choices that prioritize memory efficiency over speed and scalability: - mimalloc's high-level design of using independent per-thread-everything is excellent for speed and scalability, but generally results in higher fragmentation than can be achieved by using centralized structures that are synchronized with locking or atomics. This is why xzones and segment groups in xzone malloc are global rather than per-thread. - mimalloc's philsophy of avoiding specialization of allocation strategies for different ranges of sizes keeps its design simple and fast, but forgoes some significant memory optimization opportunities, like the ability to decommit individual pages within multi-page slabs when they aren't currently occupied by any blocks, which xzone malloc's `SMALL` allocator implements. ## Name The name of the allocator is "xzone malloc". Referring to it just as "xzone" is incorrect. "xzone" is short for "xnu-style zone", in reference to the zone abstraction in xnu's zalloc allocator that they resemble. Plain "zone" already had the separate meaning of "allocator instance" in userspace libmalloc, necessitating the "x" prefix. [mie]: https://security.apple.com/blog/memory-integrity-enforcement/ [tmo]: https://developer.apple.com/documentation/xcode/adopting-type-aware-memory-allocation [tmo-rfc]: https://discourse.llvm.org/t/rfc-typed-allocator-support/79720 [pac]: https://developer.arm.com/documentation/109576/0100/Pointer-Authentication-Code/Introduction-to-PAC [mte]: https://developer.arm.com/documentation/108035/0100/Introduction-to-the-Memory-Tagging-Extension [hardened-heap]: https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.security.hardened-process.hardened-heap [enhanced-security]: https://developer.apple.com/documentation/Xcode/enabling-enhanced-security-for-your-app [checked-allocations]: https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.security.hardened-process.checked-allocations [enable-pure-data]: https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.security.hardened-process.checked-allocations.enable-pure-data [mimalloc]: https://github.com/microsoft/mimalloc [kalloc_type]: https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/ |