Loading...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | # String handling in xnu xnu implements most POSIX C string functions, including the inherited subset of standard C string functions. Unfortunately, poor design choices have made many of these functions, including the more modern `strl` functions, confusing or unsafe. In addition, the advent of -fbounds-safety support in xnu is forcing some string handling practices to be revisited. This document explains the failings of POSIX C string functions, xnu's `strbuf` functions, and their intersection with the -fbounds-safety C extension. ## The short-form guidance * Use `strbuf*` when you have the length for all the strings; * use `strl*` when you have the length of _one_ string, and the other is guaranteed to be NUL-terminated; * use `str*` when you don't have the length for any of the strings, and they are all guaranteed to be NUL-terminated; * stop using `strn*` functions. ## Replacing `strncmp` `strncmp` is always wrong with -fbounds-safety, and it's unavailable as a result. Given `strcmp(first, secnd, n)`, you need to know the types of `first` and `secnd` to pick a replacement. Choose according to this table: | strncmp(first, secnd, n) | __null_terminated first | __indexable first | | ------------------------ | ------------------------- | ------------------------------- | | __null_terminated secnd | n/a | strlcmp(first, secnd, n1) | | __indexable secnd | strlcmp(secnd, first, n2) | strbufcmp(first, n1, secnd, n2) | Using `strncmp` with two NUL-terminated strings is uncommon and it has no direct replacement. The first person who needs to use -fbounds-safety in a file that does this might need to write the string function. If you try to use `strlcmp` and you get a diagnostic like this: > passing 'const char *__indexable' to parameter of incompatible type > 'const char *__null_terminated' is an unsafe operation ... then you might need to swap the two string arguments. `strlcmp` is sensitive to the argument order: just like for `strlcpy`, the indexable string goes first. # The problems with string functions POSIX/BSD string handling functions come in many variants: * `str` functions (strlen, strcat, etc), unsafe for writing; * `strn` functions (strnlen, strncat, etc), unsafe for writing; * `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood. `str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe because they don't care about the bounds of the output buffer. Most or all of these functions have been deprecated or outright removed from xnu. You should never use `str` functions to write to strings. Functions that simply read strings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe because there is no confusion that their input must be NUL-terminated and there is no danger of writing out of bounds (out of not writing at all). `strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe. `strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept a length for the output buffer. **All** new string buffers should include space for a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are _generally_ safe, but `strncmp` can cause confusion over which string is bound by the given size. In extreme cases, this can create information disclosure bugs or stability issues. `strl` functions, from OpenBSD, only come in writing variants, and they always NUL-terminate their output. This makes the writing part safe. (xnu adds `strl` comparison functions, which do no writing and are also safe.) However, these functions assume the output pointer is a buffer and the input is a NUL- terminated string. Because of coexistence with `strn` functions that make no such assumption, this mental model isn't entirely adopted by many users. For instance, the following code is buggy: ```c char output[4]; char input[8] = "abcdefgh"; /* not NUL-terminated */ strlcpy(output, input, sizeof(output)); ``` `strlcpy` returns the length of the input string; in xnu's implementation, literally by calling `strlen(input)`. Even though only 3 characters are written to `output` (plus a NUL), `input` is read until reaching a NUL character. This is always a problem from the perspective of memory disclosures, and in some cases, it can also lead to stability issues. # Changes with -fbounds-safety When enabling -fbounds-safety, character buffers and NUL-terminated strings are two distinct types, and they do not implicitly convert to each other. This prevents confusing the two in the way that is problematic with `strlcpy`, for instance. However, it creates new problems: * What is the correct way to transform a character buffer into a NUL-terminated string? * When -fbounds-safety flags that the use of a string function was improper, what is the solution? The most common use of character buffers is to build a string, and then this string is passed without bounds as a NUL-terminated string to downstream users. -fbounds-safety and XNU enshrine this practice with the following additions: * `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string; * `strbuf` functions, explicitly accepting character buffers and a distinct count for each: * `strbuflen(buffer, length)`: like `strnlen`; * `strbufcmp(a, alen, b, len)`: like `strcmp`; * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`; * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL- terminated string; * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL- terminated string; * `strl` (new) functions, accepting _one_ character buffer of a known size and _one_ NUL-terminated string: * `strlcmp(a, b, alen)`: like `strcmp`; * `strlcasecmp(a, b, alen)`: like `strcasecmp`. `strbuf` functions additionally all have overloads accepting character arrays in lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`, `strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`. If the destination array of `strbufcpy` or `strbufcat` has a size of 0, they return NULL without doing anything else. Otherwise, the destination is always NUL-terminated and returned as a NUL-terminated string pointer. While you are modifying a string, you should reference its data as some flavor of indexable pointer, and only once you're done should you convert it to a NUL-terminated string. NUL-terminated character pointers are generally not suitable for modifications as bounds are determined by contents. Overwriting any NUL character found through a `__null_terminated` pointer access will result in a trap. For instance: ```c void my_string_consuming_func(const char *); // lots of __unsafe! char *__null_terminated my_string = __unsafe_forge_null_terminated( kalloc_data(my_string_size, Z_WAITOK)); memcpy( __unsafe_forge_bidi_indexable(void *, my_string, my_string_size), my_data, my_string_size); my_string_consuming_func(my_string); ``` This code converts the string pointer to a NUL-terminated string too early, while it's still being modified. Keeping my_string a `__null_terminated` pointer while it's being modified leads to more forging, which has more chances of introducing errors, and is less ergonomic. Consider this instead: ```c void my_string_consuming_func(const char *); char *my_buffer = kalloc_data(my_string_size, Z_WAITOK); const char *__null_terminated finished_string = strbufcpy(my_buffer, my_string_size, my_data, my_string_size); my_string_consuming_func(finished); ``` This example has two views of the same data: `my_buffer` (through which the string is being modified) and `finished_string` (which is `const` and NUL-terminated). Using `my_buffer` as an indexable pointer allows you to modify it ergonomically, and importantly, without forging. You turn it into a NUL-terminated string at the same time you turn it into a `const` reference, signalling that you're done making changes. With -fbounds-safety enabled, you should structure the final operation modifying a character array such that you get a NUL-terminated view of it. For instance, this plain C code: ```c char thread_name[MAXTHREADNAMESIZE]; (void) snprintf(thread_name, sizeof(thread_name), "dlil_input_%s", ifp->if_xname); thread_set_thread_name(inp->dlth_thread, thread_name); ``` becomes: ```c char thread_name_buf[MAXTHREADNAMESIZE]; const char *__null_terminated thread_name; thread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf), "dlil_input_%s", ifp->if_xname); thread_set_thread_name(inp->dlth_thread, thread_name); ``` Although `tsnprintf` and `strbuf` functions return a `__null_terminated` pointer to you for convenience, not all use cases are resolved by calling `tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety enabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)` to convert a character array to a `__null_terminated` string if you need to perform more manipulations. (`p_start` is a pointer to the first character, and `p_nul` is a pointer to the NUL character in that string.) For instance, if you build a string with successive calls to `scnprintf`, you would use `__unsafe_null_terminated_from_indexable` at the end of the sequence to get your NUL-terminated string pointer. Occasionally, you need to turn a NUL-terminated string back into "char buffer" (usually to interoperate with copy APIs that need a pointer and a byte count). When possible, it's advised to use APIs that copy NUL-terminated pointers (like `strlcpy`). Otherwise, convert the NUL-terminated string to an indexable buffer using `__null_terminated_to_indexable` (if you don't need the NUL terminator to be in bounds of the result pointer) or `__unsafe_null_terminated_to_indexable` (if you need it). Also keep in mind that in code which pervasively deals with buffers that have lengths and some of them happen to also be NUL-terminated strings, it could be simply more convenient to keep string buffers in some flavor of indexable pointers instead of having conversions from and to NUL-terminated strings. # I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use? You might come across cases where the same function in different families would seem like they all do the trick. For instance: ```c struct foo { char buf1[10]; char buf2[16]; }; void bar(struct foo *f) { /* how do I test whether buf1 and buf2 contain the same string? */ if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ } if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ } } ``` Without -fbounds-safety, these all work the same, but when you enable it, `strbufcmp` could be the only one that builds. If you do not have the privilege of -fbounds-safety to guide you to the best choice, as a rule of thumb, you should prefer APIs in the following order: 1. `strbuf*` APIs; 2. `strl*` APIs; 3. `str*` APIs. That is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and `strbufcmp`, and you should prefer `strbufcmp`. `strn` functions are **never** recommended. You should use `strbuflen` over `strnlen` (they do the same thing, but having a separate `strbuflen` function makes the guidance to avoid `strn` functions easier), and you should use `strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether you know the length of each string, of just one, or of neither). |