374 lines
28 KiB
HTML
374 lines
28 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
|
||
<title>What is `Box<str>` and how is it different from `String` in Rust?</title>
|
||
<meta name="description" content="Today I and a friend went down a rabbit hole about Rust and how it manages the heap when we use Box, or String, or Vec, and while we were at it, I found out ...">
|
||
|
||
<link href="https://fonts.googleapis.com/css?family=Secular+One|Nunito|Mononoki" rel="stylesheet">
|
||
<link rel="stylesheet" href="/css/main.css">
|
||
<link rel="canonical" href="http://localhost:4000/rust-box-str-vs-string/">
|
||
<link rel="alternate" type="application/rss+xml" title="mahdi" href="http://localhost:4000/feed.xml" />
|
||
|
||
|
||
|
||
<!--<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>-->
|
||
|
||
<script>
|
||
var channel = new BroadcastChannel('egg');
|
||
|
||
channel.addEventListener('message', message => {
|
||
alert('Got a message from the other tab:\n' + message.data);
|
||
});
|
||
</script>
|
||
</head>
|
||
|
||
|
||
|
||
<body>
|
||
|
||
<header class="site-header">
|
||
|
||
<h1>
|
||
<a class='site-title' href='/'>
|
||
mahdi
|
||
</a>
|
||
</h1>
|
||
|
||
<nav>
|
||
<p>
|
||
<a href="/snippets">snippets</a>
|
||
<a href="/art">pictures</a>
|
||
</p>
|
||
<!--<p class='categories'>-->
|
||
<!---->
|
||
<!---->
|
||
<!--<a href="">art</a>-->
|
||
<!---->
|
||
<!---->
|
||
<!---->
|
||
<!---->
|
||
<!--</p>-->
|
||
<p>
|
||
<a href='mailto:mdibaiee@pm.me'>email</a>
|
||
<a href='https://git.mahdi.blog/mahdi'>git</a>
|
||
<a href='https://www.librarything.com/profile/mdibaiee'>librarything</a>
|
||
<a href="http://localhost:4000/feed.xml">feed</a>
|
||
</p>
|
||
</nav>
|
||
|
||
</header>
|
||
|
||
|
||
<div class="page-content">
|
||
<div class="wrapper">
|
||
<h1 class="page-heading"></h1>
|
||
|
||
<div class='post lang-en'>
|
||
|
||
<div class="post-header">
|
||
<h1 class="post-title"><p>What is <code class="language-plaintext highlighter-rouge">Box<str></code> and how is it different from <code class="language-plaintext highlighter-rouge">String</code> in Rust?</p>
|
||
</h1>
|
||
|
||
<h2 class="post-subtitle"><p>Using <code class="language-plaintext highlighter-rouge">rust-lldb</code> to understand rust memory internals</p>
|
||
</h1>
|
||
|
||
<p class="post-meta">
|
||
<span>Jun 16, 2022</span>
|
||
|
||
• <span>Reading time: 13 minutes</span>
|
||
</p>
|
||
</div>
|
||
|
||
<article class="post-content">
|
||
<p>Today I and a friend went down a rabbit hole about Rust and how it manages the heap when we use <code class="language-plaintext highlighter-rouge">Box</code>, or <code class="language-plaintext highlighter-rouge">String</code>, or <code class="language-plaintext highlighter-rouge">Vec</code>, and while we were at it, I found out there is such a thing as <code class="language-plaintext highlighter-rouge">Box<str></code>, which might look a bit <em>strange</em> to an untrained eye, since most of the time the <code class="language-plaintext highlighter-rouge">str</code> primitive type is passed around as <code class="language-plaintext highlighter-rouge">&str</code>.</p>
|
||
|
||
<hr />
|
||
|
||
<p>TL;DR:</p>
|
||
|
||
<p><code class="language-plaintext highlighter-rouge">Box<str></code> is a primitive <code class="language-plaintext highlighter-rouge">str</code> allocated on the heap, whereas <code class="language-plaintext highlighter-rouge">String</code> is actually a <code class="language-plaintext highlighter-rouge">Vec<u8></code>, also allocated on the heap, which allows for efficient removals and appends. <code class="language-plaintext highlighter-rouge">Box<str></code> (16 bytes) uses less memory than <code class="language-plaintext highlighter-rouge">String</code> (24 bytes).</p>
|
||
|
||
<hr />
|
||
|
||
<p>I will be using <code class="language-plaintext highlighter-rouge">rust-lldb</code> throughout this post to understand what is going on in the rust programs we write and run. The source code for this blog post is available on <a href="https://git.mahdi.blog/mahdi/rust-memory-playground">mdibaiee/rust-memory-playground</a>.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://git.mahdi.blog/mahdi/rust-memory-playground
|
||
<span class="nb">cd </span>rust-memory-playground
|
||
</code></pre></div></div>
|
||
|
||
<h1 id="the-stack">The Stack</h1>
|
||
|
||
<p>Most of the primitive data types used throughout a program, and the information about the program itself are usually allocated on the stack. Consider this simple program:</p>
|
||
|
||
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">add_ten</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-></span> <span class="nb">u8</span> <span class="p">{</span>
|
||
<span class="k">let</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
|
||
<span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
|
||
<span class="p">}</span>
|
||
|
||
|
||
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span> <span class="nf">add_ten</span><span class="p">(</span><span class="mi">9</span><span class="p">));</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>Let’s examine the stack when we are running <code class="language-plaintext highlighter-rouge">a + b</code> by setting a breakpoint on that line:</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cargo build && rust-lldb target/debug/stack-program
|
||
|
||
(lldb) breakpoint set -f main.rs -l 3
|
||
Breakpoint 1: where = stack-program`stack_program::add_ten::h42edbf0bdcb04851 + 24 at main.rs:3:5, address = 0x0000000100001354
|
||
|
||
(lldb) run
|
||
Process 65188 launched: '/Users/mahdi/workshop/rust-memory-playground/target/debug/stack-program' (arm64)
|
||
Process 65188 stopped
|
||
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
|
||
frame #0: 0x0000000100001354 stack-program`stack_program::add_ten::h42edbf0bdcb04851(a=5) at main.rs:3:5
|
||
1 fn add_ten(a: u8) -> u8 {
|
||
2 let b = 10;
|
||
-> 3 a + b
|
||
4 }
|
||
5
|
||
6
|
||
7 fn main() {
|
||
|
||
(lldb) frame var -L -f X
|
||
0x000000016fdfed7e: (unsigned char) a = 0x09
|
||
0x000000016fdfed7f: (unsigned char) b = 0x0A
|
||
</code></pre></div></div>
|
||
|
||
<p>Our program allocates two variables on the stack directly here. Notice that they are allocated right next to each other, their address only one bit apart. Most primitive types are allocated on the stack, and are copied when being passed around because they are small enough, so that copying them around is more reasonable than allocating them in the heap and passing around a pointer to them. In this case, <code class="language-plaintext highlighter-rouge">u8</code> can be allocated in a single byte, it would not make sense for us to allocate a pointer (which can vary in size, but are usually larger than 8 bytes). Every time you call a function, a copy of the values passed to it, along with the values defined in the function itself constitute the stack of that function.</p>
|
||
|
||
<p>The stack of a whole program includes more information though, such as the <em>backtrace</em>, which allows the program to know how to navigate: once I am done with this function, where should I return to? that information is available in the stack as well. Note the first couple of lines here, indicating that we are currently in <code class="language-plaintext highlighter-rouge">stack_program::add_then</code>, and we came here from <code class="language-plaintext highlighter-rouge">stack_program::main</code>, and so once we are finished with <code class="language-plaintext highlighter-rouge">add_then</code>, we will go back to <code class="language-plaintext highlighter-rouge">main</code>:</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(lldb) thread backtrace
|
||
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
|
||
* frame #0: 0x0000000100001350 stack-program`stack_program::add_ten::hf7dc9cccae290c37(a='\t') at main.rs:3:5
|
||
frame #1: 0x00000001000013a8 stack-program`stack_program::main::he22b9cf577b52c34 at main.rs:8:20
|
||
frame #2: 0x00000001000015a4 stack-program`core::ops::function::FnOnce::call_once::hd6bac0cd3fcb8c07((null)=(stack-program`stack_program::main::he22b9cf577b52c34 at main.rs:7), (null)=<unavailable>) at function.rs:227:5
|
||
frame #3: 0x00000001000014c4 stack-program`std::sys_common::backtrace::__rust_begin_short_backtrace::hc4df46810f9a7139(f=(stack-program`stack_program::main::he22b9cf577b52c34 at main.rs:7)) at backtrace.rs:122:18
|
||
frame #4: 0x0000000100001178 stack-program`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hbec5b809d627978a at rt.rs:145:18
|
||
frame #5: 0x000000010001440c stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h485d4c2966ec30a8 at function.rs:259:13 [opt]
|
||
frame #6: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panicking::try::do_call::h375a887be0bea938 at panicking.rs:492:40 [opt]
|
||
frame #7: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panicking::try::hecad40482ef3be15 at panicking.rs:456:19 [opt]
|
||
frame #8: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panic::catch_unwind::haf1f664eb41a88eb at panic.rs:137:14 [opt]
|
||
frame #9: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h976eba434e9ff4cf at rt.rs:128:48 [opt]
|
||
frame #10: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panicking::try::do_call::h8f2501ab92e340b0 at panicking.rs:492:40 [opt]
|
||
frame #11: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panicking::try::hbeb9f8df83454d42 at panicking.rs:456:19 [opt]
|
||
frame #12: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e [inlined] std::panic::catch_unwind::h0a9390b2202af6e9 at panic.rs:137:14 [opt]
|
||
frame #13: 0x0000000100014400 stack-program`std::rt::lang_start_internal::hc453db0ee48af82e at rt.rs:128:20 [opt]
|
||
frame #14: 0x0000000100001140 stack-program`std::rt::lang_start::h69bdd2191bba2dab(main=(stack-program`stack_program::main::he22b9cf577b52c34 at main.rs:7), argc=1, argv=0x000000016fdff168) at rt.rs:144:17
|
||
frame #15: 0x0000000100001434 stack-program`main + 32
|
||
frame #16: 0x00000001000750f4 dyld`start + 520
|
||
</code></pre></div></div>
|
||
|
||
<h1 id="box-string-and-vec-pointers-to-heap">Box, String and Vec: Pointers to Heap</h1>
|
||
|
||
<p>There are times when we are working with data types large enough that we would really like to avoid copying them when we are passing them around. Let’s say you have just copied a file that is 1,000,000 bytes (1Mb) in size. In this case it is much more memory and compute efficient to have a pointer to this value (8 bytes) rather than copying all the 1,000,000 bytes.</p>
|
||
|
||
<p>This is where types such as <code class="language-plaintext highlighter-rouge">Box</code>, <code class="language-plaintext highlighter-rouge">String</code> and <code class="language-plaintext highlighter-rouge">Vec</code> come into play: these types allow you to allocate something on heap, which is a chunk of memory separate from the stack that you can allocate on, and later reference those values using a pointer available on the stack.</p>
|
||
|
||
<p>Let’s start with <code class="language-plaintext highlighter-rouge">Box</code>, the most generic one, which allows you to allocate some data on the heap, consider this example:</p>
|
||
|
||
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">5_u8</span><span class="p">);</span>
|
||
<span class="k">let</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">10_u8</span><span class="p">;</span>
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}, {}"</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>We again use <code class="language-plaintext highlighter-rouge">lldb</code> to check out what is happening:</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cargo build && rust-lldb target/debug/stack-and-heap-program
|
||
|
||
(lldb) breakpoint set -f main.rs -l 4
|
||
Breakpoint 1: where = stack-and-heap-program`stack_and_heap_program::main::ha895783273646dc7 + 100 at main.rs:4:5, address = 0x0000000100005264
|
||
|
||
(lldb) run
|
||
Process 67451 launched: '/Users/mahdi/workshop/rust-memory-playground/target/debug/stack-and-heap-program' (arm64)
|
||
Process 67451 stopped
|
||
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
|
||
frame #0: 0x0000000100005264 stack-and-heap-program`stack_and_heap_program::main::ha895783273646dc7 at main.rs:4:5
|
||
1 fn main() {
|
||
2 let a = Box::new(5_u8);
|
||
3 let b = 10_u8;
|
||
-> 4 println!("{}, {}", a, b);
|
||
5 }
|
||
|
||
(lldb) frame var -L -f X
|
||
0x000000016fdfed48: (unsigned char *) a = 0x0000600000008010 "\U00000005"
|
||
0x000000016fdfed57: (unsigned char) b = 0x0A
|
||
|
||
(lldb) memory read -count 1 -f X 0x0000600000008010
|
||
0x600000008010: 0x05
|
||
</code></pre></div></div>
|
||
|
||
<p>Note that here, instead of <code class="language-plaintext highlighter-rouge">a</code> having the value <code class="language-plaintext highlighter-rouge">5</code>, has the value <code class="language-plaintext highlighter-rouge">0x0000600000008010</code>, which is a pointer to a location in memory! <code class="language-plaintext highlighter-rouge">lldb</code> is recognises that this is a pointer (note the <code class="language-plaintext highlighter-rouge">*</code> sign beside the variable type) and shows us what the memory location contains, but we can also directly read that memory location, and of course we find <code class="language-plaintext highlighter-rouge">5</code> there. The address of the heap-allocated <code class="language-plaintext highlighter-rouge">5</code> is far from the stack-allocated <code class="language-plaintext highlighter-rouge">10</code>, since stack and heap are separate parts of memory.</p>
|
||
|
||
<p>Using <code class="language-plaintext highlighter-rouge">Box</code> for an unsigned 8-bit value does not really make sense, the value itself is smaller than the pointer created by <code class="language-plaintext highlighter-rouge">Box</code>, however allocating on heap is useful when we have data that we need be able to pass around the program without copying it.</p>
|
||
|
||
<p>Turns out, <code class="language-plaintext highlighter-rouge">String</code> and <code class="language-plaintext highlighter-rouge">Vec</code> cover two of the most common cases where we may want to allocate something on heap! Let’s look at what goes on behind allocating a variable of type <code class="language-plaintext highlighter-rouge">String</code>:</p>
|
||
|
||
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="k">let</span> <span class="n">s</span> <span class="o">=</span> <span class="nn">String</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"hello"</span><span class="p">);</span>
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>And here we go again:</p>
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(lldb) breakpoint set -f main.rs -l 3
|
||
Breakpoint 1: where = string-program`string_program::main::h64ca96ee87b0ceaf + 44 at main.rs:3:5, address = 0x000000010000476c
|
||
|
||
(lldb) run
|
||
Process 68317 launched: '/Users/mahdi/workshop/rust-memory-playground/target/debug/string-program' (arm64)
|
||
Process 68317 stopped
|
||
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
|
||
frame #0: 0x000000010000476c string-program`string_program::main::h64ca96ee87b0ceaf at main.rs:3:5
|
||
1 fn main() {
|
||
2 let s = String::from("hello");
|
||
-> 3 println!("{}", s);
|
||
4 }
|
||
|
||
(lldb) frame var -L -T
|
||
0x000000016fdfed78: (alloc::string::String) s = "hello" {
|
||
0x000000016fdfed78: (alloc::vec::Vec<unsigned char, alloc::alloc::Global>) vec = size=5 {
|
||
0x0000600000004010: (unsigned char) [0] = 'h'
|
||
0x0000600000004011: (unsigned char) [1] = 'e'
|
||
0x0000600000004012: (unsigned char) [2] = 'l'
|
||
0x0000600000004013: (unsigned char) [3] = 'l'
|
||
0x0000600000004014: (unsigned char) [4] = 'o'
|
||
}
|
||
}
|
||
</code></pre></div></div>
|
||
|
||
<p>This is a formatted output from <code class="language-plaintext highlighter-rouge">lldb</code>, and here you can see that the <code class="language-plaintext highlighter-rouge">String</code> type is basically a <code class="language-plaintext highlighter-rouge">Vec<unsigned char, alloc::Global></code> (note that <code class="language-plaintext highlighter-rouge">unsigned char</code> is represented using <code class="language-plaintext highlighter-rouge">u8</code> in Rust, so in Rust terminology the type is <code class="language-plaintext highlighter-rouge">Vec<u8></code>), let’s now look at the same command but this time raw and unformatted (<code class="language-plaintext highlighter-rouge">-R</code>):</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(lldb) frame var -L -T -R
|
||
0x000000016fdfed78: (alloc::string::String) s = {
|
||
0x000000016fdfed78: (alloc::vec::Vec<unsigned char, alloc::alloc::Global>) vec = {
|
||
0x000000016fdfed78: (alloc::raw_vec::RawVec<unsigned char, alloc::alloc::Global>) buf = {
|
||
0x000000016fdfed78: (core::ptr::unique::Unique<unsigned char>) ptr = {
|
||
0x000000016fdfed78: (unsigned char *) pointer = 0x0000600000004010
|
||
0x000000016fdfed78: (core::marker::PhantomData<unsigned char>) _marker = {}
|
||
}
|
||
0x000000016fdfed80: (unsigned long) cap = 6
|
||
0x000000016fdfed78: (alloc::alloc::Global) alloc = {}
|
||
}
|
||
0x000000016fdfed88: (unsigned long) len = 6
|
||
}
|
||
}
|
||
</code></pre></div></div>
|
||
|
||
<p>Ah! I see the <code class="language-plaintext highlighter-rouge">ptr</code> field of <code class="language-plaintext highlighter-rouge">RawVec</code> with a value of <code class="language-plaintext highlighter-rouge">0x0000600000004010</code>, that is the memory address of the beginning of our string (namely the <code class="language-plaintext highlighter-rouge">h</code> of our <code class="language-plaintext highlighter-rouge">hello</code>)! There is also <code class="language-plaintext highlighter-rouge">cap</code> and <code class="language-plaintext highlighter-rouge">len</code>, which respectively stand for capacity and length, with the value 6, indicating that our string is of capacity and length 6; the difference between the two being that <a href="https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#capacity-and-reallocation">you can have a <code class="language-plaintext highlighter-rouge">Vec</code> with a capacity of 10 while it has zero items</a>, this would allow you to append 10 items to the <code class="language-plaintext highlighter-rouge">Vec</code> without having a new allocation for each append, making the process more efficient, and also a <a href="https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#guarantees">Vec is not automatically shrunk down</a> in size when items are removed from it to avoid unnecessary deallocations, hence the length might be smaller than the capacity. So in a nutshell, our String is basically something like this (inspired by <a href="https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#guarantees">std::vec::Vec</a>):</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stack:
|
||
--------------------------------
|
||
| String |
|
||
| \-> Vec |
|
||
| \-> (ptr, cap, len) |
|
||
| | |
|
||
-----------------|--------------
|
||
Heap: v
|
||
-----------------------------
|
||
| ('h', 'e', 'l', 'l', 'o') |
|
||
-----------------------------
|
||
</code></pre></div></div>
|
||
|
||
<p>Okay, so far so good. We have <code class="language-plaintext highlighter-rouge">String</code>, which uses a <code class="language-plaintext highlighter-rouge">Vec</code> under the hood, which is represented by a pointer, capacity and length triplet.</p>
|
||
|
||
<p>If <code class="language-plaintext highlighter-rouge">String</code> is already heap-allocated, why would anyone want <code class="language-plaintext highlighter-rouge">Box<str></code>!? Let’s look at how <code class="language-plaintext highlighter-rouge">Box<str></code> would be represented in memory:</p>
|
||
|
||
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="k">let</span> <span class="n">boxed_str</span><span class="p">:</span> <span class="nb">Box</span><span class="o"><</span><span class="nb">str</span><span class="o">></span> <span class="o">=</span> <span class="s">"hello"</span><span class="nf">.into</span><span class="p">();</span>
|
||
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"boxed_str: {}"</span><span class="p">,</span> <span class="n">boxed_str</span><span class="p">);</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>And <code class="language-plaintext highlighter-rouge">lldb</code> tells us:</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x000000016fdfed80: (alloc::boxed::Box<>) boxed_str = {
|
||
0x000000016fdfed80: data_ptr = 0x0000600000004010 "hello"
|
||
0x000000016fdfed88: length = 5
|
||
}
|
||
</code></pre></div></div>
|
||
|
||
<p>Okay, so a <code class="language-plaintext highlighter-rouge">Box<str></code> is much simpler than a <code class="language-plaintext highlighter-rouge">String</code>: there is no <code class="language-plaintext highlighter-rouge">Vec</code>, and no <code class="language-plaintext highlighter-rouge">capacity</code>, and the underlying data is a primitive <code class="language-plaintext highlighter-rouge">str</code> that does not allow efficient appending or removing. It is a smaller representation as well, due to the missing <code class="language-plaintext highlighter-rouge">capacity</code> field, comparing their memory size on stack using <a href="https://doc.rust-lang.org/std/mem/fn.size_of_val.html">std::mem::size_of_val</a>:</p>
|
||
|
||
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">boxed_str</span><span class="p">:</span> <span class="nb">Box</span><span class="o"><</span><span class="nb">str</span><span class="o">></span> <span class="o">=</span> <span class="s">"hello"</span><span class="nf">.into</span><span class="p">();</span>
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"size of boxed_str on stack: {}"</span><span class="p">,</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">size_of_val</span><span class="p">(</span><span class="o">&</span><span class="n">boxed_str</span><span class="p">));</span>
|
||
|
||
<span class="k">let</span> <span class="n">s</span> <span class="o">=</span> <span class="nn">String</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"hello"</span><span class="p">);</span>
|
||
<span class="nd">println!</span><span class="p">(</span><span class="s">"size of string on stack: {}"</span><span class="p">,</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">size_of_val</span><span class="p">(</span><span class="o">&</span><span class="n">s</span><span class="p">));</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>Results in:</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>size of boxed_str on stack: 16
|
||
size of string on stack: 24
|
||
</code></pre></div></div>
|
||
|
||
<p>Note that their size on heap is the same, because they are both storing the bytes for <code class="language-plaintext highlighter-rouge">hello</code> on the heap (the measurements below show all of the heap allocations of the program, and not only the string. What matters here is that these two programs have exact same heap size in total):</p>
|
||
|
||
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cargo run --bin string-dhat
|
||
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
|
||
Running `target/debug/string-dhat`
|
||
hello
|
||
dhat: Total: 1,029 bytes in 2 blocks
|
||
dhat: At t-gmax: 1,029 bytes in 2 blocks
|
||
dhat: At t-end: 1,024 bytes in 1 blocks
|
||
dhat: The data has been saved to dhat-heap.json, and is viewable with dhat/dh_view.html
|
||
|
||
$ cargo run --bin box-str-dhat
|
||
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
|
||
Running `target/debug/box-str-dhat`
|
||
boxed_str: hello
|
||
dhat: Total: 1,029 bytes in 2 blocks
|
||
dhat: At t-gmax: 1,029 bytes in 2 blocks
|
||
dhat: At t-end: 1,024 bytes in 1 blocks
|
||
dhat: The data has been saved to dhat-heap.json, and is viewable with dhat/dh_view.html
|
||
</code></pre></div></div>
|
||
|
||
<p>There is also <code class="language-plaintext highlighter-rouge">Box<[T]></code> which is the fixed size counterpart to <code class="language-plaintext highlighter-rouge">Vec<T></code>.</p>
|
||
|
||
<h1 id="should-i-use-boxstr-or-string">Should I use <code class="language-plaintext highlighter-rouge">Box<str></code> or <code class="language-plaintext highlighter-rouge">String</code>?</h1>
|
||
|
||
<p>The only use case for <code class="language-plaintext highlighter-rouge">Box<str></code> over <code class="language-plaintext highlighter-rouge">String</code> that I can think of, is optimising for memory usage when the string is fixed and you do not intend to append or remove from it. I looked for examples of <code class="language-plaintext highlighter-rouge">Box<str></code> being used, and I found a few examples:</p>
|
||
|
||
<ul>
|
||
<li>Hyper uses it in a part to reduce memory usage, since the string they have is read-only: <a href="https://github.com/hyperium/hyper/pull/2727">hyper#2727</a></li>
|
||
<li>Rust-analyzer uses it to store some strings in their snippets data structre: <a href="https://github.com/rust-lang/rust-analyzer/blob/5c88d9344c5b32988bfbfc090f50aba5de1db062/crates/ide-completion/src/snippet.rs#L123">rust-lang/rust-analyzer/crates/ide-completion/src/snippet.rs</a></li>
|
||
<li>It is also used in some parts in the compiler itself, probably with the same aim of optimising memory usage: <a href="https://github.com/rust-lang/rust/blob/7846610470392abc3ab1470853bbe7b408fe4254/src/libsyntax/symbol.rs#L82-L85">rust-lang/rust/src/libsyntax/symbol.rs</a></li>
|
||
</ul>
|
||
|
||
</article>
|
||
|
||
<div class="share-page">
|
||
Share in
|
||
|
||
<a href="https://twitter.com/intent/tweet?text=What is `Box<str>` and how is it different from `String` in Rust?&url=http://localhost:4000/rust-box-str-vs-string/&via=&related=" rel="nofollow" target="_blank" title="Share on Twitter">Twitter</a>
|
||
<a href="https://facebook.com/sharer.php?u=http://localhost:4000/rust-box-str-vs-string/" rel="nofollow" target="_blank" title="Share on Facebook">Facebook</a>
|
||
<a href="https://plus.google.com/share?url=http://localhost:4000/rust-box-str-vs-string/" rel="nofollow" target="_blank" title="Share on Google+">Google+</a>
|
||
</div>
|
||
|
||
|
||
<div id="commento"></div>
|
||
<script defer
|
||
src="//commento.mahdi.blog/js/commento.js">
|
||
</script>
|
||
|
||
<script src="/js/heading-links.js"></script>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
|
||
</body>
|
||
|
||
</html>
|