The Monthly Oxide #2
April 2021 - Unsafe is not what you think it means
Welcome to the second letter of The Monthly Oxide that bundles some Rust knowledge, articles, and projects for you to peruse and use. This month we’re gonna talk about unsafe and what it means. Too often the terminology and attitudes towards it both from within and outside the community spread either FUD or choose to ignore what it means. We’re gonna set the record straight this month! Let’s get too it.
Unsafe is not what you think it means
unsafe is probably one of the more poorly named concepts in Rust in my opinion. While it’s main purpose is to highlight areas where things can go wrong if the code is written improperly it the word muddles the concept in general. Let’s start by defining what unsafe code lets you do in Rust:
Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of
Unsafe Rust is a superset of Safe Rust. It only lets you do these five extra things. That’s it. The borrow checker still works, the types system still works, and nothing gets turned off. It’s just Rust with some extra super powers. Why would they be unsafe though? It’s because these all could subvert Rust’s safety guarantees of:
No double free or use after free
No data races
With Safe Rust we can assume all these things, but throw unsafe into the mix and some of these can be subverted:
Dereferencing a raw pointer might point to uninitialized or freed memory and when you read from it this would cause a problem, whereas in Rust the pointers & and &mut have to point to actual data
Calling an unsafe function without looking at the invariants you must uphold, rules you need to follow to avoid Undefined Behavior or just unexpected behavior, could cause an issue. A good example of this is using std::mem::transmute to cast a type to another type by just telling the compiler these bytes are this type now
Modifying a mutable static variable would subvert thread safety since it’s being changed at runtime and is a globally available variable that could cause issues where it gets it’s value changed in one thread and another which was using it has issues because it was changed. Since it is not synced by using an atomic synchronization structure like a Mutex this can cause UB.
Implementing an unsafe trait means you are telling the compiler you know what you’re doing. Send and Sync are traits marked unsafe and if you implement them for a type manually you are telling the compiler they’re thread safe in some way. If you implemented Sync for a MutexGuard this would be bad because now you could have data races in safe code and it did happen actually, though because of auto traits not because someone manually implemented, but this would be a very unsafe thing to intentionally do..
Accessing the fields of a union is unsafe because unlike an enum you don’t really know what type it currently is and can’t match on it like you would an enum. Accessing the field of type A when it’s type B field is what’s actually initialized can cause issues as Rust will assume the type from the field you want is correct, subverting type safety. I’ve written a small example on the playground to show you what can happen here. As you can see the number printed out is indeed not 5 at all.
Now you might be thinking, if these can cause so many issues why even have them at all? Seems pretty bad that you can make code not work and if this is used in a crate I depend on then my code might go wrong! Yes it could, but hold your horses. You rarely will need unsafe. I’ve barely used it in 6 years of using Rust. It’s most common in the case of embedded systems, building an operating system, or FFI. It has other uses but we do need unsafe to have a useful language.
Want to do I/O? You’re gonna need a raw pointer to read or write data somewhere and most programs are useless without I/O. No one actually wants a fully pure program. You want it to act differently based on input given to it usually. Your program will rely on unsafe in some way, even using the standard library. Unsafe is not an inherently bad thing. It lets you do very useful things and interact with a system. Pretty much every Rust program will link to say libc or some system libs in order to be able to do things on the OS. Somewhere in there lurks some unsafe code in order to do things. Want to do things like TLS or cryptographic functions? You’re going to need to call assembly written so that functions are constant time and can’t be hit with timing attacks.
My point is that unsafe is not bad, but that it’s something that must be handled with care. Factories are built to increase how many things we can make and automatically. This is useful! However, with large machinery and things one has to be careful with operating it. In C this fell on the one operating the machinery by saying “Just be careful” rather than building in safety to the systems, which as we know leads to tons of CVEs due to memory safety issues. Rust builds the machinery in such a way where it is safe to operate and only falling back on “Just be careful” in very limited areas where it’s not much of a choice.
Now this often gets people saying “Well what’s the point then? Why not just use C?”. The point is that by limiting the scope of unsafe to certain code blocks or functions we can easily audit sources of UB, whereas in C anywhere could be a source of UB, and more importantly build safe abstractions on top of them. We generally see this pattern in Rust where we have x-sys crates where x is the C lib we want to link to with all the bindings and glue code needed to interact with the lib, and then we see crates that use the sys crate to then build safer and Rust idiomatic abstractions on top. They handle all the invariants you need to in order to use the code safely. Much like how arc welding is very dangerous, with the correct PPE it can be quite safe.
What’s the point then of me writing quite a lot already about this? I think unsafe gets a bad rap. You’ve got community members being quite overzealous about it, trying to remove all instances of it from their deps and code, you’ve got people outside the Rust community who just haven’t done the work to understand it and spread misunderstanding of it, and you’ve got people who just reach too much for it all the time (you tend to see this in C/C++ people coming to Rust, but that’s not always the case).
unsafe should make you worry in the sense that you need to be careful, but it shouldn’t be something you avoid entirely either. It requires a balance. It’s a sharp tool that when built and used correctly allows you to do things Rust can’t prove are okay to do. I encourage you to try it out a bit if you haven’t already if you’ve not touched it. There’s a lot you can learn from it.
Articles of the Month
Arenas in Rust - manishearth wrote up how you might implement an Arena in Rust, but the real interesting bits here are the lifetimes and how they work. Well worth the read if you want a better understanding of them.
The social consequences of type systems - Rain wrote about types systems and what they convey a while back that was brought up again this month. Another one that’s worth reading.
Algebra and Data Types - My coworker Justin wrote a really easy to understand article on algebra and data types using Rust to explain the various concepts. If you’re like me and need some more concrete examples to understand theory better I really think you’ll like this article. I learned something new and didn’t feel wrapped up in jargon or CS terms as they were explained as I went!