Safety in an unsafe world

(lwn.net)

65 points | by signa11 15 hours ago ago

31 comments

nneonneo 2 hours ago ago
The fancy lock-ordering type bounds can be found here: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/s...
Whenever you explicitly declare a lock ordering "B must be locked after A", it creates an (explicit) trait implementation "impl LockAfter<A> for B". It also creates a blanket (generic) trait implementation "impl LockAfter<X> for B" for any X where A implements LockAfter<X>; this basically fills in all the transitive edges of the graph.
Rust prohibits multiple implementations for the same trait and type. If there's a cycle in this graph involving A, then eventually the transitive walk will generate an "impl LockAfter<A> for A", after which it will generate a blanket "impl LockAfter<A> for B" which conflicts with the explicit impl and thus results in a compiler error.
smallstepforman 10 hours ago ago
Nice article. Any project rewritten from scratch (version 3 here) by the same experienced engineers will inevitably be better/more robust/more performant than the previous versions. During our career growth as craftsmen, we build using tools we understand, and get a certain output. As we learn more about various other tools (techniques), we have a wider understanding and will make it better again.
Having read about their journey, I can see they use 77 mutexes and a hierarchy chart for locking to prevent deadlocks. How quaint. I keep on harping about the Actor programming model to deaf ears, but I guess the apprentices need more stumbling around before achieving true enlightenment.
Version #4, perhaps?
Any guru want to share what path to take after Actors? I’m ready…
[-]
- Sytten 2 hours ago ago
  The problem with actor in rust is two fold and would prevent their use in this context I think:
  - They need async. Otherwise you need to implement yielding in house, have one actor per thread, etc. OS code is usually sync / callback based.
  - They need owned and usually send for all input. Since you have to send input / messages over a channel it makes it a requirement to have owned values and send if crossing thread boundaries. Very annoying requirement in rust.
- znkr 6 hours ago ago
  Not a guru, but my take is that the actor model is one method of architecting a system to separate synchronization from other concerns. There are other ways to do that, often specific to a particular problem and with more or less separation. As always, there are many tradeoffs involved.
- mrkeen 4 hours ago ago
  After actors? Transactions for sure. The SQL databases have been doing it the right way for just as long.
- kriiuuu 8 hours ago ago
  Effect systems are great for concurrent programming and easier to reason about than actors. They aren’t available in all programming languages however.
pornel 4 hours ago ago
I'm amazed how well the Send/Sync bounds work.
Finding all possible data races in arbitrarily large and complex programs (even across 3rd party dependencies and dynamic callbacks) seems like a challenging task requiring specialized static and dynamic analyzers. But it turns out it can be reduced to automatically marking structs as safe to move to a thread, and type annotations on mutexes and thread-spawning functions.
uecker 11 hours ago ago
What is described there, seems basic encapsulation to me. We do this too in C with structure types and API around it that enforces the invariants. So C is a X-safe language too? Or what am I missing?
[-]
- sitharus 11 hours ago ago
  C is not X-safe because you can’t declare conformity in the type system.
  In Rust, if I understand the article, you can create a “trait” that marks a type as conforming to an invariant, so in the article they marked thread-safe structures as Send and the thread functions as requiring types that implement Send.
  Send isn’t an API to implement or type definition, it’s a sentinel saying “I declare that this type conforms to the documented expectations” even though the expectations can’t be checked by a compiler.
  [-]
  - dzaima 10 hours ago ago
    More generally, with C you can't restrict what can (accidentally or not) interact with the internal unsafe bits (without the cost of forcing the data to always live in the heap at least; or perhaps annoying field names that are automatically searched-for by your build system, though then you're essentially making a DSL), or even force using the "safe" parts properly (not enforced at compile-time, at least) outside of, again, a rather limited subset of cases.
    As a very general example, you can repeat basically any statement in C twice and it'll still compile. If you get lucky, the compiler might tell you you've ended up with a double-free or something, but that's a very limited set of cases, and won't help if the second copy is invoked down a couple function calls.
    There may be some ways to still get additional true guarantees in C, but they'll be rather more restrictive than ones you can write in Rust, and you'll likely end up with overhead, which tempts skimping out on doing things properly in the name of performance.
    [-]
    - uecker 10 hours ago ago
      Indeed, the double consumption you can not express in C. But invariants of data structures are not a practical problem in C.
      Looking at the Rust code of this project though, I trust my C code a lot more though... ;-)
      [-]
      - dzaima 10 hours ago ago
        > invariants of data structures are not a practical problem in C.
        Is that not the cause of like all memory safety vulnerabilities, which are like 30%-or-whatever of linux ones? I've certainly written my fair share of mistakes around invariants in C code. Of course, if you're a perfect developer, indeed the choice of language won't end up mattering.
        [-]
        uecker 10 hours ago ago
        Straw man fallacy. .
  - 0xDEAFBEAD 8 hours ago ago
    >it’s a sentinel saying “I declare that this type conforms to the documented expectations” even though the expectations can’t be checked by a compiler.
    Interesting. So perhaps the next step is to sprinkle asserts in randomly at runtime to help with catching bugs.
- db48x 10 hours ago ago
  Yea, it’s just encapsulation. Rust gives you some additional tools for achieving it though. Enums are very useful for this, as are the rules for handling shared and mutable references.
  For some examples, imagine an HTTP server that answers requests from clients. You might imagine having a Response object that lets you set headers and the body, with a send method that sends the response back to the client that made the request. It would be an obvious sort of error to send the thing twice, so in C you would assert that send was only called once. In Rust, on the other hand, the send method can _consume_ the Response object. This takes it away from the caller, so the compiler will ensure that they can’t even write code with two calls to the send method. You can’t enforce this at compile time in C because in C all methods take a simple pointer to the object to act on.
  Another invariant that you might want to enforce is that only one body gets attached to the Response, that the body is attached before the Response is sent, and that the user cannot forget to attach a body. You would start with a Response object that has methods for adding headers. It would also have a method that attaches a body. This body method would _consume_ the Response and return a ResponseWithBody object. The ResponseWithBody object doesn’t have any methods for adding headers, or for adding a body, so several of our requirements are now checked by the compiler. It does have the method for sending the response though, and the Response object does not. This satisfies the rest of the requirements. If you try to send a Response, it’ll fail to compile. If you try to add headers after the body, it’ll fail to compile. You literally just make a state machine out of types, with methods that consume one type and return another, and the compiler enforces that only valid state transitions are possible. This is usually called “typestate programming” if you want to search for more examples.
  [-]
  - uecker 10 hours ago ago
    I think an object ownership system is something we should have in C. Otherwise, i am relatively unimpressed TBH. And readability of this is questionable:
    https://cs.opensource.google/fuchsia/fuchsia/+/main:src/conn...
    [-]
    - db48x 5 hours ago ago
      I see a lot of very familiar things there, but with lock ordering declared at compile time. Of course there’s plenty I would need to know more about before I could add to it, but if you were to practice Rust instead of merely commenting about it you would find that the readability improves with usage, same as any other language.
    - SkiFire13 5 hours ago ago
      IMO "readability" will always be an issue. It's a natural consequence of making more invariants and pre/post conditions explicit in your code.
    - pjmlp 9 hours ago ago
      We are still waiting for proper arrays and string, or some form of fat pointers for the last 50 years.
      I doubt an ownership system will ever arrive.
      AT&T work on Cyclone ended up being picked by Rust instead of anyone at WG14 getting some inspiration for papers.
      [-]
      - uecker 8 hours ago ago
        It is true, I feel pretty lonely in WG14 pushing these ideas sometimes. But it is not that we haven't made progress: When I joined there was only a vague understanding of of the memory model and provenance and even ideas how to make it less reliable in favor of optimization ("wobbly values") etc. We now a a good model for provenance, killed a lot of questionable ideas, strengthened semantics when there is UB (prior I/O is not affected), introduced checked arithmetic, started to eliminate UB from the language (this is ongoing but progressing well), made dependent array types a requirement with concrete plans to add a dependent structure type. On the compiler side tools are also evolving.
        [-]
        pjmlp 8 hours ago ago
        I skimmed through those ideas for C2y, hope they make through.
    - dzaima 10 hours ago ago
      With my ~2 weeks of Rust usage, that looks pretty readable. You can freely skip over reading some if not most of the boilerplate, reading just the bits actually doing the main stuff (and whatever context you desire), without fear of having skipped out on some safety-critical part.
- jandrewrogers 10 hours ago ago
  I think the main thing is that it is all done in the type system at compile-time. This is the kind of thing C++ is good at but I’m not sure that C can do it with the same guarantees.
  [-]
  - hgomersall 6 hours ago ago
    The other thing that is important is the statically enforced move and ownership semantics. They are required for types to encode state.
  - pjmlp 9 hours ago ago
    It can't, because C doesn't have the ability to create library types as if they were built-ins.
    [-]
    - uecker 8 hours ago ago
      Why would this be required?
      [-]
      - pjmlp 8 hours ago ago
        Because otherwise there is no mechanism to introduce types, that can be used as built-is, while having the connection points across the language to enforce variants.
rurban 8 hours ago ago
Using an actually safe language would have helped also. Pony is deadlock free eg.
[-]
- IshKebab 2 hours ago ago
  You can't seriously be suggesting that Google use an extremely niche "pre-1.0" language for a production system intended to be used by hundreds of millions of people?
- pornel 5 hours ago ago
  How does it prevent two actors from waiting on each other?
  [-]
  - rurban 3 hours ago ago
    There are no locks. There is no blocking wait, the IO lib is nonblocking throughout. Actors cannot wait.
    Messages are guaranteed to be processed ordered sequentially.
    https://tutorial.ponylang.io/index.html#whats-pony-anyway