Are there any good resources on how this kind of real-time programming is done?
What goes into ensuring that a program is actually realtime? Are there formal proofs, or just experience and "vibes"? Is realtime coding any different from normal coding? How do modern CPU architectures, which have a lot of non-constant time instructions, branch prediction, potential for cache misses and such play into this?
> What goes into ensuring that a program is actually realtime?
Realtime mostly means predictable runtime for code. As long as its predictable, you can scale the CPU/microcontroller to fit your demands or optimize your code to fit the constraints. It’s about making sure your code can always respond in time to hardware inputs, timers, and other interrupts.
Generally the Linux kernel’s scheduling makes the system very unpredictable. RT linux tries to address that along with several other subsystems. On embedded CPUs this usually means disabling advanced features like cache and speculative execution (although I don’t remember if RT handles that part since its very vendor specific).
I'm not hugely experienced in the field personally, but from what I've seen, actually proving hard real time capabilities is rather involved.
If something is safety critical (think break systems, avionic computers, etc.) it likely means you also need some special certification or even formal verification. And (correct me if I'm wrong) I don't think you'll want to use a Linux kernel, even with the preempt rt patches.
I'd say specialized rt operating systems, like FreeRTOS or Zephyr, would be more fitting (though I don't have direct experience with them).
As for the hardware, you can't really use a ‘regular’ CPU and expect completely deterministic behavior. The things you mentioned (and for example caching) absolutely impact this.
iirc amd/xilinx actually offer a processor that has both regular arm cores, alongside some arm real time cores for these exact reasons.
For things like VxWorks, it's mostly vibes and setting priority between processes. But there are other ways. You can "offline schedule" your tasks, i.e. you run a scheduler at compile time which decides all possible supported orderings and how long slots each task can run.
Then, there's the whole thing of hardware. Do you have one or more cores? If you have more than one core, can they introduce jitter or slowdown to each other accessing memory? And so on and so forth.
> it's mostly vibes and setting priority between processes
I'm laughing so so hard right now. Thanks for, among other things, confirming for me that there isn't some magic tool that I'm missing :). At least I have the benefit of working on softer real-time systems where missing a deadline might result in lower quality data but there's no lives at risk.
Setting and clearing GPIOs on task entry/exit are a nice touch for verification too.
On all the real time systems I've worked on, it has just been empirical measurements of cpu load for the different task periods and a good enough margin to overruns.
On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.
I wouldn't say there is any conceptual difference from normal coding, except for that you'd want to be kinda sure algorithms terminate in a reasonable time in a time constrained task. More online algorithms than normally, though.
Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.
> On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.
Yeah, the tradeoff there is interesting. Sometimes "get it as deterministic as possible" is the right answer, even if it's slower.
> Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.
Lol, with my colleagues/juniors I'll often encourage them to take code that doesn't look like that and figure out if there's a sane way to turn it into "state-machine going in a circle". For problems that fit that mold, being able to say "event X in state Y will have effect Z" is really powerful for being able to reason about the system. Plus, sometimes, you can actually use that state machine to more formally reason about it or even informally just draw out the states, events, and transitions and identify if there's anywhere you might get stuck.
Why use Linux for that though? Why not build the machine like a 3D printer, with a dedicated microcontroller that doesn't even run an OS and has completely predicable timing, and a separate non-RT Linux system for the GUI?
I feel like Klippers approach is fairly reasonable, let an non-RT system (that generally has better performance than your micro controller) calculate the movement but leave the actual commanding of the stepper motors to the micro controller.
Yeah, I looked at Klipper a few months ago and really liked what I saw. Haven't had a chance to try it out yet but like you say they seem to have nailed the interface boundary between "things that should run fast" (on an embedded computer) and "things that need precise timing" (on a microcontroller).
One thing to keep in mind for people looking at the RT patches and thinking about things like this: these patches allow you to do RT processing on Linux, but they don't make some of the complexity go away. In the Klipper case, for example, writing to the GPIOs that actually send the signals to the steppers motors in Linux is relatively complex. You're usually making a write() syscall that's going through the VFS layer etc. to finally get to the actual pin register. On a microcontroller you can write directly to the pin register and know exactly how many clock cycles that operation is going to take.
I've seen embedded Linux code that actually opened /dev/mem and did the same thing, writing directly to GPIO registers... and that is horrifying :)
A few months ago, I played around with a contemporary build of preempt_rt to see if it was at the point where I could replace xenomai. My requirement is to be able to wake up on a timer with an interval of less than 350 us and do some work with low jitter. I wrote a simple task that just woke up every 350us and wrote down the time. It managed to do it once every 700us.
I don't believe they've actually made the kernel completely preemptive, though others can correct me. This means that you cannot achieve the same realtime performance with this as you could with a mesa kernel like xenomai.
Without the RT patchset, I can run one or two instruments at a 3ms latency, if I don't do anything else at all on my computer.
With it, I routinely have 6 instruments at 1ms, while having dozens of chrome windows open and playing 3d shooters without issue.
It's shocking how much difference it makes over the regular (non-rt) low latency scheduler.
Are there any good resources on how this kind of real-time programming is done?
What goes into ensuring that a program is actually realtime? Are there formal proofs, or just experience and "vibes"? Is realtime coding any different from normal coding? How do modern CPU architectures, which have a lot of non-constant time instructions, branch prediction, potential for cache misses and such play into this?
> What goes into ensuring that a program is actually realtime?
Realtime mostly means predictable runtime for code. As long as its predictable, you can scale the CPU/microcontroller to fit your demands or optimize your code to fit the constraints. It’s about making sure your code can always respond in time to hardware inputs, timers, and other interrupts.
Generally the Linux kernel’s scheduling makes the system very unpredictable. RT linux tries to address that along with several other subsystems. On embedded CPUs this usually means disabling advanced features like cache and speculative execution (although I don’t remember if RT handles that part since its very vendor specific).
I'm not hugely experienced in the field personally, but from what I've seen, actually proving hard real time capabilities is rather involved. If something is safety critical (think break systems, avionic computers, etc.) it likely means you also need some special certification or even formal verification. And (correct me if I'm wrong) I don't think you'll want to use a Linux kernel, even with the preempt rt patches. I'd say specialized rt operating systems, like FreeRTOS or Zephyr, would be more fitting (though I don't have direct experience with them).
As for the hardware, you can't really use a ‘regular’ CPU and expect completely deterministic behavior. The things you mentioned (and for example caching) absolutely impact this. iirc amd/xilinx actually offer a processor that has both regular arm cores, alongside some arm real time cores for these exact reasons.
For things like VxWorks, it's mostly vibes and setting priority between processes. But there are other ways. You can "offline schedule" your tasks, i.e. you run a scheduler at compile time which decides all possible supported orderings and how long slots each task can run.
Then, there's the whole thing of hardware. Do you have one or more cores? If you have more than one core, can they introduce jitter or slowdown to each other accessing memory? And so on and so forth.
> it's mostly vibes and setting priority between processes
I'm laughing so so hard right now. Thanks for, among other things, confirming for me that there isn't some magic tool that I'm missing :). At least I have the benefit of working on softer real-time systems where missing a deadline might result in lower quality data but there's no lives at risk.
Setting and clearing GPIOs on task entry/exit are a nice touch for verification too.
> If you have more than one core, can they introduce jitter or slowdown to each other accessing memory?
DMA and fancy peripherals like UART, SPI etc, could be namedropped in this regard, too.
You don't break the electrical equipment/motor/armature/process it's hooked up to.
In rt land, you test in prod and hope for the best.
On all the real time systems I've worked on, it has just been empirical measurements of cpu load for the different task periods and a good enough margin to overruns.
On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.
I wouldn't say there is any conceptual difference from normal coding, except for that you'd want to be kinda sure algorithms terminate in a reasonable time in a time constrained task. More online algorithms than normally, though.
Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.
> On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.
Yeah, the tradeoff there is interesting. Sometimes "get it as deterministic as possible" is the right answer, even if it's slower.
> Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.
Lol, with my colleagues/juniors I'll often encourage them to take code that doesn't look like that and figure out if there's a sane way to turn it into "state-machine going in a circle". For problems that fit that mold, being able to say "event X in state Y will have effect Z" is really powerful for being able to reason about the system. Plus, sometimes, you can actually use that state machine to more formally reason about it or even informally just draw out the states, events, and transitions and identify if there's anywhere you might get stuck.
[dupe]
More discussion: https://news.ycombinator.com/item?id=41584907
Hooray!
This is big for the CNC community. RT is a must have, and this makes builds that much easier.
Why use Linux for that though? Why not build the machine like a 3D printer, with a dedicated microcontroller that doesn't even run an OS and has completely predicable timing, and a separate non-RT Linux system for the GUI?
I feel like Klippers approach is fairly reasonable, let an non-RT system (that generally has better performance than your micro controller) calculate the movement but leave the actual commanding of the stepper motors to the micro controller.
Yeah, I looked at Klipper a few months ago and really liked what I saw. Haven't had a chance to try it out yet but like you say they seem to have nailed the interface boundary between "things that should run fast" (on an embedded computer) and "things that need precise timing" (on a microcontroller).
One thing to keep in mind for people looking at the RT patches and thinking about things like this: these patches allow you to do RT processing on Linux, but they don't make some of the complexity go away. In the Klipper case, for example, writing to the GPIOs that actually send the signals to the steppers motors in Linux is relatively complex. You're usually making a write() syscall that's going through the VFS layer etc. to finally get to the actual pin register. On a microcontroller you can write directly to the pin register and know exactly how many clock cycles that operation is going to take.
I've seen embedded Linux code that actually opened /dev/mem and did the same thing, writing directly to GPIO registers... and that is horrifying :)
Sounds exciting. Anyone recommend a good place to read what the nuances of these patches are? The zdnet link about the best, at the moment?
A few months ago, I played around with a contemporary build of preempt_rt to see if it was at the point where I could replace xenomai. My requirement is to be able to wake up on a timer with an interval of less than 350 us and do some work with low jitter. I wrote a simple task that just woke up every 350us and wrote down the time. It managed to do it once every 700us.
I don't believe they've actually made the kernel completely preemptive, though others can correct me. This means that you cannot achieve the same realtime performance with this as you could with a mesa kernel like xenomai.
Did you pin the kernel to its own core?