• 2 Posts
  • 342 Comments
Joined 1 year ago
cake
Cake day: June 16th, 2023

help-circle
  • Where I worked we had a very important time sensitive project. The server had to do a lot of calculations on a terrain dataset that covered the entire planet.

    The server had a huge amount of RAM and each calculation block took about a week. It could not be saved until the end of the calculation and only that server had the RAM to do the work. So if it went down we could lose almost a weeks work.

    Project was due in 6 months and calculation time was estimated to be about 5 1/2 months. So we couldn’t afford any interruptions.

    We had bought a huge UPS meant for a whole server rack. For this one server. It could keep the server up for three days. That way even if wet lost power over the weekend it would keep going and we would have time to buy a generator.

    One Friday afternoon the building losses power and I go check on the server room. Sure enough the big UPS with a sign saying only for project xyz has a bunch of other servers plugged into it.

    I quickly unplug all but ours. I tell my boss and we go home at 5. Latter that day the power comes back on.

    On Monday there are a ton of departments bitching that they came in an their servers were unplugged. Lots of people wanted me fired. My boss backed me and nothing happened but it was stressful.




















  • Pretty specific use case. A normal OS handleds time slicing and core assignment for processes and uses it’s judgement for that. So at any time your process can be suspended and you don’t know when you get your next time slice.

    Same with when you make wait calls. You might say wait 100ms but it may be much longer before your process gets to run again.

    In a real time OS if you have real time priority the OS will suspend anything else including it self to give you the time you request. It also won’t suspend you no matter how long you use the core.

    So if you need to control a process with extreme precision like a chemical manufacturing process, medical device, or flying a rocket where being 10ms late means failure they are required.

    However with great power comes great responsibility. You need to make sure your code calls sleep frequently enough that other tasks have time to run. Including things like file io or the gui.