[trustable-software] I fear we need trustable hardware too...

Thu Jan 4 19:30:37 GMT 2018

On Thu, Jan 4, 2018 at 3:10 AM, Paul Sherwood <paul.sherwood at codethink.co.uk
> wrote:

> <snip>
>

> But I'm interested to get others' thoughts on this, in the aftermath...
>
> - are we wasting our time?
> - is there any of what we are thinking about that re-applies to hardware?
>

No, and yes.

Performance, efficiency and functionality desires drive complexity up.
Complexity means a system becomes porous, and malicious or accidental
untrustworthy behavior is essentially a given. I believe trustworthy
software needs to both statically (ahead of time) be built in a way to
maximize trust, but dynamically (during execution) monitor, catch
excursions and enforce policies (fail-safes or other).

Take two examples, low and high level:

- the NVIDIA Drive PX systems based on the Parker SOC. The "Denver"
high-perf ARM CPU core is not an ARM core. To get higher performance and
higher efficiency, it's a funky VLIW core with complicated software
controlled speculation, running a microkernel, a binary translator and a
software optimizer. All hidden from the system! When executing code becomes
hot, kernel or user space, the optimizer grabs it, munges it, and produces
an optimized version that is chained into other snippets in a code cache
even as other threads are passing through. This is a complicated beast,
ripe for such research in timing attacks with much larger scope than
traditional hardware OOO engines (and I say that as one of the designers
:-/). It has hardware controls to help minimize these, such as in the
memory controller, but still.. On the GPU side, you're also facing a deep
software stack, with more JIT compilers, optimizers and a hidden
instruction set. This complexity isn't going away. Although some of the
mitigation techniques, such as
https://support.google.com/faqs/answer/7625886 , essentially gimp the
performance oriented features (like the branch target buffer on modern
cores).

- ML and other statistical models in an application (whether a self-driving
car, or a high frequency trading algorithm). It's surprisingly easy to
craft input data that triggers misbehavior - e.g. stickers that you can put
on street signs to cause misclassification. A stop sign can be made to
disappear, or become a speed limit sign. In trading terms, in a former
role, as part of understanding this, we crafted a benign looking system
that when fed external market data with a hidden trigger pattern, would
respond with a series of orders that deliberately leaked information into
the market. Testing won't show the absence of these. Nor do techniques like
model distillation, while improving robustness, fully defend against it.
Our own complex application models are likely vulnerable even if we've
developed them in a trustworthy fashion.

So what to do?

Our systems need to be trustworthy in an untrustworthy world. We don't
trust ourselves, let alone anything else. And that's complexity, not just
malice, so even allegedly "isolated" systems are not immune.

Just like building fault-tolerant distributed systems, I believe we need
layers of defense, and assume that of course things are broken, all the
time. In the interest of keeping this message to a tolerable length, I
won't go into all the details. But, I think we need two approaches, both of
which deserve pages of their own with many sub-categories:

1.  Try to not go off the rails: software isolation techniques - e.g. block
syscalls in most user code or emulate in an async fashion, explicit swim
lanes and software circuit breakers in the surrounding "app server" logic
rather than badly reinvented in an ad hoc fashion within user logic,
diverse (redundant, perturbed) execution / models and other cross-checks,
and so on. In general, the "happy path" through user logic is already hard
for us all to get correct. The unhappy path is harder, needs to be explicit
and trustworthy, and needs to not be haphazardly sprinkled through user
code.

2. *Assuming* we're going off the rails: explicitly monitor and react to
violations at runtime. Besides the graph of user logic, we should separate
and make explicit the control / monitoring / reaction software. To tie that
back to trustworthy guidelines we've been discussing, we should explicitly
state that say "model X outputs numbers between 1 and 10 and other values
are not to be trusted". And explicitly have guardband logic separate from
the model that enforces this, and prevents the pollution of downstream user
logic with crazy values. This is a trivial example, but imagine
extrapolating to say "the rate of change of the output values should be
below x", or "models x and y should output values within 10% of one
another". Or "if the objects classified as vehicles are all braking
aggressively, consider whether the running models are missing
something...". Rather than bits of logic embedded inline in the user
application logic, these should be hoisted out for checking and enforcement
'by the system' that is supervising the user logic, and augmented with
lightweight anomaly detectors etc.

Can we prevent everything bad? No. It's not even the goal. The goal is to
minimize the chances of the bad thing, and when it does happen, to survive
it. And of course we should still use the existing techniques - e.g. no one
is saying to turn off your type checkers, just to recognize that well-typed
code can go horribly wrong in myriad ways.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.trustable.io/pipermail/trustable-software/attachments/20180104/ccc5e747/attachment.html>