The SHA experiment – Bjarne H. Nielsen

I wanted to compare how TypeScript would compare to Rust for doing somewhat complex algorithms with a non-trivial number of calculations, and thus The SHA Experiment was started.

TL;DR

Deno is nice, and good for small utilities
TypeScript is surprisingly performant, as long as you do not need 64-bit numbers
Rust rocks, and doesn’t try to hide or gloss over any of the hard issues
moving data around byte-by-byte hurts alot.

A bit about the SHA algorithm

The Secure Hash Algorithms (SHA) is not only a cryptographic workhorse, it is also fairly compact algorithms of a non-trivial but manageable complexity; and they are very well described (for SHA1/2 see FIPS 180-4, and for testvectors see the Cryptographic Algoritm Validation Program).

The SHA algorithms are defined over a stream of bits, but the core algorithm is based on iterating over blocks of integers (for SHA-2, 16 unsigned integers of resp. 32-bit and 64-bit size).

The input has to be followed by a end-marker (one 1-bit) and the an integer stating the length of the message in bits. Any space left for a full block is padded by inserting 0-bits between the end-marker and the length. If there isn’t room for end-marker and length in what is left, the padding is extended into the next block; ie. end-marker + padding in the first block, and padding + length in the second.

The hash is then computed by taking an initial value and the first block, which results in a hash. If there is no more message (inclusive end-marker, padding and length), then this is the final hash, otherwise the value is used as initial value for computing the hash of the next block. And so forth.

The calculations for a block consists of doing multiple rounds of permutations (additions, XORs, rolls and shifts), effectively shuffling around this bits of the message several times.

First 16 numbers taking from the input (a block) is extended, shuffling bits and feeding them forward.

for(let t=16; t<roundCount; t++) {
  w[t] = sigma1(w[t-2]) + w[t-7] + sigma0(w[t-15]) + w[t-16];
}

(w is the extended block, and the first 16 numbers have been filled from the input block)

And then it is simply doing a round of calculations, feeding in one number from the extended block at a time:

for(let t=0; t<roundCount; t++) {
    let T1 = vars[h] + Sigma1(vars[e]) + ch(vars[e], vars[f], vars[g]) + k[t] + w[t];
    let T2 = Sigma0(vars[a]) + maj(vars[a], vars[b], vars[c]);
    vars.copyWithin(1, 0);
    vars[e] = vars[e] + T1;
    vars[a] = T1 + T2
  }

(k is a set of constants defined by the standard; the variables a through h from the algorithm was implemented as an array, allowing for shifting them all with the “copyWithin”, as an alternative to the usual h=g, g=f, … , b=a)

Streams and iterators

I initially set out wanting to explore streams, or rather the neighbouring concept of iterators. It leads to a simple, intuitive interface and naturally lends itself to partitioning the implementation in composable logical steps.

It is also obvious that it is possible to process the data sequentially, as the only data feed forward is a counter for the length and the intermediate hash, leading to an implementation using an amount of memory that is unrelated to the message size.

The two widespread alternative ways of doing the interface is to either take the entire message in one go (which doesn’t scale well), or to take it piecemeal as a set of repeated calls, one chuck at a time.

I ended up abandoning the byte stream/iterator approach in both languages for performance reasons. It is obvious that pushing around large amounts of data in some pieces isn’t effective, but my assumption was that it would pale compared to the many calculations done on each block. In TypeScript it turned out that the time needed for reading a file and pushing around byte data was roughly on par with the block algorithm; I only eyeballed it in Rust, but it was sizeable in comparison there too.

My initial implementation was based on byte iterators and it have been tested, so it works. I still like the interface, but for calculating hashes on a folder with a month worth of digital images in RAW format, it wouldn’t do.

In TypeScript the solution was to change it to a solution based on a stream of variably and oversized chunks of data; in Rust on reading the file using a block sized buffer (and putting a BufReader between this and the raw file).

TypeScript

For the Typescript code, see this project on Github.

TypeScript vs. JavaScript

First of all, the obvious: TypeScript isn’t JavaScript. While the typesystem in TypeScript allows for a lot generalizations, but you cannot write code that abstracts over bigint and number. E.g. both define addition with the + operator, but there are no way of expression that as a type bound.

Also 32-bit numbers are signed by default. Mostly it can be ignored, but there is the “>>>” operator for shifting without extending the high bit. This operator isn’t defined for bigints.

So I ended up writing almost equivalent versions for 32-bit and 64-bit operations. Both because of the (lack of) expressiveness in TypeScript type system and because of JavaScript idiosyncrasies.

The main difference between JavaScript and TypeScript is the type system. As mentioned, it does occasionally prevent you from writing generic code, but in general I found that it helped me a lot more than it got in the way (and in one instance it even forced my to rethink something that had gotten a bit to clever). When converting back and forth between different representations, and with code that requires a high level of correctness, it leads to higher productivity in catching opaque mismatches early on.

TypeScript vs. Rust

Obviously Rust can outperform TypeScript, with Rust being much more lowlevel and thus developers being much more in control. But TypeScript (on the V8 engine) really gave Rust a good run for the money.

And this difference is probably also why TypeScript turned out to be more agile than Rust. I started out in Rust, but switched over to TypeScript after having gotten a functioning solution. Working in TypeScript, I realized that I initiatially had gotten it the wrong way around: splitting the data into blocks and then padding is actually much easier than the other way around. Because of the lowlevel and rigid nature of Rust (compared to TypeScript), it didn’t occur to me. But I later went back and rewrote the Rust code too.

Or put it another way: if you want to get a feel for the different solution alternatives and experiment a bit, a language like TypeScript is probably better; if you know exactly what bits should go where and need full control and raw performance, then Rust is better. I will probably first prototype in TypeScript before writing it in Rust.

TypedArray / DataView / ArrayBuffer

The SHA algorithm input is a stream of bits, but the algorithm is defined using unsigned integers. Having fairly sized chunks of data, and having both a byte-oriented view on it, as well as viewing it as integers, looks like an obvious case for the TypedArray / DataView / ArrayBuffer construct in JavaScript. And it is highly efficient and works very well.

But there is a gotcha. Integer views uses the platform endianness for performance reasons, and the SHA algorithms obvious is defined for a certain endianness. Which means that I ended up having to copy all data to compensate for endianness.

Deno

Another motivation for this project, was to try out Deno as a platform. Node.js is hugely popular, if you do not per se want to run JavaScript in a browser. Node.js was originally written by Ryan Dahl, and he is also a huge part of the team behind Deno.

I found Deno intuitive to use, and in many ways much simpler than Node.js. It probably isn’t entirely fair, as Node.js has been on a long journey, and have been frontrunner on concepts now stabilized and internalized in JavaScript, so obviously Deno has the advantage of a fresh start.

Deno has TypeScript as a first class citizens alongside JavaScript.

Deno allows you to run the code directly out of e.g. github, and that you can install it as a script. See the README.md of the github project for an example.

I can easily see Deno fit into the gap between small scripts and applications. It is a perfect fit for something like this project: a non-trivial, but well defined utility. It probably will not replaced Node.js, because Node.js is already an established and comprehensive platform.

Rust

For the Rust code, see this project on Github.

… to be continued;