Our first result deals with software TFHE implementation of ciphertext-ciphertext multiplication. It is required, for instance, when AI/ML coumputations are outsourced to the cloud and both a user and the service provider use encryption, i.e. both the user input data and proveder's network weights are encrypted. Support for such an operation is either lacking or are extremely slow. We developed an approach to improve the performance of this multiplication by applying carry-save addition. Its theoretical speedup is proportional to the bit width of the plaintext integer operands. It also speeds up multi-operand summation.

This approach introduces easily exploitable parallelism at the level above TFHE gates. A speedup of 15x was obtained for 16-bit multiplication on a 64-core processor, when compared to previous results. This leads to a much faster dot product and convolution computations, which combine multiplications and a multi-operand sum. A 45x speedup is achieved for a 16-bit, 32-element dot product and a 30x speedup for a convolution with a 32x32 filter size.

The mltiplication also becomes more than twice as fast on a GPU when our approach is utilized.