Skip to content

Accelerating Secure and Privacy Preserving Computations

Data privacy and security have gained widespread attention in recent years. This has had implications on data-intensive applications in healthcare, financial transactions, recommendation systems, etc., which rely on cloud platforms for implementation. Data security becomes critical when uploaded data or results returned from the cloud are sensitive and confidential. Encryption of data protects data in transit. However, as cloud platforms are public, shared resources, the application’s data remains vulnerable if decoded on the cloud platform for execution.
Picture of secure computation
A new class of cryptosystems, known as Homomorphic Encryption (HE), provides a potential solution to these challenges. HE allows arbitrary computations on encrypted data without requiring access to the secret key. This enables the cloud to evaluate functions on the encrypted data without needing to or having the capability of decrypting the data, guaranteeing end-to-end privacy.

Computations on encrypted data are computationally demanding and require significant hardware resources due to the need for high-degree polynomials to achieve the desired security level. In addition, the computations are data intensive and require fast access to large amounts of data.

Heterogeneous architectures consisting of CPUs and FPGAs have become popular as they integrate the general-purpose processing power of CPUs with the energy-efficient, fine-grained parallelism of FPGAs. This project focuses on exploiting these architectures to accelerate privacy-preserving Machine Learning (ML) models. The project is structured into three levels: HE operations, HE ML models, and HE framework.

Areas of Interest:

HE operation acceleration, HE-CNN acceleration, HE framework, and overlay architecture

HE Operation Acceleration

The wide variability in computational complexity across different HE operations presents significant challenges for acceleration. Each operation also comes with distinct memory and bandwidth requirements. We focus on developing a comprehensive set of HE kernels to accelerate all key operations, e.g., multiplication, rotation, and bootstrapping.

We have implemented low-latency Number Theoretic Transform (NTT) kernels to enable fast polynomial multiplication. Additionally, we have designed the rotation kernel using a dynamic layout, addressing memory inefficiencies caused by varying data access patterns. We are developing a high-performance, bootstrappable, Fully Homomorphic Encryption (FHE) accelerator to improve the bandwidth efficiency of the most time-consuming operation significantly, i.e., bootstrap.

HE-CNN Acceleration

A major challenge in efficiently implementing CNN inference over homomorphic encrypted data is the large computational complexity and storage requirements of executing convolution layers. This is because HE converts the inputs into high-degree polynomials, typically 1K to 32K. Furthermore, the computations can only be performed in a coefficient-wise manner and require expensive rotation operations to align the coefficients. These challenges have prevented the porting of efficient convolution algorithms such as im2col and frequency domain convolution, which have been successfully used for developing low-latency CNN implementations targeting FPGAs.

We have developed end-to-end implementations of HE-CNN models which achieve high performance. We are designing efficient algorithms and architectures for dense and sparse HE convolutions. We are also developing performance models that capture the noise and execution time of available design choices and enable the selection of optimal parameters for HE-CNN.

HE Framework and Overlay Architecture

Manually programming FPGAs for HE applications is non-trivial because of the vastly different parameter settings and latency requirements. It is also challenging due to the diverse ways of mapping HE operations onto FPGAs. To close the gap, we are focusing on developing a framework to generate low-latency FPGA accelerators for all the operations supported by HE, enabling users to utilize FPGA-accelerated HE processing without requiring knowledge of FPGA implementation details. We are also developing an FPGA-based overlay accelerator, compiler, and instruction set architecture, offering a flexible programming abstraction for HE computations.

Recent Publications