Improving instructions per cycle (IPC) for single-thread applications with stagnating clock frequency requires dealing with fundamentally difficult constraints, e.g., branch mispredictions and cache misses. Overcoming these constraints, while difficult, promises to bring substantial increases in IPC. This has now become a necessity to support the scale of growing data. Fast processors, TPUs, accelerators, and heterogeneous architectures, have enabled fast computation due to which memory performance has become the bottleneck in many applications. Many applications are memory bound and the problem of reducing the latency of memory accesses must be addressed. Several emerging memory technologies such 3D-Stacked DRAM and Non-volatile Memory attempt to address memory bottleneck issues from a hardware perspective, but with a tradeoff among bandwidth, power, latency, and cost.Rather than redesigning existing algorithms to suit specific memory technology, we propose to develop Machine Learning (ML) based approach that automatically learns access patterns which may be used to prefetch data. Specifically, LSTM (Long-Short Term Memory) based Deep Learning has been successfully used in natural language tasks such as part of speech tagging, grammar learning, and text prediction. In fact, the “Quicktype” function of iPhone uses LSTM to predict the next word while typing. Since memory accesses have an underlying grammar similar to natural language (albeit simpler due to being context free), such models are naturally applicable to learning accesses. However, only recently, application of LSTM has gained the attention of researchers for learning accesses. While it has been shown to obtain high precision and recall, the approach of training offline and testing online for individual application is not a practical prefetcher, as pointed by the researchers themselves, and is only the initial step towards building one. A practical LSTM based prefetcher implementation requires dealing with certain challenges that we address (i) size of the machine learning model (small – to enable fast inference and ensure feasibility of implementation); (ii) training with large traces to obtain a (small) model that is highly accurate in predicting memory accesses; (iii) ensuring real-time inference; and (iv) retraining the model online, on-demand to learn application specific models, which would require fast learning with small amount of data. A typical size (number of parameters) of LSTM model for memory access prediction is dominated by the output layer due to a dense layer connecting tens of thousands of outputs (possible memory addresses). This forms the basis of all the challenges listed earlier. We propose a compression technique that achieves a factor of O(n/logn) in reduction of parameters without compromising accuracy, thus overcoming these challenges. Note that our objective is not to develop a full scale prefetcher, but to design a highly accurate and compact LSTM based access prediction model that demonstrates the utility of highly compressed LSTMs for improving prefetching. A prefetcher built on top of this model and its hardware implementation will be explored in future work.