RESEARCH HIGHLIGHTS: ESR8

Daniel Hesslow (LightOn)

Daniel’s research took a comprehensive approach to exploring the requirements for pushing the boundaries of very large AI models, with a special focus on natural language processing in the era of large language models.

His investigation started with the photonic hardware called Optical Processing Unit (developed by LightOn since 2016) and exploring strategies for its efficient utilization, notably for alternative methods for training Deep Neural Networks. Daniel took a significant part in two major contributions : first, the “linearization” of the Optical Processing Unit by methods of Digital interferometry, ie by combining 2 intensity-only measurements and “anchor” measurements. Second, the adaptation of a method for training Deep Neural Network called “Direct Feedback Alignment”, where the OPU is used in the feedback loop in a parallel manner on all layers. Combining thedeveloping a se 2 methods, the team has had experimental results of the optical training of DNNs several orders of magnitude larger than reported in the literature.

Daniel then focused his work on Large Language Models. He explored model architecture design and data processing techniques, to be run on high-end GPU boards. Ultimately, Daniel demonstrated how these factors come together in the training of multiple large models, culminating with Falcon 180B — a model whose capabilities, following pre-training, was exceeded at time of release only by GPT-4. He is the named co-inventor of US patent US11574178B2.’

Daniel also contributed to the development of the novel method to perform linear optical random projections without the need for holography. this method consists of a computationally trivial combination of multiple intensity measurements to mitigate the information loss usually associated with the absolute-square non-linearity imposed by optical intensity measurements. Both experimental and numerical findings demonstrate that the resulting matrix consists of real-valued, independent, and identically distributed (i.i.d.) Gaussian random entries. Our optical setup is simple and robust, as it does not require interference between two beams. We demonstrate the practical applicability of our method by performing dimensionality reduction on high-dimensional data, a common task in randomized numerical linear algebra with relevant applications in machine learning.

Key publications by Daniel Hesslow:

[1] Daniel Hesslow et al; (2021). Photonic co-processors in HPC: Using LightOn OPUs for Randomized Numerical Linear Algebra. The 33rd Hot Chips Symposium; arXiv:2104.14429.

[2] Ruben Ohana, Daniel Hesslow, Daniel Brunner, Sylvain Gigan and Kilian Mueller (2023). Linear Optical Random Projections Without Holography. arXiv:2305.12988.

[3] T. Wang, A. Roberts, D. Hesslow, et al. (2022). What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? International Conference on Machine Learning, 22964-22984. https://proceedings.mlr.press/v162/wang22u/wang22u.pdf.

[4] Daniel Hesslow et al. (2022). Rita: a study on scaling up generative protein sequence models. ArXiv preprint arXiv:2205.05789.

[5] A. Chatelain, A. Djeghri, D.Hesslow and J. Launay (2021). Is the number of trainable parameters all that actually matters? I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021, 27-32.

[6]D Hesslow, I Poli (2021). Contrastive embeddings for neural architectures. ArXiv preprint arXiv:2102.04208