Anas Skalli (UBFC)
Artificial Neural Networks (ANNs) have become a ubiquitous technology; indeed, their flexibility allows them to excel in a wide range of tasks, ranging from medical diagnosis to language models. Contrary to classical algorithms, these networks process information in parallel. Photonics, in particular, shows great promise as a platform for implementing ANNs in terms of scalability, speed, energy efficiency and parallel information processing [1]. In [2, 3], we physically implemented the first fully autonomous PNN (photonic neural network), using spatially multiplexed modes of an injection locked large area vertical cavity surface emitting laser (LA-VCSEL). All components of the PNN, including learning are realized in hardware using off-the-shelf, commercially available, low energy consumption components, while still achieving
>98% accuracy in 6-bit header recognition tasks.
The ANN we built follows the reservoir computing (RC) concept and as such it is comprised of three main parts: the input layer, the reservoir, and the output layer. The experimental scheme is shown in Fig. 1(a). Binary headers displayed on a digital micromirror device (DMDa) are injected onto the LA-VCSEL through a multimode fiber (MMF) which passively implements random input weights Win. The VCSEL transforms the injected information u non-linearly yielding the perturbed mode profile x. The final part of the ANN is its output layer. The VCSEL’s surface is imaged onto DMDb, whose pixels can flip between two positions, for one of which it reflects light onto the photodetector (DET), giving us Boolean output weights Wout. 350+ nodes are implemented fully in parallel [4]; the output of the ANN y is the optical power detected at DET. Training on DMDb is realized via iterative optimization until the output y approximates the target for a computationally meaningful task such as header recognition. Fig. 1(b) shows performance for a 6-bit header recognition task with an error of 1.5%. We also studied the impact of different learning strategies and physical parameters of our ANN (injection power and wavelength, bias current, etc…) on its performance for classification task as well as how they impacted other dynamical properties such as consistency and dimensionality, showing a promising consistency of 99%. We also studied the performance of our PNN on the MNIST dataset with promising initial results.
Figure 1a (left panel): Working principle of the experimental ANN. 2b; (right panel) Representative learning curve showing the symbol error rate (SER) for a 6-bit header recognition task.
We also studied of the performance varies with different physical parameters, namely, injection wavelength, injection power, and bias current. Furthermore, theses physical parameters were linked to the general computational measures of consistency and dimensionality. And a general method of gauging dimensionality in high dimensional nonlinear systems subject to noise was introduced. This could be applied to many systems in the context of neuromorphic computing. This fundamental characterization paves the way for the use of similar photonic devices as building blocks for more complex hardware ANN platforms.
Fig. 3(a) shows how the VCSEL reacts to an injection drive laser at different injection wavelengths. At around 918.9 nm, the VCSEL locks to the injection laser and its free running modes are suppressed by around 10 dB. At this point the VCSEL’s own emission wavelength is shifted to that of the injection laser in a phenomenon called injection locking. Fig. 3(b) shows how injection locking impacts performance. The injection wavelength is swept and the dependency of the NMSE on is shown for different power ratios (PR) between the VCSEL and the injection. The VCSEL was biased 50% above its threshold and its emission power was 3.6 mW. There is a clear trend showing optimal performance around the resonance wavelength shown by the red dotted line. In addition, performance increases with higher PRs, yet it starts degrading after a PR higher than 1.
Figure 3: a) Injection locking of the VCSEL by an external drive laser. b) Performance (NMSE) for different injection wavelengths and powers. c) Total system consistency. d) Dimensionality with the VCSEL ON and OFF
In addition to these physical parameters, consistency and dimensionality were also studied. Consistency is the ability of a given physical system to respond similarly when subjected to the same drive signal or input information. It is a crucial property when studying dynamical systems, especially when considering these dynamical systems as potential hardware candidates for neural network implementations. Indeed, a system that is not consistent to some degree would not be able to learn any task. Crucially, consistency serves as an upper bound for a given system’s performance. Indeed, a system that is 95% consistent cannot achieve lower than 5% error for a continuous function approximation task. Dimensionality is connected to the computational power of the device a higher dimensional system will I principle be able to solve more complicated tasks. Fig. 3(c) shows how the consistency saturates at a PR around 0.35 to nearly 100%. The high consistency value that was measured is promising as it shows the highly robust nature of the device Finally, Fig. 3(d) shows how the VCSEL expands the dimensionality of the input data (ON vs OFF).
Then, we worked in close collaboration with ESR5 on improving the photonic neural network (PNN) based on the large area vertical cavity surface emitting laser (LA-VCSEL) presented in Fig.3.
We performed a ceiling analysis on a digitally simulated neural network (NN) of representative size (100 neurons) to identify which parts of the neural network boost the systems performance the most when optimized in an image classification task. The well-known MNIST dataset was used to perform this analysis as it can be used in the PNN later for comparison purposes. The main findings of the ceiling analysis are summarized in the table 1. Restricting the neural network’s output weights Wout to be positive only while training them and keep the input weights random yields the worst performance at 60% classification accuracy. Allowing Wout to have negative values as well increases performance dramatically to 87%. Furthermore, training the input weights Win in addition to the output weights yielded another significant jump in performance to 97% accuracy.
System | Performance |
NN 100 training Wout positive only | 60% |
NN 100 training Wout positive and negative | 87% |
NN 100 training Wout and Win | 97% |
These conclusions may seem trivial in the context of conventional machine learning yet they have profound consequences when it comes to building a setup that leverages a physical system i.e., the LA-VCSEL as the central piece of an optical neural network. Firstly, negative weights while crucial for accuracy are not trivial to implement optically. Secondly, implementing trainable input weights is a challenge because one cannot rely on error backpropagation to train them as it is done in digitally simulated neural network. Therefore, a model-free or “black-box” optimization algorithm that is sufficiently efficient is required. Here, we use an evolutionary strategy that update the weights based on multiple samplings of the classification error of the network within a certain (potentially small) variation of the weights. In particular, we use the Parameter exploring Policy Gradient (PEPG) algorithm because of the following reasons:
- The algorithm is model-free and hence does not need any knowledge of the PNN to train the input weights.
- The PNN with its high throughput (15 kHz) allows to quickly sample the error for different weights.
- The update of the weights needs only minimal computation effort and scales linearly in the number of parameters.
- The algorithm does work even with restricted resolution of the to be trained weights.
Taking these findings in mind we designed an improved version of the setup where all weights in the PNN can be trained, yielding therefore a highly tunable network, that can start to handle real datasets and move beyond the basic proof of concept.
The updated experimental setups look as follows: Input images u displayed on a digital micromirror device (DMDa) are passed through a phase mask displayed on a spatial light modulator (SLM) which encodes the trainable input weights Win. The phase modulated input is injected onto the LA-VCSEL through a multimode fiber (MMF) which passively implements a random linear mixing Wrand. The VCSEL then transforms the injected information non-linearly yielding the perturbed mode profile x yielding up to 350 fully parallel neurons. The final part of the PNN is its output layer. The VCSEL’s surface is imaged onto
Figure 4: Working principle of the improved experimental photonic neural network. |
DMDb, whose pixels can flip between two positions, for one of which it reflects light onto the photodetector (DET), giving us Boolean output weights Wout. Here a up to 5 times optical magnification of the LA-VCSEL is implemented to increase the imaged size of the LA-VCSEL on the DMDb and accordingly the amount of available Boolean weights. Finally, the output of the PNN y is the optical power detected at DET. Here, negative output weights are achieved via recording the output of the PNN twice and implementing an electronic subtraction. As described above, Wout and Win are trained via the PEPG evolutionary optimization based on multiple samplings of the error using slightly perturbed weights. Ideally, a second SLM should be used at the output to provide higher resolution and might increase classification accuracy further. The described experimental scheme is shown in Fig. 4.
Figure 5: Preliminary classification performance for the MNIST task using Boolean weights (BOOL), and trinary weights |
Fig. 5 shows preliminary classification performance for the MNIST task using Boolean weights (BOOL), and trinary weights (3val -1, 0, +1) reaching an average performance of 90% for trinary weights which have a significant positive impact on performance. In addition, we conducted a long-term stability analysis, over a period of 10 hours, showing little to no degradation in performance or drift. Moreover, the average cross correlation between different outputs over the 10 hours was 98%.
The VCSEL-based PNN shows promising initial results in the MNIST dataset while achieving classification at a bandwidth of 15kHz. Our approach, originally developed with partner CSIC, is highly relevant, fully parallel and scalable both in terms network size and depth as we have a clear avenue for using VCSELs in a deep PNN configuration. Lastly the inference bandwidth is also highly scalable due to a fast VCSEL response time without any significant increase in power consumption. We will continue their collaboration with the goal of fully exploiting the setups computational capabilities by training the input weights of the hardware.
References , including Key Publications by Anas Skalli:
[1] B. Shastri, et al., “Photonics for artificial intelligence and neuromorphic computing, Nature Photonics 15, 102-114 (2021).
[2] X. Porte, A. Skalli, N. Haghighi, S. Reitzenstein, J. Lott, D. Brunner, “A complete, parallel, and autonomous photonic neural network in a semiconductor multimode laser,” J. Phys. Photonics 3, 024017 (2021).
[3] A. Skalli, et al., “Photonic neuromorphic computing using vertical cavity semiconductor lasers,” Optical Materials Express 12, 2395 (2022).
[4] A. Skalli, et al., “Computational metrics and parameters of an injection-locked large area semiconductor laser for neural network computing,” Optical Materials Express 12, 2793 (2022).