Enhanced approach for artificial neural network-based optical fiber channel modeling: Geometric constellation shaping WDM system as a case study

Abbass, A. M.; Fyath, R. S.; Abbass, A. M.; Fyath, R. S.

doi:10.22201/icat.24486736e.2024.22.6.2490

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Journal of applied research and technology

versión On-line ISSN 2448-6736versión impresa ISSN 1665-6423

J. appl. res. technol vol.22 no.6 Ciudad de México dic. 2024 Epub 18-Ago-2025

https://doi.org/10.22201/icat.24486736e.2024.22.6.2490

Articles

Enhanced approach for artificial neural network-based optical fiber channel modeling: Geometric constellation shaping WDM system as a case study

A. M. Abbass^a^*
http://orcid.org/0000-0003-3101-6761

R. S. Fyath^b
http://orcid.org/0000-0002-1029-3471

^{^a}Mustansiriyah University, Computer Engineering Department, Baghdad, Iraq

^{^b}Al-Nahrain University, Computer Engineering Department, Baghdad, Iraq

Abstract

Recently, there has been increasing interest in applying machine learning (ML) approaches to enhance the performance of optical communication systems. This paper applies some of these approaches to design advanced wavelength-division multiplexed (WDM)-coherent optical fiber communication (OFC) systems assisted by the constellation shaping technique. A theoretical design and performance investigation are reported assuming end-to-end deep learning (E2EDL) autoencoder (AE)-assisted system configuration. A flexible artificial neural network (ANNs)-based optical fiber channel modeling approach suitable for different multi-span transmission links in OFCs is presented. This approach is applied to E2EDL-based geometric constellation shaping WDM systems and the results reveal that using a bi-directional gated recurrent unit (Bi-GRU)-neural network (NN) gives the best modeling that tracks the numerical nonlinear interference noise fiber model with much less computation time(~7%). This work is implemented using the Python programming language and utilizing the TensorFlow framework to develop the simulation models.

Keywords: End-to-end deep learning; coherent optical fiber communication; autoencoder; GCS-WDM system; ANN-fiber channel modeling

1. Introduction

Coherent optical fiber communication (OFC) systems are usually used to transmit high-data rates over long-haul fiber transmission link (^{Neves et al., 2023}; ^{Escobar-Landero et al., 2023}). At the transmitter side, the input binary data is embedded in the amplitude A, phase ∅, or both, of the optical carrier. This carrier is presented by the electric field e(t) = Acos (2πω0t + ∅) emitted from a continuous-wave (CW) laser, where ω0 is the optical carrier (laser) radian frequency. This generally leads to three types of signal modulation formats, namely amplitude-shift keying (ASK), phase-shift keying (PSK), and quadrature-amplitude modulation (QAM), respectively, (^{Yang, 2021}). The QAM signal can be considered as the sum of two ASK signals which use the in-phase (I) and quadrature-phase (Q) components as their own optical carriers (i.e., cos(2πω0t) and sin (2πω0t), respectivelyT) (^{Binh, 2015}). he QAM digital signal element (i.e., symbol) can be expressed as (^{Binh, 2015}).

eQAM(t) = aj cos(2πω0t) + bk sin(2πω0t) (1)

where aj and bk are bipolar discrete electric field amplitudes with j = 1, 2, … and k = 1, 2, … . The combination of the discrete amplitude sets {aj, bk} gives M discrete symbols, each has its own amplitude Aq = (aj2+bk2)1/2 and phase ∅q= tan-1(bk/aj). Here, q = 1, 2, …, M, and each one of the M symbols carries Log2 M bits of information. Thus, the M-QAM symbol has 4, 6, and 8 bits when M = 16, 64, and 256, respectively. A good graphical representation of the M-QAM signal is the constellation diagram (i.e., IQ-plane) which shows all the possible transmitted symbols as a collection of points (^{Haroun, 2023}). Each symbol is presented by a single point with the distance from the origin and the angle with respect to the I-axis representing the symbol amplitude and phase, respectively. Figures 1 (a) and (b) show the constellation diagrams of conventional 16-QAM and 64-QAM signaling, respectively. Note that in conventional communication systems (including OFC systems), the M-QAM symbols are transmitted with equal probability(=1/M).

Figure 1 Constellation diagrams of (a)16-QAM signaling and (b) 64-QAM signaling (^{Haroun, 2023}).

The coherent OFC systems usually use coherent detection to recover the data at the receiver side (^{Binh, 2015}). Here, the received signal electric field is mixed with that of a local CW laser whose frequency, phase, and polarization match those of the unmodulated transmitter laser. Figure 2 shows a simplified block diagram of an optical QAM communication system. The input binary data is converted to QAM symbols by the QAM mapper and then used to modulate the transmitter laser field. For long-haul transmission, the fiber link is constructed using multi-span configuration with each span consisting of a section of a single-mode fiber (SMF) followed by an optical amplifier to compensate the span loss (^{He et
al., 2023}). Thus the transmission link acts as a quasi-lossless channel. At the receiver side, the detected symbols are converted back to binary data using QAM demapper. The propagation of the QAM symbols in the fiber is influenced mainly by fiber material dispersion, as a linear effect, and nonlinear fiber optics (^{Liang et al., 2023}). This nonlinearity effect increases in WDM systems where a group of OFC systems uses the same fiber link (^{Deligiannidis et al., 2023}). The nonlinear fiber effects and their interaction with fiber dispersion lead to nonlinear interference (NLI) which plays a key role in determining the bit rate-distance product in WDM-OFC systems. This issue should be addressed carefully in these advanced communication systems.

Figure 2 Simplified block diagram of an optical QAM communication system.

Recently, there is increasing interest in using constellation shaping (CS) techniques to reduce the effect of NLI in WDM-OFC systems (^{Civelli, 2024}; ^{Xing et al.,
2024}). These techniques are classified into three categories, namely geometric constellation shaping (GCS), probabilistic constellation shaping (PCS), and joint GCS/PCS (JGPCS) (^{Liu et al., 2023}). In GCS, the locations of the points in the constellation diagram are rearranged without changing the probability of the symbols (=1/M) (^{Xing et al., 2024}). The PCS technique changes the probability of the symbols without affecting their power (i.e., their locations in the constellation diagram) (^{Amirabadi et al.,
2022}). The JGPCS combines the effect of both GCS and PCS techniques (^{Yao et al., 2023}). In CS-assisted OFC systems, the conventional QAM mapper (demapper) is modified to produce (deals with) a new constellation diagram by encoding (decoding) the binary data according to the used CS technique. Therefore, the modified mapper and demapper are called encoder and decoder, respectively. The use of the deep learning (DL) technique in the design of these two components DL attracts increasing interest in recent years. These two components can be designed synchronously using end-to-end deep learning (E2EDL) techniques and therefore, they are lumped as one virtual component called autoencoder (AE). The performance of the E2EDL-based GCS-WDM system has been investigated by some research groups assuming five multiplexed channels (^{Jones et al., 2018}; ^{Jones et al., 2019}; ^{Oliari et
al., 2021}; ^{Jovanovic et al.,
2022}), and (^{Jovanovic et al., 2023}), 11 multiplexed channels (^{Gümüs et al., 2020}), and more number of channels (^{Abbass & Fyath,
2024}).

The model-driven simulation approach is constructed in a divide-and-conquer manner and consists of a series of model blocks (^{Wang et
al., 2020}). Among these models are laser, pulse shaper, modulator, fiber channel, optical amplifiers, filters, and detectors. All these blocks are characterized by rigorous numerical models (^{Wang et
al., 2020}). Commercial optical communication software is usually non-open and the expensive (^{Jiang et al., 2022}). Furthermore, computation complexity of conventional simulations can be very high due to the nested-function construct and the repeated iterative operation (^{Jiang et al., 2022}). For example, the optical fiber channel can be modeled using the split-step Fourier method (SSFM). This method is based on solving numerically the nonlinear Schrödinger equation (NLSE) which describes pulse

propagation in the fiber and takes into account both linear and nonlinear fiber effects. In this method, the optical fiber is divided into multiple short-length segments (steps), and the fiber linearities and nonlinearities are calculated separately for each step (^{Yang et al., 2022}). Using a shorter-step distance offers high modeling accuracy but requires a large computation time. To reduce the computation time, a data-driven proposed to characterize the transmission in OFC systems (^{Neves
et al., 2023}). In this model, the fiber channel is replaced by an ANN which is trained by data collected from experimental measures or from first-step simulation predictions. This modeling approach applied successfully for single-channel (^{You et al., 2023}) and multi-channel (^{Yang et al., 2022}) OFCs. The investigation in these references focuses on comparison with the model-driven approach and to select the suitable ANN configuration for that purpose. However, no DL-based data-driven optical fiber channel modeling is reported in the literature for constellation shaping-assisted OFCs even with the single-channel operation. This issue is addressed in this work where a flexible ANN-based fiber modeling is proposed which can be applied to GCS, PCS, and JGPCS WDM-OFC systems.

2. Related works

In 2020, ^{Wang et al. (2020)} proposed a data-driven modeling approach utilizing bidirectional (Bi)-long short-term memory (LSTM) NNs to mimic fiber channel. Both on-off keying and pulse amplitude modulation-4 signals were studied for transmission. The Bi-LSTM-based method demonstrated strong performance and produced results comparable to the conventional SSFM-based model. In 2022, ^{Jiang et al.
(2022)} Investigated data-driven approach utilizing a deep neural network (DNN) to predict the nonlinear fiber channel in OFC systems. The DNN method effectively represents the transfer function of the fiber channel. In 2022, ^{Yang et al. (2022)} suggested a hybrid model-data-driven approach for rapid and precise waveform modeling of long-distance multi-channel optical fiber transmission. It utilizes a linear-nonlinear feature decoupling distributed waveform modeling technique. The conventional approach used for modeling waveforms in optical fiber communication systems is the SSFM. In 2023, ^{You et al. (2023)} suggested a method for modeling optical fibers with low complexity LSTM (C-LSTM), and the computational complexity of C-LSTM was determined for comparison with modeling techniques based on conditional generative adversarial networks and SSFM.

The ANN-based end-to-end deep learning technique have been investigated in the literature. Table 1 presents a comparison between some of these works and the one reported in this paper.

Table 1 Comparison with related works.

Ref.	ANN	N_ch	Transmission Distance (km)	Fiber Model	Constellation Shaping (CS)
(Wang et al., 2020)	Bi-LSTM, Bi-RNN, and BP-DNN	1	10 to 80	SSFM	-
(Jiang et al., 2022)	Bi-LSTM, DNN	1	80-240	SSFM	-
(Yang et al., 2022)	Bi-LSTM, DNN	5-41	80-1040	SSFM	-
(You et al., 2023)	C-LSTM, conditional generative adversarial network (CGAN)	1	200-1000	SSFM	-
This Work	Bi-GRU, CNN	32, 64	1-2000	NLIN	CS

Based on the previous survey, it is obvious there is no document reported in the literature that uses DL-based data-driven optical fiber channel modeling for CS-assisted WDM systems to achieve high-capacity data transmission. To achieve this issue, this work presents a flexible ANN-based fiber modeling approach that is applicable to these systems.

3. General algorithm of multi-span optical fiber modeling-based artificial neural networks

Figures 3 (a)-(c) explain the block diagrams of the main steps to design general multi-span optical fiber model-based ANNs, and the steps are stated below:

Step I: Data collection

Simulate the OFC system assuming a single-span transmission link. Choose a specific fiber channel model in the simulation (such as NLIN or SSFM) and record both the input and output data of the fiber ( x(t) and y(t)).

Step II: ANN training

Choose ANN configuration and train it to model a single-span transmission link system. Train this network, ANN-single span (ss), using the data collected in step I.

Step III: Construction of the multi-span link modeling

The ANN model of the multi-span (ANN-ms) link is constructed by Nsp of ANN-ss where Nsp is the number of spans.

Figure 3 Block diagrams of general multi-span optical fiber modeling-based ANN. (a) data collection, (b) ANN training, and (c) construction of the multi-span link modeling. SMF: Single-mode fiber, OA: Optical amplifier.

4. Optical fiber communication channel model using ANNs

The configuration of the communication system and the alternative approach for modeling the OFC using an ANN are illustrated in Figure 4. In this figure, the AE simulation platform comprises two NNs positioned in the encoder and decoder, with a fiber channel model connecting them.

Figure 4 AE simulation platform with optical fiber modeling of ANN-ss.

The modulated carrier x(t) is transmitted across the OFC channel to produce an output y(t).

yt=fNLIN (x(t)) (2)

The NLIN channel model, fNLIN, investigates the impact of nonlinear interference on fiber communication (^{You et al., 2023}). To account for the nonlinear effects that degrade the broadcast signal, this model considers the launch power per channel as well as the constellation's moments. The NLIN model simplifies these nonlinear effects into additive white gaussian noise (AWGN), with the variance controlled by the fiber communication channel parameters. As a result, the channel impairments are controlled by the amplified spontaneous emission (ASE) noise, which is dictated by the amplifier noise figure Fn, the average launch power per channel (referred to as launch power), and the constellation's high order moments (^{You et al.,
2023}).

μ4 = EX4EX22 and μ6 = EX6EX23 (3)

The noise variance can be calculated as follows

σn2= σASE2 Fn + σNLIN2 (4)

where σASE2 Fn is the ASE noise variance, and σNLIN2 is the nonlinear interference variance which is a function PL, μ4, and μ6.

The encoder and decoder are both represented by dense layers, denoted as NNenwen for the encoder and NNde(wde) for the decoder. The variables wen and wde represent adjustable weights, including biases, that can be trained to enhance the performance of the system (^{Jovanovic et
al., 2021}). The transmitted and the received symbols can be represented by tx =NNen (s,wen), and rx = NNde (s^,wde), respectively, the role of the encoder is to convert the input signals into sent symbols, aiming to minimize the effects of channel distortions. The decoder produces output signals often expressed as the posterior probability of the transmitted messages. Using these probabilities, the receiver can successfully recover the original input messages. When considering an AE for this system, the input of the encoder usually comprises a probability vector, which is sometimes referred to as a one-hot vector. This vector denotes the symbols that are being transferred. Each one-hot vector is represented as s ∈ S = {ei|i =1, ..., M}; where M denotes the modulation order and ei is a binary vector with all elements set to zero, save for a single '1' at position i, denoting the symbol's position (Zhang et al., 2022; ^{Srinivasan et al., 2023}). The AE model's encoder optimizes the positioning of the constellation points, while the decoder learns the decision limits of the distorted symbols (^{Jovanovic et al., 2022}).

The parameters for training the AE in this work are provided in Table 2. A total of 250 epochs were used. The Glorot initialization is employed to initialize the weights set (^{Rex et al., 2022}). During each epoch of training, a new set of samples is created. These samples consist of 𝑁 = 128 × 𝑀 one-hot encoded vectors that are evenly distributed. The vectors are then separated into batches of size 𝐵 = 16 × 𝑀. The learning rate is optimized to 0.001. The number of batches is determined by dividing the total sample size by the specified batch size. In order to classify, a softmax layer is employed at the decoder. The purpose of this layer is to convert the decoder's output into a probability vector, ensuring that the sum of its elements is equal to one (^{Cardarilli et al., 2021}).

Table 2 Parameters of the deep learning network.

Deep learning parameters	Encoder	Decoder
Number of input nodes	M	2
Number of output nodes	2	M
Number of hidden layers	4	4
Number of nodes per hidden layer	16	16
Activation function in the hidden layer	Rleu	Rleu
Activation function in the output layer	Rleu	Softmax

The AE technique is founded on the principle of E2EDL, which aims for joint optimization of the components of the transmitter and receiver within a single process. Nevertheless, a significant limitation that obstructs the practical application is the requirement of a differentiable channel model, namely the knowledge of the gradient of the instantaneous channel transfer function. If the channel lacks a differentiable model, the gradients cannot be calculated during back-propagation to alter the network's parameters during training.

DL has the capability to approximate any function, and it can give an effective solution for linear and nonlinear problems. DL offers an innovative framework for reevaluating the optical communication modeling problem. DL models approximate the model functions by mapping independent variables to dependent variables that correlate to the input and output data (^{Wang et al.,
2020}). Therefore, the ANNs are strategically used to define the segment located between the encoder and the decoder and to accelerate the training process of the E2EDL system. Firstly train the AE at a single span (ANN-ss) with a span length of 100 km and store the outputs and inputs of the encoder and decoder, respectively, these values represent the transmitted and received symbols. In this situation, the communication fiber channel consists of a conventional SMF, and to emulate the OFC, use the nonlinear interreference noise (NLIN) fiber model. This model is built upon an improved Gaussian noise model. It is described as an additive Gaussian noise process and assesses its variance and spectrum (^{Dar et al., 2013}; ^{Dar et al.,
2014}). The erbium-doped amplifier (EDFA) is used at the end of the span to compensate for the loss of signal. The amplifier has a noise figure of 5 dB. This AE is designed based on a WDM system to increase data transmission capacity, and Table 3 lists the parameter values of the WDM system used in the AE platform.

Table 3 Parameter values of the WDM system used in the AE platform.

Modulation format	DP 64-QAM
Number of WDM channels (N_ch)	32, 64
Symbol rate (R_s)	40 Gbaud
Central channel frequency (f_c)	193.41GHz
Frequency channel spacing (△f)	50 GHz
Number of link spans (N_sp)	20
Span length (𝐿)	100 km
Fiber nonlinear coefficient (ϒ)	1.3 (W km)^-1
Fiber group-velocity dispersion (D)	16.5 ps/(nm km)
Fiber dispersion slope (S ) ≡ dD/dλ)	0.08 ps/(nm² km)
Fiber attenuation (𝛼)	0.2 dB/km
Optical amplifier gain (𝐺)	20 dB
Optical amplifier noise figure	5 dB

After the AE training procedure is finished on a single span, the encoder and decoder input-output data are saved. Afterwards, the stored data are retrieved and used as inputs and labels to train Bi-GRU-NN (^{Liu et al.,
2023}) and CNN (^{Jiang et al., 2023}) separately. This methodology is utilized to obtain an efficient ANN model that is specifically tailored for a single span, which is referred to as the ANN-ss model. The trained ANN-ss model is stored and retrieved to employ for substituting the conventional OFC (NLIN) model for single span and then used this model for predicting ANN-multi spans ANN-ms) by implementing the cascaded of the ANN-ss model and it is used as a replacement for the optical fiber spans in the communication system as shown in Figure 5. The ANN-ss model is configured in a cascaded manner to effectively simulate long-distance optical transmission systems using different numbers of WDM-channel systems at flexibility.

Figure 5 Optical Fiber modeling for ANN-multi-span link.

The BER is a metric that calculates the probability of an error using the number of erroneous bits per transmitted bit (^{You et al.,
2023}). The BER of the M-order modulation format is determined using (^{You et al., 2023}).

BER =2m (1- 1M erfc 3m(SNR)2(M-1) (5)

where M is the number of discrete symbols involved in the modulation (i.e. modulation order), m is the number of bits per transmitted symbol (m =log2M ), and erfc denotes the complementary error function.

5. Architecture perspectives: Artificial neural networks structures

In this section, describe the architecture of the proposed Bi-GRU-NN and CNN models, which are used for modeling the OFC in AE-based GCS-WDM system for long-haul transmission distance.

5.1. Bi-directional gated recurrent neural networks

This subsection provides a detailed explanation of the structure of the GRU and the proposed Bi-GRU-NN models, which are employed for optical fiber prediction. recurrent neural networks (RNNs) considered the sequence correlation can typically reconstruct the channel crosstalk in most situations (^{You et al., 2023}). In addition, RNNs are frequently used to identify connections in data that are arranged in a sequence and have temporal dependencies. This makes them particularly suitable for channel predicting. Among the several types of RNNs, the LSTM model is especially proficient at mitigating the problems of vanishing gradients and gradient explosion that are common in regular RNNs. GRU, a variant of LSTM, and GRU provides immunity for gradient explosion, and utilizes gated cells to control the flow of input within the network, resulting in a simpler implementation compared to LSTM. Meanwhile, GRU is a simpler variant of LSTM and uses gated cells to regulate the flow of information within the network, making its implementation easier than LSTM (^{Hu et al.,
2023}).

Each GRU cell consists of two gates: an update gate and a reset gate. An update gate regulates the flow of control information into the following instant, while a reset gate controls the loss of information. These two gates together decide the output of the hidden state (^{Yin et al.,
2021}). The structure of the GRU unit is explained in Figure 6, the GRU unit computes the ultimate result by considering the current input txt and the prior state h_t-1, taking into account the combined impact of these gates. A summary of the internal gate outputs of the GRU unit is provided below (^{Liu et al.,
2023}).

rt= σ Wr ht-1, txt+ br

zt= σ ( Wz [ht-1, txt] + bz)

h~t= tanh ( Wh [rt⨀ht-1, txt] + bh)

ht= (1- zt) ⨀ht-1 + zt ⨀h~t (6)

where Wr, Wz and Wh denotes the weight matrices for the reset gate, the update gate, and the new calculation of the memory, respectively. The bias vectors br, bz, and bh relate to each other. The sigmoid function σ is used for both the reset and update gate. In the context of memory computation, the hyperbolic tangent activation function is denoted as tanh, while the Hadamard product is represented as ⨀.

Figure 6 Diagram depicting the structural components of a GRU memory unit, edited from (^{Liu et al.,
2023}).

This work uses two layers of Bi-GRU-NN layers, A Bi-GRU-NN layer consists of 64-GRU units that process the input sequence in the forward direction and another 64-GRU unit that process in the backward manner. The bidirectional GRU helps mitigate the problem of error propagation resulting from unidirectional prediction. Furthermore, the bidirectional GRU approach enhances feature extraction precision by thoroughly examining the correlation between nearby data points. The output layer comprises a fully connected layer that employs a linear activation function to compute the weighted sum of the hidden layer outputs. The structure of the Bi-GRU-NN is illustrated in Figure 7.

Figure 7 Architecture of the Bi-GRU-ANN that is used for optical fiber modeling.

5.2. Conventional neural networks

In the field of optical communication, CNNs are employed for various reasons. They are utilized for different tasks such as classification and serve as effective equalizers, exhibiting excellent bit error rate (BER) performance and possessing robust equalization capabilities (^{Musumeci et al., 2019}). In addition, CNN is used for modeling optical fiber communication which yields exceptional predictive accuracy (^{Jiang et al., 2023}). Therefore, this work uses CNN for modeling the optical fiber. This subsection provides a detailed explanation of the design of the CNN model, which is employed for optical fiber prediction in the AE-based GCS-WDM system.

Figure 8 illustrates the architectural setup of the CNN model used in this work. The model consists of two layers of one dimensional (1D)-CNN that the transmitted symbols tx are applied to it, without using max pooling layer, followed by a flatten layer. The flatten layer serves the purpose of converting the output data into a flattened vector format, guaranteeing compatibility with the succeeding fully connected layer (FCL). The sequential structure of this model allows for efficient extraction of features using convolutional processes. These features are then transformed into a one-dimensional vector representation, which is suitable for the processing requirements of the FCL. The FCL is positioned at the end of CNN network architecture and is considered as an output layer and it receives input from the preceding convolutional layer. The provided input is a vector derived from the feature map after it has been flattened (^{Liu
& Zhao, 2023}). The presence of a substantial number of trainable parameters makes the FCL layers necessary to accommodate intricate nonlinear discriminant functions in the feature space, where the input data pieces are transformed (^{Basha et al., 2020}).

Figure 8 Architecture of the CNN that is used for optical fiber modeling.

5.3. Comparative analysis: Bi-directional gated recurrent neural networks vs. convolutional neural networks

This section presents a performance comparison of the AE-based GCS- WDM system using both Bi-GRU-NN and CNN modeling architectures. Both models undergo training for a total of 150 epochs. The weights set is initialized using the Glorot initialization approach, which guarantees an efficient initialization of network weights to facilitate optimum learning during training (^{Rex et al., 2022}). The batch size equals 1024 for each model. When it comes to regression, the MSE is a reliable assessment metric for assessing the quality of an estimator. It takes into account both the variance and bias of the estimator. Therefore, the MSE is chosen as the assessment indication and a highly effective performance (^{Wang et al., 2020}). The MSE in this work represents the mean of the squared amplitude errors, which is the average of the squared differences between amplitude values of the NN-ss produced and the NLIN-generated waveforms. The normalized MSE is used to statistically assess the similarities between the two simulation approaches. Given the simulation of optical communication systems using various optical launch powers, it is seen that the absolute MSE may grow with higher power levels. Therefore, instead of using the absolute MSE, the normalized MSE is preferred. The normalized MSE is defined as follows (^{Jiang et al.,
2022})

MSE_norm =∑im(Y--Y)2∑imY-2 (7)

where m denotes the sample size, Y- represents the output label (i.e., rx) signal and the output signal generated by the ANN-ss is Y.

A comparison of the normalized MSEs of the ANN-ss versus epoch numbers between Bi-GRU-NN and CNN is shown in Figs. 9 (a) and (b). For DP 64-QAM, baud rate (R _s) = 40 Gbaud, the launch power (P _L) = -2 dBm, and the number of channels N _ch (a) = 32, (b) = 64. The normalized MSEs for both ANN models reach low levels of 10^-3. More precisely, when N _ch is equal to 32 and 64, the Bi-GRU-NN model exhibits normalized MSEs of 3.64x10^-3 and 3.90x10^-3 respectively. By comparison, the normalized MSEs of the CNN for the identical channel topologies are 3.81x10^-3 and 4.51x10^-3. As a result, the ANN-ss of Bi-GRU-NN shows smaller losses in comparison to CNN.

Figure 9 Variation of the normalized MSE of the ANN-ss with epoch numbers for Bi-GRU-NN and CNN, DP 64-QAM, respectively. N_ch (a) = 32, (b) = 64, and P_L = -2 dBm.

Figures 10 (a) and (b) depict a comparison of the AE-based GCS-WDM system performance using the NLIN model, ANN-ms using Bi-GRU-NN and CNN for various values of spans at N _ch (a) = 32, (b) = 64, R _s = 40 Gbaud, and P _L = -2 dBm in terms of BER as a function of the number of link spans, respectively. It is evident from these figures that the BER of the three systems is comparable at a small number of spans and gives BER values below the BER threshold (BER _th) for the different number of spans. The performance of the developed AE-based GCS-WDM system is enhanced in terms of BER when Bi-GRU-NN is used to model the multi-span fiber link (i.e., the system response has a lower BER than when CNN is used to model the multi-span fiber link for the 32 and 64 channels). While an AE-based GCS-WDM system was devised, it was determined that modeling the multi-span fiber link with CNN yields a more comparable BER to the optical fiber link. Tables 4 (a) and (b) provide a comparative analysis of the AE's performance when trained using the same system parameters given above, which lists BER for various values of the results demonstrate that the AE-based GCS-WDM system using the Bi-GRU-NN model gives better performance across various N _sp values. The simulation results reveal that the data-driven (ANN) model reduces computation time by approximately 7% compared with the numerical NLIN model, and this time reduction is almost independent of the used ANN configuration and number of spans.

Figure 10 Variation of the BER with a number of spans for AE-based GCS-WDM system performance for DP 64-QAM using NN-multi-spans of Bi-GRU-NN and CNN for various values of spans. N _ch (a) = 32, (b) = 64, and P _L = -2 dBm.

Table 4 Comparison of AE-based GCS-WDM system performance using NLIN model, NN-multi-spans for Bi-GRU-NN and CNN for various values of spans in terms of BER, assuming DP 64-QAM and P _L = -2 dBm. (a) N _ch = 32 (b) N _ch = 64.

Channel Model	Bit Error Rate (BER)
	Number of Spans N_sp
	1	5	10	15	20
Fiber (NLIN)	7.23 x 10^-7	3.32 x 10^-4	1.24 x10^-3	2.19 x 10^-3	3.31 x10^-3
Bi-GRU-NN	6.01 x 10^-7	2.15 x 10^-4	9.80 x 10^-4	1.72 x 10^-3	2.36 x 10^-3
CNN	6.53 x 10^-7	2.53 x 10^-4	1.09 x10^-3	1.94 x 10^-3	2.96 x 10^-3
(a)
Channel Model	Bit Error Rate (BER)
	Number of Spans N_sp
	1	5	10	15	20
Fiber (NLIN)	8.03 x 10^-7	3.34 x 10^-4	1.26 x10^-3	2.22 x 10^-3	3.35 x 10^-3
Bi-GRU-NN	7.24 x 10^-7	2.56 x 10^-4	9.85 x 10^-4	1.65 x 10^-3	2.52 x 10^-3
CNN	7.72 x 10^-7	2.94 x 10^-4	1.14 x 10^-2	1.95 x 10^-3	3.11 x 10^-3
(b)

Figures 11 (a) and (b) display the learned constellation diagrams for N _sp = 10 and 20 for N _ch (a) = 32, (b) = 64, R _s = 40 Gbaud, and P _L = -2 dBm. Using NLIN model, NN-ms for Bi-GRU-NN and CNN. These figures show that the distribution of constellation points in the constellation diagrams at 10 spans is nearly identical for all three models at N _ch = 32, and = 64, respectively, and these points are arranged in regular rings and uniform distribution, which indicates a low BER. While at 20 spans, leading to a greater BER, it is observed that the locations of the inner symbols slightly change so long as the BER remains below the BER _th but the outer rings remain uniform. However, the better arrangement of the learned constellation of AE is by using the Bi-GRU-NN model that gives lower BER. Furthermore, it is observed that the constellation has been learned to tolerate NLI noise.

Figure 11 Learned constellation diagram at different numbers of spans for AE-based GCS- WDM system, assuming N _ch (a) = 32, (b) = 64, N _sp = 10, and = 20, R _s = 40 Gbaud, and P _L = -2 dBm for DP 64-QAM.

6. Conclusion

A versatile ANN-based low-computation model has been developed for optical fiber channel in WDM systems. The model has been applied successfully to a case study incorporating E2EDL-based GCS WDM systems designed with multi-span transmission link. The simulation results reveal that the data-driven (ANN) model reduces computation time by approximately 7% compared with the numerical NLIN model, and this time reduction is almost independent of the used ANN configuration. Further, The performance of the developed AE-based GCS-WDM system is enhanced in terms of BER when Bi-GRU-NN is used to model the multi-span fiber link for the 32 and 64 channels. While an AE-based GCS-WDM system was devised, it was determined that modeling the multi-span fiber link with CNN yields a more comparable BER to the optical fiber link. The better arrangement of the learned constellation of AE is by using the Bi-GRU-NN model that gives lower BER. Furthermore, it is observed that the constellation has been learned to tolerate NLI noise.

Acknowledgements

The authors express their gratitude to the College of Engineering, Al-Nahrain University for granting them access to the necessary resources to successfully carry out this work. Mrs. Ayam expresses her gratitude to the College of Engineering at Mustansirityah University for providing her Ph.D. scholarship.

References

Abbass, A. M., & Fyath, R. S. (2024). Performance investigation of geometric constellation shaping-based coherent WDM optical fiber communication system supported by deep-learning autoencoder.Results in Optics,15, 100629. https://doi.org/10.1016/j.rio.2024.100629 [ Links ]

Amirabadi, M. A., Kahaei, M. H., & Nezamalhosseini, S. A. (2022). End-to-end deep learning for joint geometric-probabilistic constellation shaping in FMF system.Physical Communication,55, 101903. https://doi.org/10.1016/j.phycom.2022.101903 [ Links ]

Basha, S. S., Dubey, S. R., Pulabaigari, V., & Mukherjee, S. (2020). Impact of fully connected layers on performance of convolutional neural networks for image classification.Neurocomputing,378, 112-119. http://doi.org/10.1016/j.neucom.2019.10.008 [ Links ]

Binh, L. N. (2015). Advanced Digital Optical Communications (2nd ed.). CRC Press. https://doi.org/10.1201/b18128 [ Links ]

Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Nannarelli, A., Re, M., & Spanò, S. (2021). A pseudo-softmax function for hardware-based high speed image classification.Scientific reports,11(1), 15307. https://doi.org/10.1038/s41598-021-94691-7 [ Links ]

Civelli, S., Forestieri, E., & Secondini, M. (2024). Sequence-selection-based constellation shaping for nonlinear channels.Journal of Lightwave Technology,42(3), 1031-1043. [ Links ]

Dar, R., Feder, M., Mecozzi, A., & Shtaif, M. (2013). Properties of nonlinear noise in long, dispersion-uncompensated fiber links.Optics Express,21(22), 25685-25699. https://doi.org/10.1364/OE.21.025685 [ Links ]

Dar, R., Feder, M., Mecozzi, A., & Shtaif, M. (2014). Accumulation of nonlinear interference noise in fiber-optic systems.Optics express,22(12), 14199-14211. https://doi.org/10.1364/OE.22.014199 [ Links ]

Deligiannidis, S., Bottrill, K. R. H., Sozos, K., Mesaritakis, C., Petropoulos, P., & Bogris, A. (2023). Multichannel Nonlinear Equalization in Coherent WDM Systems based on Bi-directional Recurrent Neural Networks. Journal of Lightwave Technology, 1-9. https://doi.org/10.1109/jlt.2023.3318559 [ Links ]

Escobar-Landero, S., Zhao, X., Gac, D. Le, Lorences-Riesgo, A., Viret-Denaix, T., Guo, Q., Gan, L., Li, S., Cao, S., Xiao, X., Demirtzioglou, I., Dahdah, N. El, Gallet, A., Yu, S., Hafermann, H., Godard, L., Brenot, R., Frignac, Y., & Charlet, G. (2023). Demonstration and Characterization of High-Throughput 200.5 Tbit/s S+C+L Transmission over 2x100 PSCF Spans. Journal of Lightwave Technology, 41(12), 3668-3673. https://doi.org/10.1109/JLT.2023.3266926 [ Links ]

Gümüş, K., Alvarado, A., Chen, B., Häger, C., & Agrell, E. (2020). End-to-end learning of geometrical shaping maximizing generalized mutual information. In2020 Optical Fiber Communications Conference and Exhibition (OFC)(pp. 1-3). IEEE. https://ieeexplore.ieee.org/document/9083181 [ Links ]

Haroun, I. A. (2023). Digital Communication Systems. In Essentials of RF Front‐end Design and Testing (pp. 39-60). Wiley. https://doi.org/10.1002/9781394210640.ch3 [ Links ]

He, Z., Vijayan, K., Mirani, A., Karlsson, M.,& Schroder, J. (2023). Inter-Channel Interference Cancellation for Long-Haul Superchannel System. Journal of Lightwave Technology. https://doi.org/10.1109/JLT.2023.3304007 [ Links ]

Hu, X., Huo, Y., Dong, X., Wu, F. Y., & Huang, A. (2023). Channel prediction using adaptive bidirectional GRU for underwater MIMO communications.IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2023.3296116 [ Links ]

Jiang, R., Fu, Z., Bao, Y., Wang, H., Ding, X., & Wang, Z. (2022). Data-driven method for nonlinear optical fiber channel modeling based on deep neural network.IEEE Photonics Journal,14(4), 1-8. http://doi.org/10.1109/JPHOT.2022.3184354 [ Links ]

Jiang, R., Wang, Z., Jia, T., Fu, Z., Shang, C., & Wu, C. (2023). Flexible optical fiber channel modeling based on a neural network module.Optics Letters,48(16), 4332-4335. https://doi.org/10.1364/OL.491573 [ Links ]

Jones, R. T., Eriksson, T. A., Yankov, M. P., & Zibar, D. (2018). Deep Learning of Geometric Constellation Shaping including Fiber Nonlinearities. http://arxiv.org/abs/1805.03785 [ Links ]

Jones, R. T., Yankov, M. P., & Zibar, D. (2019). End-to-end Learning for GMI Optimized Geometric Constellation Shape. http://arxiv.org/abs/1907.08535 [ Links ]

Jovanovic, O., Yankov, M. P., Da Ros, F., & Zibar, D. (2021). Gradient-free training of autoencoders for non-differentiable communication channels.Journal of Lightwave Technology,39(20), 6381-6391. http://doi.org/10.1109/JLT.2021.3103339 [ Links ]

Jovanovic, O., Yankov, M. P., Da Ros, F., & Zibar, D. (2022). End-to-End Learning of a Constellation Shape Robust to Channel Condition Uncertainties. Journal of Lightwave Technology, 40(10), 3316-3324. https://doi.org/10.1109/JLT.2022.3169993 [ Links ]

Jovanovic, O., Da Ros, F., Zibar, D., & Yankov, M. P. (2023). Geometric constellation shaping for fiber-optic channels via end-to-end learning.Journal of Lightwave Technology,41(12), 3726-3736. https://doi.org/10.1109/JLT.2023.3276300 [ Links ]

Liang, Z., Chen, B., Lei, Y., Liga, G., & Alvarado, A. (2023). Analytical Model of Nonlinear Fiber Propagation for General Dual-Polarization Four-Dimensional Modulation Formats. Journal of Lightwave Technology, 1-15. https://doi.org/10.1109/jlt.2023.3316836 [ Links ]

Liu, Z., Liu, X., Xiao, S., Yang, W., & Hu, W. (2023). Bi-GRU Enhanced Cost-effective Memory-aware End-to-End Learning for Geometric Constellation Shaping in Optical Coherent Communications.IEEE Photonics Journal. https://doi.org/10.1109/JPHOT.2023.3344184 [ Links ]

Liu, J., & Zhao, Y. (2023). Improved generalization performance of convolutional neural networks with LossDA.Applied Intelligence,53(11), 13852-13866. http://doi.org/10.1007/s10489-022-04208-6 [ Links ]

Musumeci, F., Rottondi, C., Nag, A., Macaluso, I., Zibar, D., Ruffini, M., & Tornatore, M. (2019). An overview on application of machine learning techniques in optical networks.IEEE Communications Surveys & Tutorials,21(2), 1383-1408. http://doi.org/10.1109/COMST.2018.2880039 [ Links ]

Neves, M. S., Lorences-Riesgo, A., Martins, C. S., Mumtaz, S., Charlet, G., Monteiro, P. P., & Guiomar, F. P. (2023). Carrier-Phase Recovery for Coherent Optical Systems: Algorithms, Challenges and Solutions. Journal of Lightwave Technology. https://doi.org/10.1109/JLT.2023.3340010 [ Links ]

Oliari, V., Karanov, B., Goossens, S., Liga, G., Vassilieva, O., Kim, I., Palacharla, P., Okonkwo, C., & Alvarado, A. (2021). High-Cardinality Hybrid Shaping for 4D Modulation Formats in Optical Communications Optimized via End-to-End Learning. http://arxiv.org/abs/2112.10471 [ Links ]

Rex, C. E. S., Annrose, J., & Jose, J. J. (2022). Comparative analysis of deep convolution neural network models on small scale datasets.Optik,271, 170238. http://doi.org/10.1016/j.ijleo.2022.170238 [ Links ]

Srinivasan, M., Song, J., Grabowski, A., Szczerba, K., Iversen, H. K., Schmidt, M. N., ... & Wymeersch, H. (2023). End-to-end learning for VCSEL-based optical interconnects: State-of-the-art, challenges, and opportunities.Journal of Lightwave Technology,41(11), 3261-3277. http://doi.org/10.1109/JLT.2023.3251660 [ Links ]

Wang, D., Song, Y., Li, J., Qin, J., Yang, T., Zhang, M., ... & Boucouvalas, A. C. (2020). Data-driven optical fiber channel modeling: A deep learning approach.Journal of Lightwave Technology,38(17), 4730-4743. http://doi.org/10.1109/JLT.2020.2993271 [ Links ]

Xing, S., Li, Z., Huang, C., Li, G., Sun, A., Yan, A., Shen, W., Shi, J., Li, Z., Shen, C., Chi, N., & Zhang, J. (2024). End-to-End Deep Learning for a Flexible Coherent PON with User-Specific Constellation Optimization. Journal of Optical Communications and Networking, 16(1), 59. https://doi.org/10.1364/jocn.500500 [ Links ]

Yang, S. M. M. (2021). Modern Digital Radio Communication Signals and Systems: Second edition. In Modern Digital Radio Communication Signals and Systems: Second Edition. Springer International Publishing. https://doi.org/10.1007/978-3-030-57706-3 [ Links ]

Yang, H., Niu, Z., Zhao, H., Xiao, S., Hu, W., & Yi, L. (2022). Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme.Journal of Lightwave Technology,40(14), 4571-4580. http://doi.org/10.1109/JLT.2022.3168698 [ Links ]

Yao, S., Mahadevan, A., Lefevre, Y., Kaneda, N., Houtsma, V., & Van Veen, D. (2023). Artificial Neural Network Assisted Probabilistic and Geometric Shaping for Flexible Rate High-Speed PONs. Journal of Lightwave Technology, 41(16), 5217-5225. https://doi.org/10.1109/JLT.2023.3259929 [ Links ]

Yin, X., Liu, C., & Fang, X. (2021). Sentiment analysis based on BiGRU information enhancement. InJournal of Physics: Conference Series(Vol. 1748, No. 3, p. 032054). IOP Publishing. http://doi.org/10.1088/1742-6596/1748/3/032054 [ Links ]

You, X., Chang, H., Zhang, Q., Gao, R., Li, Y., Tian, F., ... & Xin, X. (2023). Low-complexity characterized-long-short-term-memory-aided channel modeling for optical fiber communications.Applied Optics,62(32), 8543-8551. http://doi.org/10.1364/ao.502537 [ Links ]

Zhang, Q., Wang, Z., Duan, S., Cao, B., Wu, Y., Chen, J., ... & Wang, M. (2021). An improved end-to-end autoencoder based on reinforcement learning by using decision tree for optical transceivers.Micromachines,13(1), 31. http://doi.org/10.3390/mi13010031 [ Links ]

Funding

The authors received no specific funding for this work.

Peer Review under the responsibility of Universidad Nacional Autónoma de México.

Received: March 15, 2024; Accepted: July 31, 2024; Published: December 31, 2024

*Corresponding author. E-mail address: ayammohsen@uomustansiriyah.edu.iq (A. M. Abbass).

Conflict of interest

The authors have no conflict of interest to declare.

This is an open-access article distributed under the terms of the Creative Commons Attribution License