# Efficient DSP Receiver Architecture for Underwater **Acoustic Communications**

Lecturer Dr. A. E. Abdelkareem\*

# Abstract

In this paper, design of coherent receiver for underwater acoustic communication applicable for digital signal processor (DSP) is presented. This includes the use of channel coding in the form of bit-interleaved coded modulation with iterative decoding (BICM-ID). An architecture of single carrier receiver is suggested instead of multicarrier modulation to be implemented for real-time applications. To investigate the architecture, simulation results obtained by MATLAB are presented and reveal that BICM-ID is useful to mitigate errors through AWGN channel. Additionally, we adopt C language to investigate the receiver architecture that can be applied for realtime applications.

Keywords: BICM-ID, DSP, UNDERWATER ACOUSTIC

\* Tikrit University

# 1.Introduction

Underwater acoustic communications are more difficult than accustomed communications because electromagnetic waves do not propagate over long distances underwater except at high power. This is a direct constraint which makes acoustic waveforms the best solution for transmitting data undersea. Acoustic channels have been addressed by many researchers [1-3] as multipath and time-varying channels due to signal reflections and wave motion. These obstacles, in conjunction with ambient noise such as the noise of fish and humans, are the main reasons for degradation of data link performance. Due to these characteristics, the communication system designer must increase channel efficiency in order to transmit information reliably and tackle the challenge of low speed due to limited bandwidth.

One way to achieve a high data rate is to simply increase the transmission speed. However, doing so inevitably increases the sampled channel length L due to the band limited nature of communication channels, thereby increasing the amount of inter symbol interference (ISI). While ISI is not necessarily detrimental, as it can be treated as a source of time diversity for combating signal fades, an equalizer is nonetheless needed in order to either exploit or mitigate it. Historically, the most common types of equalizers are the maximum likelihood sequence estimators (MLSEs), which are implemented by the Viterbi algorithm, and the tapped-delay-line (TDL) equalizers, which include linear equalizers (LEs) and decision feedback equalizers (DFEs) [3, 4].

This project deals primarily with single carrier modulation. Also, it considers an algorithm suitable for real time implementation. Single-carrier modulation with equalization techniques is employed in existing coherent underwater communications as an approach to combat ISI [5]. As the data rate increases, the symbol duration decreases, and thus a channel with the same delay spread contains more channel taps when converted to the baseband discrete-time model. Consequently, complex channel equalization is required for a single carrier system to improve bit rate. Thus, reducing the receiver complexity to implement efficient real-time system has been attracted many recent researchers. The complexity of receiver comes from the types of estimation algorithms such as LMS and RLS. The researchers in [6-8] present a pre-processor that estimates Doppler shift in single carrier by measuring the time between two known signals, and remove the Doppler shift using a computationally efficient linear interpolator and have implemented their system in real-time using three ADSP2106L (SHARC) processors. In [9] a decision feedback equalizer has been used in conjunction with a phase lock loop to track small changes in phase. In [10], a novel method has been proposed to compensate both time delay spread and Doppler spread using new system architecture for high data rate. The method uses channel characterization in time and frequency domain followed by successive interference cancellation using Doppler and time information of different multipath.

مجلة المنصور / عدد/20/ خاص AL-Mansour Journal / No.20/ Special Issue 2013

An approach for multipath rejection at the receiver end is being investigated at the University of Newcastle [8]. The researchers are used adaptive beam forming with LMS type to steer the reflected wave. It was found that the beam former encounter difficulties as the rang increases relative to the depth because they used 64 point correlation sequence. The real time system was implemented using multiple digital signal processors (DSP<sub>S</sub>) connected together via VME bus. The system was tested in shallow water at 9.975kbps, and resulted BER of  $2.2*10^{-2}$ to $<10^{-3}$ .

To achieve reliable communication over acoustic channel, the channel coding in the form of block or convolutional coding of the source bit stream should be involved to [11].In 1982, Ungerboeck introduced a trellis-coded modulation (TCM) system as a bandwidth-efficient signaling over an additive white Gaussian noise (AWGN) channel[12]. This study set out with the aim of assessing the importance of mapping by "set portioning". The most interesting finding was that coding reduces noise about 3-4 dB compared with uncoded with the same transmitted information. However these results were not very encouraging in undersea channel, thus for fading channels, the diversity order of the coded modulation system should be high; therefore the performance of TCM is degraded in such channels [13], but it can be improved by adding symbol interleaver. However, the limitation of the diversity order in symbol interleaved coded modulation and the cost of increasing the complexity of the code results in finding different approach called bit-interleaved coded modulation It was suggested by Zahavi[14] to improve the performance of coded (BICM). modulation over fading channels. It was shown that the diversity order can be increased to a minimum number of distinct bits rather than symbols by using bitwise interleaving. It was shown in [15-21] that with iterative decoding, BICM can be used to provide excellent performance over any channel provided well designed signal mapping.

# 2.Proposed System

## 2.1Transmitter

The simulation of the proposed system shown in (Figure 1-A)has been investigated. Random message of length 1024-bit has been generated by MATLAB to form the input of the encoder. It composed FEC with non-systematic convolutional (NSC)code has a code rate of ½, constraint length K=5, generator polynomial is [23 35] in octal. It has an M-stage shift register and memory contains 4-shift registers (M=K-1) [33], so the coded bits length is 2056 length because the trellis contain 16-states and therefore the first k-1 levels correspond to the encoder's departure from initial state, and the other correspond to its return to the final state. S-random interleaver (S=6, L=2056Bits) is used to permute the encoder output consequently randomize error. IFFT has been utilized to convert to time domain. The digital modulation technique is quadrature phase shift keying (QPSK) mapped to non-Gray mapping.

## 2.2 Receiver

In BICM-ID, the receiver requires only one set of encoder/decoder; thus, the receiver complexity is reduced. The received symbols after being passed through AWGN channel and passed FFT, is transformed into ML symbol log-likelihoods (Figure 1-B) then converted to binary LLR by soft demapper. The decoder type is

linear approximation to log-map; the Soft-In-Soft-Out (SISO) algorithm is used for convolutional code to generate LLR [22] for both data and code bits to be utilized in iterative decoding.

$$L_{a}(C_{k}(i)) = \log \frac{P(C_{k}(i) = 0)}{P(C_{k}(i) = 1)}$$
Equation 1
$$L_{e}(C_{k}(i)) = \log \frac{P(c_{k}(i) = 0 \setminus x^{\hat{}}, L_{a}(C_{k}))}{P(c_{k}(i) = 1 \setminus x^{\hat{}}, L_{a}(C_{k}))} - L_{a}(C_{k}(i))$$
Equation 2

Where  $C_k(i)$  denotes the binary random variable with realizations  $c_k(i) \in \{0,1\}$ .

Using Baye's rule and taking expectation of  $p(x_k^{/} x_k)$  over  $P(x_k \setminus C_k(i) = b), x \in \{0,1\}$  in position  $i \in \{1, 2, ..., m\}$ .

$$L_{e}(C_{k}(i)) = \log \frac{\sum_{xk \in \Omega_{0}^{i}} p(x_{k} \land x_{k}) P(x_{k} \land C_{k}(i) = 0)}{\sum_{xk \in \Omega_{1}^{i}} p(x_{k} \land x_{k}) P(x_{k} \land C_{k}(i) = 1)}$$
 Equation 3

The first term  $p(x_k^{\prime}/x_k)$  is computed according to the channel model out of Gaussian distribution:

$$p(x_k^{\prime} / x_k) = \frac{1}{\sqrt{2\pi\sigma_n}} e^{\frac{|x_k^{\prime} - x_k|}{2\sigma_n^2}}$$
 Equation 4

The second term  $P(x_k \setminus C_k(i) = b)$  is computed from the *a priori* information of the individual bits:

$$P(x_k \setminus C_k(i) = b) = \prod_{j=1, j \neq i} \frac{1}{1 + e^{-L_a(C_k(j))}} e^{-L_a(C_k(j)) \cdot c_k(j)}$$

**Equation 5** 

Where  $m = \log_2 M$ .

The extrinsic estimates  $L_e(C_k(i))$  are deinterleaved and applied to the *a priori probability*(APP) channel decoder. Performing iterative decoding, extrinsic information about the coded bits from the decoder is fed back and regarded as a priori information  $L_a(C_k(i))$  at the demapper. During the initial demapping step, the a priori LLR<sub>s</sub> are set to zero.



| AL-Mansour Journal / No.20/ Special Issue | 2013 | مجلة المنصور / عدد/20/ خاص |
|-------------------------------------------|------|----------------------------|
| A-Transmitter                             |      |                            |



**B-Receiver** 

Figure 1The block diagram of a COFDM BICM-ID system with soft-decision feedback

# **3.DSP System Implementation**

### 3.1 Platform Selection

The first stage in the real-time implementation for the system is to select appropriate DSP. Selecting the most appropriate DSP processor and tackling a real-time signal is an important issue. Programmable DSP is more flexible, of a lower cost, and a higher speed than other processors, so it is a goal solution for many communication, medical, and industrial products because traditional microprocessors are inappropriate for such applications. The main aspects of selecting a DSP processor are as follows: data format, memory bandwidth, CPU architecture, and million integer operation per second (MIPS) or million floating point operations (MFLOPS) [23]. In terms of data format, fixed point DSP's are generally cheaper, but produce higher quantization noise. This will be added to the signal and lower the signal to noise ratio

of the system. Also, extra code has to be written to overcome the overflow or underflow, and the programmer should be aware of what scaling needs to take place.

In comparison, floating point devices have better precision, higher dynamic range, and a shorter development cycle [24]. As we have iterative decoding in the suggested receiver and the algorithm spends most of the execution time, especially the SISO algorithm because of the Add Compare Select (ACS), it is important to take advantage of some of the available architecture, such as Super Harvard Architecture (SHARC), because it includes an instruction cache in the CPU. This feature is important to avoid any conflict between data and instruction transfer during the fetch cycle, and to ensure the program memory does not have to be accessed for the instructions to be restored. Consequently, all of the memory for CPU information transfers can be accomplished in a single cycle, which results in high memory access bandwidth. Also, on-chip memory is a key factor to be considered when deciding which DSP device to use, because it should be sufficient enough to hold the digitized samples. The third aspect of selecting a DSP is the CPU architecture. For instance, traditional architecture uses single memory for both data and instruction, whereas some DSP's are Very Long Instruction Word (VLIW) core architectures, thus they execute multiple instructions in parallel results in fast operations. However, these types of architectures [25] dissipate more power than conventional DSP architectures.

In addition, it includes a high speed I/O controller to support Direct Memory Access (DMA). For instance, in the proposed system, the SHARC ADSP-21364 has been selected to use DMA chaining facility, which lets the DMA controller auto- initialize itself between multiple DMA transfers. A section of internal memory, called the transfer control block (TCB), is where the DMA attributes are stored for each DMA operation. A chain pointer is also associated with each DMA operation. Basically, the chain pointer (an address to a TCB) links one DMA operation to the next. To properly set up and initiate a chained DMA, the TCBs should first set up with the appropriate attribute information. To enable the chained DMA, simultaneously the DMA enable and chain enable bits in the corresponding DMA control register should be set. Finally, to start the DMA controller, the address of the first TCB (chain pointer) to the chain pointer register should be written. The DMA controllers will auto-initialize itself with the first TCB, then start the first transfer. When this transfer is over, if the current chain pointer register is non-zero it will be used as a pointer to a new TCB, and the process will begin again.

Furthermore, SHARC uses shadow registers [26] for all the CPU'sregisters. They are used to accomplish the interrupt quickly by moving the entire register contents to these registers in a single clock cycle. The challenge of any floating point architecture for the purpose of real time application is how many operations can be carried out simultaneously. A benchmark has been used to express the speed of a microprocessor as a number. For example, [27] has pointed out that floating point devices can be specified by MFLOPS and MIPS to specify fixed point devices. This gauge is useful only in terms of a single, known, processor architecture; so MIPS and MFLOPS is misleading [28] because the amount of work done by an instruction can

AL-Mansour Journal / No.20/ Special Issue 2013 مجلة المنصور / عدد/20/ خاص ary depending on the instruction format of that processor. In the current application, effort has been focused on how many operations in the receiver, where it contains

*Total Operations=*Band pass Filter + Synchronization + Equalisation + Decoding

=6092

This number (6092) obtained when adding the total operation at each stage.

the most complex part as it includes iterative (ACS) in the SISO decoder

As the sampling frequency is varied in the receiver stages due to down sampling and SISO decoder manipulate 2 samples/ symbol, so the required clock rate is:

Required Processor Clock Cycle= (179+388+200)\*48000+ (5325)\*48000/12

=36816000+4443.75=36.75 MHz

According to [29], High-level like C-language, spends about 2-3 times low-level language execution time, therefore the minimum required processor clock is 110.25MHz.

### 3.2 DSP board overview

As mentioned above, I used the ADSP-21364-EZLITE Kit SHARC family from Analog Devices. It is a 32-bit/40-bit floating point processor optimized for high performance automotive audio applications with large on chip SRAM (3M bit) and ROM (4M bit). It achieves an instruction cycle time of 3.0 ns at 333 MHz.

## 3.3 A/D Interface

One of the key components of the acoustic modem is the audio signal input/output module. The ADSP-21364 development board I used has a built-in module for sampling audio signal. The task is handled by the integrated Analog Device AD183x CODEC family [30]. Data transfer word lengths of 16, 20, 24, and 32 bits, with sampling rates from 8 kHz to 96 kHz, are supported. The operation mode of AD183x can be programmed with a set of control registers. For sampling rate of 48 kHz, a processor working on a frame of 1024 samples has a frame acquisition interval of 21.33 ms (i.e., 1024 x 20.833  $\mu$ s = 21.33 ms). Here the DSP has 21.33 ms to complete all the required processing tasks for that frame of data. Three buffers are used of size 1024 to exchange these samples between CODEC, DMA and serial ports as shown in (TABLE1), so data sampling and processing can be done simultaneously and no incoming signals are missed even if the DSP is processing previously received data. However, with two data pins in DAC/ADC, it needs to be configured to work well.

# 3.4 Code Optimization

The single-carrier modem prototype works well in the baseband as that appear in the result of C-language simulation result above. The challenge is how to run the same code on the DSP board, especially, in the receiver side. Therefore an effort has been spent to concentrate on the bottleneck in the receiver to optimize the Viterbi decoder algorithms as a first step. I first tackle the memory limitation in the DSP. For instance, in the piece of code below, the instruction *callocspends* most of the execution time to

find free memory then writing data. As we have max\_states=16, nn=2, LL=604, KK=5, the available DSP memory insufficient to allocate these massive arrays, so in code (A), the variable *g\_encoder* is defined as a constant array of two elements of

type integer, whereas in code (B), out0, out1, state0, state1 is defined as an integer arrays of size 16.

When memory or data storage is at premium [29], *bit field* in what is called union structure allow us to pack data in a structure, so the union structure has been utilized in the proposed algorithm. However, this structure is working with integer data type only and it has adjusted to work with hard-in-hard-out Viterbi. The next step is to modify this code to work with soft-in-hard-out data.



Figure 2 Comparison of uncoded and coded performance in AWGN channel

### **4.Conclusions**

The performance of the BICM-ID in a single carrier underwater acoustic communication has been investigated. The receiver architecture accommodates for AWGN channel and suitable for real-time implementation. It has been shown that using the bit interleaved level with iterative decoding mitigates the bit error rate with two iterations comparing with an uncoded system. It is an interesting idea to investigate this receiver with a real data. Additionally, a comparison between RF and an underwater coded receiver can set good area of research.

#### References

- R. F. W. Coates, Underwater Acoustoc Systems, 1st ed. Hampshair, England: [1] Macmillan New Electronics, 1990.
- J. A. Catipovic, "Performance limitations in underwater acoustic telemetry," Oceanic [2] Engineering, IEEE Journal of, vol. 15, pp. 205-216, 1990.
- M. S. John G.Proakis, Digital Communications, Fifth edition ed.: McGraw-Hill, 2008. [3]
- S. U. H. Qureshi, "Adaptive equalization," Proceedings of the IEEE, vol. 73, pp. 1349-[4] 1387, 1985.
- [5] G. S. Howe, P. S. D. Tarbit, O. R. Hinton, B. S. Sharif, and A. E. Adams, "Sub-sea acoustic remote communications utilising an adaptive receiving beamformer for multipath suppression," in OCEANS '94. 'Oceans Engineering for Today's Technology and Tomorrow's Preservation.' Proceedings, 1994, pp. I/313-I/316 vol.1.
- [6] D. B. Kilfoyle and A. B. Baggeroer, "The state of the art in underwater acoustic telemetry," Oceanic Engineering, IEEE Journal of, vol. 25, pp. 4-27, 2000.
- B. S. Sharif, J. Neasham, O. R. Hinton, A. E. Adams, and J. Davies, "Adaptive [7] Doppler compensation for coherent acoustic communication," Radar, Sonar and Navigation, IEE Proceedings -, vol. 147, pp. 239-246, 2000.
- B. S. Sharif, J. Neasham, O. R. Hinton, and A. E. Adams, "A computationally efficient [8] Doppler compensation system for underwater acoustic communications," Oceanic Engineering, IEEE Journal of, vol. 25, pp. 52-61, 2000.
- [9] B. S. Sharif, J. Neasham, O. R. Hinton, and A. E. Adams, "Doppler compensation for underwater acoustic communications," in OCEANS '99 MTS/IEEE. Riding the Crest into the 21st Century, 1999, pp. 216-221 vol.1.
- [10] T. C. Yang and A. Al-Kurd, "Performance limitations of joint adaptive channel equalizer and phase locking loop in random oceans: initial test with data," in OCEANS 2000 MTS/IEEE Conference and Exhibition, 2000, pp. 803-808 vol.2.
- J. S. Dhanoa and R. F. Ormondroyd, "Combined differential Doppler and time delay [11] compensation for an underwater acoustic communication system," in Oceans '02 MTS/IEEE, 2002, pp. 581-587 vol.1.
- J. G. Proakis, "Coded modulation for digital communications over Rayleigh fading [12] channels," Oceanic Engineering, IEEE Journal of, vol. 16, pp. 66-73, 1991.
- [13] G. Ungerboeck, "Channel coding with multilevel/phase signals," Information Theory, IEEE Transactions on, vol. 28, pp. 55-67, 1982.
- D. Divsalar and M. K. Simon, "The design of trellis coded MPSK for fading channels: [14] performance criteria," Communications, IEEE Transactions on, vol. 36, pp. 1004-1012, 1988.

- [15] E. Zehavi, "8-PSK trellis codes for a Rayleigh channel," *Communications, IEEE Transactions on,* vol. 40, pp. 873-884, 1992.
- [16] E. M. Sozer, J. G. Proakis, and F. Blackmon, "Iterative equalization and decoding techniques for shallow water acoustic channels," in OCEANS, 2001. MTS/IEEE Conference and Exhibition, 2001, pp. 2201-2208 vol.4.
- [17] S. Y. Le Goff, "Signal constellations for bit-interleaved coded modulation," *Information Theory, IEEE Transactions on,* vol. 49, pp. 307-313, 2003.
- [18] S. Y. Le Goff, K. Boon Kien, C. C. Tsimenidis, and B. S. Sharif, "Constellation Shaping for Bandwidth-Efficient Turbo-Coded Modulation With Iterative Receiver," *Wireless Communications, IEEE Transactions on,* vol. 6, pp. 2223-2233, 2007.
- [19] F. Schreckenbach, N. Gortz, J. Hagenauer, and G. Bauch, "Optimized symbol mappings for bit-interleaved coded modulation with iterative decoding," in *Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE*, 2003, pp. 3316-3320 vol.6.
- [20] S. ten Brink, J. Speidel, and R. H. Han, "Iterative demapping for QPSK modulation," *Electronics Letters,* vol. 34, pp. 1459-1460, 1998.
- [21] F. Simoens, H. Wymeersch, H. Bruneel, and M. Moeneclaey, "Multidimensional mapping for bit-interleaved coded modulation with BPSK/QPSK signaling," *Communications Letters, IEEE,* vol. 9, pp. 453-455, 2005.
- [22] L. Xiaodong and J. A. Ritcey, "Bit-interleaved coded modulation with iterative decoding," in *Communications, 1999. ICC '99. 1999 IEEE International Conference on*, 1999, pp. 858-863 vol.2.
- [23] M. M. Simon Haykin, *Modern Wireless Communications*: Pearson, 2005.
- [24] J. Eyre and J. Bier, "The evolution of DSP processors," *Signal Processing Magazine, IEEE*, vol. 17, pp. 43-51, 2000.
- [25] S. w.Smith, *The Scientest and Engineer's Guide to Digital Signal Processing*.: Analog Device, 1998.
- [26] W. H. Park, M. H. Sunwoo, and S. K. Oh, "Efficient DSP Architecture for Viterbi Decoding with Small Trace Back Latency," *IEICE Transactions,* pp. 2813-2818, 2006.
- [27] P. Lapsley and G. Blalock, "How to estimate DSP processor performance," *Spectrum, IEEE*, vol. 33, pp. 74-78, 1996.
- [28] A. Devices, "ADSP-21364 EZ-KIT Lite Evaluation System Manual," Revision 3.2 ed: Analog Devices, Julay, 2007.
- [29] I. P. AL Kelly, *A Book on C*, Fourth Edition ed.: addison-wesley, 1998.

### معمارية مستلم لمعالج الاشارة الرقمية ذو كفاءة عالية للاتصالات الصوتية تحت الماء

م.د. عمار عبد الملك عبد الكريم\*

#### المستخلص

في هذا البحث، تصميم مستقبل للاتصالات الصوتية تحت الماء قابل للتطبيق على معالج الاشارة الرقمية قد تم تبيانه. هذا يشمل استخدام تجفير الوسط الناقل على شكل التنغيم المجفر والمبعثر على مستوى البت مع فتح التجفير المتكرر. معمارية لمستقبل ذات حامل منفرد تم اقتراحه بدلا من التنغيم ذات حوامل متعددة لغرض بناءه في تطبيقات الزمن الحقيقي. لغرض فحص المعمارية، نتائج تم الحصول عليها بواسطة التقليد وباستخدام حقيبة البرامج ماتلاب تم عرضها واكدت بانه التنغيم المجفر والمبعثر على مستوى البت مع فتح التجفير المتكرر مفيد لتقليل الاخطاء خلال الوسط الناقل من نوع الصوضاء البيضاء التجميعية الكاوسية . بالاضافة الى ذلك، تم تبني لغة سي لفحص معمارية المستقبل الذي يمكن تطبيقه على تطبيقات الزمن الحقيقي.

<sup>\*</sup> جامعة تكريت / قسم علوم الحاسوب والرياضيات