Solar-Blind Focal Plane Array Photodetectors for Massive Parallel Processing Application Based on Optoelectronic Integrated Circuit and Field-Programmable Gate Array

In this paper, a concept for a solar-blind ultraviolet (SB-UV) photodetector for detecting and analyzing a hydrogen flame as a high-speed object with massive parallel processing is proposed. A two-dimensional (2D) UV photodetector together with the readout and edge detection circuits is fabricated as an optoelectronic integrated circuit (OEIC). The edge detection circuit extracts the edge of the object using the outer vertebrate retina network and produces binary images. The binary images are sent to the field-programmable gate array (FPGA). To reduce the dimensions of the binary image from two dimensions into one dimension, the projection histogram circuit is implemented in the FPGA. The edge locations of the projections are processed to generate information such as object presence, object speed, object direction and object spreading status. At this current stage, a personal computer (PC) is used to generate the binary images of a moving object to simulate the output of the OEIC. The implemented system in the FPGA achieved a real-time performance at 71 frames/s for 250 × 250 pixels with the PC.


Introduction
A solar-blind ultraviolet (SB-UV) photodetector detects UV radiation in the UV-C region (200-280 nm). The term solar blind means that the UV radiation from the sun does not cause a false detection on the SB-UV photodetector when the SB-UV photodetector is subjected to the intended UV source located at sea level. This is possible because the sun's UV radiation between 200 and 300 nm is absorbed by the ozone in the atmosphere. (1) To detect radiation in the UV-C region, the photodetector material must have a wide bandgap. Al x Ga 1−x N is an example of a promising material from the group III nitrides that can be used as a SB-UV photodetector. The Al composition of Al x Ga 1−x N can be tailored to change the bandgap. AlN has a bandgap of 6.2 eV at 210 nm, and GaN has a bandgap of 3.42 eV at 360 nm. An Al composition of around 50% in Al x Ga 1−x N provides a cutoff wavelength of around 270 nm. (2) In this study, the SB-UV photodetector is designed to be a smart UV sensor for a hydrogen flame monitoring system. This UV sensor not only detects the presence of the hydrogen flame but also analyzes the flame. It generates information such as flame location, flame speed, flame direction and flame spreading status. To realize these goals, a focal plane array of SB-UV photodiodes to produce a two-dimensional UV-image must be fabricated along with the readout circuit as an optoelectronic integrated circuit (OEIC). The readout circuit also contains the edge detector circuit. The edge detector circuit mimics the outer vertebrate retina as described in ref. 3. The output of the edge detector circuit is a binary image that is transferred to the field-programmable gate array (FPGA) device. Within the FPGA, the projection histogram circuit is implemented to reduce the dimensions of the binary image from two dimensions into one dimension. After the edge locations of the projected images are determined, the system calculates the necessary information. Currently, the OEIC is still under development. A personal computer (PC) is used to generate the binary image. The design concept of the smart UV sensor and the performance of the implemented system in the FPGA are described in the following sections.

Design Concept
Hydrogen is an essential material that is used in many industries. However, leakage may occur during its manufacture, storage, and distribution. When hydrogen is released and mixed with oxygen in air, it can self-ignite. (4,5) It is important to have a monitoring system to locate a hydrogen flame. Because the system has to locate the source of the flame, a focal plane array (FPA) is needed. An FPA consists of two-dimensional SB-UV photodetectors that act as an imaging sensor in which its images are sent out to the processing unit. In a high-speed imaging, the data from the photodetectors to the readout circuit then to the processing unit should be transferred as fast as possible. Commercially available complementary metal oxide semiconductor (CMOS) image sensor chips utilize a high-speed serial communication system to transfer the images from the readout circuit to the processing unit, for example, the PYTHON 1300 series produced by ON Semiconductor that uses a low-voltage differential signal (LVDS) for transmission to have a frame rate of 860 frames/s with a data rate of 720 Mbps × 4 lanes at a video graphics array (VGA) resolution. (6) However, in this case, parallel data transfer has some advantages. The first is that the circuit becomes simple because each pixel is connected to the readout circuit without an additional circuit controller. The second is that the current FPGA devices have abundant registers in which each output of the readout circuit could be connected to them. The third is that it increases the system execution time because the images from the readout circuit are delivered to the processing unit at one time. Parallel data transfer is more effective when carried out by a stacking technique to minimize the length of the metal connections. A hybrid image sensor that comprises a planar membrane array of frontside-illuminated AlGaN/GaN photodetectors bonded to a CMOS readout circuit using metal via connection has been demonstrated in ref. 7. Figure 1 shows the concept in this study. The photodetectors are stacked onto the readout circuits as an OEIC. Then, the outputs of the OEIC are connected to the FPGA's registers. An FPGA is also used because of its programmability. A suitable circuit can be implemented in the FPGA without refabricating the unit.

The previous work on the OEIC
The OEIC opens the possibility of massive parallel processing. (3,(7)(8)(9)(10)(11) A combination of photodetectors or light-emitting diodes (LEDs) with CMOS circuits has been demonstrated in our laboratory. A Au/n-GaN Schottky barrier diode (SBD) was used as an ultraviolet sensor, and an nMOS Si-charge transfer signal processor was successfully fabricated. (8) A GaN-based UV sensor array integrated into the silicon sensing circuit has been implemented, considering that the charge transfer signal processor was difficult to build. (9) A back-side illuminated photodiode that has been fabricated along with an optical current amplifier and a pulse width modulation (PWM) generator based on CMOS processes has been designed. It was designed to demonstrate further integration with other circuits such as light-emitting devices. (10) LEDs and metal oxide semiconductor field-effect transistor (MOSFET) circuits have been fabricated monolithically using the Si/III-V-N alloy/Si structure. The LEDs were fabricated in the III-V-N alloy layer, whereas the MOSFET circuits were built on the silicon capping layer. (11) References 10 and 11 show the possibility of the integration of different chips.
An edge detection circuit based on the outer vertebrate retina that has been demonstrated in ref. 3 is used in this study and combined together with a readout circuit to produce binary images. Figure 2 shows the edge detection architecture based on the outer vertebrate retina model. The model was composed of photodetectors P, horizontal cells H, and bipolar cells B. The photodetectors P convert the UV radiation from a hydrogen flame into photocurrent as shown in Fig. 2

Stacking approaches
The photodetector of the circuit in Fig. 2 may be fabricated separately from the readout circuit, after which they are connected using a bonding or stacking technique. The purpose is to make the area of the pixel wider, which increases the sensitivity of the photodetector itself. The back-side-illuminated Schottky photodiode AlGaN based on a sapphire substrate became the candidate in this study. There are four reasons for these considerations. The first is that, by adjusting the Al concentration in the AlGaN, the desired bandgap can be achieved. The second is that a Schottky structure offers fast response and little persistent photoconductivity (PPC). The third is that a sapphire substrate in conjunction with an AlN template is the best choice to avoid crack and absorption losses compared with a GaN template on a silicon substrate. The fourth is that the back-side-illuminated structure may be stacked with the readout circuit using a microbump. Then, the question of how to stack the OEIC and the FPGA arises. Three existing methods can be adopted for this purpose. The first method is to stack both of them chip-to-chip using a multilayer printed circuit board. There is, however, a pin limitation for the available FPGA chips in the market. The Virtex-T series from Xilinx has 1200 I/O pins, which are only enough for 34 × 34 pixels to be connected directly. The second method is 2.5D technology, called stacked silicon interconnect (SSI) technology introduced by Xilinx. (12) The SSI technology enables the integration of heterogeneous dies, for example, between logics and RFs or high-speed I/Os. Microbumps and silicon interposers are used to make the connections. The third method is 3D technology. The OEIC and logics (FPGA) are stacked vertically and connected with through-silicon vias (TSVs). Figure 3 shows a demonstration of a chip-based 3D heterogeneous integration system as a prototype of a high-speed and highly parallel image processing system. (13) Cu TSVs and Cu/Sn microbumps are used to vertically connect the CMOS image sensor (CIS), correlated double sampling circuit (CDS), and analogto-digital converter (ADC) chips. To induce bonding, each chip is assisted by a support wafer using an adhesive layer. It was confirmed that a frame rate of 4000 frames/s for a 320 × 240 pixel resolution was achieved using those chips.
For this study, the outputs of the OEIC might be connected separately to the FPGA using wires. The technical approach for stacking the OEIC is shown in Fig. 4. This approach is also used in ref. 7, in which the UV sensor array and silicon signal processor were integrated via microbumps.    Figure 5 shows the system algorithm with 10 × 10 pixels as an example of the illustration for each process. The implementation utilized 250 × 250 pixels. Figure 5(a) shows the ultraviolet radiation from a hydrogen flame captured by the photodetectors. The gray scale indicates the intensity of the photocurrent of each pixel. The bright pixels are called objects that represent the hydrogen flame. The next step is extracting the edge of the object as shown in Fig. 5(b). The process described in Fig. 2 defines that the black pixels indicate the edge of the object, while the white pixels indicate the background. This produces a binary image. The binary image is transferred to the registers of the FPGA. Here, the logic is inverted to reduce the power. The projection histogram circuit projects the object's edge horizontally and vertically as shown in Fig. 5(c). Originally, a projection was performed by using adders to sum up each row and each column. An image of 250 rows by 250 columns would need 500 adders to do a projection at one time. This would consume many available resources. Considering that the images are binary, OR gates are chosen instead of adders for a projection. Thus, the circuit complexity is reduced. As a result, the execution time is shorter. Figure 5(d) shows the edge locator circuit that identifies the position of each of the projected images. After that, the weighting circuit associates each edge location with a number that indicates the real position from the leftmost side, as shown in Fig. 5 Finally, the calculation circuit generates information about the object location, object speed, object direction, and object spreading status. The object locations (O X and O Y ) are defined as a midpoint of the histogram projected, as shown as

(e). Those four edge locations
The object speed (O P ) is defined as a resultant of the object velocities (V X and V Y ), as shown in eq. (3). The object velocity is defined as the change in displacement between two consecutive images, as shown in eqs. (4) and (5). The index n means the current image, the index n−1 means the previous image, and T means the time interval between two consecutive images.
The object direction (O D ) is defined as an inverse tangent of the object velocity, as shown as The object size (O Z ) is defined as a product of the lengths of the histograms projected, as shown as Therefore, the maximum object size is equal to the object square area. By comparing the object sizes between two consecutive images, the spreading status determines whether the object becomes bigger, smaller, unchanged, or lost from the image.

Result and Discussion
A PC used to generate the binary images of a moving object to imitate the output of the OEIC. The object in the binary images can be moved manually by using the PC's mouse or moved automatically by a movement algorithm. Those binary images are sent to the FPGA by using a serial communication system. Each binary image of 250 × 250 pixels is compressed before it is sent to the FPGA to reduce the transmission time. Each row is encoded into two bytes of data: the start address and length of the object. Thus, the total data for one binary image that is sent to the FPGA is 502 bytes, after adding two additional bytes (a start byte and a stop byte) for synchronization. The FPGA receives those data and decodes back row by row into a complete image. The programming codes in the PC are written using Processing (https://processing.org/). The PC itself is equipped with an Intel ® Core™ i3-4130 CPU and 4 GB RAMs.
So far, we have not measured the real speed of the moving object because the object speed was expressed in units of pixel/s. The range of detectable speeds can be calculated theoretically as shown in Fig. 6. In Fig. 6, one image (one frame) is ready to be processed every T n second. The duration between two consecutive images is called time interval (Δt). The time interval is inversely related to the frame rate. The moving object (the bright pixel in Fig. 6) moves to the right side one pixel every Δt second. This gives the speed calculation of 1 pixel/Δt. In this experiment, the measured time interval is 14.08 ms. Therefore, the minimum detectable speed is 1 pixel/14.08 ms (~71 pixels/s). This means that, when the object moves below 71 pixels/s, the system might not be able to detect the object speed. With an image size of 250 × 250 pixels, the maximum displacement of a pixel is 249 horizontally or vertically. Thus, the maximum detectable speed is 249 pixels/14.08 ms (~17685 pixels/s). This also indicates that when the object moves faster than 17,685 pixels/s, the system might not be able to detect the object speed. The relationship between image size and the time interval to obtain the range of detectable speeds is calculated as where D min/max is the minimum or maximum displacement inside the binary image. To convert units into real-world units such m/s, sensor calibration is needed. Figures 7 and 8 show the experimental result of the speed test. Figure 7 shows the result when an object moved from left to right (the origin of the image is top left coordinates) at a speed of 71 pixels/s, the direction was 0°, the diameter was 10 pixels, and the y-position of the object was 128. The object was designed to appear in the image, to move to the right side, and then to disappear from the image. The x-position of the object (O X ) increased linearly from 0 to 249, meaning that the object moved from the left side to the right side. The object size (O Z ) changed from 0 to 100 pixel 2 , then became constant at 100 pixel 2 , and finally decreased to 0. This shows that, first, the object was not in the image, then it gradually appeared and moved with a constant size, and finally it disappeared from the image. The object speed (O P ) was also detected to be constant at 71 pixels/s, but a vibration was observed at the beginning and end. This vibration shows that, at the beginning, the object size gradually increased. Sometimes, the calculated x-position of the object was constant. Thus, the calculated speed was zero. This also happened when the object gradually disappeared from the image.   Figure 8 shows the result when an object whose diameter was 10 pixels moved from left to right at a speed of 3551.14 pixels/s, at a direction of 0°, and with a y-position of 128. The x-position of the object (O X ) confirmed that the object was moved from the left image to the right image. The object size (O Z ) also confirmed that the object was moved across the image. The object speed (O P ) was measured at 3551 pixels/s when the object was close to the center of the image. Figure 9 shows the experimental results of the direction test. An object with a diameter of 10 pixels moved in a circular loop at an angular speed of 12.4 rad/s. The center position of the object (defined by O X and O Y ) rotated and appeared as a sinusoidal pattern in the graph. The measured direction (O D ) appeared as a sawtooth pattern from around 6 to 353°. This means that the system can measure the direction of the object, but the object speed (O P ) was not measured correctly because the system used a linear speed calculation.
These experimental results of the system implemented in the FPGA confirmed that the FPGA enabled the analysis of an object in motion for the given binary images. The projection histogram algorithm played an important role because it could find the object edges rapidly using simple circuits. The implemented system in the FPGA consumed 31% of available slices (1194 of 3758 slices). The real-time performance using a PC is 71 frames/s or 14.08 ms because the software Processing on the PC sets the limit on the frame rate, even when a baud rate of 921600 bit/s is used. The Processing executes the code within the void loop() at a certain frame rate depending on the specifications of the PC. The implemented circuit in the XC6SLX25 FPGA (within a XuLA2-LX25 development board) took about 7.05 ms to process one binary image at a clock rate of 100 MHz. The process for receiving a complete binary image from the PC took 7 ms, while the remaining processes (histogram projection, edge locator, weighting, and calculation) consumed about 0.05 ms. This means if parallel data transfer to transfer a complete binary image at a single time can be realized, we can achieve an execution time of about 0.05 ms. This gives a processing time of up to 20000 frames/s. Nevertheless, one should consider having an appropriate relationship between the frame rate and the object speed. Higher frame rates would be useless for slow moving objects, and vice versa. Once the OEIC has been fabricated and connected to the FPGA, calibration should be conducted to convert real-world units such m/s into pixel/s. The hydrogen flame might have a speed of 18.6 m/s. (14) After calibration, one could determine the proper frame rate.
Realizing the parallel data transfer between the OEIC and the FPGA means that the time for transferring the images from the OEIC into the FPGA is very short or at least it only needs a single clock period. When the image resolution is increased, the histogram projection, edge locator, and weighting circuits would be scaled. Moreover, the calculation circuit is unchanged. This means that the system execution time would not be scaled for transferring the images from the OEIC to the FPGA and also for the calculation circuit.
The histogram projection circuit utilizes OR gates in series for each row and each column. Each OR gate has two inputs, connected to the current pixel and the output of the previous OR gate. Therefore, the execution time of the histogram projection [t HP (m,n)] is equal to the longest propagation delay (t PD ) of the OR gates, as defined in where h = max(m,n), m is the number of rows, and n is the number of columns. A minus 1 is added to eq. (9) because the last OR gate is connected to the last two pixels. The edge locator and weighting circuits were modified for higher-resolution images to reduce the circuit complexity. They were combined as a searching circuit. The searching circuit finds the edges by seeking the logic 1 within the projected registers from the left to the right, step by step. If the logic 1 is found, it seeks again from the right to the left, step by step, and the number of steps is equal to the real position of the edge. Therefore, the execution time of the searching circuit [t SC (m,n)] of an image with a resolution of m × n pixels is define by t SC (m,n) = [(m + 2 − d V ) + (n + 2 − d H )] × t CLK , where d V is the length of the vertical projection, d H is the length of the horizontal projection, and t CLK is the clock period of the FPGA. Hence, the total execution time of the circuit [t CIR (m,n)] in the FPGA is defined by t CIR (m,n) = t HP (m,n) + t SC (m,n) + t FIX , (11) where t FIX is the execution time of the calculation circuit and the logics for controlling those circuits. t FIX is not scaled with the higher-resolution images because the circuit remains unchanged.
Ideally, only a single object is allowed to appear in the image. When several objects appear simultaneously in the image, the system assumes that those objects are fragments of a large object. This problem is handled by the searching circuit because it finds the boundaries of the outer objects. Moreover, further investigation to involve multiobject analysis is necessary.

Conclusion
A potential of OEIC technology for massive parallel processing application has been discussed. At this current stage, a high-speed motion detector using an image processing circuit has been implemented in an FPGA. A PC was used to emulate the OEIC and sent the binary images of 250 × 250 pixels to the FPGA at a frame rate of 71 frames/s. The detectable speeds ranged from 71 to 17685 pixels/s. The execution time of the system was 7.05 ms, in which 7 ms was spent for handling the data transfer itself between the PC and the FPGA. This means that if the parallel data transfer between OEIC and FPGA is realized in the near future, the time required to handle the data transfer between the OEIC and the FPGA could be neglected and the real-time performance may increase up to 20000 frames/s.