Teaching Tool for Fun Learning of AI-based Banknote Detection Technology

This paper presents a teaching tool for schoolchildren to learn artificial intelligence (AI) technology through which a variety of banknotes can be recognized. This was done by first using a pretrained YOLOv3 object detection model. Secondly, transfer learning was conducted on the pretrained model using 11 collected banknotes, including US$, Euro, Japanese Yen, and NT$. The banknote detection model was experimentally validated to give an average precision (AP) of up to 99.09% if the threshold Intersection over Union (IoU) is not less than 0.8. Once a banknote was successfully recognized, the face value and the country name thereon were displayed, and schoolchildren can access suggested websites, i.e., Wikipedia, Google Maps, and the Bank of Taiwan, to learn more about the exchange rate between currencies and the history and location of the country that issued the banknote. Consequently, schoolchildren can have fun using this tool and acquire a more a global outlook. Moreover, they may be motivated to become AI professionals in the future.


Introduction
Given their rapid advances, artificial intelligence (AI) technologies have been widely applied and already affect our daily lives. In AI-related technologies, deep learning is a hot issue, and considerable progress has been made in addressing image recognition issues in computer vision. (1)(2)(3)(4) A clear advantage of a deep learning model is that significantly improved recognition accuracy and robustness can be achieved. Furthermore, an input image can be directly applied to the model without conventional preprocessing.
In an object detection task, it is necessary to first locate and then recognize specific objects in an image or a video. Object detection is an event in the ImageNet Large Scale Visual Recognition Competition (ILSVRC), hosted by ImageNet. (20) Today, commonly used object detection models include Regions with CNN features (R-CNN), (9) Faster R-CNN, (10) Single Shot MultiBox Detector (SSD), (11,12) and You Only Look Once (YOLO). (13)(14)(15) The COCO dataset (21) can be trained to recognize up to 80 types of objects, including humans, vehicles, cats, and dogs, and can be widely applied to fields such as smart homes, smart security, smart traffic, and intelligent image analysis and retrieval.
However, the well-established object detection techniques have a limitation, that is, only the objects in a pretrained model can be detected, such as the 80 types of objects in the COCO dataset. In other words, transfer learning must be conducted so as to detect objects not contained in the COCO dataset. In this manner, object detection techniques can be applied to a wide variety of disciplines.
In light of this, we present in this paper a teaching tool for banknote detection, through which schoolchildren, especially those from seven to ten years old, can learn AI technology in a fun way. Once a banknote is successfully recognized, the face value and the country name are displayed instantly, and suggested websites, i.e., Wikipedia, Google Maps, and the Bank of Taiwan, are listed. Schoolchildren can access the listed websites to learn more about the country that issued the banknote, and consequently acquire a more global outlook. Hopefully, this tool will appeal to schoolchildren and encourage them to engage in the AI industry in the future.
The presented AI-based teaching tool was developed using a pretrained YOLOv3 object detection model. (15) Transfer learning was conducted on the model using a variety of collected banknotes. The teaching tool can be used by schoolchildren to access the internet for teaching purposes. The YOLOv3 model is acknowledged as an efficient object detection model with a satisfactory mean average precision (mAP), a measure of object detection performance. (21,22) This feature gives the YOLOv3 model a clear advantage over its counterparts. This paper is outlined as follows. Section 2 refers to the YOLOv3 model used for object detection, Sect. 3 details the operation of the presented AI-based teaching tool, Sect. 4 gives a discussion of experimental results, and Sect. 5 concludes the paper.

YOLOv3 Object Detection
YOLO, short for you only look once, is a real-time convolution-based object detection algorithm. In reality, real-time detection can be carried out well using YOLO but at the cost of an acceptable degradation of precision. As its name indicates, YOLOv3 is the third version of YOLO (15) and has a high speed. Moreover, as experimentally validated in Ref. 15, YOLOv3-320 has a slightly higher mAP and runs three times faster than SSD321. Major improvements in YOLOv3, as compared with earlier versions, are detailed as follows.
Firstly, YOLOv3 employs Darknet-53 as the backbone, (15) which is an upgraded version of Darknet-19 used in YOLOv2. In addition to more layers in the backbone, ResNet and Feature Pyramid Networks (FPN) were introduced into Darknet-53 for the following reasons. Firstly, the vanishing gradient problem due to more layers in the backbone can be resolved using ResNet, and small objects can be well detected using the FPN structure, which was a major problem in the earlier versions. Similarly to the earlier versions, YOLOv3 lacks a fully connected (FC) layer, and consequently, there is no limitation on input image dimensions, except that they must be multiples of 32.
In YOLOv3, multiscale detection is carried out using FPN. More precisely, multiscale refers to 3-scale here. For example, three feature maps of sizes 13 × 13, 26 × 26, and 52 × 52 are employed to detect an input image of size 416 × 416. A small feature map is used to detect a large object, and vice versa. Moreover, three anchor boxes are employed for object detection in each layer, that is, a total of nine anchor boxes are used to detect nine bounding boxes. As a consequence, 13 × 13 × 3 + 26 × 26 × 3 + 52 × 52 × 3 = 10647 bounding boxes in total are required in this case, which is more than 12 times as many as that in a YOLOv2 counterpart.
Object detection generates two quantities: object localization and classification. The former was referred to as the bounding box prediction in Ref. 15, which predicted the coordinates of the bounding boxes and the confidence scores of an object, and the latter was referred to as the class prediction therein. For training purposes, a loss function is defined as the sum of the loss of bounding box offsets, the loss of object confidence, and the loss of class prediction, formulated as where where (b x , b y ) represents the coordinates of the centroid of the bounding box, (c x , c y ) represents the offset between the top-left corner of an image and that of the top-left grid cell, (b w , b h ) and (P w , P h ) represent the widths and heights of the bounding and ground truth boxes, respectively.

Proposed System
Our aim was to develop a teaching tool for schoolchildren to learn AI-related technologies. Schoolchildren are expected to have fun using the tool, become interested in AI technologies, and may even become more interested in being AI professionals in the future. Once a banknote is successfully recognized, the face value and the country name thereon are listed immediately. Schoolchildren can access suggested websites, i.e., Wikipedia, Google Maps, and the Bank of Taiwan, for more information, e.g., the exchange rate between currencies and the history and location of the country that issued the banknote. Hopefully, this will help schoolchildren to acquire a more global outlook. Illustrated in Fig. 1 is the flow of the AI-based banknote detection tool. As can be seen therein, an image is captured using a webcam as the first step. Subsequently, the captured image is input into an AI-based banknote detection model. The image, identified as a banknote, is framed and the information thereon, i.e., the value and country name, is then displayed. A number of suggested websites, as mentioned previously, are also listed for users. Otherwise, the detection tool waits for the next input image. In this way, schoolchildren can familiarize themselves with the use of this AI-based teaching tool.
This work was developed using a pretrained YOLOv3 model, whereon transfer learning was carried out. There were two tasks before conducting transfer learning. The first was to collect training data, that is, a variety of banknotes having different denominations and issued by different countries. The second was to label the collected training data, including bounding boxes and classifications. Table 1 lists the development environment in which the presented banknote detection system was developed. As can be seen therein, the codes were written in Python, and libraries including Keras, TensorFlow, OpenCV, and numpy were used. The hardware consists of a PC, a web camera, and a GeForce GTX 1060Ti graphics card. Table 2 lists the collected training data, that is, a total of 11 banknotes, each including the obverse and reverse sides. Since the image on the obverse side is very different from that on the reverse side, there are 22 banknote images in this work, numbered 1-22, as shown below the "image number" field in Table 2. Finally, transfer training was conducted using the collected training data in the pretrained YOLOv3 model, and the model was validated using the testing data. Table  3 gives all the numbered banknote images.

Experimental Results
An object detection model not only needs to recognize an object, but also has to determine a bounding box thereof. Precision is used as a performance measure of a detection model and was tested for our teaching tool. As listed in the rightmost column of Table 2, each collected banknote image was assigned 20 items of testing data, that is, there were 440 pieces of testing data in total. For unbiased testing, no items of training data were reused as items of testing data.
As its name indicates, the Intersection over Union (IoU) refers to the intersection area between two objects divided by the union area, expressed as   Note that "+" and "-" identify the obverse and reverse sides of a banknote, respectively.
where A and B represent the predicted and ground truth bounding boxes, respectively. A high value of IoU indicates that there is a good match between A and B. Therefore, the precision was evaluated using IoU as a threshold. For example, IoU was set to 0.5 in the PASCAL VOC challenge. (22) The precision for the jth classification is defined as , where N c = 22 represents the number of classifications. TP(c j ) is the number of true positives (TPs) in the recognition of the object of the jth type. TP means that the predicted object type matches the ground truth type and IoU is greater than a default threshold. Otherwise, the predicted outcome is classified as a false positive (FP). Table 4 lists the values of precision with IoU as a parameter. Note that 100% precision is achieved for all the images in the case of IoU ≥ 0.7. This observation also applies to the case of IoU ≥ 0.8, except that 95% precision is obtained for images 6, 7, 17, and 21. As compared with the previous two cases, the precision plunges across all the images in the case of IoU ≥ 0.9. The precision in each case is averaged and listed in Table 5. As can be seen therein, there is poor average precision (AP) in the case of IoU ≥ 0.9, that is, AP = 61.36%.   1  100  100  70  12  100  100  45  2  100  100  70  13  100  100  95  3  100  100  45  14  100  100  70  4  100  100  55  15  100  100  70  5  100  100  65  16  100  100  65  6  100  95  75  17  100  95  70  7  100  95  65  18  100  100  50  8  100  100  50  19  100  100  60  9  100  100  60  20  100  100  65  10  100  100  60  21  100  95  40  11  100  100  50  22 100 100 55 Figure 2 shows predicted and ground truth bounding boxes in red and green, respectively, for comparison purposes in each test case. The banknote images in Figs. 2(a) and 2(b) were detected with the highest and second highest values of IoU, respectively; in Figs. 2(c) and 2(d), they were detected with an IoU of approximately 0.9; in Figs. 2(e) and 2(f), they were detected with an IoU slightly higher than 0.8; and in Figs. 2(g) and 2(h), they were detected with an IoU below 0.8. The recognized currency, face value, and confidence score are also presented above the upper-left corner of the predicted bounding box in Figs. 2(a)-2(h).
There is a satisfactory match between the predicted and ground truth bounding boxes if IoU ≥ 0.8, that is, an AP of up to 99.09%, as listed in Table 5. It must be stressed that the presented banknote detection model was developed as a teaching tool for schoolchildren and not as a counterfeit money detector. Therefore, an error in banknote recognition does not result in any loss. It is even possible that schoolchildren will be motivated to correct the error as the first step to becoming a young AI engineer.

Conclusions
This paper presented an AI-based teaching tool for schoolchildren. A variety of banknotes can be well recognized using the teaching tool, through which schoolchildren can obtain handson experience in AI technologies. A pretrained YOLOv3 model for object detection played a key role in this tool. Transfer learning was conducted on the pretrained model using collected banknote images. The banknote detection model was experimentally validated to perform well if IoU ≥ 0.8, that is, an AP of up to 99.09%. Finally, the model was implemented as a teaching tool.
Once a banknote was successfully recognized, relevant websites, i.e., Wikipedia, Google Maps, and the Bank of Taiwan, were displayed instantly, and schoolchildren can access the websites to acquire a more global outlook through the recognized banknote, e.g., exchange rates between currencies and the history and location of the country that issued the banknote. Hopefully, this teaching tool will appeal to children and motivate them to become AI engineers in the future.
Furthermore, a more efficient model, such as YOLOv4, will be employed in the near future so as to upgrade the performance of banknote recognition. In addition, another interesting teaching tool for schoolchildren is also planned.