使用 TensorFlow.js 在浏览器上进行自定义对象检测

2024.03.23 磐创AI

    什么是物体检测？
    与许多计算机视觉认知应用相比，对象检测是在图像和视频中识别和定位对象的常用技术之一。顾名思义——“计算机视觉”，是计算机获得类似人类视觉以查看和识别物体的能力。目标检测可以被视为具有一些高级功能的图像识别。该算法不仅可以识别／识别图像／视频中的对象，还可以对它们进行定位。换句话说，算法在图像或视频帧中的对象周围创建了一个边界框。

    物体检测示例
    各种物体检测算法
    以下是一些用于对象检测的流行：
    R－CNN： 基于区域的卷积神经网络
    Fast R－CNN： ：基于区域的快速卷积神经网络
    Faster R－CNN： 更快的基于区域的卷积网络YOLO：只看一次
    SSD： 单镜头探测器每种算法都有自己的优缺点。这些算法如何工作的细节超出了本文的范围。

    卷积神经网络的架构
    曾经晚上放学回家，打开电视看最喜欢的动画片的美好时光，可能大家都经历过。相信我们都喜欢看动画片。那么，如何重温那些日子呢？
    今天，我们将学习如何使用 TensorFlow．js 创建端到端的自定义对象检测 Web 应用程序。我们将在自定义数据集上训练模型，并将其作为成熟的 Web 应用程序部署在浏览器上。
    如果你对构建自己的对象检测模型感到兴奋，还等什么？让我们深入了解。
    本文将创建一个在浏览器上实时检测卡通的模型。随意选择你自己的数据集，因为整个过程保持不变。
    创建数据集
    第一步是收集要检测的对象的图像。比如最喜欢的动画片是机器猫，史酷比，米奇老鼠，憨豆先生和麦昆。这些卡通形象构成了这个模型的类。为这五个类中的每一个收集了大约 60 张图像。这是数据集外观。

    记住：如果你给模型喂垃圾，你就会得到垃圾。为了获得最佳结果，请确保为模型收集足够的图像以从中学习特征。
    收集到足够的数据后，让我们继续下一步。
    标记数据集
    要标记数据集中的对象，我们需要一个注释／标记工具。有很多注释工具可以做到这一点，例如 LabelImg、Intel OpenVINO CVAT、VGG Image Annotator 等。
    虽然这些都是业内最好的注释工具，但发现 LabelImg 更容易使用。随意选择你喜欢的任何注释工具，或者直接按照本文进行操作。
    下面是一个带注释的图像的示例：围绕感兴趣区域（对象）及其标签名称的边界框。

    图片标注
    对于每个注释的图像，将生成一个相应的 XML 文件，其中包含元数据，例如边界框的坐标、类名、图像名称、图像路径等。
    训练模型时需要这些信息。我们稍后会看到那部分。
    下面是 XML 注释文件的外观示例。

    注释 XML 文件
    好的，一旦你正确注释了所有图像，按照目录结构的以下方式将数据集拆分为训练集和测试集：

    数据集的目录结构
    在 Google Drive 上上传数据集登
    录你的 Google 帐户并将压缩的数据集上传到你的 Google Drive。我们将在模型训练期间获取此数据集。确保数据集的上传没有因网络问题而中断，并且已完全上传。

    Google Drive 上的数据集
    在本地机器上克隆以下存储库
    https：／／github．com／NSTiwari／TensorFlow．js－Custom－Object－Detection
    此存储库包含一个名为：Custom＿Object＿Detection＿using＿TensorFlow＿js．pynb的 Colab Notebook。
    打开 Google Colab 并将此 Colab Notebook上传到那里。现在，我们将开始实际训练我们的对象检测模型。
    我们正在使用 Google Colab，因此你无需在本地机器上安装 TensorFlow 和其他库，因此我们避免了手动安装库的不必要麻烦，如果安装不当可能会出错。
    配置 Google Colab
    在 Google Colab 上上传笔记本后，检查运行时类型是否设置为“GPU”。为此，请单击 Runtime –＞ Change runtime type．

    Google Colab 设置
    在笔记本设置中，如果硬件加速器设置为＇GPU＇，如下图，你就可以开始了。

    Google Colab 设置
    如果以上五个步骤都成功完成，那么就开始真正的游戏 —— 模型训练。
    模型训练
    配置所有必要的训练参数。

    挂载 Google Drive：
    访问你在第 3 步中存储在 Google Drive 上的数据集。
    from google．colab import drive
    drive．mount（＇／content／drive＇）
    安装 TensorFlow 对象检测 API：
    安装和设置 TensorFlow 对象检测 API、Protobuf 和其他必要的依赖项。
    依赖项：
    所需的大部分依赖项都预装在 Google Colab 中。我们需要安装的唯一附加包是 TensorFlow．js，它用于将我们训练的模型转换为与网络兼容的模型。
    协议缓冲区：
    TensorFlow 对象检测 API 依赖于所谓的协议缓冲区（也称为 protobuf）。Protobuf 是一种描述信息的语言中立方式。这意味着你可以编写一次 protobuf，然后编译它以用于其他语言，如 Python、Java 或 C。下面使用的protoc命令正在为 Python 编译 object＿detection／protos 文件夹中的所有协议缓冲区。
    环境：
    要使用对象检测 API，我们需要将它与包含用于训练和评估几个广泛使用的卷积神经网络（CNN）图像分类模型的代码的 slim 添加到我们的 PYTHONPATH 中。

    测试设置：
    运行模型构建器测试以验证是否一切设置成功。
    ！python object＿detection／builders／model＿builder＿tf1＿test．py
    从 Google Drive 复制数据集文件夹：
    获取保存在 Drive 上的图像和注释数据集。
    ！unzip ／content／drive／MyDrive／TFJS－Custom－Detection －d ／content／
    ％cd ／content／
    ％mkdir data
    加载 xml＿to＿csv．py 文件：
    ！wget https：／／raw．githubusercontent．com／NSTiwari／TensorFlow．js－Custom－Object－Detection／master／xml＿to＿csv．py －P ／content／TFJS－Custom－Detection／
    将XML注释转换为 CSV 文件：
    所有 PascalVOC 标签都转换为 CSV 文件，用于训练和测试数据。
    ％cd ／content／
    ！python TFJS－Custom－Detection／xml＿to＿csv．py
    在数据文件夹中创建 labelmap．pbtxt 文件：考虑以下示例：

    创建TFRecord：
    下载 generate＿tf＿record．py 文件。
    ！wget https：／／raw．githubusercontent．com／NSTiwari／TensorFlow．js－Custom－Object－Detection／master／generate＿tf＿records．py －P ／content／
    ！python generate＿tf＿records．py －l ／content／data／labelmap．pbtxt －o data／train．record －i TFJS－Custom－Detection／images －csv TFJS－Custom－Detection／train＿labels．csv
    ！python generate＿tf＿records．py －l ／content／data／labelmap．pbtxt －o data／val．record －i TFJS－Custom－Detection／images －csv TFJS－Custom－Detection／val＿labels．csv
    导航到models／research目录：
    ％cd ／content／models／research
    下载基本模型：
    从头开始训练模型可能需要大量计算时间。相反，我们选择在预训练模型上应用迁移学习。当然，迁移学习在很大程度上有助于减少计算和时间。我们将使用的基本模型是非常快的 MobileNet 模型。

    模型配置：
    在训练开始之前，我们需要通过指定 labelmap、TFRecord 和 checkpoint 的路径来配置训练管道。默认批量大小为 128，这也需要更改，因为它太大而无法由 Colab 处理。
    import re
    from google．protobuf import text＿format
    from object＿detection．utils import config＿util
    from object＿detection．utils import label＿map＿util
    pipeline＿skeleton ＝＇／content／models／research／object＿detection／samples／configs／＇＋ CONFIG＿TYPE ＋＇．config＇
    configs ＝ config＿util．get＿configs＿from＿pipeline＿file（pipeline＿skeleton）
    label＿map ＝ label＿map＿util．get＿label＿map＿dict（LABEL＿MAP＿PATH）
    num＿classes ＝ len（label＿map．keys（））
    meta＿arch ＝ configs［＂model＂］．WhichOneof（＂model＂）
    override＿dict ＝｛
     ＇model．｛｝．num＿classes＇．format（meta＿arch）： num＿classes，
     ＇train＿config．batch＿size＇： 24，
     ＇train＿input＿path＇： TRAIN＿RECORD＿PATH，
     ＇eval＿input＿path＇： VAL＿RECORD＿PATH，
     ＇train＿config．fine＿tune＿checkpoint＇： os．path．join（CHECKPOINT＿PATH，＇model．ckpt＇），
     ＇label＿map＿path＇： LABEL＿MAP＿PATH
    ｝
    configs ＝ config＿util．merge＿external＿params＿with＿configs（configs， kwargs＿dict＝override＿dict）
    pipeline＿config ＝ config＿util．create＿pipeline＿proto＿from＿configs（configs）
    config＿util．save＿pipeline＿config（pipeline＿config， DATA＿PATH）
    开始训练：
    运行下面的单元格以开始训练模型。通过调用model＿main脚本并将以下参数传递给它来调用训练
    · 我们创建的pipeline．config 的位置。
    · 我们想要保存模型的位置。
    · 我们想要训练模型的步骤数（训练时间越长，学习的潜力就越大）。
    · 评估步骤的数量（或测试模型的频率）让我们了解模型的表现。
    ！rm －rf ＄OUTPUT＿PATH
    ！python －m object＿detection．model＿main
      －－pipeline＿config＿path＝＄DATA＿PATH／pipeline．config
      －－model＿dir＝＄OUTPUT＿PATH
      －－num＿train＿steps＝＄NUM＿TRAIN＿STEPS
      －－num＿eval＿steps＝100
    导出推理图：
    每 500 个训练步骤后生成检查点。每个检查点都是你的模型在该训练点的快照。
    如果由于某种原因训练因网络或电源故障而崩溃，那么你可以从最后一个检查点继续训练，而不是从头开始。
    import os
    import re
    regex ＝ re．compile（r＂model．ckpt－（［0－9］＋）．index＂）
    numbers ＝［int（regex．search（f）．group（1）） for f in os．listdir（OUTPUT＿PATH） if regex．search（f）］
    TRAINED＿CHECKPOINT＿PREFIX ＝ os．path．join（OUTPUT＿PATH，＇model．ckpt－｛｝＇．format（max（numbers）））
    print（f＇Using ｛TRAINED＿CHECKPOINT＿PREFIX｝＇）
    ！rm －rf ＄EXPORTED＿PATH
    ！python －m object＿detection．export＿inference＿graph
     －－pipeline＿config＿path＝＄DATA＿PATH／pipeline．config
     －－trained＿checkpoint＿prefix＝＄TRAINED＿CHECKPOINT＿PREFIX
     －－output＿directory＝＄EXPORTED＿PATH
    测试模型：
    现在，让我们在一些图像上测试模型。请记住，该模型仅训练了 500 步。所以，准确度可能不会那么高。运行下面的单元格来亲自测试模型并了解模型的训练效果。
    注意：有时，此命令不运行，可以尝试重新运行它。此外，尝试将模型训练 5，000 步，看看准确性如何变化。
    from IPython．display import display， Javascript， Image
    from google．colab．output import eval＿js
    from base64 import b64decode
    import tensorflow as tf
    ＃ Use javascipt to take a photo．
    def take＿photo（filename， quality＝0．8）：
      js ＝ Javascript（＇＇＇
         async function takePhoto（quality）｛
      const div ＝ document．createElement（＇div＇）；
      const capture ＝ document．createElement（＇button＇）；
      capture．textContent ＝＇Capture＇；
      div．appendChild（capture）；
    const video ＝ document．createElement（＇video＇）；
      video．style．display ＝＇block＇；
      const stream ＝ await navigator．mediaDevices．getUserMedia（｛video： true｝）；
    document．body．appendChild（div）；
      div．appendChild（video）；
      video．srcObject ＝ stream；
      await video．play（）；
    ／／ Resize the output to fit the video element．
      google．colab．output．setIframeHeight（document．documentElement．scrollHeight， true）；
    ／／ Wait for Capture to be clicked．
      await new Promise（（resolve）＝＞ capture．onclick ＝ resolve）；
    const canvas ＝ document．createElement（＇canvas＇）；
      canvas．width ＝ video．videoWidth；
      canvas．height ＝ video．videoHeight；
      canvas．getContext（＇2d＇）．drawImage（video， 0， 0）；
      stream．getVideoTracks（）［0］．stop（）；
      div．remove（）；
      return canvas．toDataURL（＇image／jpeg＇， quality）；
      ｝
      ＇＇＇）
      display（js）
      data ＝ eval＿js（＇takePhoto（｛｝）＇．format（quality））
      binary ＝ b64decode（data．split（＇，＇）［1］）
      with open（filename，＇wb＇） as f：
      f．write（binary）
      return filename
    try：
     take＿photo（＇／content／photo．jpg＇）
    except Exception as err：
     ＃ Errors will be thrown if the user does not have a webcam or if they do not
     ＃ grant the page permission to access it．
     print（str（err））
    ＃ Use the captured photo to make predictions
    ％matplotlib inline
    import os
    import numpy as np
    from matplotlib import pyplot as plt
    from PIL import Image as PImage
    from object＿detection．utils import visualization＿utils as vis＿util
    from object＿detection．utils import label＿map＿util
    ＃ Load the labels
    category＿index ＝ label＿map＿util．create＿category＿index＿from＿labelmap（LABEL
    MAP＿PATH， use＿display＿name＝True）
    ＃ Load the model
    path＿to＿frozen＿graph ＝ os．path．join（EXPORTED＿PATH，＇frozen＿inference＿graph．pb＇）
    detection＿graph ＝ tf．Graph（）
    with detection＿graph．as＿default（）：
     od＿graph＿def ＝ tf．GraphDef（）
     with tf．gfile．GFile（path＿to＿frozen＿graph，＇rb＇） as fid：
      serialized＿graph ＝ fid．read（）
      od＿graph＿def．ParseFromString（serialized＿graph）
      tf．import＿graph＿def（od＿graph＿def， name＝＇＇）
    with detection＿graph．as＿default（）：
     with tf．Session（graph＝detection＿graph） as sess：
      ＃ Definite input and output Tensors for detection＿graph
      image＿tensor ＝ detection＿graph．get＿tensor＿by＿name（＇image＿tensor：0＇）
      ＃ Each box represents a part of the image where a particular object was detected．
      detection＿boxes ＝ detection＿graph．get＿tensor＿by＿name（＇detection＿boxes：0＇）
      ＃ Each score represent how level of confidence for each of the objects．
      ＃ Score is shown on the result image， together with the class label．
      detection＿scores ＝ detection＿graph．get＿tensor＿by＿name（＇detection＿scores：0＇）
      detection＿classes ＝ detection＿graph．get＿tensor＿by＿name（＇detection＿classes：0＇）
      num＿detections ＝ detection＿graph．get＿tensor＿by＿name（＇num＿detections：0＇）
      image ＝ PImage．open（＇／content／photo．jpg＇）
      ＃ the array based representation of the image will be used later in order to prepare the
      ＃ result image with boxes and labels on it．
      （im＿width， im＿height）＝ image．size
      image＿np ＝ np．array（image．getdata（））．reshape（（im＿height， im＿width， 3））．astype（np．uint8）
      ＃ Expand dimensions since the model expects images to have shape：［1， None， None， 3］
      image＿np＿expanded ＝ np．expand＿dims（image＿np， axis＝0）
      ＃ Actual detection．
      （boxes， scores， classes， num）＝ sess．run（
      ［detection＿boxes， detection＿scores， detection＿classes， num＿detections］，
      feed＿dict＝｛image＿tensor： image＿np＿expanded｝）
      ＃ Visualization of the results of a detection．
      vis＿util．visualize＿boxes＿and＿labels＿on＿image＿array（
      image＿np，
      np．squeeze（boxes），
      np．squeeze（classes）．astype（np．int32），
      np．squeeze（scores），
      category＿index，
      use＿normalized＿coordinates＝True，
      line＿thickness＝8）
      plt．figure（figsize＝（12， 8））
      plt．imshow（image＿np）
    将模型转换为 TFJS：
    我们导出的模型适用于 Python。但是，要将其部署在 Web 浏览器上，我们需要将其转换为 TensorFlow．js，以便兼容直接在浏览器上运行
    此外，该模型仅将对象检测为label＿map．pbtxt．因此，我们还需要为所有可以映射到 ID 的标签创建一个 JSON 列表。

    下载模型：
    现在可以下载 TFJS 模型了。
    注意：有时，此命令不会运行或会引发错误。请尝试再次运行它。
    你还可以通过右键单击左侧边栏文件检查器中的 model＿web．zip 文件来下载模型。
    from google．colab import files
    files．download（＇／content／model＿web．zip＇）
    如果你顺利到达这里，恭喜你，你已经成功地训练了模型。
    使用 TensorFlow．js 在 Web 应用程序上部署模型。下载 TFJS 模型后，复制TensorFlow．js－Custom－Object－Detection／React＿Web＿App／public目录中的 model＿web 文件夹。
    现在，运行以下命令：
    cd TensorFlow．js－Custom－Object－Detection／React＿Web＿App
    npm install
    npm start
    现在，最后在你的 Web 浏览器上打开localhost：3000并亲自测试模型。

TF．js 模型的对象检测输出
因此，恭喜你使用 TensorFlow 创建了端到端的自定义对象检测模型，并将其部署在使用 TensorFlow．js 的 Web 应用程序上。