什么是 OpenCV？计算机视觉基本任务入门

2024.03.23 磐创AI

    如果你有兴趣或计划做与图像或视频相关的事情，你绝对应该考虑使用计算机视觉。计算机视觉（CV）是人工智能（AI）的一个分支，它使计算机能够从图像、视频和其他视觉输入中提取有意义的信息，并采取必要的行动。例如自动驾驶汽车、自动交通管理、监控、基于图像的质量检查等等。
    什么是 OpenCV？
    OpenCV 是一个主要针对计算机视觉的库。它拥有你在使用计算机视觉（CV）时所需的所有工具。“Open”代表开源，“CV”代表计算机视觉。我会学到什么？本文包含使用 OpenCV 库开始使用计算机视觉所需的全部内容。你会在计算机视觉方面感到更加自信和高效。
    读取和显示图像
    首先让我们了解如何读取图像并显示它，这是CV的基础知识。
    读取图像：
    import numpy as np
    import cv2 as cv
    import matplotlib．pyplot as plt
    img＝cv2．imread（＇．．／input／images－for－computer－vision／tiger1．jpg＇）
    ＇img＇包含 numpy 数组形式的图像。让我们打印它的类型和形状
    print（type（img））
    print（img．shape）
    numpy 数组的形状为（667， 1200， 3），其中，
    667 – 图像高度，1200 – 图像宽度，3 – 通道数，
    在这种情况下，有 RGB 通道，所以我们有 3 个通道。原始图像是 RGB 的形式，但 OpenCV 默认将图像读取为 BGR，因此我们必须在显示之前将其转换回RGB
    显示图像：
    ＃ Converting image from BGR to RGB for displaying
    img＿convert＝cv．cvtColor（img， cv．COLOR＿BGR2RGB）
    plt．imshow（img＿convert）

    在图像上绘图
    我们可以绘制线条、形状和文本图像。
    ＃ Rectangle
    color＝（240，150，240）＃ Color of the rectangle
    cv．rectangle（img，（100，100），（300，300），color，thickness＝10， lineType＝8）＃＃ For filled rectangle， use thickness ＝－1
    ＃＃（100，100） are （x，y） coordinates for the top left point of the rectangle and （300， 300） are （x，y） coordinates for the bottom right point
    ＃ Circle
    color＝（150，260，50）
    cv．circle（img，（650，350），100， color，thickness＝10）＃＃ For filled circle， use thickness ＝－1
    ＃＃（250， 250） are （x，y） coordinates for the center of the circle and 100 is the radius
    ＃ Text
    color＝（50，200，100）
    font＝cv．FONT＿HERSHEY＿SCRIPT＿COMPLEX
    cv．putText（img，＇Save Tigers＇，（200，150）， font， 5， color，thickness＝5， lineType＝20）
    ＃ Converting BGR to RGB
    img＿convert＝cv．cvtColor（img， cv．COLOR＿BGR2RGB）
    plt．imshow（img＿convert）

    混合图像
    我们还可以使用 OpenCV 混合两个或多个图像。图像只不过是数字，你可以对数字进行加、减、乘、除运算，从而得到图像。需要注意的一件事是图像的大小应该相同。
    ＃ For plotting multiple images at once
    def myplot（images，titles）：
     fig， axs＝plt．subplots（1，len（images），sharey＝True）
     fig．set＿figwidth（15）
     for img，ax，title in zip（images，axs，titles）：
     if img．shape［－1］＝＝3：
     img＝cv．cvtColor（img， cv．COLOR＿BGR2RGB）＃ OpenCV reads images as BGR， so converting back them to RGB
     else：
     img＝cv．cvtColor（img， cv．COLOR＿GRAY2BGR）
     ax．imshow（img）
     ax．set＿title（title）
    img1 ＝ cv．imread（＇．．／input／images－for－computer－vision／tiger1．jpg＇）
    img2 ＝ cv．imread（＇．．／input／images－for－computer－vision／horse．jpg＇）
    ＃ Resizing the img1
    img1＿resize ＝ cv．resize（img1，（img2．shape［1］， img2．shape［0］））
    ＃ Adding， Subtracting， Multiplying and Dividing Images
    img＿add ＝ cv．add（img1＿resize， img2）
    img＿subtract ＝ cv．subtract（img1＿resize， img2）
    img＿multiply ＝ cv．multiply（img1＿resize， img2）
    img＿divide ＝ cv．divide（img1＿resize， img2）
    ＃ Blending Images
    img＿blend ＝ cv．addWeighted（img1＿resize， 0．3， img2， 0．7， 0）＃＃ 30％ tiger and 70％ horse
    myplot（［img1＿resize， img2］，［＇Tiger＇，＇Horse＇］）
    myplot（［img＿add， img＿subtract， img＿multiply， img＿divide， img＿blend］，［＇Addition＇，＇Subtraction＇，＇Multiplication＇， Division＇，＇Blending＇］）

    乘法图像几乎为白色，分割图像为黑色，这是因为白色表示255，黑色表示0。当我们将图像的两个像素值相乘时，我们得到的数字更大，因此其颜色变为白色或接近白色，与分割图像相反。
    图像变换
    图像变换包括平移、旋转、缩放、裁剪和翻转图像。
    img＝cv．imread（＇．．／input／images－for－computer－vision／tiger1．jpg＇）
    width， height，＿＝img．shape
    ＃ Translating
    M＿translate＝np．float32（［［1，0，200］，［0，1，100］］）＃ 200＝＞ Translation along x－axis and 100＝＞translation along y－axis
    img＿translate＝cv．warpAffine（img，M＿translate，（height，width））
    ＃ Rotating
    center＝（width／2，height／2）
    M＿rotate＝cv．getRotationMatrix2D（center， angle＝90， scale＝1）
    img＿rotate＝cv．warpAffine（img，M＿rotate，（width，height））
    ＃ Scaling
    scale＿percent ＝ 50
    width ＝ int（img．shape［1］＊ scale＿percent ／ 100）height ＝ int（img．shape［0］＊ scale＿percent ／ 100）
    dim ＝（width， height）
    img＿scale ＝ cv．resize（img， dim， interpolation ＝ cv．INTER＿AREA）
    ＃ Flipping
    img＿flip＝cv．flip（img，1）＃ 0：Along horizontal axis， 1：Along verticle axis，－1： first along verticle then horizontal
    ＃ Shearing
    srcTri ＝ np．array（［［0， 0］，［img．shape［1］－ 1， 0］，［0， img．shape［0］－ 1］］）．astype（np．float32）
    dstTri ＝ np．array（［［0， img．shape［1］＊0．33］，［img．shape［1］＊0．85， img．shape［0］＊0．25］，［img．shape［1］＊0．15，
    img．shape［0］＊0．7］］）．astype（np．float32）
    warp＿mat ＝ cv．getAffineTransform（srcTri， dstTri）
    img＿warp ＝ cv．warpAffine（img， warp＿mat，（height， width））
    myplot（［img， img＿translate， img＿rotate， img＿scale， img＿flip， img＿warp］，
     ［＇Original Image＇，＇Translated Image＇，＇Rotated Image＇，＇Scaled Image＇，＇Flipped Image＇，＇Sheared Image＇］）

    图像预处理
    阈值处理：在阈值处理中，小于阈值的像素值变为 0（黑色），大于阈值的像素值变为 255（白色）。
    我将阈值设为 150，但你也可以选择任何其他数字。
    ＃ For visualising the filters
    import plotly．graph＿objects as go
    from plotly．subplots import make＿subplots
    def plot＿3d（img1， img2， titles）：
     fig ＝ make＿subplots（rows＝1， cols＝2，
      specs＝［［｛＇is＿3d＇： True｝，｛＇is＿3d＇： True｝］］，
     subplot＿titles＝［titles［0］， titles［1］］，
     ）
     x， y＝np．mgrid［0：img1．shape［0］， 0：img1．shape［1］］
     fig．add＿trace（go．Surface（x＝x， y＝y， z＝img1［：，：，0］）， row＝1， col＝1）
     fig．add＿trace（go．Surface（x＝x， y＝y， z＝img2［：，：，0］）， row＝1， col＝2）
     fig．update＿traces（contours＿z＝dict（show＝True， usecolormap＝True，
     highlightcolor＝＂limegreen＂， project＿z＝True））
     fig．show（）
    img＝cv．imread（＇．．／input／images－for－computer－vision／simple＿shapes．png＇）
    ＃ Pixel value less than threshold becomes 0 and more than threshold becomes 255
    ＿，img＿threshold＝cv．threshold（img，150，255，cv．THRESH＿BINARY）
    plot＿3d（img， img＿threshold，［＇Original Image＇，＇Threshold Image＝150＇］）

    应用阈值后，150 的值变为等于 255
    过滤：图像过滤是通过改变像素的值来改变图像的外观。每种类型的过滤器都会根据相应的数学公式更改像素值。我不会在这里详细介绍数学，但我将通过在 3D 中可视化它们来展示每个过滤器的工作原理。
    limg＝cv．imread（＇．．／input／images－for－computer－vision／simple＿shapes．png＇）
    ＃ Gaussian Filter
    ksize＝（11，11）＃ Both should be odd numbers
    img＿guassian＝cv．GaussianBlur（img， ksize，0）
    plot＿3d（img， img＿guassian，［＇Original Image＇，＇Guassian Image＇］）
    ＃ Median Filter
    ksize＝11
    img＿medianblur＝cv．medianBlur（img，ksize）
    plot＿3d（img， img＿medianblur，［＇Original Image＇，＇Median blur＇］）
    ＃ Bilateral Filter
    img＿bilateralblur＝cv．bilateralFilter（img，d＝5， sigmaColor＝50， sigmaSpace＝5）
    myplot（［img， img＿bilateralblur］，［＇Original Image＇，＇Bilateral blur Image＇］）
    plot＿3d（img， img＿bilateralblur，［＇Original Image＇，＇Bilateral blur＇］）

    高斯滤波器：通过去除细节和噪声来模糊图像。
    中值滤波器：非线性过程可用于减少脉冲噪声或椒盐噪声
    双边滤波器：边缘保留和降噪平滑。简单来说，过滤器有助于减少或去除亮度或颜色随机变化的噪声，这称为平滑。
    特征检测
    特征检测是一种通过计算图像信息的抽象，在每个图像点上做出局部决策的方法。例如，对于一张脸的图像，特征是眼睛、鼻子、嘴唇、耳朵等，我们尝试识别这些特征。让我们首先尝试识别图像的边缘。
    边缘检测
    img＝cv．imread（＇．．／input／images－for－computer－vision／simple＿shapes．png＇）
    img＿canny1＝cv．Canny（img，50， 200）
    ＃ Smoothing the img before feeding it to canny
    filter＿img＝cv．GaussianBlur（img，（7，7）， 0）img＿canny2＝cv．Canny（filter＿img，50， 200）
    myplot（［img， img＿canny1， img＿canny2］，
     ［＇Original Image＇，＇Canny Edge Detector（Without Smoothing）＇，＇Canny Edge Detector（With Smoothing）＇］）

    这里我们使用 Canny 边缘检测器，它是一种边缘检测算子，它使用多阶段算法来检测图像中的各种边缘。它由 John F． Canny 于 1986 年开发。我不会详细介绍 Canny 的工作原理，但这里的关键点是它用于提取边缘。
    在使用 Canny 边缘检测方法检测边缘之前，我们平滑图像以去除噪声。正如你从图像中看到的，平滑后我们得到清晰的边缘。
    轮廓
    img＝cv．imread（＇．．／input／images－for－computer－vision／simple＿shapes．png＇）
    img＿copy＝img．copy（）
    img＿gray＝cv．cvtColor（img，cv．COLOR＿BGR2GRAY）
    ＿，img＿binary＝cv．threshold（img＿gray，50，200，cv．THRESH＿BINARY）
    ＃Edroing and Dilating for smooth contours
    img＿binary＿erode＝cv．erode（img＿binary，（10，10）， iterations＝5）
    img＿binary＿dilate＝cv．dilate（img＿binary，（10，10）， iterations＝5）
    contours，hierarchy＝cv．findContours（img＿binary，cv．RETR＿TREE， cv．CHAIN＿APPROX＿SIMPLE）
    cv．drawContours（img， contours，－1，（0，0，255），3）＃ Draws the contours on the original image just like draw function
    myplot（［img＿copy， img］，［＇Original Image＇，＇Contours in the Image＇］）

    侵蚀，使用用于探测和降低包含在图像中的形状的结构元素的侵蚀操作。
    膨胀：将像素添加到图像中对象的边界，与侵蚀相反

    Hullsimg＝cv．imread（＇．．／input／images－for－computer－vision／simple＿shapes．png＇，0）
    ＿，threshold＝cv．threshold（img，50，255，cv．THRESH＿BINARY）
    contours，hierarchy＝cv．findContours（threshold，cv．RETR＿TREE， cv．CHAIN＿APPROX＿SIMPLE）
    hulls＝［cv．convexHull（c） for c in contours］
    img＿hull＝cv．drawContours（img， hulls，－1，（0，0，255），2）＃Draws the contours on the original image just like draw function
    plt．imshow（img）

总结我们看到了如何读取和显示图像、在图像上绘制形状、文本、混合两个图像、旋转、缩放、平移等变换图像，使用高斯模糊、中值模糊、双边模糊过滤图像，以及检测使用 Canny 边缘检测和在图像中查找轮廓的特征。