如何避免这8个常见的深度学习/计算机视觉错误？

2024.07.04 磐创AI

    人是不完美的，我们经常在程序中犯错误。有时这些错误很容易发现：你的代码根本不能工作，你的应用程序崩溃等等。但是有些bug是隐藏的，这使得它们更加危险。
    在解决深度学习问题时，由于一些不确定性，很容易出现这种类型的bug：很容易看到web应用端点路由请求是否正确，而不容易检查你的梯度下降步骤是否正确。然而，在DL从业者生涯中有很多错误是可以避免的。

    我想分享一些我的经验，关于我在过去两年的计算机视觉工作中看到或制造的错误。我在会议上谈到过这个话题，很多人在会后告诉我：“是的，伙计，我也有很多这样的错误。”我希望我的文章可以帮助你至少避免其中的一些问题。
    1．翻转图像和关键点
    假设一个关键点检测问题的工作。它们的数据看起来像图像和一系列关键点元组，例如［（0，1），（2，2）］，其中每个关键点是一对x和y坐标。
    让我们对这个数据实现一个基本的数据增强：
    def flip＿img＿and＿keypoints（img： np．ndarray， kpts： Sequence［Sequence［int］］）：
     img ＝ np．fliplr（img）
     h， w，＊＿＝ img．shape
     kpts ＝［（y， w － x） for y， x in kpts］
     return img， kpts
    看起来好像是正确的，嗯，让我们把结果可视化一下：
    mage ＝ np．ones（（10， 10）， dtype＝np．float32）
    kpts ＝［（0， 1），（2， 2）］
    image＿flipped， kpts＿flipped ＝ flip＿img＿and＿keypoints（image， kpts）
    img1 ＝ image．copy（）
    for y， x in kpts：
     img1［y， x］＝ 0
    img2 ＝ image＿flipped．copy（）
    for y， x in kpts＿flipped：
     img2［y， x］＝ 0
    ＿＝ plt．imshow（np．hstack（（img1， img2）））

    不对称看起来很奇怪！如果我们检查极值的情况呢？
    image ＝ np．ones（（10， 10）， dtype＝np．float32）
    kpts ＝［（0， 0），（1， 1）］
    image＿flipped， kpts＿flipped ＝ flip＿img＿and＿keypoints（image， kpts）
    img1 ＝ image．copy（）
    for y， x in kpts：
     img1［y， x］＝ 0
    img2 ＝ image＿flipped．copy（）
    for y， x in kpts＿flipped：
     img2［y， x］＝ 0
    out：
    IndexError
    Traceback （most recent call last）
    ＜ipython－input－5－997162463eae＞ in ＜module＞
     8 img2 ＝ image＿flipped．copy（）
     9 for y， x in kpts＿flipped：
    －－－＞ 10 img2［y， x］＝ 0
    IndexError： index 10 is out of bounds for axis 1 with size 10
    程序报错了！这是一个典型的差一误差。正确的代码是这样的：
    def flip＿img＿and＿keypoints（img： np．ndarray， kpts： Sequence［Sequence［int］］）：
     img ＝ np．fliplr（img）
     h， w，＊＿＝ img．shape
     kpts ＝［（y， w － x － 1） for y， x in kpts］
     return img， kpts
    我们可以通过可视化来检测这个问题，而在x ＝ 0点的单元测试也会有帮助。
    2．还是关键点问题
    即使在上述错误被修复之后，仍然存在问题。现在更多的是语义上的问题，而不仅仅是代码上的问题。
    假设需要增强具有两只手掌的图像。看起来好像没问题－左右翻转后手还是手。

    但是等等！我们对我们拥有的关键点语义一无所知。如果这个关键点的意思是这样的：
    kpts ＝［
     （20， 20），＃左小指
     （20， 200），＃右小指
     ．．．
     ］

    这意味着增强实际上改变了语义：左变成右，右变成左，但我们不交换数组中的关键点索引。它会给训练带来大量的噪音和更糟糕的度量。
    我们应该吸取教训：
    在应用增强或其他特性之前，要了解和考虑数据结构和语义；
    保持你的实验原子性：添加一个小的变化（例如一个新的变换），如果分数已经提高，检查它如何进行和合并。
    3．编码自定义损失函数
    熟悉语义分割问题的人可能知道IoU度量。不幸的是，我们不能直接用SGD来优化它，所以常用的方法是用可微损失函数来近似它。让我们编码实现一个！
    def iou＿continuous＿loss（y＿pred， y＿true）：
     eps ＝ 1e－6
     def ＿sum（x）：
     return x．sum（－1）．sum（－1）
     numerator ＝（＿sum（y＿true ＊ y＿pred）＋ eps）
     denominator ＝（＿sum（y＿true ＊＊ 2）＋＿sum（y＿pred ＊＊ 2）
     －＿sum（y＿true ＊ y＿pred）＋ eps）
     return （numerator ／ denominator）．mean（）
    看起来不错，让我们测试一下：
    In ［3］： ones ＝ np．ones（（1， 3， 10， 10））
     ．．．： x1 ＝ iou＿continuous＿loss（ones ＊ 0．01， ones）
     ．．．： x2 ＝ iou＿continuous＿loss（ones ＊ 0．99， ones）
    In ［4］： x1， x2
    Out［4］：（0．010099999897990103， 0．9998990001020204）
    在x1中，我们计算了与正确数据完全不同的数据的损失，而x2则是非常接近正确数据的数据损失结果。我们期望x1很大因为预测很糟糕，x2应该接近0。但是结果与我期望的有差别，哪里出现错误了呢？
    上面的函数是度量的一个很好的近似。度量不是一种损失：它通常（包括这种情况）越高越好。当我们使用SGD最小化损失时，我们应该做一些改变：
    def iou＿continuous（y＿pred， y＿true）：
     eps ＝ 1e－6
     def ＿sum（x）：
     return x．sum（－1）．sum（－1）
     numerator ＝（＿sum（y＿true ＊ y＿pred）＋ eps）
     denominator ＝（＿sum（y＿true ＊＊ 2）＋＿sum（y＿pred ＊＊ 2）
     －＿sum（y＿true ＊ y＿pred）＋ eps）
     return （numerator ／ denominator）．mean（）
    def iou＿continuous＿loss（y＿pred， y＿true）：
     return 1 － iou＿continuous（y＿pred， y＿true）
    这些问题可以从两个方面来确定：
    编写一个单元测试来检查损失的方向
    运行健全性检查
    4．当我们遇到Pytorch的时候
    假设有一个预先训练好的模型。编写基于ceevee API的Predictor 类。
    from ceevee．base import AbstractPredictor
    class MySuperPredictor（AbstractPredictor）：
     def ＿＿init＿＿（self，
     weights＿path： str，
     ）：
     super（）．＿＿init＿＿（）
     self．model ＝ self．＿load＿model（weights＿path＝weights＿path）
     def process（self， x，＊kw）：
     with torch．no＿grad（）：
     res ＝ self．model（x）
     return res
     ＠staticmethod
     def ＿load＿model（weights＿path）：
     model ＝ ModelClass（）
     weights ＝ torch．load（weights＿path， map＿location＝＇cpu＇）
     model．load＿state＿dict（weights）
     return model
    这个代码正确吗？也许！对于某些模型来说确实是正确的。例如，当模型没有dropout或norm 层，如torch．nn．BatchNorm2d。
    但是对于大多数计算机视觉应用来说，代码忽略了一些重要的东西：转换到评估模式。
    如果试图将动态PyTorch图转换为静态PyTorch图，这个问题很容易意识到。torch．jit模块用于这种转换。
    In ［3］： model ＝ nn．Sequential（
     ．．．： nn．Linear（10， 10），
     ．．．： nn．Dropout（．5）
     ．．．：）
     ．．．：
     ．．．： traced＿model ＝ torch．jit．trace（model， torch．rand（10））
    ／Users／Arseny／．pyenv／versions／3．6．6／lib／python3．6／site－packages／torch／jit／＿＿init＿＿．py：914： TracerWarning： Trace had nondeterministic nodes． Did you forget call ．eval（） on your model？ Nodes：
     ％12 ： Float（10）＝ aten：：dropout（％input，％10，％11）， scope： Sequential／Dropout［1］＃／Users／Arseny／．pyenv／versions／3．6．6／lib／python3．6／site－packages／torch／nn／functional．py：806：0
    This may cause errors in trace checking． To disable trace checking， pass check＿trace＝False to torch．jit．trace（）
     check＿tolerance，＿force＿outplace， True，＿module＿class）
    ／Users／Arseny／．pyenv／versions／3．6．6／lib／python3．6／site－packages／torch／jit／＿＿init＿＿．py：914： TracerWarning： Output nr 1． of the traced function does not match the corresponding output of the Python function． Detailed error：
    Not within tolerance rtol＝1e－05 atol＝1e－05 at input［5］（0．0 vs． 0．5454154014587402） and 5 other locations （60．00％）
    check＿tolerance，＿force＿outplace， True，＿module＿class）
    一个简单的解决办法：
    In ［4］： model ＝ nn．Sequential（
     ．．．： nn．Linear（10， 10），
     ．．．： nn．Dropout（．5）
     ．．．：）
     ．．．：
     ．．．： traced＿model ＝ torch．jit．trace（model．eval（）， torch．rand（10））
     ＃没有警告！
    torch．jit．trace运行模型几次并比较结果。
    然而torch．jit．trace并不是万能的，你应该了解并记住。
    5．复制粘贴问题
    很多东西都是成对存在的：训练和验证、宽度和高度、纬度和经度……如果你仔细阅读，你会很容易发现一个bug是由某一个成员中复制粘贴到另外一个成员中引起的：
    def make＿dataloaders（train＿cfg， val＿cfg， batch＿size）：
     train ＝ Dataset．from＿config（train＿cfg）
     val ＝ Dataset．from＿config（val＿cfg）
     shared＿params ＝｛＇batch＿size＇： batch＿size，＇shuffle＇： True，＇num＿workers＇： cpu＿count（）｝
     train ＝ DataLoader（train，＊＊shared＿params）
     val ＝ DataLoader（train，＊＊shared＿params）
     return train， val
    不仅仅是我犯了愚蠢的错误，例如。流行的albumentations库中也有类似的问题。
    ＃ https：／／github．com／albu／albumentations／blob／0．3．0／albumentations／augmentations／transforms．py
    def apply＿to＿keypoint（self， keypoint， crop＿height＝0， crop＿width＝0， h＿start＝0， w＿start＝0， rows＝0， cols＝0，＊＊params）：
     keypoint ＝ F．keypoint＿random＿crop（keypoint， crop＿height， crop＿width， h＿start， w＿start， rows， cols）
     scale＿x ＝ self．width ／ crop＿height
     scale＿y ＝ self．height ／ crop＿height
     keypoint ＝ F．keypoint＿scale（keypoint， scale＿x， scale＿y）
     return keypoint
    不过别担心，现在已经修复好了。
    如何避免？尽量以不需要复制和粘贴的方式编写代码。
    下面这种编程方式不是一个好的方式：
    datasets ＝［］
    data＿a ＝ get＿dataset（MyDataset（config［＇dataset＿a＇］）， config［＇shared＿param＇］， param＿a）
    datasets．append（data＿a）
    data＿b ＝ get＿dataset（MyDataset（config［＇dataset＿b＇］）， config［＇shared＿param＇］， param＿b）
    datasets．append（data＿b）
    而下面的方式看起来好多了：
    datasets ＝［］
    for name， param in zip（（＇dataset＿a＇，＇dataset＿b＇），
     （param＿a， param＿b），
     ）：
     datasets．append（get＿dataset（MyDataset（config［name］）， config［＇shared＿param＇］， param））
    6．正确的数据类型让我们编写一个新的增强：def add＿noise（img： np．ndarray）－＞ np．ndarray：
     mask ＝ np．random．rand（＊img．shape）＋．5
     img ＝ img．astype（＇float32＇）＊ mask
     return img．astype（＇uint8＇）

    图像已被更改。这是我们所期望的吗？嗯，可能修改得有点过了。
    这里有一个危险的操作：将float32转换为uint8。它可能会导致溢出：
    def add＿noise（img： np．ndarray）－＞ np．ndarray：
     mask ＝ np．random．rand（＊img．shape）＋．5
     img ＝ img．astype（＇float32＇）＊ mask
     return np．clip（img， 0， 255）．astype（＇uint8＇）
    img ＝ add＿noise（cv2．imread（＇two＿hands．jpg＇）［：，：，：：－1］）
    ＿＝ plt．imshow（img）
    看起来好多了，是吧？
    顺便说一句，还有一种方法可以避免这个问题：不要重造轮子，不要从头开始编写增强代码，而是使用现有的增强，比如：albumentations．augmentations．transforms．GaussNoise。
    我曾经犯过另一个同样的错误。
    raw＿mask ＝ cv2．imread（＇mask＿small．png＇）
    mask ＝ raw＿mask．astype（＇float32＇）／ 255
    mask ＝ cv2．resize（mask，（64， 64）， interpolation＝cv2．INTER＿LINEAR）
    mask ＝ cv2．resize（mask，（128， 128）， interpolation＝cv2．INTER＿CUBIC）
    mask ＝（mask ＊ 255）．astype（＇uint8＇）
    ＿＝ plt．imshow（np．hstack（（raw＿mask， mask）））
    这里出了什么问题？首先，用三次样条插值调整mask的大小是一个坏主意。与转换float32到uint8的问题是一样的：三次样条插值的输出值会大于输入值，会导致溢出。

    我在做可视化的时候发现了这个问题。在你的训练循环中到处使用断言也是一个好主意。
    7．拼写错误发生
    假设需要对全卷积网络（如语义分割问题）和一个巨大的图像进行推理。该图像是如此巨大，没有机会把它放在你的GPU上－例如，它可以是一个医疗或卫星图像。
    在这种情况下，可以将图像分割成网格，独立地对每一块进行推理，最后合并。此外，一些预测交叉可能有助于平滑边缘的伪影
    让我们编码实现吧！
    from tqdm import tqdm
    class GridPredictor：
     ＂＂＂
     你有GPU内存限制时，此类可用于预测大图像的分割掩码
     ＂＂＂
     def ＿＿init＿＿（self， predictor： AbstractPredictor， size： int， stride： Optional［int］＝ None）：
     self．predictor ＝ predictor
     self．size ＝ size
     self．stride ＝ stride if stride is not None else size ／／ 2
     def ＿＿call＿＿（self， x： np．ndarray）：
     h， w，＿＝ x．shape
     mask ＝ np．zeros（（h， w， 1）， dtype＝＇float32＇）
     weights ＝ mask．copy（）
     for i in tqdm（range（0， h － 1， self．stride））：
     for j in range（0， w － 1， self．stride）：
     a， b， c， d ＝ i， min（h， i ＋ self．size）， j， min（w， j ＋ self．size）
     patch ＝ x［a：b， c：d，：］
     mask［a：b， c：d，：］＋＝ np．expand＿dims（self．predictor（patch），－1）
     weights［a：b， c：d，：］＝ 1
     return mask ／ weights
    有一个符号输入错误，可以很容易地找到它，检查代码是否正确：
    class Model（nn．Module）：
     def forward（self， x）：
     return x．mean（axis＝－1）
    model ＝ Model（）
    grid＿predictor ＝ GridPredictor（model， size＝128， stride＝64）
    simple＿pred ＝ np．expand＿dims（model（img），－1）
    grid＿pred ＝ grid＿predictor（img）
    np．testing．assert＿allclose（simple＿pred， grid＿pred， atol＝．001）
    AssertionError Traceback （most recent call last）
    ＜ipython－input－24－a72034c717e9＞ in ＜module＞
     9 grid＿pred ＝ grid＿predictor（img）
     10
    －－－＞ 11 np．testing．assert＿allclose（simple＿pred， grid＿pred， atol＝．001）
    ～／．pyenv／versions／3．6．6／lib／python3．6／site－packages／numpy／testing／＿private／utils．py in assert＿allclose（actual， desired， rtol， atol， equal＿nan， err＿msg， verbose）
     1513 header ＝＇Not equal to tolerance rtol＝％g， atol＝％g＇％（rtol， atol）
     1514 assert＿array＿compare（compare， actual， desired， err＿msg＝str（err＿msg），
    －＞ 1515 verbose＝verbose， header＝header， equal＿nan＝equal＿nan）
     1516
     1517
    ～／．pyenv／versions／3．6．6／lib／python3．6／site－packages／numpy／testing／＿private／utils．py in assert＿array＿compare（comparison， x， y， err＿msg， verbose， header， precision， equal＿nan， equal＿inf）
     839 verbose＝verbose， header＝header，
     840 names＝（＇x＇，＇y＇）， precision＝precision）
    －－＞ 841 raise AssertionError（msg）
     842 except ValueError：
     843 import traceback
    AssertionError：
    Not equal to tolerance rtol＝1e－07， atol＝0．001
    Mismatch： 99．6％
    Max absolute difference： 765．
    Max relative difference： 0．75000001
     x： array（［［［215．333333］，
     ［192．666667］，
     ［250．］，．．．
     y： array（［［［ 215．33333］，
     ［ 192．66667］，
     ［ 250．］，．．．
    call方法的正确版本如下：
    def ＿＿call＿＿（self， x： np．ndarray）：
     h， w，＿＝ x．shape
     mask ＝ np．zeros（（h， w， 1）， dtype＝＇float32＇）
     weights ＝ mask．copy（）
     for i in tqdm（range（0， h － 1， self．stride））：
     for j in range（0， w － 1， self．stride）：
     a， b， c， d ＝ i， min（h， i ＋ self．size）， j， min（w， j ＋ self．size）
     patch ＝ x［a：b， c：d，：］
     mask［a：b， c：d，：］＋＝ np．expand＿dims（self．predictor（patch），－1）
     weights［a：b， c：d，：］＋＝ 1
     return mask ／ weights
    如果你仍然不知道问题是什么，注意行weights［a：b， c：d，：］＋＝ 1。
    8．Imagenet归一化
    当一个人需要做迁移学习时，用训练Imagenet时的方法将图像归一化通常是一个好主意。
    让我们使用熟悉的albumentations来实现：
    from albumentations import Normalize
    norm ＝ Normalize（）
    img ＝ cv2．imread（＇img＿small．jpg＇）
    mask ＝ cv2．imread（＇mask＿small．png＇， cv2．IMREAD＿GRAYSCALE）
    mask ＝ np．expand＿dims（mask，－1）＃ shape （64， 64）－＞ shape （64， 64， 1）
    normed ＝ norm（image＝img， mask＝mask）
    img， mask ＝［normed［x］ for x in ［＇image＇，＇mask＇］］
    def img＿to＿batch（x）：
     x ＝ np．transpose（x，（2， 0， 1））．astype（＇float32＇）
     return torch．from＿numpy（np．expand＿dims（x， 0））
    img， mask ＝ map（img＿to＿batch，（img， mask））
    criterion ＝ F．binary＿cross＿entropy
    现在是时候训练一个网络并对单个图像进行拟合——正如我所提到的，这是一种很好的调试技术：
    model＿a ＝ UNet（3， 1）
    optimizer ＝ torch．optim．Adam（model＿a．parameters（）， lr＝1e－3）
    losses ＝［］
    for t in tqdm（range（20））：
     loss ＝ criterion（model＿a（img）， mask）
     losses．append（loss．item（））
     optimizer．zero＿grad（）
     loss．backward（）
     optimizer．step（）
    ＿＝ plt．plot（losses）

    曲率看起来很好，但是－300不是我们期望的交叉熵的损失值。是什么问题？
    归一化处理图像效果很好，但掩码需要缩放到［0，1］之间。
    model＿b ＝ UNet（3， 1）
    optimizer ＝ torch．optim．Adam（model＿b．parameters（）， lr＝1e－3）
    losses ＝［］
    for t in tqdm（range（20））：
     loss ＝ criterion（model＿b（img）， mask ／ 255．）
     losses．append（loss．item（））
     optimizer．zero＿grad（）
     loss．backward（）
     optimizer．step（）
    ＿＝ plt．plot（losses）
    在训练循环时一个简单运行断言（例如assert mask．max（）＜＝ 1）可以很快地检测到问题。同样，也可以是单元测试。