DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • HPE to Acquire OpsRamp to Gain AIOps Platform
  • Oracle Makes Java 20 Platform Generally Available
  • How to Maximize Telemetry Data Value With Observability Pipelines
  • Awareness of Software Supply Chain Security Issues Improves
  • Why Observability is Important for Development Teams

Home » Blogs » AI » 6 Common Pitfalls to Avoid While Building Out Your AI Tools

6 Common Pitfalls to Avoid While Building Out Your AI Tools

Avatar photoBy: Ramprakash Ramamoorthy on April 24, 2020 1 Comment

Enterprise-level companies, with thousands of employees across the globe, can have their AI framework replicated by B2B companies of nearly any size. In fact, even without access to a ton of proprietary data and immense computational power, you can build out effective AI tools.

Related Posts
  • 6 Common Pitfalls to Avoid While Building Out Your AI Tools
  • Logz.io Secures $16M Series B, Adds Dyn Chairman to Board
  • A Year in Review: Top Developer Tools Trends in 2015
    Related Categories
  • AI
  • Blogs
  • DevOps Practice
    Related Topics
  • ai
  • AI tools
  • artificial intelligence
  • data leakage
  • personal data
Show more
Show less

To begin with, you can introduce publicly available datasets from movie review websites, e-commerce review sites and elsewhere to train your system. When doing so, be sure to double check the license details of such publicly available corpuses, as some will not allow commercial usage. Then, during the preprocessing stage—after removing swear words, jargon and other irrelevant elements from your corpus—you can add noise and slightly alter your existing data to create even more relevant data points in your corpus. 

After this, you can build a semi-supervised learning algorithm through techniques such as tri-training. In tri-training, we essentially train three different neural networks over binary classification data. Once two models agree with one another, the data is introduced to the third network and then re-trained until all three models are in agreement. However, before your AI tools are up and running, it’s important to be mindful of the following common pitfalls.  

Overfitting AI Models

It may seem counterintuitive, but you really do not want your AI model to be too accurate. If your cross validation score is anything above 100, then you might need to reconsider deploying the model, since it might be prone to overfitting. If your training data is overfit, one small change can render the entire model useless. Your model should be accurate. However, it cannot be too accurate, as it needs to be adaptive. 

Class Imbalance

When training your AI tools, it sometimes makes sense to include a disproportionate amount of your target variable. For example, perhaps you’re training your computer vision AI tool to recognize planks of wood in a warehouse. Even if planks of wood only account for 1% of your total inventory, you’ll want to oversample the minority class. In this case, you’ll likely want 60% of your sample to be planks of woods. You really want to amplify this number to ensure you have the right class balance. 

Data Leakage

Another common misstep is the failure to capture data leakage, which can occur in any number of ways. A common instance occurs when an element used to test the model slips into the training data. This generally causes your model to appear to be performing well. However, its predictions might not be as good when deployed in production. To rectify this, it is vital that you keep track of any changes to your AI tools. In addition to your organization’s code reviews and configuration change reviews, you should have data set reviews to ensure your AI engineering is being done properly. Any parameter change within your AI tools needs to be fastidiously documented. 

Concept Drift

All AI tools are dependent upon the shape of the data that comes into your system. For example, perhaps your AI model was trained for response times being between 1-100 seconds, and all of a sudden the response times are between 1-1,000 seconds. In this case, your model will likely fail to capture the new data points. It’s important to be aware of such an occurrence, so that you can retrain your model accordingly. 

There are many techniques that can be used to identify instances of concept drift. Rather than waiting for feedback from your customers, you can set up anomaly detection in a way that it identifies concept drift on incoming data and automatically updates your AI model.

Using Personal Data

With the CCPA and GDPR in place and a federally mandated U.S. data privacy law on the horizon, organizations are wary of using their customers’ data to train their AI tools. It’s prudent to train your AI system with anonymized data; this way, you won’t set yourself up for problems down the road. 

Failing to Ensure All Your AI Decisions Are Explainable

Most AI-enabled decisions are enacted upon by teams rather than individuals. Hence, it’s vital that your AI model’s recommended courses of action come with explanations. Additionally, these explanations should have confidence intervals, making it easier to interpret results. 

During root cause analysis and remediation, it’s often unclear which team is ultimately responsible for the issue at hand. If the AI’s suggested course of action comes with an explanation, this helps immensely during the remediation process. For example, if the starter event is a network build-up, the issue likely lies with the engineering team. Alternatively, if the starter event is a network configuration change, the network team is the likely culprit. By closely monitoring your AI tools’ performance and ensuring that all AI decisions are explainable, you can quickly solve issues and course correct.

Filed Under: AI, Blogs, DevOps Practice Tagged With: ai, AI tools, artificial intelligence, data leakage, personal data

« 3 Ways DevOps Can Increase the Value of Organizational Data
IBM Boosts Security and Productivity with Red Hat OpenShift 4.3 on IBM Cloud »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

The Testing Diaries: Confessions of an Application Tester
Wednesday, March 22, 2023 - 11:00 am EDT
The Importance of Adopting Modern AppSec Practices
Wednesday, March 22, 2023 - 1:00 pm EDT
Cache Reserve: Eliminating the Creeping Costs of Egress Fees
Thursday, March 23, 2023 - 1:00 pm EDT

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

HPE to Acquire OpsRamp to Gain AIOps Platform
March 21, 2023 | Mike Vizard
Oracle Makes Java 20 Platform Generally Available
March 21, 2023 | Mike Vizard
How to Maximize Telemetry Data Value With Observability Pipelines
March 21, 2023 | Tucker Callaway
Awareness of Software Supply Chain Security Issues Improves
March 21, 2023 | Mike Vizard
Why Observability is Important for Development Teams
March 21, 2023 | John Bristowe

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

SVB: When Silly Valley Sneezes, DevOps Catches a Cold
March 14, 2023 | Richi Jennings
Large Organizations Are Embracing AIOps
March 16, 2023 | Mike Vizard
Addressing Software Supply Chain Security
March 15, 2023 | Tomislav Pericin
Modern DevOps is a Chance to Make Security Part of the Process
March 15, 2023 | Don Macvittie
What NetOps Teams Should Know Before Starting Automation Journeys
March 16, 2023 | Yousuf Khan
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.