Quality Data has to Lead the AI Charge

Posted by IntegrityM | | Data & Statistical Analysis, Hot Topics, Technology & AI

Author: Natasha Williams, Chief Operating Officer

There is a wave of excitement in the industry surrounding artificial intelligence (AI), and not without reason. AI, with its ability to synthesize massive quantities of information at lightning speed, promises limitless possibility. I share in the collective optimism surrounding this new frontier. The detection and prevention of fraud, waste, and abuse (FWA) in healthcare can feel overwhelming in scope, and tools as potentially powerful as AI would be significant assets in the fight to protect vulnerable populations and tax payer dollars.

That being said, many of our current conversations about AI, machine learning, and emerging technology tend to downplay one foundational truth that underpins any future successes in AI implementation – without the right data, Artificial Intelligence…isn’t.

The value AI brings to the table lies in its computing power. It can run sophisticated analysis on terabytes of data in minutes and draw connections at speeds that humans simply cannot replicate. The quality of its results, however, is based entirely on the quality of its data. At the end of the day, AI algorithms are executing on a task based on the information they are given. If fed poor information, they will inevitably return poor (or even unusable) results.

I’ll illustrate with a relevant example: data analysts develop an algorithm to identify potentially aberrant providers: a list of potential leads for fraud investigation. The algorithm is designed to produce a common red flag for potential fraud – cases where a provider prescribes an expensive therapy (typically requiring extensive diagnostics) to a beneficiary that they’ve never previously met.  The AI does its job beautifully and returns a list several pages long, that is passed onto the team of investigators. Soon enough, the analytics team hears back: most of the leads they flagged are actually false positives – requiring no further investigation.

What went wrong?

The analytics team hadn’t realized that they were missing one key facet of information – that in this specific scenario, attending physicians on hospital claims have a prior relationship with the beneficiary. In these frequent cases, patients may well have been run through extensive diagnostics by a hospital – and as a result, been prescribed an entirely valid treatment by this “red flag” provider. There was no way for the algorithm to pick up on this nuance without being specifically told by a human. It needed expertly informed, quality data to do its job correctly. 

This is just one example – but there are many ways seemingly small deficiencies can be magnified by strict computer algorithms to produce unsatisfactory results. To circumvent this, I propose a new primary paradigm for AI-bound datasets: the IACC (Informed, Accurate, Current and Comprehensive) standard.

IACC – Informed, Accurate, Current and Comprehensive


Information is only as useful as the question it answers.  Before you even enter the collection stage for data, ensure you have a specific question or problem you’re trying to solve with the information you’re gathering. Knowing what you need to know will be the best guide to building a robust data set.

I’ll illustrate with a simple, non-medical example. Let’s say that you are looking to analyze the impact of sunlight on plant growth. There are a number of obvious variables that you’ll need to capture: plant height and sunlight exposure being the most readily apparent.  However, you’ll also want to consider potentially confounding variables in your dataset. Soil type, provision of water, plant species, temperature, other growth treatments – failing to include all of these in your analysis will obviously lead to a skewed (and most likely downright inaccurate) conclusion.

The same tenets apply, at a more advanced level, to the healthcare arena. Without an understanding of the question one is asking and what it entails, one runs the risk of compiling datasets that ignore important contextual components.  In compiling datasets, it is crucial to recruit experienced subject matter experts (SMEs) who can provide this understanding and guide the selection of relevant information.


This is the most obvious, and most crucial component of a quality data set. Whatever you’re measuring, make sure that it’s captured correctly.

Pin this down first; have humans validate information, cross-reference, and re-check the facts that your data sets contain. To do this thoroughly, you’ll likely need to include both data experts and subject matter experts to collaborate on meaning and operationalization.


Information changes. Context changes. Even if your dataset is close to perfect, the rules you’ll apply to it won’t remain static. The regulatory environment in 1992 was not the same landscape as the one we operate in now.

Ensure that the rules you’re using, the laws you’re applying, and the measures you’re gathering are all reflective of the most recent documentation available.


I’ll refer again to the example I provided earlier. The data analytics team I referred to had checked all of the other boxes for a quality data set – their algorithm was working with raw material that was informed, accurate and current. Its Achilles’ Heel came in missing the full picture: their information wasn’t sufficiently comprehensive enough produce reliable results. It’s easier said than done, but ensuring that you have the full (sometimes complex) picture of the healthcare scenario you’re looking at is indispensable when optimizing machine-driven solutions.

Quality IACC data is work to compile and maintain, and in many cases will require the backing of data analysts and area SMEs. This ingredient, though, is the first and most fundamental step for all organizations that want to leverage AI. As we pave the way to implement widespread technological innovation, let’s ensure that we’re equipped to train our AI with informed, accurate, current and comprehensive data it will need to succeed.

Integrity Management Services, Inc. (IntegrityM) is a certified women-owned small business, ISO 9001:2015 certified, CMMI Level 3 Services and FISMA compliant organization. IntegrityM was created to support the program integrity efforts of federal and state government programs, as well as private sector organizations. Results are achieved through consulting services such as statistical and data analytics, technology solutions, compliance, audit, investigation, medical review, and training.

Certifications and Memberships

U.S. Women's Chamber of Commerce Small, Women and Minority Owned CMMISVC-3 GSA Contract HolderISO Member ASA

Copyright © 2021 Integrity Management Services, Inc. All Rights Reserved.

GLȲD(Σ)TM and the GLȲD(Σ)TM Logo are registered trademarks of Integrity Management Services, Inc. in the United States and other countries.

Privacy Policy | Sitemap