Without Quality Data, Artificial Intelligence Is a Shriveled-up Prune

Shriveled-up prunes resembling an AI application with no data — *I’ve got nothing against prunes… Really.*

Building Artificial Intelligence (AI) applications in your business without high quality data is like driving a Ferrari without gas. It’s expensive and it gets you nowhere.

Defining Artificial Intelligence

Before dissecting my claim, it’s important to first state that the field of AI is overwhelmingly vast. Currently, Limited Memory AI is the most promising and useful type of AI for business. Therefore, it is the focus of this article. Limited Memory AI uses machine learning (ML) algorithms that combine various data sets to establish probabilistic knowledge. These data sets can come from facts (e.g., measurement conversion rates), past events (e.g., previous system transactions), simulations (e.g., weather projections) or even previous algorithm executions (e.g., a toy car better recognizing a wall after hitting into it multiple times). In essence, the goal of these algorithms is to iteratively crunch data, recognize patterns and establish principles to make predictions within a confidence interval. Rules can then be build around those predictions and confidence intervals to automate actions within your systems.

As you can see, to effectively harness Limited Memory AI, you need quality data. It must be complete, available & accurate/trustworthy. Without it, system predictions are worse than human predictions (which are not great to begin with…). So, how can you ensure you have a solid data foundation on which to build your organizational AI capabilities? The answer is simple enough but attainment is difficult…

Building a good data foundation requires mastery of data quality management process for three types of system data: organizational data, master data and transactional data. In short, this involves establishing a set of standard processes and tools to govern your data’s lifecycle (creation to decommission). How do we do this?

Data Quality Management

Organizational Data Quality

First, there is organizational data. This is the easiest type of data to manage as it is the most static. Organizational data represents the entities that make up your organizational structure. These can be concrete entities such as physical plants, office buildings or production lines. These can also be conceptual entities such as divisions, customer segments or departments. Because of its foundational nature, organizational data is naturally updated when there are changes in the organizational structure. These are rarely forgotten as they enable operations and reporting according to the new structure. Everything else “hangs” on these data objects.

Nevertheless, ensuring data quality requires good governance. You should create processes and tools for requesters within the organization to submit their creation/modification requests. These tools should capture all the data required to action these objects within your systems (e.g., web forms). Furthermore, you should have mechanisms in place to verify the validity of organizational data on a consistent basis (e.g., automated yearly questionnaires to data owners). Even better if you have automated triggers to decommission old data when required (e.g., office closure events).

Master Data Quality

Second, there is master data. This is where things become more complex. This type of data changes continuously. Master data represents the objects needed to carry out business activities, or transactions. For example, in order to execute a purchasing transaction, you need at minimum a supplier, a product and a price. As these objects do not define the organization’s structure, nor can they represent a transaction on their own, they are categorized as master data. Furthermore, sources for master data can be internal to the organization (e.g., a bill of materials for a finished good as determined by the marketing department) or external (e.g., a supplier’s address, pricing, payment and banking information). This makes managing the quality of this type of data difficult.

Internal Master Data

For internal master data, assigning data ownership to the teams responsible for creating that data will help. For example, plant maintenance teams should be responsible for a manufacturing plant’s location address data (where everything is located in the plant). If teams are empowered, staffed and accountable for data maintenance, they will maintain address data as it changes. Why? When the digital world reflects the physical world, doing a god job is much easier. However, your teams need help transitioning to this model. Consider using a Data Centre of Excellence (CoE) to help them ramp up in competency. You can also turn to automation once your data quality processes are mature.

External Master Data

For external data, you also need data owners. However, as the “real data owners” are outside your organization, this is not sufficient. The ideal setup involves getting a data feed directly from the source that automatically updates your data (e.g. commodity pricing from the reference index). For distributed master data, you might be able to find solution providers that specialize in aggregating, enriching and publishing the type of data you’re looking for. They might even be using their own internal AI/ML technology that runs on a high-quality data foundation to provide this service. However, when this type of feed doesn’t exist, you can fall back on supplementing your process with user knowledge, judgements and/or the use of surveys. Bonus points if you partner with a solution provider to develop the feed you need!

For example, in the case of supplier data, it is the supplier who is the “true data owner”. It is their actions that impact data validity over time. This makes keeping a supplier database up to date difficult. While it’s not impossible to get thousands of your suppliers to regularly update their information in your systems, it’s pretty close. It’s also important to note that attempting it will drive you mad. So, if you can connect your systems to a service that does it automatically for you, doesn’t that make sense? Of course, you will always have internal contextual information to capture as well (e.g., your particular salesperson with a supplier). Additionally, these data feeds won’t always be 100% accurate. Therefore, coupling data owners with data feeds is crucial to ensuring high quality external data.

Transactional Data Quality

Finally, there is transactional data. This type of data is also difficult to manage. It represents all the different variations of business processes executed within your organization. Every day, every time someone interacts with a system to execute an action, they are logging a transaction. For example, purchase orders, goods receipts, invoices, sales orders, production orders, employee onboarding, creating a new user in a system, etc. are all part of transactional data. Each type of transaction, say creating a purchase order, can also contain multiple variations (e.g., Stock, Non-Stock, Subcontracting, Consignment, Free good, etc.). This represents hundreds if not thousands of different processes all generating transactional data. Without oversight you quickly end up with a “data box of chocolates”… You never know what you’re going to get!

Governing Processes

In this case, it is not the transactional data itself that needs to be governed. Rather, it is the overarching processes that need owners. As processes are made up of transactions, if each process owner ensures a coherent end-to-end process from a business and application perspective, it will produce high quality data outputs. When analyzed, these will provide meaningful insights about how to further optimize your business. To achieve this, identify and catalogue your business processes, assign them to an owner, and then formally define and translate them to system processes and transactions.

In short, start by identifying and cataloging your processes in a business process hierarchy (BPH). You can use the APQC Process Classification Framework as a starting point. Assign a business owner who is responsible for defining, overseeing, managing and optimizing each key process. This should be a senior person within the associated function. By focusing on process measurements such as data quality output, continuous improvement of process outcomes will become top of mind. This will lead to increasingly sophisticated “application guardrails” as process and application weaknesses are identified and resolved with new process activities, system configurations, development or add-ons.

A Mountain to Climb

As I’m sure you’ve realized by now, mastering quality for these three types of data across your organization is no small endeavor. In fact, I don’t think I’ve ever seen it done across a whole organization.

Should that deter you?

No.

Do not surrender progress to perfection. Start small. Master data quality for the areas of your business where Limited Memory AI feasibility and benefits align. This type of approach does not need to be widespread within your organization to have massive impact.

Start Small

For example, let’s explore the purchasing (P2P) process. Why not start with data owners for your top 20 vendors? Or, by assigning process owners for your top 5 most executed purchasing processes? If these are already optimized, start with troubled vendors/processes. The Pareto’s principle usually applies here (80% of your problems come from 20% of your vendors/processes). Give your team a set of ground rules (a short training, document templates, terminology, etc.) and begin. Adjust as needed along the way. As you start cultivating a data quality mindset with your team and tracking the results, the initiative will snowball and grow by itself. As you create “believers” in your team, this will lead to uncovering opportunities for automation of the various data quality management processes you’ve built.

Develop Artificial Intelligence Literacy

In parallel, work on developing your team’s Artificial Intelligence literacy. Once you have acceptable data quality levels, the real difficulty becomes identifying valid opportunities for the use of Limited Memory AI within your business. You must be able to separate the art of the possible from the empty promises. As a starting point, you now know that high-quality data is required for the effective use of Limited Memory AI, regardless of its source. When AI initiatives come up in internal meetings or supplier sales pitches you can now ask the most important question to consider the proposal: “Where does the underlying data come from?”

How to “Mr. Miyagi” Data Quality

Once you understand data quality management principles and pair that knowledge with Artificial Intelligence literacy, you may start asking yourself: “Couldn’t we use Limited Memory AI to help automate the data quality management process?”. That’s when you’re ready to fight Johnny Lawrence in the All Valley Karate Championship…

As you design data quality management processes and assign data owners to different types of objects, you will undoubtedly realize that mastering data quality is hard. For master data, the job is never done because data is always changing/updating and you have imperfect information (e.g., supplier data). For transactional data, there are always new process variations that pop up because of specific just-in-time business requirements. These are typically executed outside the system and supplemented by accounting entries or by perverting existing processes in your systems. This all contributes to data quality degradation.

The Tools That Will Help

The most impactful Limited Memory AI tools available on the market today are tools that help you proactively manage these data quality management problems at the source by automating a large part of the processes described above. These applications recognize that it is unreasonable to expect humans to keep data quality perfect and that you are most likely starting your journey with low quality data. They execute machine learning algorithms that work on continuously and iteratively bettering data quality levels with a combination of available data and user inputs. Over time, as the application develops operational rules based on usage, these applications have the potential of getting you pretty close to perfect. Better data may not be perfect data, but it sure as heck is better than bad data.

In Conclusion

Once your quality data foundation is in place, this opens up additional opportunities to leverage Limited Memory AI to recognize patterns and (more reliably!) predict the future across your business.

Don’t think of AI initiatives as an all or nothing, now or never proposition. It is a journey with a promise of exponential returns but requires long term vision and continuous work.

Take your cue from the plum tree… On average, it takes 6 years for a plum tree to bear fruit. Then, you either put in the work to pick the fruit or you end up with a bunch of shriveled-up prunes.

What will your data be doing in 6 years?

Summary

Limited Memory AI is currently the most viable form of Artificial Intelligence for business purposes.
In order to benefit from these possibilities, data quality must be high.
To improve data quality, you must tackle three kinds of data: organizational, master and transactional.
- For organizational and master data, position data creation and maintenance actions as close to the true data owners as possible.
- For transactional data, use process owners and process management techniques to get high data quality outputs from your processes.
Start small, with the data objects and processes that will provide the most benefits with higher data quality levels and Limited Memory AI functionality.
As you augment your data quality management maturity, work on automating processes where possible and developing your organization’s Artificial Intelligence literacy.
- Automate data consistency problem correction (e.g., setup business rules around “illegal” report line item combinations with notifications). Data owners can act as reviewers instead of researchers when this is setup.
- You need to know how to spot real Limited Memory AI opportunities within your business. You also need to be able to spot empty promises to avoid building/purchasing AI “shelf-ware”.
The shortcut in this process is purchasing/building Limited Memory AI applications that help you automate data quality management processes (such as maintaining master data or mining & defining processes). These applications greatly reduce or eliminate the data owner’s operational task load in data quality management processes.
- These are the types of “AI applications” that show the most concrete value today on the market.
- As these applications are adopted more widely, new AI applications that need high data quality levels will emerge and be viable for use by organizations who have a data foundation in place.
How are you working on your data foundation? Remember the Plum tree!

———————————
What other mechanisms have you used to master data quality? What other factors do you see contributing to the success of AI initiatives in organizations? What are your AI initiative lessons learned? Let me know in the comments.

If you liked this post, why not Subscribe

Without Quality Data, Artificial Intelligence Is a Shriveled-up Prune12 min read

Defining Artificial Intelligence