More data may not always be better, but some is needed.
Big Data is all the buzz these days, but what does it mean? You're not alone if Big Data leaves you scratching your head like Benjamin Braddock in "The Graduate" when Mr. Maguire says "plastics."
Data has become synonymous with credibility because data speaks for itself: It's accessible, unassailable, even democratic in how fairly it seems to present the facts. The experts always fall back on "the data." If data is so trustworthy, so untainted, then it's easy to imagine how Big Data can sound utopian.
This may be why Big Data has spawned something like the industrial version of a New Age faith in data as a Rosetta stone for making sense of chaos, for distilling random and inexplicable events into understandable sequences and avoidable outcomes. Big Data holds the promise of omniscience, the power to foresee every conceivable outcome before it happens and to steer us away from all manner of bad and costly mistakes. It's exciting.
The truth is slightly less exhilarating because data doesn't come with the answers or predictions it's supposed to portend. Data is just data. It's inanimate and inarticulate, and making data say meaningful things, especially if there's a lot of it, is complicated. It takes tools, knowledge and domain expertise for Big Data to live up to its potential because more is not necessarily better, and may, in many cases, be worse by complicating the job of analysis.
So how much data is enough? The answer depends on the effectiveness of the process for separating wheat from chaff; the knowledge and experience of the people designing and managing the process and their demonstrated ability to find meaningful information in clouds of background noise or even gibberish. Performance enveloping illustrates the problem.
Performance enveloping is the process of monitoring hundreds and sometimes thousands of data points and using them to take a snapshot of a moment in time during which a process (mechanical or otherwise) appears to be performing optimally. The snapshot creates a picture or envelope of parameters that signal optimal performance. Analysts compare data subsequently captured in real time to the snapshot of optimal performance to expose deviations in performance. Statistically significant deviations can suggest a process has gone awry.
But once the data deviates from the optimal norms, then what? Is the abnormal data the cause, the effect or just a symptom? And how serious is the problem? Knowing something is wrong is very different from understanding the problem and further still from knowing how to fix it, and all this precedes estimations of risk (to prioritize the seriousness of the detected fault) and determining accountability (because someone will have to pay). Is it design flaw or human error? And if human, is it a systematic or cultural problem, a random or predictable event?
Data can point fingers, so who analyzes it matters. To the uninitiated, Big Data may sound like it can reduce cause and effect to black-and-white events detectable by "if, then" logic and algorithms engineered from a blank sheet of paper. Experienced plant managers know better.
Understanding how data and events may be correlated, especially in complex industrial or mechanical processes, still requires domain expertise—knowledge and experience augmented by informed inference, intuition, intelligence and logic. These are uniquely human faculties on which we all depend all the time to make sense of the world. It's why experience and knowledge matter.
Can a lot of the "art" of domain expertise be reduced to code? Certainly. Data has always been the medium of choice for translating inference and intuition into describable, predictable and sometimes correlated phenomena that can reduce if not eliminate our dependence on informed guesswork. But the process has always been iterative and dynamic. Transforming intuition into logic, art into science, is a journey of accumulation during which knowledge becomes so specialized by virtue of perpetually building on itself that it becomes all but unintelligible to anybody but experts.
The truth is, for all its promise, Big Data may not be accessible or democratic at all, because the process of decoding unexplained events and compiling new knowledge has the curious side effect of concentrating expertise with the experts, and at an accelerating rate. As applications of Big Data evolve, they give birth to entirely new lexicons, each one unique to the domain expertise of the analysts and scientists exploring new frontiers in performance—until the experts are speaking their own language, a language foreign to everyone but each other. Fluency in foreign languages should never be underestimated. A product I received recently came with the following warning: "Do not overcharge or your battery may catch a fever." You would not want this translator interpreting the Big Data outputs of your fever-prone 350,000-hp compressor. Ironically, the misuse of a common phrase demonstrates how profoundly Big Data applications will depend on expert interpretation.
Human nature yearns for pattern recognition. It wants to see rational design in chaos and will leap to conclusions that seem to fit, phrases that appear to make sense. This is Big Data's deathtrap. Interpreting the data incorrectly can be catastrophic, and with more data—Big Data—the nuances become more subtle, the meanings more complex.
Does that mean Big Data's utopian promise is at risk? Yes and no. Experts with the tools and domain expertise to harvest Big Data will multiply their knowledge and influence, which will increase their value in the marketplace. Life will be good, even utopian, for the experts who find ways to harness Big Data to enrich products and services. They will make the world a better place. We will know more, fail less and produce more efficiently in safer environments. But the spoils will go to the experts. It's as old as the rich getting richer because knowledge, like wealth, compounds.
Contrary to retail or finance, industry has not excelled at accumulating knowledge. The distinct and distributed nature of production operations have been natural obstacles to the study of common industrial challenges and opportunities. That will change as the Industrial Internet of Things (IIoT) creates the same digital fabric of interconnectedness that we see in financial and retail markets. Maintaining competitiveness will demand participation, and participating will expose flaws and opportunities.
Leaping into the Big Data stream and submitting to the diagnostic scrutiny of experts will feel anything but utopian, but doing it early will pay off. Early adopters will benefit disproportionately by leveraging novel insights, and by developing fluency in the language of performance that the domain experts will use in the process of deploying their Big Data applications. We will know Big Data has truly taken hold when the buzz about Big Data subsides, and the new hot topic is the power of performance analytics. This will be the sign that Big Data is finally saying meaningful things. Will Big Data learn to speak on its own, and how will its insightfulness evolve?
These are the questions the IT-heavy, billion-dollar IIoT platform designers seem committed to answer, even though Big Data is in use today by domain experts who routinely apply scalable, expert diagnostic systems to reduce cost, improve safety, increase production and optimize efficiency. They are already compiling the data and codifying the knowledge that is laying the foundations for standards of performance, which are already in use to benchmark and improve performance.
It's possible that holistic platforms, self-learning machines, optimization algorithms and automated everything will eventually render the domain expert superfluous. But I wouldn't count on it in industry, because even retail and especially finance still depend on these masters of the universe.
How should performance be defined? How do you cost and price it? And how much of performance belongs to the vendor vs. the buyer/operator? These are only a few of the questions Big Data users and application developers are rushing to answer. And because Big Data is a language, the meanings it can express are virtually limitless. Unfortunately, data has limits, starting with the finite universe of events it's capturing. Data cannot know what it does not measure, and this is where selling, measuring and managing performance
First published with Plant Engineering, June 2016 magazine.
Click here for the original published article and full issue.