Christopher Savoie, PhD, is the CEO & founder of Zapata AI. He is a published scholar in medicine, biochemistry and computer science.
Generative AI is continuing to make headlines for its impressive ability to create text and images on command through well-known tools like ChatGPT. But text and images are just the beginning of what generative AI can do—particularly when it comes to enterprise use cases.
In my last article for the Forbes Technology Council, I covered some underreported enterprise use cases for generative AI, including industrial optimization problems. I also discussed how to reduce the costs of generative AI tools trained using a company’s private IP and data by compressing large language models (LLMs).
Here, I’ll discuss how generative AI can enhance a core capability of a 21st-century enterprise: data analytics.
Generative AI And Analytics
Data analytics—whether descriptive, predictive, or prescriptive—are widely used across industries to help leaders make more informed decisions. But analytics are only as good as the data that fuels them, and without all the right data, analytics can suffer. Generative AI can help fill in the gaps and generate statistically accurate scenarios and simulations for consideration by enterprise decision-makers.
There are many reasons why data would be difficult to collect. In some cases, data access is restricted by legal protections, privacy regulations or ethical concerns; for example, with healthcare data. For other data of interest, like data on stock market crashes, the sample size would be too small for meaningful analysis. Cost constraints can also limit data collection, as can physical limitations—for example when it comes to placing sensors on industrial equipment. Importantly, data are often not sufficiently available for underrepresented populations—because, by definition, they are statistically underrepresented.
In these cases where data is difficult or impossible to collect or is simply unavailable, generative AI can generate synthetic data that follows the probability distribution of the real data. This way, generative AI can augment incomplete or missing datasets to enrich analytics and train machine learning models.
Synthetic data isn’t exactly new: the concept has existed since the 1970s. But synthetic data capabilities have improved dramatically in recent years due to advances in AI and modeling. By 2024, Gartner has predicted that 60% of all data used for developing AI and analytics will be synthetically generated.
However, recent advances, spurred in part by the recent generative AI gold rush, portend even greater improvements for synthetic data and the business decisions that flow from them.
Inferring Data For Unmeasurable Variables
In many cases, key data is not just expensive or difficult to collect, but impossible. In automotive and aerospace design, downforce and drag have a major impact on the driving or flying experience but can’t be measured directly. The same is true for various risky lifestyle choices of interest to insurance companies. It is also impossible to directly measure an individual’s placebo response—a major confounding variable in determining a drug’s effectiveness in clinical trials.
Using complex mathematical models, data for each of these variables can be inferred from correlations with measurable data. This isn’t just generating new rows in a datasheet—it’s generating new columns. The result is a more complete picture of the ground truth that can reduce trial and error, create more accurate simulations, and ultimately save costs across industries.
This is not to be confused with mock, “fake” data. It is rooted in the statistical properties of the real observable data. Unlike mock data, it can extrapolate rules that govern the interrelations among the variables and use these rules to create realistic, novel data and scenarios.
Combining LLMs And Synthetic Data
The power of LLMs in applications like ChatGPT speaks for itself. But these LLMs also have valuable applications for generating synthetic text-based data.
One example is AI-based clinical decision support or prescriptive analytics in healthcare. Biases in training data are a significant ethical concern with AI in healthcare. Training data for clinical AI is already difficult enough to obtain due to privacy regulations, and even more so for minorities and rare clinical cases.
LLMs could augment these training datasets with synthetic case studies for underrepresented communities or uncommon medical scenarios. Learning from these augmented datasets, clinical AI tools can mitigate bias and more reliably support clinicians with care plans for underrepresented patients.
Generating High-Quality Synthetic Data from A Limited Dataset
In generative modeling, generalization refers to a model’s ability to generate high-quality samples that don’t just repeat the training data. Generalization is of course highly desirable for any synthetic data problem, but it can suffer when the training data is limited.
However, recent research has suggested that the unique statistical power of quantum generative models may have an advantage over classical neural networks in generalizing from limited data. In the study, quantum-inspired mathematical models were run on classical hardware (GPUs specifically) but were forward-compatible with quantum hardware.
Though limited today by low qubit counts and high error rates, quantum hardware is on track to mature to the point where it can be used in production for enterprise use cases. In other words, generative models based on quantum statistics, but running on classical hardware like GPUs and CPUs, can provide an advantage today for synthetic data use cases with limited training data—with the potential for an even greater advantage on tomorrow’s advanced quantum hardware.
Implementing Generative AI For Analytics In The Enterprise
Every enterprise will have different applications for generative AI. Once you have a target use case, the next step is to choose a generative model and train it on your proprietary data, in your own secure environment. Different models have different strengths for different applications, so it’s necessary to benchmark their relative performance to choose the right one for your problem.
Generative AI continues to evolve, so models should be updated and tested over time. Critically, model outputs should not be blindly trusted—they should be continuously validated over time to ensure accuracy and reliable results. With the right safeguards, generative AI can be a valuable tool for enriching analytics and promoting better business decisions.
This post was created with our nice and easy submission form. Create your post!