ADVERTISEMENT
Artificial Intelligence (AI) is rapidly transforming industries, from healthcare to banking, reshaping how businesses operate and interact with customers. According to a report by NASSCOM titled ‘AI Enterprise Adoption Index 2.0: Tracking India’s Sectoral Progress in AI Adoption’, India’s AI market is expected to grow at a 25-35% CAGR over the next three to four years, aligning with global trends. However, this anticipated growth is contingent upon efficient AI use, which, in turn, depends on high-quality datasets.
While AI holds immense potential to drive innovation, it also brings forth critical challenges—one of the most pressing being AI hallucinations. As AI systems become more deeply embedded in decision-making processes, understanding and mitigating AI hallucinations is essential to ensuring accuracy, reliability, and trust. This is often due to the models being trained on broad internet data, which contains both reliable and unreliable information, rather than employing robust ranking methods to filter for quality, as search engines do.
Similar to human hallucinations, where we see something that isn’t there, with AI hallucinations, models generate incorrect or misleading information. Neuroscientist Douglas Hofstadter demonstrated a known LLM’s tendency to "hallucinate" by asking nonsensical questions. When asked about when the Golden Gate Bridge was moved across Egypt, the LLM confidently provided a false date. This highlighted its ability to produce fluent but factually incorrect responses, revealing a reliance on patterns over real-world knowledge.
LLMs hallucinate when they encounter data limitations, such as biases, outdated information, or missing datasets, leading to inaccurate outputs. Model limitations, including overfitting and a lack of real-world grounding, also contribute to their tendency to generate misleading responses. Contextual ambiguity further exacerbates the issue, as AI models may misinterpret prompts and produce incorrect or fabricated information with high confidence. If the data sets are incorrectly labelled, LLMs can “hallucinate” by confidently generating incorrect information.
AI hallucinations have far-reaching consequences across industries. Especially in a country like India, where businesses are increasingly becoming digital, potential risks include misinformation and disinformation, medical errors, financial losses, reputational damage, etc. AI-driven trading or analysis tools generating erroneous insights can result in substantial financial setbacks for businesses and investors. Businesses that rely on AI-generated insights for customer service risk losing consumer trust if their AI systems produce misleading or fabricated information.
Many enterprises fine-tune or train commercial and open-source large language models (LLMs) using publicly available and proprietary data rather than building their own models from scratch—a process that can cost millions of dollars and is out of reach for most companies. While foundational LLMs are trained on vast amounts of internet data, this data includes both accurate and unreliable information. Unlike search engines with sophisticated ranking algorithms to refine results, LLMs lack mature filtering mechanisms, making them prone to generating incorrect or misleading responses.
To enhance reliability without changing or fine-tuning models, businesses are increasingly turning to Retrieval Augmented Generation (RAG) techniques. RAG enables models to access a specific knowledge base with the most up-to-date, accurate information available before generating a response. By incorporating ranking techniques similar to search engines and prioritizing high-quality datasets, enterprises can leverage RAG to develop AI applications that deliver more accurate and trustworthy insights. For instance, banks utilizing AI for fraud detection would benefit more from harnessing insights from internal transaction patterns rather than general financial market trends.
AI is only as good as the data it learns from. Clean, structured, and centralized data ensures accuracy and reliability. Yet, many enterprises struggle with fragmented data silos, leading to AI models learning from incomplete or biased information.
To maximise AI's potential, organizations must prioritise data governance, real-time validation, and seamless integration of structured and unstructured data. Curating high-quality internal datasets over unfiltered public data improves AI reliability, which is critical for fraud detection and risk analysis applications. By implementing stringent vetting processes, organizations can minimize bias and maintain high data integrity.
While improving data quality is crucial, human oversight remains indispensable in mitigating AI hallucinations. Organizations must implement human-in-the-loop models to validate AI-generated outputs, foster AI literacy within teams to identify and correct hallucinations, and develop regulatory frameworks to ensure responsible AI deployment.
For AI to become a truly trusted tool, organizations must minimize hallucinations through rigorous data management and governance. The reliability of AI systems depends on their ability to produce accurate, transparent, and trustworthy insights. Ultimately, just as a well-balanced diet supports long-term health, a strong AI data strategy ensures accuracy, reliability, and trust—essential ingredients to harness AI’s potential fully; businesses must treat data as the foundation of AI reliability, ensuring that its deployment is both responsible and impactful. With a strong focus on data integrity and human insight, India’s ‘trusted AI evolution’ can pave the way for sustained growth and meaningful innovation.
Views are personal. The author is Managing Director- India, Snowflake
Fortune India is now on WhatsApp! Get the latest updates from the world of business and economy delivered straight to your phone. Subscribe now.