en.Wedoany.com Reported - As generative Artificial Intelligence (AI) transitions from the Proof of Concept (PoC) stage to actual service operations, the challenges faced by enterprises have expanded from improving model performance to managing the content and behavior of AI outputs. Against the backdrop of AI's full integration into business and services, erroneous responses, security issues, and unexpected failures are directly translating into corporate risks, making "LLM Observability" a new topic in the enterprise IT domain.
In a video interview with ZDNet Korea on the 26th, Ko Ji-hoon, head of the WhaTap Labs application team, and developer Shin Min-cheol emphasized the changes enterprises must address in the era of generative AI and the importance of LLM observability. Ko Ji-hoon stated that even if responses are provided by AI, customers will ultimately regard them as official corporate information. Therefore, a system for continuously managing response quality and reliability is indispensable during the AI service operation phase.
The case of Air Canada demonstrates that enterprises must bear responsibility for AI responses. The company's chatbot once informed a customer about a non-existent discount product. The customer purchased a ticket based on this information and, upon being denied the discount, initiated a legal dispute. The Canadian court ruled that even if the response came from AI, the responsibility for published information lies with the enterprise. Air Canada lost the case, facing financial losses and reputational damage. Shin Min-cheol pointed out that cases where AI chatbot responses are regarded as the official stance of an enterprise are occurring frequently, and a single erroneous response can directly lead to financial losses and a decline in brand credibility.
Ko Ji-hoon added that as of last year, most enterprises were still at the level of AI pilot applications. However, starting this year, cases of actual service deployment are rapidly increasing, particularly in the financial, public, and enterprise sectors. Yet, many enterprises have launched services without a system for observing response quality.
Existing monitoring methods are inadequate for detecting AI response errors. Even if server and network metrics are normal, there is no way to know when AI outputs incorrect responses. Ko Ji-hoon noted that enterprises may face new types of problems where CPU and memory are normal, but customer complaints surge. Monitoring only infrastructure cannot capture anomalies in response quality. Meanwhile, security threats are also evolving in new forms. As AI agents can execute code and control systems, "Prompt Injection" attacks—where malicious inputs induce AI to perform unintended operations—have become a reality. WhaTap Labs also experienced an incident during internal experiments where, without malicious input, an AI misjudgment led to the complete deletion of a development PC folder. Shin Min-cheol explained that LLMs have evolved from being limited to generating text to becoming agents capable of function calling, code execution, and external system control. A single prompt input can now be directly linked to actual system operations.
To address these issues, WhaTap Labs has launched an LLM observability solution. This solution performs correlation analysis across the entire process, from GPU resource usage to application performance and AI response quality, enabling unified management of errors and failures occurring in the service operation environment. Key monitoring items include: the appropriateness and accuracy of AI responses, hallucinations (where AI fabricates non-existent information), prompt injection attacks, whether personal information is included, unnecessary response detour paths, and Token and GPU resource efficiency. This solution is particularly suitable for domestic financial and public institutions that build their own GPU operation models due to security reasons, preventing the use of external AI services. It is explained that in a self-built GPU operation model environment, the Tokens used for AI responses are directly related to GPU resources. By optimizing response paths, both processing performance and cost efficiency can be improved. Shin Min-cheol emphasized that enterprises operating AI services should have a system to monitor everything from response quality to security threats on a single platform, which is the core infrastructure for maintaining service credibility. Ko Ji-hoon predicted that in the future, the role of operators will shift from directly analyzing data to designing guardrails for safe AI operation. A unified system for observing infrastructure, applications, and AI models will determine enterprise competitiveness.
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









