Sunday, January 18, 2026

How Palo Alto Networks enhanced system safety infra log evaluation with Amazon Bedrock


This put up is co-written by Fan Zhang, Sr Principal Engineer / Architect from Palo Alto Networks.

Palo Alto Networks’ Gadget Safety crew needed to detect early warning indicators of potential manufacturing points to supply extra time to SMEs to react to those rising issues. The first problem they confronted was that reactively processing over 200 million every day service and software log entries resulted in delayed response occasions to those vital points, leaving them in danger for potential service degradation.

To deal with this problem, they partnered with the AWS Generative AI Innovation Heart (GenAIIC) to develop an automatic log classification pipeline powered by Amazon Bedrock. The answer achieved 95% precision in detecting manufacturing points whereas lowering incident response occasions by 83%.

On this put up, we discover the best way to construct a scalable and cost-effective log evaluation system utilizing Amazon Bedrock to remodel reactive log monitoring into proactive concern detection. We talk about how Amazon Bedrock, by Anthropic’ s Claude Haiku mannequin, and Amazon Titan Textual content Embeddings work collectively to routinely classify and analyze log information. We discover how this automated pipeline detects vital points, look at the answer structure, and share implementation insights which have delivered measurable operational enhancements.

Palo Alto Networks presents Cloud-Delivered Safety Providers (CDSS) to sort out system safety dangers. Their answer makes use of machine studying and automatic discovery to supply visibility into related gadgets, imposing Zero Belief rules. Groups dealing with related log evaluation challenges can discover sensible insights on this implementation.

Answer overview

Palo Alto Networks’ automated log classification system helps their Gadget Safety crew detect and reply to potential service failures forward of time. The answer processes over 200 million service and software logs every day, routinely figuring out vital points earlier than they escalate into service outages that influence prospects.

The system makes use of Amazon Bedrock with Anthropic’s Claude Haiku mannequin to grasp log patterns and classify severity ranges, and Amazon Titan Textual content Embeddings permits clever similarity matching. Amazon Aurora offers a caching layer that makes processing large log volumes possible in actual time. The answer integrates seamlessly with Palo Alto Networks’ current infrastructure, serving to the Gadget Safety crew deal with stopping outages as an alternative of managing advanced log evaluation processes.

Palo Alto Networks and the AWS GenAIIC collaborated to construct an answer with the next capabilities:

  • Clever deduplication and caching – The system scales by intelligently figuring out duplicate log entries for a similar code occasion. Moderately than utilizing a big language mannequin (LLM) to categorise each log individually, the system first identifies duplicates by actual matching, then makes use of overlap similarity, and at last employs semantic similarity provided that no earlier match is discovered. This strategy cost-effectively reduces the 200 million every day logs by over 99%, to logs solely representing distinctive occasions. The caching layer permits real-time processing by lowering the necessity for redundant LLM invocations.
  • Context retrieval for distinctive logs – For distinctive logs, Anthropic’s Claude Haiku mannequin utilizing Amazon Bedrock classifies every log’s severity. The mannequin processes the incoming log together with related labeled historic examples. The examples are dynamically retrieved at inference time by vector similarity search. Over time, labeled examples are added to supply wealthy context to the LLM for classification. This context-aware strategy improves accuracy for Palo Alto Networks’ inside logs and techniques and evolving log patterns that conventional rule-based techniques battle to deal with.
  • Classification with Amazon Bedrock – The answer offers structured predictions, together with severity classification (Precedence 1 (P1), Precedence 2 (P2), Precedence 3 (P3)) and detailed reasoning for every resolution. This complete output helps Palo Alto Networks’ SMEs rapidly prioritize responses and take preventive motion earlier than potential outages happen.
  • Integration with current pipelines for motion – Outcomes combine with their current FluentD and Kafka pipeline, with information flowing to Amazon Easy Storage Service (Amazon S3) and Amazon Redshift for additional evaluation and reporting.

The next diagram (Determine 1) illustrates how the three-stage pipeline processes Palo Alto Networks’ 200 million every day log quantity whereas balancing scale, accuracy, and cost-efficiency. The structure consists of the next key parts:

  • Knowledge ingestion layer – FluentD and Kafka pipeline and incoming logs
  • Processing pipeline – Consisting of the next levels:
    • Stage 1: Good caching and deduplication – Aurora for actual matching and Amazon Titan Textual content Embeddings for semantic matching
    • Stage 2: Context retrieval – Amazon Titan Textual content Embeddings to allow historic labeled examples, and vector similarity search
    • Stage 3: Classification – Anthropic’s Claude Haiku mannequin for severity classification (P1/P2/P3)
  • Output layer – Aurora, Amazon S3, Amazon Redshift, and SME evaluate interface

Determine 1: Automated log classification system structure

The processing workflow strikes by the next levels:

  • Stage 1: Good caching and deduplication – Incoming logs from Palo Alto Networks’ FluentD and Kafka pipeline are instantly processed by an Aurora based mostly caching layer. The system first applies actual matching, then falls again to overlap similarity, and at last makes use of semantic similarity by Amazon Titan Textual content Embeddings if no earlier match is discovered. Throughout testing, this strategy recognized that greater than 99% of logs corresponded to duplicate occasions, though they contained totally different time stamps, log ranges, and phrasing. The caching system diminished response occasions for cached outcomes and diminished pointless LLM processing.
  • Stage 2: Context retrieval for distinctive logs – The remaining lower than 1% of really distinctive logs require classification. For these entries, the system makes use of Amazon Titan Textual content Embeddings to establish probably the most related historic examples from Palo Alto Networks’ labeled dataset. Moderately than utilizing static examples, this dynamic retrieval makes certain every log receives contextually applicable steerage for classification.
  • Stage 3: Classification with Amazon Bedrock – Distinctive logs and their chosen examples are processed by Amazon Bedrock utilizing Anthropic’s Claude Haiku mannequin. The mannequin analyzes the log content material alongside related historic examples to supply severity classifications (P1, P2, P3) and detailed explanations. Outcomes are saved in Aurora and the cache and built-in into Palo Alto Networks’ current information pipeline for SME evaluate and motion.

This structure permits cost-effective processing of large log volumes whereas sustaining 95% precision for vital P1 severity detection. The system makes use of fastidiously crafted prompts that mix area experience with dynamically chosen examples:

system_prompt = """

You might be an skilled log evaluation system chargeable for classifying manufacturing system logs based mostly on severity. Your evaluation helps engineering groups prioritize their response to system points and preserve service reliability.


P1 (Important): Requires fast motion - system-wide outages, repeated software crashes
P2 (Excessive): Warrants consideration throughout enterprise hours - efficiency points, partial service disruption 
P3 (Low): Could be addressed when sources accessible - minor bugs, authorization failures, intermittent community points




2024-08-17 01:15:00.00 [warn] failed (104: Connection reset by peer) whereas studying response header from upstream

severity: P3
class: Class A


2024-08-18 17:40:00.00  Error: Request failed with standing code 500 at settle

severity: P2
class: Class B




Log: {incoming_log_snippet}
Location: {system_location}
"""

Present severity classification (P1/P2/P3) and detailed reasoning.

Implementation insights

The core worth of Palo Alto Networks’ answer lies in making an insurmountable problem manageable: AI helps their crew analyze 200 million of every day volumes effectively, whereas the system’s dynamic adaptability makes it attainable to increase the answer into the long run by including extra labeled examples. Palo Alto Networks’ profitable implementation of their automated log classification system yielded key insights that may assist organizations constructing production-scale AI options:

  • Steady studying techniques ship compounding worth – Palo Alto Networks designed their system to enhance routinely as SMEs validate classifications and label new examples. Every validated classification turns into a part of the dynamic few-shot retrieval dataset, enhancing accuracy for related future logs whereas growing cache hit charges. This strategy creates a cycle the place operational use enhances system efficiency and reduces prices.
  • Clever caching permits AI at manufacturing scale – The multi-layered caching structure processes greater than 99% of logs by cache hits, reworking costly per-log LLM operations into an economical system able to dealing with 200 million every day volumes. This basis makes AI processing economically viable at enterprise scale whereas sustaining response occasions.
  • Adaptive techniques deal with evolving necessities with out code modifications – The answer accommodates new log classes and patterns with out requiring system modifications. When efficiency wants enchancment for novel log varieties, SMEs can label further examples, and the dynamic few-shot retrieval routinely incorporates this data into future classifications. This adaptability permits the system to scale with enterprise wants.
  • Explainable classifications drive operational confidence – SMEs responding to vital alerts require confidence in AI suggestions, significantly for P1 severity classifications. By offering detailed reasoning alongside every classification, Palo Alto Networks permits SMEs to rapidly validate choices and take applicable motion. Clear explanations rework AI outputs from predictions into actionable intelligence.

These insights reveal how AI techniques designed for steady studying and explainability turn into more and more invaluable operational belongings.

Conclusion

Palo Alto Networks’ automated log classification system demonstrates how generative AI powered by AWS helps operational groups handle huge volumes in actual time. On this put up, we explored how an structure combining Amazon Bedrock, Amazon Titan Textual content Embeddings, and Aurora processes 200 million of every day logs by clever caching and dynamic few-shot studying, enabling proactive detection of vital points with 95% precision. Palo Alto Networks’ automated log classification system delivered concrete operational enhancements:

  • 95% precision, 90% recall for P1 severity logs – Important alerts are correct and actionable, minimizing false alarms whereas catching 9 out of 10 pressing points, leaving the remaining alerts to be captured by current monitoring techniques
  • 83% discount in debugging time – SMEs spend much less time on routine log evaluation and extra time on strategic enhancements
  • Over 99% cache hit fee – The clever caching layer processes 20 million every day quantity cost-effectively by subsecond responses
  • Proactive concern detection – The system identifies potential issues earlier than they influence prospects, stopping the multi-week outages that beforehand disrupted service
  • Steady enchancment – Every SME validation routinely improves future classifications and will increase cache effectivity, leading to diminished prices

For organizations evaluating AI initiatives for log evaluation and operational monitoring, Palo Alto Networks’ implementation presents a blueprint for constructing production-scale techniques that ship measurable enhancements in operational effectivity and price discount. To construct your personal generative AI options, discover Amazon Bedrock for managed entry to basis fashions. For extra steerage, try the AWS Machine Studying sources and browse implementation examples within the AWS Synthetic Intelligence Weblog.

The collaboration between Palo Alto Networks and the AWS GenAIIC demonstrates how considerate AI implementation can rework reactive operations into proactive, scalable techniques that ship sustained enterprise worth.

To get began with Amazon Bedrock, see Construct generative AI options with Amazon Bedrock.


Concerning the authors

riz.jpg

Rizwan Mushtaq

Rizwan is a Principal Options Architect at AWS. He helps prospects design progressive, resilient, and cost-effective options utilizing AWS companies. He holds an MS in Electrical Engineering from Wichita State College.

hectorlh.jpg

Hector Lopez

Hector Lopez, PhD is an Utilized Scientist in AWS’s Generative AI Innovation Heart, the place he makes a speciality of delivering production-ready generative AI options and proof-of-concepts throughout numerous business functions. His experience spans conventional machine studying and information science in life and bodily sciences. Hector implements a first-principles strategy to buyer options, working backwards from core enterprise wants to assist organizations perceive and leverage generative AI instruments for significant enterprise transformation.

meenamen.jpg

Meena Menon

Meena Menon is a Sr. Buyer Success Supervisor at AWS with over 20 years of expertise delivering enterprise buyer outcomes and digital transformation. At AWS, she companions with strategic ISVs together with Palo Alto Networks, Proofpoint, New Relic, and Splunk to speed up cloud modernization and migrations.

FanZhang-PANW.jpg

Fan Zhang

Fan is a Senior Principal Engineer/Architect at Palo Alto Networks, main the IoT Safety crew’s infrastructure and information pipeline, in addition to its generative AI infrastructure.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles