top of page

From Kindergarten to PhD - Leveraging open-appsec WAF Machine Learning Levels for a Robust Web Protection

Updated: Apr 29

Introduction 

Modern web applications face ever-evolving security threats, making robust application security essential. Machine learning has revolutionized the field of Web App and API security, providing tools that can learn and adapt to the dynamic nature of web traffic. One of the key innovations in this space is the concept of learning levels. These levels represent the maturity of a machine-learning model as it observes and analyzes application behavior over time.  


By progressing through learning levels, organizations can achieve: 

  • Enhanced Accuracy: Improved detection quality of security threats and minimized false positive occurrences. 

  • Well-calculated performance and gradually constructed learning process: A systematic approach to transitioning from learning to proactive threat prevention. 

  • Tuning: The ability to fine-tune machine learning’s accuracy by adapting its recommendations and providing feedback.  


With these advantages, learning levels provide a clear path for maximizing the ML’s performance and the WAF’s overall protection. 


open-appsec WAF’s Contextual Machine Learning provides an innovative approach by continuously learning and adapting to application behaviors. This blog explains the progression of learning levels in open-appsec, how to track them, and the steps necessary to optimize and transition from Detect mode to Prevent mode, and how to enhance the learning after already reaching Prevent mode. 


What is open-appsec? 

open-appsec is an open-source, next-generation WAF, powered by machine learning and AI. It delivers robust protection against various web threats, including OWASP Top 10 vulnerabilities, zero-day attacks, and more. One key feature distinguishing open-appsec from traditional WAF solutions is its machine-learning-based threat detection capabilities. It continuously evolves and learns from web requests and new attacks, offering dynamic, real-time protection without requiring manual updates or rule sets (no more signatures!). This adaptability makes it particularly effective against sophisticated attacks that can evade signature-based detection. 


open-appsec WAF also offers flexible management options, including a central WebUI (SaaS) for easy, centralized control. Alternatively, it supports local, declarative configurations through custom resources in Kubernetes or configuration files in Docker and Linux, making it compatible with CI/CD and GitOps workflows. 


open-appsec supports integration with many commonly used proxy solutions, like NGINX/Ingress NGINX, Kong, APISIX, NGINX Proxy Manager, Docker SWAG, Envoy, and more. 


open-appsec Contextual Machine Learning Model 

Imagine a busy airport check-in counter where passengers represent web traffic. Most passengers are ordinary travelers with valid tickets and proper documentation. However, among them, a few may be carrying prohibited items or posing potential threats—similar to malicious web requests. 


Initially, the airport staff is new and unfamiliar with the patterns of passenger behavior. They scrutinize every passenger equally, sometimes raising false alarms. As they process more passengers, the staff becomes more efficient, as they learn to better recognize patterns of normal behavior based on their experience. They can identify e.g. regular travelers showing normal typical behavior quickly, allowing them to focus their attention on suspicious individuals whose behavior deviates from the norm. 

Eventually, the check-in counter staff become experts at distinguishing between safe passengers and potential risks. They implement advanced tools like scanners and facial recognition systems to enhance accuracy, ensuring a smooth and secure process for most travelers while promptly addressing any potential threats. 


This analogy reflects the progression of open-appsec’s machine learning over time. Although the machine learning engine has a quite good initial understanding capability to detect and prevent malicious web requests, after initial deployment, it still has only limited context (“experience”) and has to first process and learn the application’s usual requests before it can understand the “baseline behavior” of regular traffic for a given web application. Over time, as it processes more data (similar to "passengers"), it refines its understanding, becoming an even more reliable and intelligent tool for identifying and protecting against known as well as unknown attacks.  


open-appsec’s machine learning mirrors this journey, guiding the system from basic observation to advanced precision. The Contextual Machine Learning Engine utilizes a three-phase approach for detecting and preventing web application and API attacks. These three phases deliver accurate results with a very low number of false positives, protecting the environment against known and unknown zero-day attacks with real-time protection. 



In stage 1, the engine is parsing and decoding the Payload (analyzing all fields of HTTP requests, base64 decoding, etc.)  


In stage 2, the engine looks for short attack indicators within the HTTP request, to test the likelihood of the request being used to exploit a vulnerability. This evaluation is based on a supervised, offline Machine Learning model, which was built in an on-going offline supervised training process using millions of malicious and benign requests. Scores, representing the indicator’s likelihood of being part of an attack, are assigned not only to each indicator by itself but also to pairs of indicators. Aggregating the scores of the indicators to a total stage 1-score allows open-appsec to have an initial good understanding about the attack likelihood of the HTTP request.  


In stage 3, requests which are considered potentially malicious based on the indicators analysis, which happened in stage 2, are further analysed in the contextual machine learning evaluation engine, in order to gain the best-possible confidence that any HTTP request, which was indicated as being potentially malicious, is indeed an attack, and to rule out false positives effectively. To do this, open-appsec considers different additional contexts like the application structure, how users generally or individually interact with the content, and more. This evaluation is done with an online, non-supervised ML model, which is built and updated continuously in real-time for the specific, protected environment based on the inbound traffic.  

For more information about the inner mechanics of open-appsec’s contextual machine learning engine, we created a detailed video session, led by open-appsec Product Manager, Christopher Lutat


open-appsec Learning Levels: A User-Centric Design 

When HTTP requests are inspected, the open-appsec Contextual Machine Learning model will progress over different learning levels until it reaches the optimum learning state. Each level represents the maturity stage of the learning model and provides clear guidance to the administrator about what is needed to reach the next level.  

open-appsec’s learning levels are designed to provide a relatable, intuitive, and interactive experience for users. Represented through an educational analogy, these levels progress from Kindergarten to PhD, symbolizing the model’s increasing maturity and proficiency. Here are the key levels: 


  • Kindergarten: The starting phase where the model begins to observe and gather basic data. 

  • Primary School: The model builds foundational knowledge of application behavior. 

  • High School: Intermediate understanding is achieved, with the model recognizing common patterns and traffic types. 

  • Graduate: A high level of accuracy is reached, enabling safe transition to Prevent mode for critical events. When the machine learning reaches Graduate level, it is recommended to change the asset Mode to Prevent for Critical confidence events. 

  • Master: Advanced insights are obtained by incorporating trusted sources, improving detection capabilities further. 

  • PhD: The pinnacle of learning, where the model is highly refined, with minimal need for additional training. When the machine learning reaches PhD level, the Prevent mode is trained to search for High confidence events. 

 

Transitioning From Learn/Detect to Prevent Mode

When a new asset is added to open-appsec, it is recommended that open-appsec runs in Learn/Detect mode, to create an initial baseline for application behavior. This phase is critical for establishing a foundation of understanding of the machine learning, and typically lasts 2-3 days, provided that the application receives a substantial amount of traffic. Here’s a breakdown of the process: 


1. Initial Learning Period 

  • Traffic Volume and Variance: The Contextual Machine Learning engine observes HTTP requests to learn how the application is used. The diversity and volume of traffic directly impact the speed of learning. The more, the better.  

  • Tuning Suggestions: During this phase, the engine proposes configuration tweaks to refine its accuracy. Administrators can review and apply these recommendations to accelerate the learning process, a technique referred to as supervised learning. 


2. Moving to Prevent Mode 

When the learning level becomes Graduate, it is recommended to change the asset Mode to Prevent for Critical confidence events. Graduate level ensures a very good level of accuracy (e.g. low number of false positives). To reach Master or PhD level, it is necessary to configure Trusted Sources. The Phd level is the highest level, which means that more learning is less likely going to improve the model further. 

 

Tracking Learning Progress, Recommendations and Actions

open-appsec management portal provides users with real-time insights regarding their machine learning progress. It will also indicate for each protected web application or web API asset, what's required to reach the next level in the ‘What’s next?’ section. 

For example, in the image below, the learning level that was reached is ‘Graduate’, and the ‘What’s Next?’ section indicates how many more HTTP requests and trusted sources are required to reach the next level of learning. 


Some positive contributing factors to the learning process are the number of trusted sources defined by the admin, elapsed time, amount of traffic inspected, decisions made on tuning suggestions, and more. 



In addition, the WebUI also provides specific recommendations to learn what the current recommended action is for the asset, regardless of the next level.

Recommendations include the following options: 

Recommendation 

Action Required 

Keep Learning 

No action required. The machine learning model requires additional HTTP requests (and additional time). 

Review Tuning Suggestions 

 

The learning mechanism generated tuning suggestions. Review them and decide whether the events are malicious or benign. 

Prevent Critical Severity Events 

 

The system is ready to prevent critical severity events. Navigate to the Web Attack tab and change the Web Attacks sub-practice Mode to Prevent for Critical Severity events. 

Prevent High Severity And Above Events 

 

The system is ready to prevent high-severity (and above) events. Navigate to the Web Attack tab and change the Web Attacks sub-practice Mode to Prevent for High and above Severity events. 

 

In the example below, the Recommendation is ‘Keep Learning’ as additional HTTP requests and more time are required to reach the next learning level.  



In the example below, the Recommendation is ‘The learning mechanism generated critical tuning suggestions. Review them and decide whether the events are malicious or benign'.

In the image below, the highest level of learning was reached - PhD. At this point, the user gets clear feedback that the asset is protected and that any severe events will be blocked. This clear feedback helps users understand that no other action is required. 


The image below shows examples for ‘Tuning Suggestions’. Reviewing them by the administrator will help the machine learning improve its accuracy. 


Advantages of the Web UI Representation

The learning levels are visually represented in the web interface with playful and engaging icons — a teddy bear for Kindergarten, a graduation cap for Master, and a diploma for PhD. This design achieves the following: 


  • User Understanding: By drawing parallels to familiar educational stages, users can easily grasp the model’s progress and maturity. 

  • Interactive Experience: The progression creates a sense of achievement and encourages user engagement, turning security management into an interactive journey. 

  • Clear Guidance: Users can quickly determine the current state of the model and what actions are needed to advance to the next level. 


This combination of functionality and design enhances the overall user experience, making complex machine learning processes accessible and engaging. 

 

Conclusion 

open-appsec’s Contextual Machine Learning empowers organizations to defend against sophisticated threats by continuously learning and adapting. By understanding the learning levels, tracking progress, and applying the suggested, recommended configuration adjustments, administrators can easily and securely transition from Learn/Detect mode to Prevent mode, ensuring robust, effective application security. By embracing this process and leveraging these tools, they can stay ahead of the curve in application protection. 


open-appsec is an open-source project that builds on machine learning to provide pre-emptive web app & API threat protection against OWASP-Top-10 and zero-day attacks. It simplifies maintenance as there is no threat signature upkeep and exception handling, like common in many WAF solutions. 


More information about open-appsec's Learning Levels can be found here. 

To achieve the best Threat Prevention results of the ML engine, read this blog. 


To learn more about how open-appsec works, see this White Paper and the in-depth Video Tutorial. You can also experiment with deployment in the free Playground. 

 

Experiment with open-appsec on Linux, Docker and Kubernetes using a free virtual lab

bottom of page