7.7 CVE-2024-52595

lxml_html_clean is a project for HTML cleaning functionalities copied from `lxml.html.clean`. Prior to version 0.4.0, the HTML Parser in lxml does not properly handle context-switching for special HTML tags such as `<svg>`, `<math>` and `<noscript>`. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content. Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue. As a temporary mitigation, users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability. Via `remove_tags`, one may specify tags to remove - their content is moved to their parents' tags. Via `kill_tags`, one may specify tags to be removed completely. Via `allow_tags`, one may restrict the set of permissible tags, excluding context-switching tags like `<svg>`, `<math>` and `<noscript>`.
https://nvd.nist.gov/vuln/detail/CVE-2024-52595

References

security-advisories@github.com

https://github.com/fedora-python/lxml_html_clean/commit/c5d816f86eb3707d72a8e...
https://github.com/fedora-python/lxml_html_clean/pull/19
https://github.com/fedora-python/lxml_html_clean/security/advisories/GHSA-5jf...

CPE

cpe	start	end

REMEDIATION

EXPLOITS

Exploit-db.com

id	description	date
No known exploits

POC Github

Url
No known exploits

Other Nist (github, ...)

Url
No known exploits

CAPEC

Common Attack Pattern Enumerations and Classifications

id	description	severity
120	Double Encoding The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request. This may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. Filters may be able to catch illegal encoded strings, but may not catch doubly encoded strings. For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request. In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target. [Survey the application for user-controllable inputs] Using a browser, an automated tool or by inspecting the application, an attacker records all entry points to the application. [Probe entry points to locate vulnerabilities] Try double-encoding for parts of the input in order to try to get past the filters. For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.	Medium
15	Command Delimiters An attack of this type exploits a programs' vulnerabilities that allows an attacker's commands to be concatenated onto a legitimate command with the intent of targeting other resources such as the file system or database. The system that uses a filter or denylist input validation, as opposed to allowlist validation is vulnerable to an attacker who predicts delimiters (or combinations of delimiters) not present in the filter or denylist. As with other injection attacks, the attacker uses the command delimiter payload as an entry point to tunnel through the application and activate additional attacks through SQL queries, shell commands, network scanning, and so on. [Assess Target Runtime Environment] In situations where the runtime environment is not implicitly known, the attacker makes connections to the target system and tries to determine the system's runtime environment. Knowing the environment is vital to choosing the correct delimiters. [Survey the Application] The attacker surveys the target application, possibly as a valid and authenticated user [Attempt delimiters in inputs] The attacker systematically attempts variations of delimiters on known inputs, observing the application's response each time. [Use malicious command delimiters] The attacker uses combinations of payload and carefully placed command delimiters to attack the software.	High
182	Flash Injection An attacker tricks a victim to execute malicious flash content that executes commands or makes flash calls specified by the attacker. One example of this attack is cross-site flashing, an attacker controlled parameter to a reference call loads from content specified by the attacker. [Find Injection Entry Points] The attacker first takes an inventory of the entry points of the application. [Determine the application's susceptibility to Flash injection] Determine the application's susceptibility to Flash injection. For each URL identified in the explore phase, the attacker attempts to use various techniques such as direct load asfunction, controlled evil page/host, Flash HTML injection, and DOM injection to determine whether the application is susceptible to Flash injection. [Inject malicious content into target] Inject malicious content into target utilizing vulnerable injection vectors identified in the Experiment phase	Medium
3	Using Leading 'Ghost' Character Sequences to Bypass Input Filters Some APIs will strip certain leading characters from a string of parameters. An adversary can intentionally introduce leading "ghost" characters (extra characters that don't affect the validity of the request at the API layer) that enable the input to pass the filters and therefore process the adversary's input. This occurs when the targeted API will accept input data in several syntactic forms and interpret it in the equivalent semantic way, while the filter does not take into account the full spectrum of the syntactic forms acceptable to the targeted API. [Survey the application for user-controllable inputs] Using a browser, an automated tool or by inspecting the application, an adversary records all entry points to the application. [Probe entry points to locate vulnerabilities] The adversary uses the entry points gathered in the "Explore" phase as a target list and injects various leading 'Ghost' character sequences to determine how to application filters them. [Bypass input filtering] Using what the adversary learned about how the application filters input data, they craft specific input data that bypasses the filter. This can lead to directory traversal attacks, arbitrary shell command execution, corruption of files, etc.	Medium
43	Exploiting Multiple Input Interpretation Layers An attacker supplies the target software with input data that contains sequences of special characters designed to bypass input validation logic. This exploit relies on the target making multiples passes over the input data and processing a "layer" of special characters with each pass. In this manner, the attacker can disguise input that would otherwise be rejected as invalid by concealing it with layers of special/escape characters that are stripped off by subsequent processing steps. The goal is to first discover cases where the input validation layer executes before one or more parsing layers. That is, user input may go through the following logic in an application: <parser1> --> <input validator> --> <parser2>. In such cases, the attacker will need to provide input that will pass through the input validator, but after passing through parser2, will be converted into something that the input validator was supposed to stop. [Determine application/system inputs where bypassing input validation is desired] The attacker first needs to determine all of the application's/system's inputs where input validation is being performed and where they want to bypass it. [Determine which character encodings are accepted by the application/system] The attacker then needs to provide various character encodings to the application/system and determine which ones are accepted. The attacker will need to observe the application's/system's response to the encoded data to determine whether the data was interpreted properly. [Combine multiple encodings accepted by the application.] The attacker now combines encodings accepted by the application. The attacker may combine different encodings or apply the same encoding multiple times. [Leverage ability to bypass input validation] Attacker leverages their ability to bypass input validation to gain unauthorized access to system. There are many attacks possible, and a few examples are mentioned here.	High
6	Argument Injection An attacker changes the behavior or state of a targeted application through injecting data or command syntax through the targets use of non-validated and non-filtered arguments of exposed services or methods. [Discovery of potential injection vectors] Using an automated tool or manual discovery, the attacker identifies services or methods with arguments that could potentially be used as injection vectors (OS, API, SQL procedures, etc.). [1. Attempt variations on argument content] Possibly using an automated tool, the attacker will perform injection variations of the arguments. [Abuse of the application] The attacker injects specific syntax into a particular argument in order to generate a specific malicious effect in the targeted application.	High
71	Using Unicode Encoding to Bypass Validation Logic An attacker may provide a Unicode string to a system component that is not Unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request. That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly. [Survey the application for user-controllable inputs] Using a browser or an automated tool, an attacker follows all public links and actions on a web site. They record all the links, the forms, the resources accessed and all other potential entry-points for the web application. [Probe entry points to locate vulnerabilities] The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various Unicode encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.	High
73	User-Controlled Filename An attack of this type involves an adversary inserting malicious characters (such as a XSS redirection) into a filename, directly or indirectly that is then used by the target software to generate HTML text or other potentially executable content. Many websites rely on user-generated content and dynamically build resources like files, filenames, and URL links directly from user supplied data. In this attack pattern, the attacker uploads code that can execute in the client browser and/or redirect the client browser to a site that the attacker owns. All XSS attack payload variants can be used to pass and exploit these vulnerabilities.	High
85	AJAX Footprinting This attack utilizes the frequent client-server roundtrips in Ajax conversation to scan a system. While Ajax does not open up new vulnerabilities per se, it does optimize them from an attacker point of view. A common first step for an attacker is to footprint the target environment to understand what attacks will work. Since footprinting relies on enumeration, the conversational pattern of rapid, multiple requests and responses that are typical in Ajax applications enable an attacker to look for many vulnerabilities, well-known ports, network locations and so on. The knowledge gained through Ajax fingerprinting can be used to support other attacks, such as XSS. [Send request to target webpage and analyze HTML] Using a browser or an automated tool, an adversary sends requests to a webpage and records the received HTML response. Adversaries then analyze the HTML to identify any known underlying JavaScript architectures. This can aid in mappiong publicly known vulnerabilities to the webpage and can also helpo the adversary guess application architecture and the inner workings of a system.	Low

Cybersecurity needs ?

Strengthen software security from the outset with our DevSecOps expertise

Integrate security right from the start of the software development cycle for more robust applications and greater customer confidence.
Our team of DevSecOps experts can help you secure your APIs, data pipelines, CI/CD chains, Docker containers and Kubernetes deployments.

Discover this offer

Soutenez No Hack Me sur Tipeee