Description
To understand the sensitivity under attacks and to develop defense mechanisms, machine-learning model designers craft worst-case adversarial perturbations with gradient-descent optimization algorithms against the model under evaluation. However, many of the proposed defenses have been shown to provide a false sense of robustness due to failures of the attacks, rather than actual improvements in the machine‐learning models’ robustness, as highlighted by more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic and automated manner. To this end, the analysis of failures in the optimization of adversarial attacks is the only valid strategy to avoid repeating mistakes of the past.
Next sessions
-
Towards More Secure Large Language Models
Speaker : Raouf Kerkouche - Inria Lille
Large Language Models (LLMs) have achieved considerable success and are now widely used across multiple domains, highlighting their transformative impact on both technology and society. However, this widespread adoption also exposes LLMs to numerous security threats that can alter model behavior or degrade overall performance. To mitigate these threats, most research has focused on alignment[…]-
Machine learning
-