Table of contents

Description

  • Speaker

    Diane Leblanc-Albarel - KU Leuven

Perceptual hash functions identify multimedia content by mapping similar inputs to similar outputs. They are widely used for detecting copyright violations and illegal content but lack transparency, as their design details are typically kept secret. Governments are considering extending the application of these functions to Client-Side Scanning (CSS) for end-to-end encrypted services: multimedia content would be verified against known illegal content before applying encryption. In 2021, Apple presented a detailed proposal for CSS based on the NeuralHash perceptual hash function. After strong criticism pointing out privacy and security concerns, Apple has withdrawn the proposal, but the NeuralHash software is still present on Apple devices. Brute force collisions for NeuralHash (with a 96-bit result) require $2^{48}$ evaluations. Shortly after the publication of NeuralHash, it was demonstrated that it is easy to craft two colliding inputs for NeuralHash that are perceptually dissimilar. In the context of CSS, this means that it is easy to falsely incriminate someone by sending an innocent picture with the same hash value as illegal content. 

This work shows a more serious weakness: when inputs are restricted to a set of human faces, random collisions are highly likely to occur in input sets of size $2^{16}$. Unlike the targeted attack, our attacks are black-box attacks: they do not require knowledge of the design of the perceptual hash functions. In addition, we show that the false negative rate is high as well. We demonstrate the generality of our approach by applying a similar attack to PhotoDNA, a widely deployed perceptual hash function proposed by Microsoft with a hash result of 1152 bits. Here we show that specific small input sets result in near-collisions, with similar impact. These results imply that the current designs of perceptual hash function are completely unsuitable for large-scale client scanning, as they would result in an unacceptably high false positive rate. This work underscores the need to reassess the security and feasibility of perceptual hash functions, particularly for large-scale applications where privacy risks and false positives have serious consequences.

Practical infos

Next sessions

  • CHERI: Architectural Support for Memory Protection and Software Compartmentalization

    • September 12, 2025 (10:00 - 11:00)

    • Inria Center of the University of Rennes - Room Métivier

    Speaker : Robert Watson - University of Cambridge

    CHERI is a processor architecture protection model enabling fine-grained C/C++ memory protection and scalable software compartmentalization. CHERI hybridizes conventional processor, instruction-set, and software designs with an architectural capability model. Originating in DARPA’s CRASH research program in 2010, the work has progressed from FPGA prototypes to the recently released Arm Morello[…]
    • SoSysec

    • SemSecuElec

    • Compartmentalization

    • Hardware/software co-design

    • Hardware architecture

  • CHERI standardization and software ecosystem

    • September 12, 2025 (11:00 - 12:00)

    • Inria Centre of the University of Rennes - Room Métivier

    Speaker : Carl Shaw - Codasip

    This talk will describe the current status of the RISC-V International standardization process to add CHERI as an official extension to RISC-V. It will then explore the current state of CHERI-enabled operating systems, toolchains and software tool development, focusing on the CHERI-RISC-V hardware implementations of CHERI. It will then go on to give likely future development roadmaps and how the[…]
    • SoSysec

    • SemSecuElec

    • Compartmentalization

    • Operating system and virtualization

    • Hardware/software co-design

    • Hardware architecture

  • Towards privacy-preserving and fairness-aware federated learning framework

    • September 19, 2025 (11:00 - 12:00)

    • Inria Center of the University of Rennes - Petri/Turing room

    Speaker : Nesrine Kaaniche - Télécom SudParis

    Federated Learning (FL) enables the distributed training of a model across multiple data owners under the orchestration of a central server responsible for aggregating the models generated by the different clients. However, the original approach of FL has significant shortcomings related to privacy and fairness requirements. Specifically, the observation of the model updates may lead to privacy[…]
    • Cryptography

    • SoSysec

    • Privacy

    • Machine learning

  • Malware Detection with AI Systems: bridging the gap between industry and academia

    • October 09, 2025 (11:00)

    • Inria Center of the University of Rennes - Room Aurigny

    Speaker : Luca Demetrio - University of Genova

    With the abundance of programs developed everyday, it is possible to develop next-generation antivirus programs that leverage this vast accumulated knowledge. In practice, these technologies are developed with a mixture of established techniques like pattern matching, and machine learning algorithms, both tailored to achieve high detection rate and low false alarms. While companies state the[…]
    • SoSysec

    • Intrusion detection

    • Machine learning

Show previous sessions