Talks and presentations


Delivering Fairness in Human Resources AI: Mutual Information to the Rescue

November 23, 2022

Oral Presentation, Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (AACL 2022), Virtual Event

Automatic language processing is used frequently in the Human Resources (HR) sector for automated candidate sourcing and evaluation of resumes. These models often use pretrained language models where it is difficult to know if possible biases exist. Recently, Mutual Information (MI) methods have demonstrated notable performance in obtaining representations agnostic to sensitive variables such as gender or ethnicity. However, accessing these variables can sometimes be challenging, and their use is prohibited in some jurisdictions. These factors can make detecting and mitigating biases challenging. In this context, we propose to minimize the MI between a candidate’s name and a latent representation of their CV or short biography. This method may mitigate bias from sensitive variables without requiringthe collection of these variables. We evaluate this methodology by first projecting the name representation into a smaller space to prevent potential MI minimization problems in high dimensions.

Don’t Judge Me by My Face: An Indirect Adversarial Approach to Remove Sensitive Information From Multimodal Neural Representation in Asynchronous Job Video Interviews

September 28, 2021

Poster Presentation, 9th International Conference on Affective Computing & Intelligent Interaction (ACII 2021), Virtual Event

Use of machine learning for automatic analysis of job interview videos has recently seen increased interest. Despite claims of fair output regarding sensitive information such as gender or ethnicity of the candidates, the current approaches rarely provide proof of unbiased decision-making, or that sensitive information is not used. Recently, adversarial methods have been proved to effectively remove sensitive information from the latent representation of neural networks. However, these methods rely on the use of explicitly labeled protected variables (e.g. gender), which cannot be collected in the context of recruiting in some countries (e.g. France). In this article, we propose a new adversarial approach to remove sensitive information from the latent representation of neural networks without the need to collect any sensitive variable. Using only a few frames of the interview, we train our model to not be able to find the face of the candidate related to the job interview in the inner layers of the model. This, in turn, allows us to remove relevant private information from these layers. Comparing our approach to a standard baseline on a public dataset with gender and ethnicity annotations, we show that it effectively removes sensitive information from the main network. Moreover, to the best of our knowledge, this is the first application of adversarial techniques for obtaining a multimodal fair representation in the context of video job interviews. In summary, our contributions aim at improving fairness of the upcoming automatic systems processing videos of job interviews for equality in job selection.

Slices of Attention in Asynchronous Video Job Interviews

September 04, 2019

Poster Presentation, 8th International Conference on Affective Computing & Intelligent Interaction (ACII 2019), Cambridge, United Kingdom

The impact of non verbal behaviour in a hiring decision remains an open question. Investigating this question is important, as it could provide a better understanding on how to train candidates for job interviews and make recruiters be aware of influential non verbal behaviour. This research has recently been accelerated due to the development of tools for the automatic analysis of social signals (facial expression detection, speech processing, etc), and the emergence of machine learning methods. However, these studies are still mainly based on hand engineered features, which imposes a limit to the discovery of influential social signals. On the other side, deep learning methods are a promising tool to discover complex patterns without the necessity of feature engineering. In this paper, we focus on studying influential non verbal social signals in asynchronous job video interviews that are discovered by deep learning methods. We use a previously published deep learning system that aims at inferring the hirability of a candidate with regard to a sequence of interview questions. One particularity of this system is the use of attention mechanisms, which aim at identifying the relevant parts of an answer. Thus, information at a fine-grained temporal level could be extracted using global (at the interview level) annotations on hirability. While most of the deep learning systems use attention mechanisms to offer a quick visualization of slices when a rise of attention occurs, we perform an in-depth analysis to understand what happens during these moments. First, we propose a methodology to automatically extract slices where there is a rise of attention (attention slices). Second, we study the content of attention slices by comparing them with randomly sampled slices. Finally, we show that they bear significantly more information for hirability than randomly sampled slices, and that such information is related to visual cues associated with anxiety and turn taking.

TLDR : Attention Mechanism applied on facial expressions detect effectively key moment of job interview. These moments mainly occurs at the beginning and end of the answers.

Download link

HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews

January 29, 2019

Poster Presentation, The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) , Honolulu, Hawaii, USA

New technologies drastically change recruitment techniques. Some research projects aim at designing interactive systems that help candidates practice job interviews. Other studies aim at the automatic detection of social signals (e.g. smile, turn of speech, etc…) in videos of job interviews. These studies are limited with respect to the number of interviews they process, but also by the fact that they only analyze simulated job interviews (e.g. students pretending to apply for a fake position). Asynchronous video interviewing tools have become mature products on the human resources market, and thus, a popular step in the recruitment process. As part of a project to help recruiters, we collected a corpus of more than 7000 candidates having asynchronous video job interviews for real positions and recording videos of themselves answering a set of questions. We propose a new hierarchical attention model called HireNet that aims at predicting the hirability of the candidates as evaluated by recruiters. In HireNet, an interview is considered as a sequence of questions and answers contain- ing salient socials signals. Two contextual sources of information are modeled in HireNet: the words contained in the question and in the job position. Our model achieves better F1-scores than previous approaches for each modality (verbal content, audio and video). Results from early and late multimodal fusion suggest that more sophisticated fusion schemes are needed to improve on the monomodal results. Finally, some examples of moments captured by the attention mechanisms suggest our model could potentially be used to help finding key moments in an asynchronous job interview.

TLDR : Hierarchical and Contextual (questions and job titles) are important to model. Attention mechanism could probably inform on key moments of the job interview.

Download link


Attention Slices dans les Entretiens d’Embauche Vidéo Différés

June 05, 2020

Oral Presentation, Workshop sur les “Affects, Compagnons Artificiels et Interactions” (WACAI 2020), Ile d' Olérons, France

Nous nous intéressons à l’étude de signaux influents dans les entretiens vidéo d’embauche asynchrones découverts par des méthodes d’apprentissage profond. Le système que nous étudions emploie des mécanismes d’attention, qui permettent d’extraire d’un entretien les informations et les instants décisifs (qui ont influencé la décision du système au niveau de l’entretien), sans requérir d’annotation locale. Alors que la majorité des approches similaires évaluent les mécanismes d’attention en se contentant de visualiser les moments d’attention maximale, nous proposons ici une méthodologie permettant d’automatiser l’analyse du contenu de ces attention slices afin de fournir des éléments d’interprétation des prédictions du système.

TLDR : Les tranches de vidéos selectionnées par le mécanisme d’attention ont une meilleure prédictabilité que des tranches aléatoires. Ces tranches ont lieu le plus souvent au début et en fin d’interactions pour les modalités audio et vidéo.

Lien de téléchargement

Entretien vidéo différé : modèle prédictif pour pré-sélection de candidats sur la base du contenu verbal

June 14, 2018

Oral Presentation, Workshop sur les “Affects, Compagnons Artificiels et Interactions” (WACAI2018), Ile de Porquerolles, France

Les entretiens vidéo différés sont devenus plus en plus populaires dans le milieu des ressources humaines et constituent un objet de recherche en traitement informatique de signaux sociaux. Dans cet article, nous passons en revue plusieurs études sur les recherches pertinentes en termes d’analyses comportementales et de systèmes d’entraînement à passer des entretiens d’embauche. Nous présentons ensuite un premier modèle de prédiction automatiques des classements des candidats pour un recruteur. Nous nous focalisons dans cette première étude sur l’analyse automatique du contenu verbal d’entretiens vidéo différés. Ce travail s’inscrit en partenariat avec la société EASYRECRUE, société proposant une plateforme d’entretiens vidéo différés, avec laquelle un corpus de plus de 300 candidats évalués par un recruteur a été collecté. Nos travaux ont montré la faisabilité de l’exploitation du contenu verbal après l’utilisation d’un outil de reconnaissance automatique. Les perspectives en ce qui concerne le traitement automatique de la prosodie et des comportements non verbaux sont présentées.