Publications
Pedro Henrique Luz de Araujo & Benjamin Roth. (2025). Helpful Assistant or Fruitful Facilitator? Investigating How Personas Affect Language Model Behavior. PLOS ONE. [Code and data]
Yuxi Xia, Pedro Henrique Luz de Araujo, Klim Zaporojets, & Benjamin Roth. (2025). Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles. Accepted at ACL 2025.
Benjamin Roth, Pedro Henrique Luz de Araujo, Yuxi Xia, Saskia Kaltenbrunner, & Christoph Korab. (2025). Specification overfitting in artificial intelligence. Artif. Intell. Rev.
Yuxi Xia, Anastasiia Sedova, Pedro Henrique Luz de Araujo, Vasiliki Kougia, Lisa Nußbaumer, & Benjamin Roth. (2024). Exploring prompts to elicit memorization in masked language model-based named entity recognition. Preprint.
Andreas Stephan, Lukas Miklautz, Collin Leiber, Pedro Henrique Luz de Araujo, Dominik Répás, Claudia Plant, & Benjamin Roth. (2024). Text-Guided Alternative Image Clustering. In Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024). Bangkok, Thailand. [Code and data]
Pedro Henrique Luz de Araujo & Benjamin Roth. (2024). Functionality Learning through Specification Instructions. In Findings of the Association for Computational Linguistics: EMNLP 2024. Miami, Florida, USA. [Code and data]
Pedro Henrique Luz de Araujo & Benjamin Roth. (2023). Cross-functional Analysis of Generalization in Behavioral Learning. TACL.[Code and data]
Pedro Henrique Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio Ataides Braz, Nilton Correia da Silva, Flavio de Barros Vidal, & Teófilo Emídio de Campos. (2023). Sequence-aware multimodal page classification of Brazilian legal documents. Int. J. Document Anal. Recognit. [Dataset][Code]
Pedro Henrique Luz de Araujo & Benjamin Roth. (2022). Checking HateCheck: A Cross-Functional Analysis of Behaviour-Aware Learning for Hate Speech Detection. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP. Dublin, Ireland. [Code and data]
Pedro Henrique Luz de Araujo & Teófilo E. de Campos. (2020). Topic Modelling Brazilian Supreme Court Lawsuits. In Legal Knowledge and Information Systems - JURIX 2020: The Thirty-third Annual Conference. [Code and data]
Pedro Henrique Luz de Araujo, Teófilo Emídio de Campos, Fabricio Ataides Braz, & Nilton Correia da Silva. (2020). VICTOR: a Dataset for Brazilian Legal Documents Classification. In Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020. Marseille, France. [Dataset][Code]
Pedro Henrique Luz de Araujo, Teófilo Emídio de Campos, & Marcelo Magalhães Silva de Sousa. (2020). Inferring the Source of Official Texts: Can SVM Beat ULMFiT? In Computational Processing of the Portuguese Language - 14th International Conference, PROPOR 2020. Evora, Portugal. [Code and data]
Pedro Henrique Luz de Araujo, Teófilo E. de Campos, Renato R. R. de Oliveira, Matheus Stauffer, Samuel Couto, & Paulo Henrique S. Bermejo. (2018). LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text. In Computational Processing of the Portuguese Language - 13th International Conference, PROPOR 2018. Canela, Brazil. [Dataset][Code]