My amazing collaborators will be presenting several works at ACL 2024, ICML 2024, and CoLM 2024 in the upcoming weeks/months!
Our work at ACL 2024
We will be presenting four papers this year at ACL, the flagship NLP conference:
- Analysing The Impact of Sequence Composition on Language Model Pre-Training, by Yu Zhao et al. – we analyse several language model pre-training schemes and find out that, e.g., intra-document causal masking helps both in terms of pre-training dynamics, and accuracy on a wide array of downstream tasks! This approach was later adopted by Llama 3, Meta’s flagship language model family. This paper will be presented as an Oral – top 8% of the accepted papers!
- SparseFit: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations, by Jesus Solano, Mardhiyah Sanni et al. – we introduce SparseFit, a sparse fine-tuning method that uses few-shot prompting and discrete prompts to efficiently generate both predictions and natural language explanations (NLEs) with large pre-trained language models, achieving competitive performance while significantly reducing the number of fine-tuning parameters! [poster]
- Using Natural Language Explanations to Improve Robustness of In-context Learning, by Xuanli He et al. – we found that integrating NLEs into in-context learning significantly improves the robustness of large language models against adversarial inputs, and show that generating NLEs with frontier models in a few-shot setting can significantly improve accuracy on challenging natural language inference tasks compared to traditional in-context learning and human-generated NLEs.
- Probing the Emergence of Cross-lingual Alignment during LLM Training, by Hetong Wang et al. – we analyse how cross-lingual alignment emerges during the training of multilingual large language models by probing neuron activity in different languages. We find that higher neuron overlap between languages correlates strongly with improved zero-shot cross-lingual transfer performance, but also identifies phases during training where both alignment and performance degrade, offering new insights into the dynamics of multilingual model training! [poster]
Our work at ICML 2024
We will be presenting three works this year at ICML – one in the main conference and two in co-located workshops:
- On the Independence Assumption in Neurosymbolic Learning, by Emile van Krieken et al. – we analyse the common assumption in neurosymbolic learning that symbols are conditionally independent given the input, and argue that this assumption biases models towards deterministic solutions and limits their ability to express uncertainty.
- Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models, by Georgy Tyukin et al. – we investigate the effects of removing MLP and attention layers in large language models during inference, finding that removing deeper attention layers can only marginally reduce performance while significantly improving inference speed!
- An Auditing Test to Detect Behavioral Shift in Language Models, by Leo Richter et al. – we propose a continuous online auditing framework to detect behavioural shifts in language models, ensuring that deployed models remain aligned with societal values and preventing vendors or attackers from covertly deploying unaligned models for malicious purposes.
Our work at CoLM 2024
The Conference on Language Modeling (CoLM) is a very new thing. I have been area-chairing for CoLM this year, and I’m really impressed by the quality of all submissions! We will be presenting two papers:
- Forklift: An Extensible Neural Lifter, by Jordi Armengol-Estapé et al. – we introduce Forklift, a framework that uses neural models to translate assembly code across different instruction set architectures by “lifting” source assembly code into an intermediate representation, thereby reducing the engineering effort required for cross-architecture software migration!
- Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models, by Simon Yu, Jie He et al. – we analyse the adversarial robustness of retrieval-based in-context learning, finding that while retrieval-augmented methods improve robustness against test sample attacks, they increase vulnerability to adversarially perturbed demonstrations; to address this, we propose a new training-free defence method, which significantly improves adversarial robustness.