neuralnoise.comHomepage of Dr Pasquale Minervini <br/> Researcher/Faculty at the University of Edinburgh, School of Informatics <br/> Co-Founder and CTO at Miniml.AI <br/> ELLIS Scholar, Edinburgh Unit
https://neuralnoise.com///
Sun, 20 Oct 2024 16:20:16 +0200Sun, 20 Oct 2024 16:20:16 +0200Jekyll v4.2.0July 2024 in Research<p>My amazing collaborators will be presenting several works at <a href="https://2024.aclweb.org/">ACL 2024</a>, <a href="https://icml.cc/">ICML 2024</a>, and <a href="https://colmweb.org/">CoLM 2024</a> in the upcoming weeks/months!</p>
<h3 id="our-work-at-acl-2024">Our work at ACL 2024</h3>
<p>We will be presenting four papers this year at ACL, the flagship NLP conference:</p>
<ul>
<li><a href="https://arxiv.org/abs/2402.13991">Analysing The Impact of Sequence Composition on Language Model Pre-Training</a>, by <a href="https://huggingface.co/yuzhaouoe">Yu Zhao</a> et al. – we analyse several language model pre-training schemes and find out that, e.g., intra-document causal masking helps both in terms of pre-training dynamics, and accuracy on a wide array of downstream tasks! This approach was later adopted by <a href="https://llama.meta.com/">Llama 3</a>, Meta’s flagship language model family. This paper will be presented as an Oral – top 8% of the accepted papers!</li>
<li><a href="https://arxiv.org/abs/2305.13235">SparseFit: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations</a>, by <a href="https://scholar.google.com/citations?user=KqAy5mQAAAAJ&hl=en">Jesus Solano</a>, <a href="https://www.linkedin.com/in/mardhiyah-sanni/?originalSubdomain=uk">Mardhiyah Sanni</a> et al. – we introduce SparseFit, a sparse fine-tuning method that uses few-shot prompting and discrete prompts to efficiently generate both predictions and natural language explanations (NLEs) with large pre-trained language models, achieving competitive performance while significantly reducing the number of fine-tuning parameters! [<a href="https://x.com/PMinervini/status/1822912368940081265">poster</a>]</li>
<li><a href="https://arxiv.org/abs/2311.07556">Using Natural Language Explanations to Improve Robustness of In-context Learning</a>, by <a href="https://xlhex.github.io">Xuanli He</a> et al. – we found that integrating NLEs into in-context learning significantly improves the robustness of large language models against adversarial inputs, and show that generating NLEs with frontier models in a few-shot setting can significantly improve accuracy on challenging natural language inference tasks compared to traditional in-context learning and human-generated NLEs.</li>
<li><a href="https://arxiv.org/abs/2406.13229">Probing the Emergence of Cross-lingual Alignment during LLM Training</a>, by <a href="https://x.com/ErikaaWang">Hetong Wang</a> et al. – we analyse how cross-lingual alignment emerges during the training of multilingual large language models by probing neuron activity in different languages. We find that higher neuron overlap between languages correlates strongly with improved zero-shot cross-lingual transfer performance, but also identifies phases during training where both alignment and performance degrade, offering new insights into the dynamics of multilingual model training! [<a href="https://x.com/ErikaaWang/status/1822297520334094724">poster</a>]</li>
</ul>
<h3 id="our-work-at-icml-2024">Our work at ICML 2024</h3>
<p>We will be presenting three works this year at <a href="https://icml.cc/">ICML</a> – one in the main conference and two in co-located workshops:</p>
<ul>
<li><a href="https://arxiv.org/abs/2404.08458">On the Independence Assumption in Neurosymbolic Learning</a>, by <a href="https://www.emilevankrieken.com">Emile van Krieken</a> et al. – we analyse the common assumption in neurosymbolic learning that symbols are conditionally independent given the input, and argue that this assumption biases models towards deterministic solutions and limits their ability to express uncertainty.</li>
<li><a href="https://arxiv.org/abs/2407.15516">Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models</a>, by <a href="https://www.linkedin.com/in/georgy-tyukin-644048150/?originalSubdomain=uk">Georgy Tyukin</a> et al. – we investigate the effects of removing MLP and attention layers in large language models during inference, finding that removing deeper attention layers can only marginally reduce performance while significantly improving inference speed!</li>
<li><a href="https://openreview.net/forum?id=N0lEOF2eDm">An Auditing Test to Detect Behavioral Shift in Language Models</a>, by <a href="https://scholar.google.com/citations?hl=en&user=1BMnCH0AAAAJ&view_op=list_works&sortby=pubdate">Leo Richter</a> et al. – we propose a continuous online auditing framework to detect behavioural shifts in language models, ensuring that deployed models remain aligned with societal values and preventing vendors or attackers from covertly deploying unaligned models for malicious purposes.</li>
</ul>
<h3 id="our-work-at-colm-2024">Our work at CoLM 2024</h3>
<p>The <a href="https://colmweb.org/">Conference on Language Modeling</a> (CoLM) is a very new thing. I have been area-chairing for CoLM this year, and I’m really impressed by the quality of all submissions!
We will be presenting two papers:</p>
<ul>
<li><a href="https://arxiv.org/abs/2404.16041">Forklift: An Extensible Neural Lifter</a>, by <a href="https://jordiae.com/">Jordi Armengol-Estapé</a> et al. – we introduce Forklift, a framework that uses neural models to translate assembly code across different instruction set architectures by “lifting” source assembly code into an intermediate representation, thereby reducing the engineering effort required for cross-architecture software migration!</li>
<li><a href="https://arxiv.org/abs/2405.15984">Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models</a>, by <a href="https://simon-yu.netlify.app/">Simon Yu</a>, <a href="https://probe2.github.io/">Jie He</a> et al. – we analyse the adversarial robustness of retrieval-based in-context learning, finding that while retrieval-augmented methods improve robustness against test sample attacks, they increase vulnerability to adversarially perturbed demonstrations; to address this, we propose a new training-free defence method, which significantly improves adversarial robustness.</li>
</ul>
Mon, 01 Jul 2024 02:00:00 +0200
https://neuralnoise.com///2024/research/
https://neuralnoise.com///2024/research/machine learningnatural language processingresearchacademiaedinburghresearchmachine learningnatural language processingresearchacademiaedinburghLooking for Postdocs, June 2024 Edition<p>We have an opening for a 3-year postdoc – <a href="https://ellis.eu/jobs/post-doctoral-research-associate">more details are available here</a> – on a project funded by Huawei via the Huawei-Edinburgh Joint Lab initiative, with me as the Principal Investigator (PI).</p>
<p>The researcher will work on projects involving the design and application of improving the robustness and trustworthiness of Large Language Models when solving complex reasoning tasks, while improving their explainability and generalisation properties. They will be part of the <a href="https://edinburghnlp.inf.ed.ac.uk/">Edinburgh NLP Group</a>, a world-leading research group in Natural Language Processing.</p>
Sat, 01 Jun 2024 02:00:00 +0200
https://neuralnoise.com///2024/postdoc/
https://neuralnoise.com///2024/postdoc/machine learningnatural language processinghallucinationsreasoningacademiaedinburghpostdocmachine learningnatural language processinghallucinationsreasoningacademiaedinburghpostdocLooking for Postdocs!<p>We have an opening for a 2-year postdoc – <a href="https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1001/job/5583">more details are available here</a> – on a project titled <a href="https://web.inf.ed.ac.uk/eliai/projects/gradient-based-learning-of-complex-latent-structur">Gradient-based Learning of Complex Latent Structures</a>, with me as the Principal Investigator (PI), and <a href="http://nolovedeeplearning.com/">Antonio Vergari</a> (<a href="https://web.inf.ed.ac.uk/anc">IANC</a>) and <a href="https://ducdauge.github.io/">Edoardo Ponti</a> (<a href="https://web.inf.ed.ac.uk/ilcc">ILCC</a>) as co-PIs. The position is entirely funded by the <a href="https://web.inf.ed.ac.uk/eliai">Edinburgh Laboratory for Integrated Artificial Intelligence</a> (ELIAI) – if you want to know more, feel free to reach out!</p>
<p>You can apply <a href="https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1001/job/5583">at this link</a>.</p>
<h3 id="project-description">Project description</h3>
<p>Imposing structural constraints on the latent representations learned by deep neural models has several applications, which can improve their explainability, their robustness, and their ability to generalise to out-of-domain distributions. For example, we can learn more explainable models by making them selectively decide which parts of the input to consider; and we can improve their generalisation properties by learning representations suitable for reasoning tasks, such as deductive reasoning and planning, and comply with any desired constraints. For instance, the intermediate structure can represent a relational graph between objects in the world; the relationships between multiple sub-questions in a complex question; or computation graphs which can be executed to produce a prediction.</p>
<p>In this project, we aim to investigate how we can derive better methods for back-propagating through mixed continuous-discrete complex latent structures, and how we can leverage them for learning more explainable, data-efficient, and robust deep neural models. The reason why discrete latent representations are not widely adopted by deep neural models is that they tend to not interact well with gradient-based optimisation methods, but this started to change recently (e.g., see <a href="https://arxiv.org/abs/2106.01798">Niepert et al., 2021</a>; <a href="https://arxiv.org/abs/2209.04862">Minervini et al. 2022</a>), enabling a wide range of applications and use cases.</p>
<p>Related papers:</p>
<ul>
<li>Niepert, Minervini, and Franceschi - <a href="https://arxiv.org/abs/2106.01798">Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions</a>. NeurIPS 2021</li>
<li>Minervini, Franceschi, and Niepert - <a href="https://arxiv.org/abs/2209.04862">Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models</a>. AAAI 2023</li>
<li>Ahmed, Teso, Chang, Van den Broeck, Vergari - <a href="https://arxiv.org/abs/2206.00426">Semantic Probabilistic Layers for Neuro-Symbolic Learning</a>. NeurIPS 2022</li>
</ul>
<h3 id="position">Position</h3>
<p>The post holder will work on projects involving the design and application of deep learning models with discrete latent structures for improving their explainability, generalisation, and robustness properties. They will be part of the new <a href="https://web.inf.ed.ac.uk/eliai">Edinburgh Laboratory for Integrated Artificial Intelligence</a> and the <a href="https://edinburghnlp.inf.ed.ac.uk/">Edinburgh NLP Group</a>, a world-leading research group in Natural Language Processing.</p>
<p>The School of Informatics is one of the largest research centres in Computer Science in Europe, and it has been <a href="https://www.ed.ac.uk/informatics/news-events/stories/2022/informatics-ref2021-results-global-reach-genuine-i">ranked #1 in the UK</a> in terms of research power by a large margin. The Edinburgh NLP Group is consistently ranked among the <a href="https://csrankings.org/#/index?nlp&world">world’s leading research groups</a> in Natural Language Processing. We are offering an exciting opportunity to work in an interdisciplinary, collaborative, friendly, and supportive environment, integrating different sub-fields of Computer Science and Artificial Intelligence.</p>
Tue, 01 Nov 2022 01:00:00 +0100
https://neuralnoise.com///2022/postdoc/
https://neuralnoise.com///2022/postdoc/machine learningnatural language processingknowledge graphsneuro-symbolic reasoningacademiaedinburghpostdocmachine learningnatural language processingknowledge graphsneuro-symbolic reasoningacademiaedinburghpostdocPhD Projects<p>As mentioned <a href="/2021/research-interests/">here</a>, in September 2022 I joined the <a href="https://www.research.ed.ac.uk/en/organisations/institute-of-language-cognition-and-computation">Institute for Language, Cognition and Communication</a> (ILCC) at the <a href="https://www.ed.ac.uk/informatics">School of Informatics</a>, <a href="https://www.ed.ac.uk/">University of Edinburgh</a>, one of the <a href="https://csrankings.org/#/fromyear/2016/toyear/2022/index?nlp&world">world’s best schools in NLP and related areas</a>, as a faculty member in NLP! If you are interested in working with me, I have funding for multiple PhD students: make sure to apply either to the <a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing">UKRI CDT in Natural Language Processing</a> or to the <a href="http://www.ilcc.inf.ed.ac.uk/study/possible-phd-topics-in-ilcc">ILCC 3-year PhD program</a>!</p>
<p>Some more details on the <a href="https://web.inf.ed.ac.uk/ilcc/study-with-us/studentships/linguistics-speech-technology-cognitive-science">ILCC PhD program</a> – there are <a href="https://web.inf.ed.ac.uk/ilcc/study-with-us/studentships/linguistics-speech-technology-cognitive-science">two deadlines for applying</a>: the first round is on 25th November 2022, and the second round is on 27th January 2023. I strongly recommend that non-UK applicants submit their applications in the first round, to maximise their chances of funding.</p>
<p>Regarding the <a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing">NLP CDT program</a> – there are <a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing/apply">also two deadlines for applying</a>: the first round is on 25th November 2022, and the second round is on 27th January 2023. Likewise, I strongly recommend that non-UK applicants submit their applications in the first round, to maximise their chances of funding.</p>
<p>If you are interested in working with me, you can apply via the ILCC PhD program’s and the NLP CDT program’s application portals. You will be asked to submit a research proposal: this is mostly used for assessing candidate PhD students and for matching them with potential faculty supervisor, and you can decide to work on different problems during your PhD. If you would like some feedback on your research proposal, get in touch!</p>
<p>In the following there’s a (non-exhaustive but fairly up-to-date) list of PhD topics we may decide to work on – this list is also available on the <a href="https://web.inf.ed.ac.uk/ilcc/study-with-us/possible-phd-topics-ilcc/language-processing-computational-linguistics">Possible PhD topics in ILCC</a> page. An older list of possible research topics is also available <a href="/2021/research-interests/">at this link</a>, and feel free to propose new project topics that intest you! I’m always happy to explore new directions!</p>
<h4 id="open-domain-complex-question-answering-at-scale">Open-Domain Complex Question Answering at Scale</h4>
<p>Open-Domain Question Answering (ODQA) is a task where a system needs to generate the answer to a given general-domain question, and the evidence is not given as input to the system. A core limitation of modern ODQA models (and, more generally, of all models for solving <a href="https://aclanthology.org/2021.naacl-main.200/">knowledge-intensive tasks</a>) is that they remain limited to answering simple, factoid questions, where the answer to the question is explicit in a single piece of evidence. In contrast, complex questions involve aggregating information from multiple documents, requiring some form of logical reasoning and sequential, multi-hop processing in order to generate the answer. Projects in this area involve proposing new ODQA models for answering complex questions, for example, by taking inspiration from models for answering complex queries in Knowledge Graphs (<a href="https://arxiv.org/abs/2011.03459">Arakaleyan et al., 2021</a>; <a href="https://www.ijcai.org/proceedings/2022/741">Minervini et al., 2022a</a>) and Neural Theorem Provers (<a href="https://arxiv.org/abs/2007.06477">Minervini et al., 2020a</a>; <a href="https://arxiv.org/abs/1912.10824">Minervini et al., 2020b</a>) and proposing methods by which neural ODQA models can learn to search in massively large text corpora, such as the entire Web.</p>
<h4 id="neuro-symbolic-and-hybrid-discrete-continuous-natural-language-processing-models">Neuro-Symbolic and Hybrid Discrete-Continuous Natural Language Processing Models</h4>
<p>Incorporating discrete components, such as discrete decision steps and symbolic reasoning algorithms, in neural models can significantly improve their interpretability, data efficiency, and predictive properties — for example, see (<a href="https://arxiv.org/abs/2106.01798">Niepert et al., 2021</a>; <a href="https://arxiv.org/abs/2209.04862">Minervini et al., 2022b</a>; <a href="(https://www.ijcai.org/proceedings/2022/741)">Minervini et al., 2020a</a>; <a href="https://arxiv.org/abs/1912.10824">Minervini et al., 2020b</a>). However, approaches in this space rely either on ad-hoc continuous relaxations (e.g., <a href="(https://www.ijcai.org/proceedings/2022/741)">Minervini et al., 2020a</a>, <a href="https://arxiv.org/abs/1912.10824">Minervini et al., 2020b</a>) or on gradient estimation techniques that require some assumptions on the distributions of the discrete variables (<a href="https://arxiv.org/abs/2106.01798">Niepert et al., 2021</a>; <a href="https://arxiv.org/abs/2209.04862">Minervini et al., 2022b</a>). Projects in this area involve devising neuro-symbolic approaches for solving NLP tasks that require some degree of reasoning and compositionality and identifying gradient estimation techniques (for back-propagating through discrete decision steps) that are both data-efficient, hyperparameter-free, accurate, and require fewer assumptions on the distribution of the discrete variables.</p>
<h4 id="learning-from-graph-structured-data">Learning from Graph-Structured Data</h4>
<p>Graph-structured data is everywhere – e.g. consider Knowledge Graphs, social networks, protein and drug interaction networks, and molecular profiles. In this project, we aim to improve models for learning from graph-structured data and their evaluation protocols. Projects in this area involve incorporating invariances and constraints in graph machine learning models (e.g., see <a href="https://arxiv.org/abs/1707.07596">Minervini et al., 2017</a>), proposing methods of transferring knowledge between graph representations, automatically identifying functional inductive biases for learning from graphs from a given domain (such as Knowledge Graphs – for example, see <a href="https://arxiv.org/abs/2207.09980">our NeurIPS 2022 paper on incorporating the inductive biases used by factorisation-based models into GNNs</a>) and proposing techniques for explaining the output of black-box graph machine learning methods (such as graph embeddings).</p>
Sat, 01 Oct 2022 02:00:00 +0200
https://neuralnoise.com///2022/phd-projects/
https://neuralnoise.com///2022/phd-projects/machine learningnatural language processingknowledge graphsneuro-symbolic reasoningacademiaedinburghphdmachine learningnatural language processingknowledge graphsneuro-symbolic reasoningacademiaedinburghphdCall for PhD Students<p>From September 2022, I will join the <a href="https://www.research.ed.ac.uk/en/organisations/institute-of-language-cognition-and-computation">Institute for Language, Cognition and Communication</a> (ILCC) at the <a href="https://www.ed.ac.uk/informatics">School of Informatics</a>, <a href="https://www.ed.ac.uk/">University of Edinburgh</a>!</p>
<p>And there is more! I have <strong>funding for multiple PhD students</strong>: if you are interested in working with me, make sure to apply either to the <a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing">UKRI CDT in Natural Language Processing</a> or to the <a href="http://www.ilcc.inf.ed.ac.uk/study/possible-phd-topics-in-ilcc">ILCC 3-year PhD program</a>.</p>
<p>In general, I care about anything that can help Deep Learning models become more <em>data-efficient</em>, <em>statistically robust</em>, and <em>explainable</em>. As Artificial Intelligence and Machine Learning systems become more pervasive in areas like critical infrastructures, education, and healthcare, there is an increasing need of AI-based systems that we can <strong>trust</strong>.
For example, the European Union is working on a <a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai">new set of regulations</a> that will enforce AI-based systems used in high-risk areas to be able to produce high-quality explanations to their users and high levels of robustness and accuracy, among other things.
This will automatically exclude the vast majority of the Deep Learning systems that we love and work with on a daily basis.</p>
<p>My research focuses about <em>filling this gap</em>, and developing Deep Learning systems that can produce <em>faithful explanations</em>, that can learn from fewer examples (e.g. thanks to stronger inductive biases), and that can work even on out-of-distribution samples (such as adversarial inputs).</p>
<p>Probably you may want to know a bit more about my research so far in these directions – here are some pointers. Let me now if any of these clicks with you, and feel free to reach out!</p>
<h3 id="bridging-neural-and-symbolic-computation">Bridging Neural and Symbolic Computation</h3>
<p>One way I am trying to address some of the limitations of modern Deep Learning models is by designing <em>hybrid</em> approaches, that inheret the strength of both neural and symbolic systems.</p>
<p>For example, let’s consider the problem of answering complex symbolic queries on (potentially very large) Knowledge Graphs. In our paper <a href="https://arxiv.org/abs/2011.03459">Complex Query Answering with Neural Link Predictors</a>, presented at <a href="https://iclr.cc/Conferences/2021">ICLR 2021</a>, we presented an hybrid approach where the query answering task is reduced to solving an optimisation problem whose structure follows the compositional logic structure of the query. Using orders of magnitude less training data, our approach obtains significant improvements in comparison with the purely-neural state-of-the-art models developed in this space, while also being able to produce faithful explanations to its users. This paper obtained an <a href="https://iclr-conf.medium.com/announcing-iclr-2021-outstanding-paper-awards-9ae0514734ab">Outstanding Paper Award</a> at <a href="https://iclr.cc/Conferences/2021">ICLR 2021</a>.</p>
<p>Or, for example, let’s consider the problem of <em>deductive reasoning</em> – i.e. deriving logical conclusions. <a href="https://arxiv.org/abs/1908.06177">Previous research</a> shows that even BERT-based models do not generalise properly when required to perform reasoning tasks that differ from these observed during training – e.g. because they require composing multiple reasoning patterns, that were never observed together at training time. We proposed several approaches for solving this problem, by designing neural models whose behaviour mimics the behaviour of logic deductive reasoners. Our approaches <a href="https://arxiv.org/abs/1906.06187">enable neural models to perform multi-hop reasoning over multiple documents</a> (<a href="https://acl2019.org/EN/index.xhtml.html">ACL 2019</a>), and <a href="">learn logic rules from graph-structured data</a> (<a href="https://icml.cc/Conferences/2020">ICML 2020</a> and <a href="https://aaai.org/Conferences/AAAI-20/">AAAI 2020</a>).</p>
<p>More recently, we were wondering whether it could be possible to incorporate black-box algorithmic components, like <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">Dijkstra’s shortest path algorithm</a> or any <a href="https://en.wikipedia.org/wiki/Integer_programming">ILP solver</a> in a neural model. In our paper <a href="https://arxiv.org/abs/2106.01798">Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions</a>, presented at <a href="https://nips.cc/Conferences/2021/">NeurIPS 2021</a>, we developed a very general (and extremely simple!) method for back-propagating through a massive variety of algorithmic components, effectively allowing neural models to use them as off-the-shelf components. See <a href="https://www.youtube.com/watch?v=hb2b0K2PTxI">our presentation</a> of this paper, as well as <a href="https://www.youtube.com/watch?v=W2UT8NjUqrk">Yannic Kilcher’s explanation</a>.</p>
<h3 id="incorporating-constraints-in-neural-models">Incorporating Constraints in Neural Models</h3>
<p>Some other times, we would like a neural model to <em>comply</em> with a given set constraints. For example, we would like that, when our model predicts that <em>$X$ is a parent of $Y$ *, and *$Y$ is a parent of $Z$</em>, we would also like it to predict that <em>$X$ is a grandparent of $Z$</em>. Constraints are key for developing statistically robust model – for example, think of <em>adversarial perturbations</em> in computer vision. In the case of adversarial perturbations, the model is essentially violating a single constraint, i.e. <em>given an image $X$, if $Y$ is a semantically-invariant perturbation of $X$, the model should produce the same output for both $X$ and $Y$</em>.</p>
<p>In our paper <a href="http://auai.org/uai2017/proceedings/papers/306.pdf">Adversarial Sets for Regularising Neural Link Predictors</a>, presented at UAI 2017, we presented the first method for incorporating arbitrary constraints encoded in the form of First-Order Logic rules in a wide class of neural models. Our idea is very simple and general: during training, at each step, we can define an <em>adversary</em> that finds on which inputs the model maximally violates a given constraint, and then require the model to reduce the degree of such violations. We also show that, for a wide class of models and constraint types, we can have <em>efficient and globally-optimal</em> solutions to the problem of finding where the model maximally violates a constraint. This is pretty amazing, since (1) it makes the training procedure extremely efficient, adding very little overhead, and (2) if the search process does not return any significant violation of a constraint, it means that <em>the model will never violate that constraint, for every possible input it may encounter</em>. This provides a way of producing some kind of <em>safety guarantees</em> for a large set of neural models, which are very desirable in a lot of high-risk settings.</p>
<p>We explored further applications of these ideas in several settings. For example, in <a href="https://arxiv.org/abs/1808.08609">Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge</a> (<a href="https://www.conll.org/2018">CoNLL 2018</a>), we show that some common-sense reasoning patterns can also be represented as constraints, and incporporating these in neural Natural Language Inference (NLI) models yields improvements both on in-distribution and out-of-distribution data. In <a href="https://arxiv.org/abs/2004.07790">Gone At Last: Removing the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training</a> (<a href="https://2020.emnlp.org/">EMNLP 2020</a>), we show that we can use <em>ensembles of adversaries</em> for de-biasing neural NLI models. In <a href="https://arxiv.org/abs/2003.04808">Undersensitivity in Neural Reading Comprehension</a> (<a href="https://2020.emnlp.org/">Findings of EMNLP 2020</a>), we found that neural Question Answering (QA) models can often ignore semantically meaningful variations in the input questions, and proposed a related training process for correcting such behaviour. In <a href="https://arxiv.org/abs/1910.03065">Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations</a> (<a href="https://acl2020.org/">ACL 2020</a>), we identified that models for producing natural language explanations often violate self-consistency constraints, and can produce mutually inconsistent explanations.</p>
Fri, 01 Oct 2021 02:00:00 +0200
https://neuralnoise.com///2021/research-interests/
https://neuralnoise.com///2021/research-interests/machine learningnatural language processingacademiaedinburghphdmachine learningnatural language processingacademiaedinburghphdSome notes on Gaussian Fields and Label Propagation<p>In several occasions, we find ourselves in need of <em>propagating</em> information among nodes in an undirected graph.</p>
<p>For instance, consider graph-based Semi-Supervised Learning (SSL): here, labeled and unlabeled examples are represented by an undirected graph, referred to as the <em>similarity graph</em>.</p>
<p>The task consists in finding a <em>label assignment</em> to all examples, such that:</p>
<ol>
<li>The final labeling is consistent with training data (e.g. positive training examples are still classified as positive at the end of the learning process), and</li>
<li>Similar examples are assigned similar labels: this is referred to as the <em>semi-supervised smoothness assumption</em>.</li>
</ol>
<p>Similarly, in networked data such as social networks, we might assume that related entities (such as <em>friends</em>) are associated to similar attributes (such as political and religious views, musical tastes and so on): in social network analysis, this phenomenon is commonly referred to as <em>homophily</em> (love of the same).</p>
<p>In both cases, propagating information from a limited set of nodes in a graph to all nodes provides a method for predicting the attributes of such nodes, when this information is missing.</p>
<p>In the following, we introduce a really clever method for efficiently propagating information about nodes in undirected graphs, known as the <em>Gaussian Fields</em> method.</p>
<h3 id="propagation-as-a-cost-minimization-problem">Propagation as a Cost Minimization Problem</h3>
<p>We now cast the propagation problem as a binary classification task.
Let $X = \{ x_{1}, x_{2}, \ldots, x_{n} \}$ be a set of $n$ instances, of which only $l$ are labeled: $X^{+}$ are positive examples, while $X^{-}$ are negative examples</p>
<p>Similarity relations between instances can be represented by means of an undirected similarity graph having adjacency matrix $\mathbf{W} \in \mathbb{R}^{n \times n}$: if two instances are connected in the similarity graph, it means that they are considered <em>similar</em>, and should be assigned the same label.
Specifically, $\mathbf{W}_{ij} > 0$ iff the instances $x_{i}, x_{j} \in X$ are connected by an edge in the similarity graph, and $\mathbf{W}_{ij} = 0$ otherwise.</p>
<p>Let $y_{i} \in \{ \pm 1 \}$ be the label assigned to the $i$-th instance $x_{i} \in X$.
We can encode our assumption that <em>similar instances should be assigned similar labels</em> by defining a quadratic cost function over labeling functions in the form $f : X \mapsto \{ \pm 1 \}$:</p>
\[E(f) = \frac{1}{2} \sum_{x_{i} \in X} \sum_{x_{j} \in X} \mathbf{W}_{ij} \left[ f(x_{i}) - f(x_{j}) \right]^{2}.\]
<p>Given an input labeling function $f$, the cost function $E(\cdot)$ associates, for each pair of instances $x_{i}, x_{j} \in X$, a non-negative cost $\mathbf{W}_{ij} \left[ f(x_{i}) - f(x_{j}) \right]$: this quantity is $0$ when $\mathbf{W}_{ij} = 0$ (i.e. $x_{i}$ and $X_{j}$ are not linked in the similarity graph), or when $f(x_{i}) = f(x_{j})$ (i.e. they are assigned the same label).</p>
<p>For such a reason, the cost function $E(\cdot)$ favors labeling functions that are more likely to assign the same labels to instances that are linked by an edge in the similarity graph.</p>
<p>Now, the problem of finding a labeling function that is both consistent with training labels, and assigns similar labels to similar instances, can be cast as a <em>cost minimization problem</em>. Let’s represent a labeling function $f$ by a vector $\mathbf{f} \in \mathbb{R}^{n}$, $L \subset X$ denote labeled instances, and $\mathbf{y}_{i} \in \{ \pm 1 \}$ denote the label of the $x_{i}$-th instance.
The optimization problem can be defined as follows:</p>
\[\begin{aligned}
& \underset{\mathbf{f} \in \{ \pm 1 \}^{n}}{\text{minimize}}
& & E(\mathbf{f}) \\
& \text{subject to}
& & \forall x \in L: \; \mathbf{f}_{i} = \mathbf{y}_{i}.
\end{aligned}\]
<p>The constraint $\forall x \in L : \mathbf{f}_{i} = \mathbf{y}_{i}$ enforces the label of each labeled example $x_{i} \in L$ to $\mathbf{f}_{i} = +1$ if the instance has a positive label, and to $\mathbf{f}_{i} = -1$ if the instance has a negative label, so to achieve consistency with training labels.</p>
<p>However, constraining labeling functions $f$ to only take discrete values has two main drawbacks:</p>
<ul>
<li>Each function $f$ can only provide <em>hard</em> classifications, without yielding any measure of confidence in the provided classification.</li>
<li>The cost term $E(\cdot)$ can be hard to optimize in a multi-label classification setting.</li>
</ul>
<p>For overcoming such limitations, Zhu et al. propose a <em>continuous relaxation</em> of the previous optimization problem:</p>
\[\begin{aligned}
& \underset{\mathbf{f} \in \mathbb{R}^{n}}{\text{minimize}}
& & E(\mathbf{f}) \\
& \text{subject to}
& & \forall x \in L: \; \mathbf{f}_{i} = \mathbf{y}_{i},
\end{aligned}\]
<p>where the term $\sum_{x_{i} \in X} \mathbf{f}_{i}^{2} = \mathbf{f}^{T} \mathbf{f}$ is a $L_{2}$ regularizer over $\mathbf{f}$, weighted by a parameter $\epsilon > 0$ which ensures that the optimization problem has a unique global solution.</p>
<p>The parameter $\epsilon$ can be interpreted as the <em>decay</em> of the propagation process: as the distance from a labeled instance within the similarity graph increases, the confidence in the classification (as measured by the continuous label) gets closer to zero.</p>
<p>This optimization problem has a unique, global solution that can be calculated in closed-form. Specifically, the optimal (relaxed) discriminant function $f : X \mapsto \mathbb{R}$ is given by $\mathbf{\hat{f}} = \left[ \mathbf{f}_{L}, \mathbf{f}_{U} \right]^{T}$, where $\mathbf{\hat{f}}_{L} = \mathbf{y}_{L}$ (i.e. labels for labeled examples in $L$ coincide with training labels), while $\mathbf{\hat{f}}_{U}$ is given by:</p>
\[\mathbf{\hat{f}}_{U} = (\mathbf{L}_{UU} + \epsilon \mathbf{I})^{-1} \mathbf{W}_{UL} \mathbf{\hat{f}}_{L},\]
<p>where $\mathbf{L} = \mathbf{D} - \mathbf{W}$ is the <em>graph Laplacian</em> of the similarity graph with adjacency matrix $\mathbf{W}$, and $\mathbf{D}$ is a diagonal matrix such that $\mathbf{D}_{ii} = \sum_{j} \mathbf{W}_{ij}$.</p>
Sun, 01 Jan 2017 01:00:00 +0100
https://neuralnoise.com///2017/gaussian-fields/
https://neuralnoise.com///2017/gaussian-fields/machine learningsemi-supervised learningmachine learningsemi-supervised learning