neuralnoise.comHomepage of Dr Pasquale Minervini, PhD <br/> Researcher at University College London <br/> London, United Kingdom
http://www.neuralnoise.com//
Fri, 17 Dec 2021 10:52:28 +0100Fri, 17 Dec 2021 10:52:28 +0100Jekyll v4.2.0Call for PhD Students!<p>From September 2022, I will join the <a href="https://www.research.ed.ac.uk/en/organisations/institute-of-language-cognition-and-computation">Institute for Language, Cognition and Communication</a> (ILCC) at the <a href="https://www.ed.ac.uk/informatics">School of Informatics</a>, <a href="https://www.ed.ac.uk/">University of Edinburgh</a> as a faculty member!</p>
<p>I have <strong>funding for multiple PhD students</strong>: if you would like to work with me, make sure to apply to the <a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing">UKRI CDT in Natural Language Processing</a> and the <a href="http://www.ilcc.inf.ed.ac.uk/study/possible-phd-topics-in-ilcc">ILCC PhD Program</a>. The deadline for applying is <strong><a href="https://web.inf.ed.ac.uk/cdt/natural-language-processing/apply">January 28th, 2022</a></strong>: this deadline is mainly for UK applicants, but <strong>foreign applicants will be considered as well</strong>. If you have any further questions, feel free to reach out! ðŸ˜Š And please, <strong>share this ad</strong> with your friends and connections that may be interested, especially if they come from under-represented groups!</p>
<p>I care about anything that can help Deep Learning models become more <strong>data-efficient</strong>, <strong>statistically robust</strong>, and <strong>explainable</strong>. As Artificial Intelligence and Machine Learning systems become more pervasive in high-risk areas like education and healthcare, there is an increasing need for AI-based systems that <strong>we can trust</strong>.</p>
<p>My research focuses on <strong>filling this gap</strong> and developing Deep Learning systems that can produce explanations, that can learn from fewer examples, and that can handle out-of-distribution data such as adversarial inputs.</p>
<p>Probably you may want to know a bit more about my research so far in these directions â€“ here are a few pointers about some of my recent works. Let me know if any of these clicks with you!</p>
<h3 id="bridging-neural-and-symbolic-computation">Bridging Neural and Symbolic Computation</h3>
<p>One way I am trying to address some of the limitations of modern Deep Learning models is by designing <strong>hybrid approaches</strong> that inherit the strength of both neural and symbolic systems.</p>
<p>For example, letâ€™s consider the problem of answering complex symbolic queries on potentially very large Knowledge Graphs. In our paper <a href="https://arxiv.org/abs/2011.03459">Complex Query Answering with Neural Link Predictors</a>, we propose a hybrid approach that combines symbolic and neural computation. Using orders of magnitude less training data, our approach obtains significant improvements compared with the purely-neural state-of-the-art models while also being able to produce faithful explanations to its users. This paper received an <a href="https://iclr-conf.medium.com/announcing-iclr-2021-outstanding-paper-awards-9ae0514734ab">Outstanding Paper Award</a> at <a href="https://iclr.cc/Conferences/2021">ICLR 2021</a>.</p>
<p>Or, for example, letâ€™s consider tasks that require some sort of <strong>logic deductive reasoning</strong>. <a href="https://arxiv.org/abs/1908.06177">Previous research</a> shows that even BERT-based models may not generalise properly when they are required to perform new reasoning tasks. We proposed several approaches for solving this problem by designing neural models whose behaviour mimics that of logic deductive reasoners. Our approaches enable neural models to <a href="https://arxiv.org/abs/1906.06187">answer multi-hop question</a> and to <a href="https://arxiv.org/abs/2007.06477"> jointly learn logic rules and reasoning policies</a>, even <a href="https://arxiv.org/abs/1912.10824">on massive Knowledge Bases</a>.</p>
<p>More recently, we were wondering whether it could be possible to incorporate black-box algorithmic components, like <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">Dijkstraâ€™s shortest path algorithm</a> or any <a href="https://en.wikipedia.org/wiki/Integer_programming">ILP solver</a>, in a neural model. In our paper <a href="https://arxiv.org/abs/2106.01798">Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions</a>, we propose a very general (and <a href="https://github.com/uclnlp/torch-imle">extremely simple!</a>) method for back-propagating through a massive variety of algorithmic components, effectively allowing neural models to use them as off-the-shelf components. See <a href="https://www.youtube.com/watch?v=hb2b0K2PTxI">our NeurIPS 2021 presentation</a> of this paper, as well as <a href="https://www.youtube.com/watch?v=W2UT8NjUqrk">Yannic Kilcherâ€™s explanation</a>.</p>
<h3 id="incorporating-constraints-in-neural-models">Incorporating Constraints in Neural Models</h3>
<p>Some other times, we would like a neural model to comply with a given set constraints, coming for example from domain experts or from the existing laws.</p>
<p>In <a href="http://auai.org/uai2017/proceedings/papers/306.pdf">Adversarial Sets for Regularising Neural Link Predictors</a>, we propose the first framework for incorporating a wide family of (First-Order!) logic constraints in neural models. Our framework can also be useful for producing formal <strong>robustness guarantees</strong>: in many interesting cases, we can mathematically prove that <em>for any possible input, the model will never violate a given set of constraints</em>!</p>
<p>We explored further applications of these ideas in several settings. For example, in <a href="https://arxiv.org/abs/1808.08609">Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge</a>, we show that some common-sense reasoning patterns can also be represented as constraints, and incorporating these in neural Natural Language Inference (NLI) models yields improvements both on in-distribution and out-of-distribution data. In <a href="https://arxiv.org/abs/2004.07790">Gone At Last: Removing the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training</a>, we propose a method for effectively de-biasing neural NLI models. In <a href="https://arxiv.org/abs/2003.04808">Undersensitivity in Neural Reading Comprehension</a>, we find that neural Question Answering (QA) models can often ignore semantically meaningful variations in the inputs, and analyse different ways of correcting such behaviour. In <a href="https://arxiv.org/abs/1910.03065">Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations</a>, we show that models for producing natural language explanations can easily contradict themselves!</p>
<p>If you want to connect, feel free to chat me up at <a href="https://nips.cc/">NeurIPS</a> in case you are attending, or just send me an <a href="mailto:p.minervini@gmail.com">e-mail</a>!</p>
Wed, 01 Dec 2021 01:00:00 +0100
http://www.neuralnoise.com//2021/call-for-phd-students/
http://www.neuralnoise.com//2021/call-for-phd-students/machine learningnatural language processingacademiaedinburghphdscholarshipsmachine learningnatural language processingacademiaedinburghphdscholarshipsSome notes on Gaussian Fields and Label Propagation<p>In several occasions, we find ourselves in need of <em>propagating</em> information among nodes in an undirected graph.</p>
<p>For instance, consider graph-based Semi-Supervised Learning (SSL): here, labeled and unlabeled examples are represented by an undirected graph, referred to as the <em>similarity graph</em>.</p>
<p>The task consists in finding a <em>label assignment</em> to all examples, such that:</p>
<ol>
<li>The final labeling is consistent with training data (e.g. positive training examples are still classified as positive at the end of the learning process), and</li>
<li>Similar examples are assigned similar labels: this is referred to as the <em>semi-supervised smoothness assumption</em>.</li>
</ol>
<p>Similarly, in networked data such as social networks, we might assume that related entities (such as <em>friends</em>) are associated to similar attributes (such as political and religious views, musical tastes and so on): in social network analysis, this phenomenon is commonly referred to as <em>homophily</em> (love of the same).</p>
<p>In both cases, propagating information from a limited set of nodes in a graph to all nodes provides a method for predicting the attributes of such nodes, when this information is missing.</p>
<p>In the following, we introduce a really clever method for efficiently propagating information about nodes in undirected graphs, known as the <em>Gaussian Fields</em> method.</p>
<h3 id="propagation-as-a-cost-minimization-problem">Propagation as a Cost Minimization Problem</h3>
<p>We now cast the propagation problem as a binary classification task.
Let $X = \{ x_{1}, x_{2}, \ldots, x_{n} \}$ be a set of $n$ instances, of which only $l$ are labeled: $X^{+}$ are positive examples, while $X^{-}$ are negative examples</p>
<p>Similarity relations between instances can be represented by means of an undirected similarity graph having adjacency matrix $\mathbf{W} \in \mathbb{R}^{n \times n}$: if two instances are connected in the similarity graph, it means that they are considered <em>similar</em>, and should be assigned the same label.
Specifically, $\mathbf{W}_{ij} > 0$ iff the instances $x_{i}, x_{j} \in X$ are connected by an edge in the similarity graph, and $\mathbf{W}_{ij} = 0$ otherwise.</p>
<p>Let $y_{i} \in \{ \pm 1 \}$ be the label assigned to the $i$-th instance $x_{i} \in X$.
We can encode our assumption that <em>similar instances should be assigned similar labels</em> by defining a quadratic cost function over labeling functions in the form $f : X \mapsto \{ \pm 1 \}$:</p>
\[E(f) = \frac{1}{2} \sum_{x_{i} \in X} \sum_{x_{j} \in X} \mathbf{W}_{ij} \left[ f(x_{i}) - f(x_{j}) \right]^{2}.\]
<p>Given an input labeling function $f$, the cost function $E(\cdot)$ associates, for each pair of instances $x_{i}, x_{j} \in X$, a non-negative cost $\mathbf{W}_{ij} \left[ f(x_{i}) - f(x_{j}) \right]$: this quantity is $0$ when $\mathbf{W}_{ij} = 0$ (i.e. $x_{i}$ and $X_{j}$ are not linked in the similarity graph), or when $f(x_{i}) = f(x_{j})$ (i.e. they are assigned the same label).</p>
<p>For such a reason, the cost function $E(\cdot)$ favors labeling functions that are more likely to assign the same labels to instances that are linked by an edge in the similarity graph.</p>
<p>Now, the problem of finding a labeling function that is both consistent with training labels, and assigns similar labels to similar instances, can be cast as a <em>cost minimization problem</em>. Letâ€™s represent a labeling function $f$ by a vector $\mathbf{f} \in \mathbb{R}^{n}$, $L \subset X$ denote labeled instances, and $\mathbf{y}_{i} \in \{ \pm 1 \}$ denote the label of the $x_{i}$-th instance.
The optimization problem can be defined as follows:</p>
\[\begin{aligned}
& \underset{\mathbf{f} \in \{ \pm 1 \}^{n}}{\text{minimize}}
& & E(\mathbf{f}) \\
& \text{subject to}
& & \forall x \in L: \; \mathbf{f}_{i} = \mathbf{y}_{i}.
\end{aligned}\]
<p>The constraint $\forall x \in L : \mathbf{f}_{i} = \mathbf{y}_{i}$ enforces the label of each labeled example $x_{i} \in L$ to $\mathbf{f}_{i} = +1$ if the instance has a positive label, and to $\mathbf{f}_{i} = -1$ if the instance has a negative label, so to achieve consistency with training labels.</p>
<p>However, constraining labeling functions $f$ to only take discrete values has two main drawbacks:</p>
<ul>
<li>Each function $f$ can only provide <em>hard</em> classifications, without yielding any measure of confidence in the provided classification.</li>
<li>The cost term $E(\cdot)$ can be hard to optimize in a multi-label classification setting.</li>
</ul>
<p>For overcoming such limitations, Zhu et al. propose a <em>continuous relaxation</em> of the previous optimization problem:</p>
\[\begin{aligned}
& \underset{\mathbf{f} \in \mathbb{R}^{n}}{\text{minimize}}
& & E(\mathbf{f}) \\
& \text{subject to}
& & \forall x \in L: \; \mathbf{f}_{i} = \mathbf{y}_{i},
\end{aligned}\]
<p>where the term $\sum_{x_{i} \in X} \mathbf{f}_{i}^{2} = \mathbf{f}^{T} \mathbf{f}$ is a $L_{2}$ regularizer over $\mathbf{f}$, weighted by a parameter $\epsilon > 0$ which ensures that the optimization problem has a unique global solution.</p>
<p>The parameter $\epsilon$ can be interpreted as the <em>decay</em> of the propagation process: as the distance from a labeled instance within the similarity graph increases, the confidence in the classification (as measured by the continuous label) gets closer to zero.</p>
<p>This optimization problem has a unique, global solution that can be calculated in closed-form. Specifically, the optimal (relaxed) discriminant function $f : X \mapsto \mathbb{R}$ is given by $\mathbf{\hat{f}} = \left[ \mathbf{f}_{L}, \mathbf{f}_{U} \right]^{T}$, where $\mathbf{\hat{f}}_{L} = \mathbf{y}_{L}$ (i.e. labels for labeled examples in $L$ coincide with training labels), while $\mathbf{\hat{f}}_{U}$ is given by:</p>
\[\mathbf{\hat{f}}_{U} = (\mathbf{L}_{UU} + \epsilon \mathbf{I})^{-1} \mathbf{W}_{UL} \mathbf{\hat{f}}_{L},\]
<p>where $\mathbf{L} = \mathbf{D} - \mathbf{W}$ is the <em>graph Laplacian</em> of the similarity graph with adjacency matrix $\mathbf{W}$, and $\mathbf{D}$ is a diagonal matrix such that $\mathbf{D}_{ii} = \sum_{j} \mathbf{W}_{ij}$.</p>
Sun, 01 Jan 2017 01:00:00 +0100
http://www.neuralnoise.com//2017/gaussian-fields/
http://www.neuralnoise.com//2017/gaussian-fields/machine learningsemi-supervised learningmachine learningsemi-supervised learning