Again 7x7 papers!! yay!!! (Note that the papers from October 4 count towards this week) |
|
Rei: Semi-supervised Multitask Learning for Sequence Labelling
They propose a model based on bi-directional LSTMs where, additionally to the task objective (tagging), the forward lstm has to predict the next token, and the backward lstm has to predict the previous token. They achieve consistent improvements over a large range of tasks.
---
Smith and Topin: Deep CNN Design Patterns
They give an overview over typical design choices for (vision-)cnns, such as that the first layer increases the number of filters by a large factor. They also propose two innovative design choices: (1) freeze-drop training, where parts of a symmetrical model are either de-activated or not trained, and (2) tailor expansion networks where coefficients of a polynomial are learned.
---
Ye et al.: Jointly extracting relations with class ties via effective deep ranking
They combine ideas from CNN-based relation classification with attention over instances (~multi instance learning) and a ranking-based objective. For the ranking based objective a fact score of a positive fact must be higher than that of a negative one (plus a margin). Additionally, the attention mechanism can be calculated using any positive relation for the instance tuple, not just the one that is currently predicted.
---
Iyyer et al.: Search-based neural structured learning for sequential question answering
They reformulate Wiki-Table based QA, such that a complex question is split into a set of smaller questions. A semantc parser that generates SQL code (over tables) is learned using reinforcement learning. Partial feedback is given for partially correct results (using Jaccard coefficient).
---
Chen et al.: Reading Wikipedia to answer open domain questions
A combined pipeline of IR and DeepQA is built, and applied to Wikipedia. It is evaluated on TrecQA data. Since Trec data is not annotated data for provenience, a distantly supervised data set is constructed, where paragraphs need to contain the answer and have token overlap with the query. The Prediction module uses LSTM encoder over words and other features (such as query overlap indicators); prediction is done with a pointer model.
---
Peng et al.: Cross-sentence n-ary relation extraction with graph lstms
They work on a novel, distantly supervised, corpus which contains bio-medical n-ary relations. They use an bidirectional lstm, which contains, additionally to linear chain connections, connections from a parse tree. For each direction, only edges in that direction are considered; different edge-labels have different lstm parameters.
---
Mishra et al.: Domain-targeted, high precision knowledge extraction
They propose a large-scale openIE data set for helping with grade-school science tests. They create their tuples using standard techniques such as head-word selection. They have 15% of he tuples annotated, and train a classifier to filter the rest (count-based features). They achieve generalization using hypernym metrics derived from kb's (WordNet) and counts, and optimizing via ILP, enforcing the constraint that each predicate is either a generalizing or generalized concept.
|
|
Abend and Rappoport: The State of the Art in Semantic Representation
Several semantic representation frameworks are compared: Framenet, Verbnet, UCCA, CCG, HPSG etc. The main differences lie in how much they follow a syntactic analysis or deviate from it in order to represent similar meaning in a similar way. This difference is e.g. important for machine translation. Another difference is in how much the formalisms are lexicalized (e.g. does each verb have its own roles) or universal (e.g. role inventory shared among all verbs).
---
Du et al.: Learning to Ask: Neural Question Generation for Reading Comprehension
They learn a se2seq model from SQUAD to generate questions. Their model uses attention, and encodes part of the paragraph together with the sentence for which the question should be generated. Comparison to and outperformance of rule-based systems, as measured by automatic and human evaluation.
---
Yang & Mitchell: Leveraging Knowledge Bases for Improving Machine Reading
They employ type information from Freebase and WordNet to improve lstm-based tagging tasks. For each word they search for contexts (entities/types/synsets) in the KBs, for which embeddings were learned beforehand. The averaged, transformed and weighted embeddings are added to the hidden state.
---
Zhang et al.: Prior knowledge integration for neural machine translation using posterior regularization
They extend the typical likelihood term that macimizes p(y|x;theta) by a second term (posterior regularization) that makes the model prediction p(y|x;theta) similar to a second function p(y|x;gamma). This second function is simultaneously learned, however it is constrined to a log-linear function of predefined features. Since the log-linear function need to calculate the KL divergence over an exponential combinatorial space and has an expensive normalization term, it is approximated by sampling.
---
Yu et al: Improved neural relation detection for kb question answering
They provide a model that scores KB relations given a natural language question. High scored relations are then combined to yield a database query on the KB. The scoring between NL question and relations is achieved by appending the relation terms to the question and process it with lstms (+pooling).
---
Cui et al.: Attention over attention networks for reading comprehension
The task is cloze style question answering where a term from a document needs to fill a placeholder in a query. They extend a bidirectional attention model, where they first normalize the co-attention matrix for the document and query, and then average the query to get a vector. They then use the dot product between the attention matrix and this vector to obtain scores for each term in the document; the maximum indicates the answer.
---
Katiya & Cardie: Going on a limb: Joint extraction of entity mentions an relations without dependency trees
They propose a model for joint ne tagging and relation extraction on ace data. The model is based on LSTMs, and has a tagging objective. Additionally it has a relation objective, which computes for each preceding position a relational score. The maximum of this score indicates which token stands in a relation (as in pointer networks). Relation types are also predicted.
|
|
Rehbein & Ruppenhofer: Detecting annotation noise in automatically labeleld data
They apply an algorithm that was applied for singleing out bad annotations by humans to combine the output of different taggers, and to identify errors made by them. Underlying there is a variational approximation of the true probability p(y|x) as estimated from the tagger outputs. If this probability has a high Entropy (i.e. classifiers can't decide between labels) the instance is selected for annotation. The variational estimation uses a prior, and this prior is updated with those annotations.
---
Xu et al.: A Local Detection Approach for Named Entity Recognition and Mention Detection
They propose to tackle tagging task with a sequence representation that combines one-hot vectors using an exponentially decaying weight (from start or end position). The representation is unique to a sequnce. The dimensionality is the vocabulary size. This way, words can be represented in the character space without needing to "learn" that embedding. They combine these predictions also with a CRF on top.
---
He et al.: Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning
They provide a new task and data set for predicting natural sounding answers (i.e. the information asked for embedded in a complete sentence) to a natural language question on a kb. They encode question and KB. The decoder can flip between generating a new word from the whole vocabulary, copying a word from the question, and retrieving an entity from the KB.
---
Liang et al.: Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
They propose a method to automatically translate questions into LISP programs that can retrieve and process information from a data base. They distinguish between a "programmer" and the "computer". The programmer is a generative decoder, that writes lisp programs (it is constrained to valid programs, and has special mechanisms to hold and refer to variables). The computer executes those programs, and gives feedback/reward (i.e. is the answer correct). Reinforcement learning is used, but because of the large and sparse search space pre-training and other tricks have to be applied.
---
Hao et al.: An End-to-End Model for Question Answering over Knowledge Base with
Cross-Attention Combining Global Knowledge
They propose a model for the Webquestions task, where they aditionally include information from Freebase. Their model uses cross-attention between the query and source sentence, the KB embeddings are used during attention.
---
Choi: Coarse-to-Fine Question Answering for Long Documents
They propose a model for sentence selection on a data set that contains web questions, texts, and where the answer must be extracted from the text. The sentence selection model predicts a score on the BOW representations (query+sentence). From the selected sentence, the answer is extracted. Since the success rate does not only depend on whether the sentence contains the answer, but also on whether it is easy to be extracted, scoring is only possible after a more complicated extraction model. To combine these two steps, reinforcement learning is used.
---
Zhou & Neubig: Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction
They propose a variational encoder-decoder, whith the following modifications to the standard setting: (1) Not the same input is encoded and decoded, but a source is encoded and decoded to a target. (2) Additionally to the input, there is also a modification to the input encoded in a additional random variable z. This is categorgical and encodes properies of the target. Since they have discrete random variables (in addition to the continuous ones), the model stays only differentiable using a softmax approximation to the Gumbel-trick.
|
|
3093582309334330923633091169