menu
beroth
98 papers challenge Star this Commitment
Week 2 of 2

beroth commits to:
Read and summarize 7 scientific papers per day.
To track progress, my own (abstract length) summaries of the papers are posted as comments on my stickk goal on the day I read the paper.
2
0
No more reports due
Details
My Commitment Journal
beroth
beroth
October 11, 2017, 7:26 AM
Again 7x7 papers!! yay!!! (Note that the papers from October 4 count towards this week)
beroth
beroth
October 10, 2017, 9:46 PM
Rei: Semi-supervised Multitask Learning for Sequence Labelling

They propose a model based on bi-directional LSTMs where, additionally to the task objective (tagging), the forward lstm has to predict the next token, and the backward lstm has to predict the previous token. They achieve consistent improvements over a large range of tasks.
---
Smith and Topin: Deep CNN Design Patterns

They give an overview over typical design choices for (vision-)cnns, such as that the first layer increases the number of filters by a large factor. They also propose two innovative design choices: (1) freeze-drop training, where parts of a symmetrical model are either de-activated or not trained, and (2) tailor expansion networks where coefficients of a polynomial are learned.
---
Ye et al.: Jointly extracting relations with class ties via effective deep ranking

They combine ideas from CNN-based relation classification with attention over instances (~multi instance learning) and a ranking-based objective. For the ranking based objective a fact score of a positive fact must be higher than that of a negative one (plus a margin). Additionally, the attention mechanism can be calculated using any positive relation for the instance tuple, not just the one that is currently predicted.
---
Iyyer et al.: Search-based neural structured learning for sequential question answering

They reformulate Wiki-Table based QA, such that a complex question is split into a set of smaller questions. A semantc parser that generates SQL code (over tables) is learned using reinforcement learning. Partial feedback is given for partially correct results (using Jaccard coefficient).
---
Chen et al.: Reading Wikipedia to answer open domain questions

A combined pipeline of IR and DeepQA is built, and applied to Wikipedia. It is evaluated on TrecQA data. Since Trec data is not annotated data for provenience, a distantly supervised data set is constructed, where paragraphs need to contain the answer and have token overlap with the query. The Prediction module uses LSTM encoder over words and other features (such as query overlap indicators); prediction is done with a pointer model.
---
Peng et al.: Cross-sentence n-ary relation extraction with graph lstms

They work on a novel, distantly supervised, corpus which contains bio-medical n-ary relations. They use an bidirectional lstm, which contains, additionally to linear chain connections, connections from a parse tree. For each direction, only edges in that direction are considered; different edge-labels have different lstm parameters.
---
Mishra et al.: Domain-targeted, high precision knowledge extraction

They propose a large-scale openIE data set for helping with grade-school science tests. They create their tuples using standard techniques such as head-word selection. They have 15% of he tuples annotated, and train a classifier to filter the rest (count-based features). They achieve generalization using hypernym metrics derived from kb's (WordNet) and counts, and optimizing via ILP, enforcing the constraint that each predicate is either a generalizing or generalized concept.
beroth
beroth
October 9, 2017, 9:08 PM
Abend and Rappoport: The State of the Art in Semantic Representation

Several semantic representation frameworks are compared: Framenet, Verbnet, UCCA, CCG, HPSG etc. The main differences lie in how much they follow a syntactic analysis or deviate from it in order to represent similar meaning in a similar way. This difference is e.g. important for machine translation. Another difference is in how much the formalisms are lexicalized (e.g. does each verb have its own roles) or universal (e.g. role inventory shared among all verbs).
---
Du et al.: Learning to Ask: Neural Question Generation for Reading Comprehension

They learn a se2seq model from SQUAD to generate questions. Their model uses attention, and encodes part of the paragraph together with the sentence for which the question should be generated. Comparison to and outperformance of rule-based systems, as measured by automatic and human evaluation.
---
Yang & Mitchell: Leveraging Knowledge Bases for Improving Machine Reading

They employ type information from Freebase and WordNet to improve lstm-based tagging tasks. For each word they search for contexts (entities/types/synsets) in the KBs, for which embeddings were learned beforehand. The averaged, transformed and weighted embeddings are added to the hidden state.
---
Zhang et al.: Prior knowledge integration for neural machine translation using posterior regularization

They extend the typical likelihood term that macimizes p(y|x;theta) by a second term (posterior regularization) that makes the model prediction p(y|x;theta) similar to a second function p(y|x;gamma). This second function is simultaneously learned, however it is constrined to a log-linear function of predefined features. Since the log-linear function need to calculate the KL divergence over an exponential combinatorial space and has an expensive normalization term, it is approximated by sampling.
---
Yu et al: Improved neural relation detection for kb question answering

They provide a model that scores KB relations given a natural language question. High scored relations are then combined to yield a database query on the KB. The scoring between NL question and relations is achieved by appending the relation terms to the question and process it with lstms (+pooling).
---
Cui et al.: Attention over attention networks for reading comprehension

The task is cloze style question answering where a term from a document needs to fill a placeholder in a query. They extend a bidirectional attention model, where they first normalize the co-attention matrix for the document and query, and then average the query to get a vector. They then use the dot product between the attention matrix and this vector to obtain scores for each term in the document; the maximum indicates the answer.
---
Katiya & Cardie: Going on a limb: Joint extraction of entity mentions an relations without dependency trees

They propose a model for joint ne tagging and relation extraction on ace data. The model is based on LSTMs, and has a tagging objective. Additionally it has a relation objective, which computes for each preceding position a relational score. The maximum of this score indicates which token stands in a relation (as in pointer networks). Relation types are also predicted.
beroth
beroth
October 8, 2017, 6:11 PM
Rehbein & Ruppenhofer: Detecting annotation noise in automatically labeleld data

They apply an algorithm that was applied for singleing out bad annotations by humans to combine the output of different taggers, and to identify errors made by them. Underlying there is a variational approximation of the true probability p(y|x) as estimated from the tagger outputs. If this probability has a high Entropy (i.e. classifiers can't decide between labels) the instance is selected for annotation. The variational estimation uses a prior, and this prior is updated with those annotations.
---
Xu et al.: A Local Detection Approach for Named Entity Recognition and Mention Detection

They propose to tackle tagging task with a sequence representation that combines one-hot vectors using an exponentially decaying weight (from start or end position). The representation is unique to a sequnce. The dimensionality is the vocabulary size. This way, words can be represented in the character space without needing to "learn" that embedding. They combine these predictions also with a CRF on top.
---
He et al.: Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning

They provide a new task and data set for predicting natural sounding answers (i.e. the information asked for embedded in a complete sentence) to a natural language question on a kb. They encode question and KB. The decoder can flip between generating a new word from the whole vocabulary, copying a word from the question, and retrieving an entity from the KB.
---
Liang et al.: Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

They propose a method to automatically translate questions into LISP programs that can retrieve and process information from a data base. They distinguish between a "programmer" and the "computer". The programmer is a generative decoder, that writes lisp programs (it is constrained to valid programs, and has special mechanisms to hold and refer to variables). The computer executes those programs, and gives feedback/reward (i.e. is the answer correct). Reinforcement learning is used, but because of the large and sparse search space pre-training and other tricks have to be applied.
---
Hao et al.: An End-to-End Model for Question Answering over Knowledge Base with
Cross-Attention Combining Global Knowledge

They propose a model for the Webquestions task, where they aditionally include information from Freebase. Their model uses cross-attention between the query and source sentence, the KB embeddings are used during attention.
---
Choi: Coarse-to-Fine Question Answering for Long Documents

They propose a model for sentence selection on a data set that contains web questions, texts, and where the answer must be extracted from the text. The sentence selection model predicts a score on the BOW representations (query+sentence). From the selected sentence, the answer is extracted. Since the success rate does not only depend on whether the sentence contains the answer, but also on whether it is easy to be extracted, scoring is only possible after a more complicated extraction model. To combine these two steps, reinforcement learning is used.
---
Zhou & Neubig: Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction

They propose a variational encoder-decoder, whith the following modifications to the standard setting: (1) Not the same input is encoded and decoded, but a source is encoded and decoded to a target. (2) Additionally to the input, there is also a modification to the input encoded in a additional random variable z. This is categorgical and encodes properies of the target. Since they have discrete random variables (in addition to the continuous ones), the model stays only differentiable using a softmax approximation to the Gumbel-trick.
    This Commitment has no photos.
Displaying 1-2 of 2 results.
October 4 to October 11
Successful (referee feedback expired)
Success
No report submitted
September 27 to October 4
Successful
Success
Success

asia-biega
asia-biega
- Referee approval report
Congrats! :)
beroth
beroth
- Committed user success report
Read and summarized 7 papers for each of the last 7 days. Success! ;)
Recipient of Stakes
Money to a friend ($20.00 to asia-biega per failed reporting period)
To change the Recipient of Stakes for your 98 papers challenge Commitment, enter their email address or stickK username below.
Total at stake: $40.00
Stakes per period: $20.00
Remaining Stakes: $0.00
Total Money Lost: $0.00
Referee
Supporters
This Commitment doesn't have any Supporters yet!
.
Your feedback has been sent. Thank you!