<?xml version="1.0"?>
<article>
  <header>
    <firstpageheader>
      <page local="1" global="46"/>
      <title>Task-oriented Evaluation of Syntactic Parsers and Their Representations</title>
      <pubinfo>Proceedings ofACL-08: HLT,pages 46-54, Columbus, Ohio, USA, June 2008. ©2008 Association for Computational Linguistics</pubinfo>
      <author surname="Miyao" givenname="Yusuke">
        <org name="University of Tokyo" country="Japan" city="Tokyo"/>
      </author>
      <author surname="S&#xE6;tre" givenname="Rune">
        <org name="University of Tokyo" country="Japan" city="Tokyo"/>
      </author>
      <author surname="Sagae" givenname="Kenji">
        <org name="University of Tokyo" country="Japan" city="Tokyo"/>
      </author>
      <author surname="Matsuzaki" givenname="Takuya">
        <org name="University of Tokyo" country="Japan" city="Tokyo"/>
      </author>
      <author surname="Tsujii" givenname="Jun'ichi">
        <org name="University of Tokyo" country="Japan" city="Tokyo"/>
      </author>
    </firstpageheader>
    <frontmatter>
      <p>
        <b>Task-oriented Evaluation of Syntactic Parsers and Their Representations</b>
      </p>
      <p><b>Yusuke Miyao<footnote anchor="1"/>"   Rune Saetre<footnote anchor="1"/>"   Kenji Sagae<footnote anchor="1"/>"   Takuya Matsuzaki<footnote anchor="1"/>"   Jun'ichi Tsujii<footnote anchor="1"/>"** </b>^Department of Computer Science, University of Tokyo, Japan * School of Computer Science, University of Manchester, UK *National Center for Text Mining, UK</p>
      <p>{yusuke,rune.saetre,sagae,matuzaki,tsujii}@is.s.u-tokyo.ac.jp</p>
    </frontmatter>
    <abstract>This paper presents a comparative evalua­tion of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when it is used as a component of an information ex­traction system that performs protein-protein interaction (PPI) identification in biomedical papers. We evaluate eight parsers (based on dependency parsing, phrase structure parsing, or deep parsing) using five different parse rep­resentations. We run a PPI system with several combinations of parser and parse representa­tion, and examine their impact on PPI identi­fication accuracy. Our experiments show that the levels of accuracy obtained with these dif­ferent parsers are similar, but that accuracy improvements vary when the parsers are re­trained with domain-specific data. </abstract>
  </header>
  <body>
    <section number="1" title="Introduction">
      <p>Parsing technologies have improved considerably in the past few years, and high-performance syntactic parsers are no longer limited to PCFG-based frame­works (Charniak, 2000; Klein and Manning, 2003; Charniak and Johnson, 2005; Petrov and Klein, 2007), but also include dependency parsers (Mc­Donald and Pereira, 2006; Nivre and Nilsson, 2005; Sagae and Tsujii, 2007) and deep parsers (Kaplan et al., 2004; Clark and Curran, 2004; Miyao and Tsujii, 2008). However, efforts to perform extensive comparisons of syntactic parsers based on different frameworks have been limited. The most popular method for parser comparison involves the direct measurement of the parser output accuracy in terms of metrics such as bracketing precision and recall, or dependency accuracy. This assumes the existence of a gold-standard test corpus, such as the Penn Tree-bank (Marcus et al., 1994). It is difficult to apply this method to compare parsers based on different frameworks, because parse representations are often framework-specific and differ from parser to parser (Ringger et al., 2004). The lack of such comparisons is a serious obstacle for NLP researchers in choosing an appropriate parser for their purposes.</p>
      <p>In this paper, we present a comparative evalua­tion of syntactic parsers and their output represen­tations based on different frameworks: dependency parsing, phrase structure parsing, and deep pars­ing. Our approach to parser evaluation is to mea­sure accuracy improvement in the task of identify­ing protein-protein interaction (PPI) information in biomedical papers, by incorporating the output of different parsers as statistical features in a machine learning classifier (Yakushiji et al., 2005; Katrenko and Adriaans, 2006; Erkan et al., 2007; Saetre et al., 2007). PPI identification is a reasonable task for parser evaluation, because it is a typical information extraction (IE) application, and because recent stud­ies have shown the effectiveness ofsyntactic parsing in this task. Since our evaluation method is applica­ble to any parser output, and is grounded in a real application, it allows for a fair comparison of syn­tactic parsers based on different frameworks.</p>
      <p>Parser evaluation in PPI extraction also illu­minates domain portability. Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank, and high accuracy has been reported for WSJ text; how­ever, these parsers rely on lexical information to at­tain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;<page local="2" global="47"/></p>
      <p>Klein and Manning, 2003). Another issue for dis­cussion is the portability of training methods. When training data in the target domain is available, as is the case with the GENIA Treebank (Kim et al., 2003) for biomedical papers, a parser can be re­trained to adapt to the target domain, and larger ac­curacy improvements are expected, if the training method is sufficiently general. We will examine these two aspects of domain portability by compar­ing the original parsers with the retrained parsers.</p>
    </section>
    <section number="2" title="Syntactic Parsers and Their Representations">
      <p>This paper focuses on eight representative parsers that are classified into three parsing frameworks: <i>dependency parsing, phrase structure parsing, </i>and <i>deep parsing. </i>In general, our evaluation methodol­ogy can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be re-trained with GENIA, thus allowing for us to investigate the effect of domain adaptation.</p>
      <subsection number="2.1" title="Dependency parsing">
        <p>Because the shared tasks of CoNLL-2006 and CoNLL-2007 focused on data-driven dependency parsing, it has recently been extensively studied in parsing research. The aim of dependency pars­ing is to compute a tree structure of a sentence where nodes are words, and edges represent the re­lations among words. Figure 1 shows a dependency tree for the sentence "IL-8 recognizes and activates CXCR1." An advantage of dependency parsing is that dependency trees are a reasonable approxima­tion of the semantics of sentences, and are readily usable in NLP applications. Furthermore, the effi­ciency of popular approaches to dependency pars­ing compare favorable with those of phrase struc­ture parsing or deep parsing. While a number ofap-proaches have been proposed for dependency pars­ing, this paper focuses on two typical methods.</p>
        <p><b>mst </b>McDonald and Pereira (2006)'s dependency parser,<footnote anchor="1"/> based on the Eisner algorithm for projective dependency parsing (Eisner, 1996) with the second-order factorization.</p>
        <footnote label="1">http://sourceforge.net/projects/mstparser</footnote>
        <doubt alpha="100.0" length="3" tooSmall="False" monospace="0.0">OBJ</doubt>
        <p><b>root  </b><b>IL-8 recognizes and activates CXCR1</b><b>IL-8 recognizes and activates CXCR1</b><b>ksdep </b>Sagae and Tsujii (2007)'s dependency parser,<footnote anchor="2"/> based on a probabilistic shift-reduce al­gorithm extended by the pseudo-projective parsing technique (Nivre and Nilsson, 2005).</p>
        <figure caption="Figure 1: CoNLL-X dependency tree"/>
        <figure caption="Figure 2: Penn Treebank-style phrase structure tree"/>
      </subsection>
      <subsection number="2.2" title="Phrase structure parsing">
        <p>Owing largely to the Penn Treebank, the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing. These parsers output Penn Treebank-style phrase structure trees, although function tags and empty categories are stripped off (Figure 2). While most of the state-of-the-art parsers are based on probabilistic CFGs, the parameteriza­tion of the probabilistic model of each parser varies. In this work, we chose the following four parsers.</p>
        <p><b>no-rerank </b>Charniak (2000)'s parser, based on a lexicalized PCFG model of phrase structure trees.<footnote anchor="3"/>The probabilities of CFG rules are parameterized on carefully hand-tuned extensive information such as lexical heads and symbols of ancestor/sibling nodes.</p>
        <p><b>rerank </b>Charniak and Johnson (2005)'s rerank-ing parser. The reranker of this parser receives n-best<footnote anchor="4"/> parse results from no-rerank, and selects the most likely result by using a maximum entropy model with manually engineered features.</p>
        <p><b>berkeley </b>Berkeley's parser (Petrov and Klein, 2007).<footnote anchor="5"/> The parameterization of this parser is op<b>root  </b><b>IL-8 recognizes and activates CXCR1</b><page local="3" global="48"/></p>
        <footnote label="2">http://www.cs.cmu.edu/~sagae/parser/  3 http://bllip.cs.brown.edu/resources.shtml  4 We set n = 50 in this paper.</footnote>
        <footnote label="5">http://nlp.cs.berkeley.edu/Main.html#Parsing</footnote>
        <doubt alpha="100.0" length="4" tooSmall="False" monospace="0.0">ROOT</doubt>
        <doubt alpha="75.0" length="4" tooSmall="False" monospace="0.0">ARG2</doubt>
        <doubt alpha="45.0" length="20" tooSmall="False" monospace="0.0">ARG_L\ ^_&amp;RG1^_^ARG2</doubt>
        <doubt alpha="100.0" length="4" tooSmall="False" monospace="0.0">ARGl</doubt>
        <figure caption="Figure 3: Predicate argument structure"/>
        <p>This study demonstrates that <b>IL-8 </b>recognizes and activates <b>CXCR1, CXCR2, </b>and the <b>Duffy antigen </b>by distinct mechanisms.</p>
        <p>The molar ratio of serum <b>retinol-binding protein</b><b>(RBP) </b>to <b>transthyretin (TTR) </b>is not useful to as­sess vitamin A status during infection in hospi­talised children.</p>
        <p>timized automatically by assigning latent variables to each nonterminal node and estimating the param­eters of the latent variables by the EM algorithm (Matsuzaki et al., 2005).</p>
        <p><b>STANFORD </b>Stanford's unlexicalized parser (Klein and Manning, 2003).<footnote anchor="6"/> Unlike NO-RERANK, proba­bilities are not parameterized on lexical heads.</p>
      </subsection>
      <subsection number="2.3" title="Deep parsing">
        <p>Recent research developments have allowed for ef­ficient and robust deep parsing of real-world texts (Kaplan et al., 2004; Clark and Curran, 2004; Miyao and Tsujii, 2008). While deep parsers compute theory-specific syntactic/semantic structures, pred­icate argument structures (PAS) are often used in parser evaluation and applications. PAS is a graph structure that represents syntactic/semantic relations among words (Figure 3). The concept is therefore similar to CoNLL dependencies, though PAS ex­presses deeper relations, and may include reentrant structures. In this work, we chose the two versions of the Enju parser (Miyao and Tsujii, 2008).</p>
        <p><b>enju </b>The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank, and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.<footnote anchor="7"/><b>enju-genia </b>The HPSG parser adapted to biomedical texts, by the method of Hara et al. (2007). Because this parser is trained with both WSJ and GENIA, we compare it parsers that are retrained with GENIA (see section 3.3).</p>
      </subsection>
    </section>
    <section number="3" title="Evaluation Methodology">
      <p>In our approach to parser evaluation, we measure the accuracy of a PPI extraction system, in which the parser output is embedded as statistical features of a machine learning classifier. We run a classi­fier with features of every possible combination of a parser and a parse representation, by applying con­versions between representations when necessary. We also measure the accuracy improvements ob­tained by parser retraining with GENIA, to examine the domain portability, and to evaluate the effective­ness of domain adaptation.</p>
      <footnote label="6">http://nlp.stanford.edu/software/lex-parser . shtml</footnote>
      <footnote label="7">http://www-tsujii.is.s.u-tokyo.ac.jp/enju/</footnote>
      <figure caption="Figure 4: Sentences including protein names"/>
      <doubt alpha="85.7" length="7" tooSmall="True" monospace="0.0">cDj ORT</doubt>
      <doubt alpha="65.2" length="46" tooSmall="False" monospace="0.0">ENTITYl(IL-8)      recognizesi0—ENTITY2(CXCR1)</doubt>
      <figure caption="Figure 5: Dependency path"/>
      <subsection number="3.1" title="PPI extraction">
        <p>PPI extraction is an NLP task to identify protein pairs that are mentioned as interacting in biomedical papers. Because the number of biomedical papers is growing rapidly, it is impossible for biomedical re­searchers to read all papers relevant to their research; thus, there is an emerging need for reliable IE tech­nologies, such as PPI identification.</p>
        <p>Figure 4 shows two sentences that include pro­tein names: the former sentence mentions a protein interaction, while the latter does not. Given a pro­tein pair, PPI extraction is a task of binary classi­fication; for example, (IL-8, CXCR1) is a positive example, and (RBP, TTR) is a negative example. Recent studies on PPI extraction demonstrated that dependency relations between target proteins are ef­fective features for machine learning classifiers (Ka-trenko and Adriaans, 2006; Erkan et al., 2007; Saetre et al., 2007). For the protein pair <b>IL-8 </b>and <b>CXCR1 </b>in Figure 4, a dependency parser outputs a depen­dency tree shown in Figure 1. From this dependency tree, we can extract a dependency path shown in Fig­ure 5, which appears to be a strong clue in knowing that these proteins are mentioned as interacting.</p>
        <page local="4" global="49"/>
        <p>
          <b>root  </b>
          <b>IL-8 recognizes and activates CXCR1</b>
        </p>
        <doubt alpha="100.0" length="4" tooSmall="False" monospace="0.0">ROOT</doubt>
        <p>(dep_path  (SBJ  (ENTITY1 recognizes))</p>
        <doubt alpha="66.7" length="30" tooSmall="False" monospace="0.0">(rOBJ   (recognizes ENTITY2)))</doubt>
        <figure caption="Figure 6: Tree representationofadependencypath"/>
        <p>We follow the PPI extraction method of Saetre et al. (2007), which is based on SVMs with SubSet Tree Kernels (Collins and Duffy, 2002; Moschitti, 2006), while using different parsers and parse rep­resentations. Two types of features are incorporated in the classifier. The first is bag-of-words features, which are regarded as a strong baseline for IE sys­tems. Lemmas of words before, between and after the pair of target proteins are included, and the linear kernel is used for these features. These features are commonly included in all of the models. Filtering by a stop-word list is not applied because this setting made the scores higher than Saetre et al. (2007)'s set­ting. The other type of feature is syntactic features. For dependency-based parse representations, a de­pendency path is encoded as a flat tree as depicted in Figure 6 (prefix "r" denotes reverse relations). Be­cause a tree kernel measures the similarity of trees by counting common subtrees, it is expected that the system finds effective subsequences of dependency paths. For the PTB representation, we directly en­code phrase structure trees.</p>
      </subsection>
      <subsection number="3.2" title="Conversion of parse representations">
        <p>It is widely believed that the choice of representa­tion format for parser output may greatly affect the performance of applications, although this has not been extensively investigated. We should therefore evaluate the parser performance in multiple parse representations. In this paper, we create multiple parse representations by converting each parser's de­fault output into other representations when possi­ble. This experiment can also be considered to be a comparative evaluation of parse representations, thus providing an indication for selecting an appro­priate parse representation for similar IE tasks.</p>
        <p>Figure 7 shows our scheme for representation conversion. This paper focuses on five representa­tions as described below.</p>
        <p><b>CoNLL </b>The dependency tree format used in the 2006 and 2007 CoNLL shared tasks on dependency parsing. This is a representation format supported by several data-driven dependency parsers. This repre-</p>
        <p>
          <b>mst ksdep rerank no-rerank berkeley stanford enju enju-genia</b>
        </p>
        <figure caption="Figure 7: Conversion ofparse representations"/>
        <doubt alpha="100.0" length="2" tooSmall="False" monospace="0.0">NP</doubt>
        <p>Figure : Head dependencies sentation is also obtained from Penn Treebank-style trees by applying constituent-to-dependency conver­sion (Johansson and Nugues, 2007). It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the con­version program relies on function tags and empty categories of the original Penn Treebank.</p>
        <p><b>PTB </b>Penn Treebank-style phrase structure trees without function tags and empty nodes. This is the default output format for phrase structure parsers. We also create this representation by converting ENJU's output by tree structure matching, although this conversion is not perfect because forms of PTB and ENJU's output are not necessarily compatible.</p>
        <p><b>HD </b>Dependency trees of syntactic heads (Fig­ure ). This representation is obtained by convert­ing PTB trees. We first determine lexical heads of nonterminal nodes by using Bikel's implementation of Collins' head detection algorithm<footnote anchor="9"/> (Bikel, 2004; Collins, 1997). We then convert lexicalized trees into dependencies between lexical heads.</p>
        <p><b>SD </b>The Stanford dependency format (Figure 9). This format was originally proposed for extracting dependency relations useful for practical applica­tions (de Marneffe et al., 2006). A program to con­vert PTB is attached to the Stanford parser. Although the concept looks similar to CoNLL, this representa-</p>
        <p>http://nlp.cs.lth.se/pennconverter/ <footnote anchor="9"/>http://www.cis.upenn.edu/~dbikel/software.</p>
        <doubt alpha="100.0" length="4" tooSmall="False" monospace="0.0">html</doubt>
        <page local="5" global="50"/>
        <p>
          <b>IL-8 recognizes and activates CXCR1</b>
        </p>
        <p>
          <b>nsubi.</b>
        </p>
        <figure caption="Figure 9: Stanford dependencies"/>
        <p>tion does not necessarily form a tree structure, and is designed to express more fine-grained relations such as apposition. Research groups for biomedical NLP recently adopted this representation for corpus anno­tation (Pyysalo et al., 2007a) and parser evaluation (Clegg and Shepherd, 2007; Pyysalo et al., 2007b).</p>
        <p><b>PAS </b>Predicate-argument structures. This is the de­fault output format for ENJU and ENJU-GENIA.</p>
        <p>Although only CoNLL is available for depen­dency parsers, we can create four representations for the phrase structure parsers, and five for the deep parsers. Dotted arrows in Figure 7 indicate imper­fect conversion, in which the conversion inherently introduces errors, and may decrease the accuracy. We should therefore take caution when comparing the results obtained by imperfect conversion. We also measure the accuracy obtained by the ensem­ble of two parsers/representations. This experiment indicates the differences and overlaps ofinformation conveyed by a parser or a parse representation.</p>
      </subsection>
      <subsection number="3.3" title="Domain portability and parser retraining">
        <p>Since the domain of our target text is different from WSJ, our experiments also highlight the domain portability of parsers. We run two versions of each parser in order to investigate the two types ofdomain portability. First, we run the original parsers trained with WSJ<footnote anchor="10"/> (39832 sentences). The results in this setting indicate the domain portability ofthe original parsers. Next, we run parsers re-trained with GE-NIA<footnote anchor="11"/> (8127 sentences), which is a Penn Treebank-style treebank of biomedical paper abstracts. Accu­racy improvements in this setting indicate the pos­sibility of domain adaptation, and the portability of the training methods ofthe parsers. Since the parsers listed in Section 2 have programs for the training with a Penn Treebank-style treebank, we use those programs as-is. Default parameter settings are used for this parser re-training.</p>
        <footnote label="10">Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank.</footnote>
        <footnote label="11">The domains of GENIA and AImed are not exactly the same, because they are collected independently.</footnote>
        <p>In preliminary experiments, we found that de­pendency parsers attain higher dependency accuracy when trained only with GENIA. We therefore only input GENIA as the training data for the retraining of dependency parsers. For the other parsers, we in­put the concatenation of WSJ and GENIA for the retraining, while the reranker of RERANK was not re­trained due to its cost. Since the parsers other than NO-RERANK and RERANK require an external POS tagger, a WSJ-trained POS tagger is used with WSJ-trained parsers, and geniatagger (Tsuruoka et al., 2005) is used with GENIA-retrained parsers.</p>
      </subsection>
    </section>
    <section number="4" title="Experiments">
      <subsection number="4.1" title="Experiment settings">
        <p>In the following experiments, we used AImed (Bunescu and Mooney, 2004), which is a popular corpus for the evaluation of PPI extraction systems. The corpus consists of 225 biomedical paper ab­stracts (1970 sentences), which are sentence-split, tokenized, and annotated with proteins and PPIs. We use gold protein annotations given in the cor­pus. Multi-word protein names are concatenated and treated as single words. The accuracy is mea­sured by abstract-wise 10-fold cross validation and the one-answer-per-occurrence criterion (Giuliano adjust the balance of precision and recall, and the maximum f-scores are reported for each setting.</p>
        <doubt alpha="63.8" length="47" tooSmall="False" monospace="0.0">et al., 2006). A threshold for SVMs is moved to</doubt>
      </subsection>
      <subsection number="4.2" title="Comparison of accuracy improvements">
        <p>Tables 1 and 2 show the accuracy obtained by using the output of each parser in each parse representa­tion. The row "baseline" indicates the accuracy ob­tained with bag-of-words features. Table 3 shows the time for parsing the entire AImed corpus, and Table 4 shows the time required for 10-fold cross validation with GENIA-retrained parsers.</p>
        <p>When using the original WSJ-trained parsers (Ta­ble 1), all parsers achieved almost the same level of accuracy — a significantly better result than the baseline. To the extent of our knowledge, this is the first result that proves that dependency parsing, phrase structure parsing, and deep parsing perform equally well in a real application.<page local="6" global="51"/> Among these parsers, RERANK performed slightly better than the other parsers, although the difference in the f-score is small, while it requires much higher parsing cost.</p>
        <p>When the parsers are retrained with GENIA (Ta­ble 2), the accuracy increases significantly, demon­strating that the WSJ-trained parsers are not suffi­ciently domain-independent, and that domain adap­tation is effective. It is an important observation that the improvements by domain adaptation are larger than the differences among the parsers in the pre­vious experiment. Nevertheless, not all parsers had their performance improved upon retraining. Parser retraining yielded only slight improvements for RERANK, BERKELEY, and STANFORD, while larger improvements were observed for MST, KSDEP, NO-RERANK, and ENJU. Such results indicate the dif­ferences in the portability of training methods. A large improvement from ENJU to ENJU-GENIA shows the effectiveness of the specifically designed do­main adaptation method, suggesting that the other parsers might also benefit from more sophisticated approaches for domain adaptation.</p>
        <p>While the accuracy level of PPI extraction is the similar for the different parsers, parsing speed differs significantly.<page local="7" global="52"/> The dependency parsers are much faster than the other parsers, while the phrase structure parsers are relatively slower, and the deep parsers are in between. It is noteworthy that the dependency parsers achieved comparable accuracy with the other parsers, while they are more efficient.</p>
        <table caption="Table 1: Accuracy on the PPI task with WSJ-trained parsers (precision/recall/f-score)" class="main" frame="box" rules="all" border="1" regular="False">
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>PTB</p>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p>PAS</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>baseline</p>
            </td>
            <td class="cell">
              <p>48.2/54.9/51.1</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>MST</p>
            </td>
            <td class="cell">
              <p>53.2/56.5/54.6</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>KSDEP</p>
            </td>
            <td class="cell">
              <p>49.3/63.0/55.2</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>NO-RERANK</p>
            </td>
            <td class="cell">
              <p>50.7/60.9/55.2</p>
            </td>
            <td class="cell">
              <p>45.9/60.5/52.0</p>
            </td>
            <td class="cell">
              <p>50.6/60.9/55.1</p>
            </td>
            <td class="cell">
              <p>49.9/58.2/53.5</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p>53.6/59.2/56.1</p>
            </td>
            <td class="cell">
              <p>47.0/58.9/52.1</p>
            </td>
            <td class="cell">
              <p>48.1/65.8/55.4</p>
            </td>
            <td class="cell">
              <p>50.7/62.7/55.9</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>BERKELEY</p>
            </td>
            <td class="cell">
              <p>45.8/67.6/54.5</p>
            </td>
            <td class="cell">
              <p>50.5/57.6/53.7</p>
            </td>
            <td class="cell">
              <p>52.3/58.8/55.1</p>
            </td>
            <td class="cell">
              <p>48.7/62.4/54.5</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>STANFORD</p>
            </td>
            <td class="cell">
              <p>50.4/60.6/54.9</p>
            </td>
            <td class="cell">
              <p>50.9/56.1/53.0</p>
            </td>
            <td class="cell">
              <p>50.7/60.7/55.1</p>
            </td>
            <td class="cell">
              <p>51.8/58.1/54.5</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p>52.6/58.0/55.0</p>
            </td>
            <td class="cell">
              <p>48.7/58.8/53.1</p>
            </td>
            <td class="cell">
              <p>57.2/51.9/54.2</p>
            </td>
            <td class="cell">
              <p>52.2/58.1/54.8</p>
            </td>
            <td class="cell">
              <p>48.9/64.1/55.3</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
        </table>
        <table caption="Table 2: Accuracy on the PPI task with GENIA-retrained parsers (precision/recall/f-score)" class="main" frame="box" rules="all" border="1" regular="False">
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>PTB</p>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p>PAS</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>baseline</p>
            </td>
            <td class="cell">
              <p>48.2/54.9/51.1</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>MST</p>
            </td>
            <td class="cell">
              <p>49.1/65.6/55.9</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>KSDEP</p>
            </td>
            <td class="cell">
              <p>51.6/67.5/58.3</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>NO-RERANK</p>
            </td>
            <td class="cell">
              <p>53.9/60.3/56.8</p>
            </td>
            <td class="cell">
              <p>51.3/54.9/52.8</p>
            </td>
            <td class="cell">
              <p>53.1/60.2/56.3</p>
            </td>
            <td class="cell">
              <p>54.6/58.1/56.2</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p>52.8/61.5/56.6</p>
            </td>
            <td class="cell">
              <p>48.3/58.0/52.6</p>
            </td>
            <td class="cell">
              <p>52.1/60.3/55.7</p>
            </td>
            <td class="cell">
              <p>53.0/61.1/56.7</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>BERKELEY</p>
            </td>
            <td class="cell">
              <p>52.7/60.3/56.0</p>
            </td>
            <td class="cell">
              <p>48.0/59.9/53.1</p>
            </td>
            <td class="cell">
              <p>54.9/54.6/54.6</p>
            </td>
            <td class="cell">
              <p>50.5/63.2/55.9</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>STANFORD</p>
            </td>
            <td class="cell">
              <p>49.3/62.8/55.1</p>
            </td>
            <td class="cell">
              <p>44.5/64.7/52.5</p>
            </td>
            <td class="cell">
              <p>49.0/62.0/54.5</p>
            </td>
            <td class="cell">
              <p>54.6/57.5/55.8</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p>54.4/59.7/56.7</p>
            </td>
            <td class="cell">
              <p>48.3/60.6/53.6</p>
            </td>
            <td class="cell">
              <p>56.7/55.6/56.0</p>
            </td>
            <td class="cell">
              <p>54.4/59.3/56.6</p>
            </td>
            <td class="cell">
              <p>52.0/63.8/57.2</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU-GENIA</p>
            </td>
            <td class="cell">
              <p>56.4/57.4/56.7</p>
            </td>
            <td class="cell">
              <p>46.5/63.9/53.7</p>
            </td>
            <td class="cell">
              <p>53.4/60.2/56.4</p>
            </td>
            <td class="cell">
              <p>55.2/58.3/56.5</p>
            </td>
            <td class="cell">
              <p>57.5/59.8/58.4</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
        </table>
        <table caption="Table 3: Parsing time (sec.)" class="main" frame="box" rules="all" border="1" regular="False">
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>WSJ-trained</p>
            </td>
            <td class="cell">
              <p>GENIA-retrained</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>MST</p>
            </td>
            <td class="cell">
              <p>613</p>
            </td>
            <td class="cell">
              <p>425</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>KSDEP</p>
            </td>
            <td class="cell">
              <p>136</p>
            </td>
            <td class="cell">
              <p>111</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>NO-RERANK</p>
            </td>
            <td class="cell">
              <p>2049</p>
            </td>
            <td class="cell">
              <p>1372</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p>2806</p>
            </td>
            <td class="cell">
              <p>2125</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>BERKELEY</p>
            </td>
            <td class="cell">
              <p>1118</p>
            </td>
            <td class="cell">
              <p>1198</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>STANFORD</p>
            </td>
            <td class="cell">
              <p>1411</p>
            </td>
            <td class="cell">
              <p>1645</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p>1447</p>
            </td>
            <td class="cell">
              <p>727</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU-GENIA</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>821</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
        </table>
        <table caption="Table 4: Evaluation time (sec.)" class="main" frame="box" rules="all" border="1" regular="False">
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>PTB</p>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p>PAS</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>baseline</p>
            </td>
            <td class="cell">
              <p>424</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>MST</p>
            </td>
            <td class="cell">
              <p>809</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>KSDEP</p>
            </td>
            <td class="cell">
              <p>864</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>NO-RERANK</p>
            </td>
            <td class="cell">
              <p>851</p>
            </td>
            <td class="cell">
              <p>4772</p>
            </td>
            <td class="cell">
              <p>882</p>
            </td>
            <td class="cell">
              <p>795</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p>849</p>
            </td>
            <td class="cell">
              <p>4676</p>
            </td>
            <td class="cell">
              <p>881</p>
            </td>
            <td class="cell">
              <p>778</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>BERKELEY</p>
            </td>
            <td class="cell">
              <p>869</p>
            </td>
            <td class="cell">
              <p>4665</p>
            </td>
            <td class="cell">
              <p>895</p>
            </td>
            <td class="cell">
              <p>804</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>STANFORD</p>
            </td>
            <td class="cell">
              <p>847</p>
            </td>
            <td class="cell">
              <p>4614</p>
            </td>
            <td class="cell">
              <p>886</p>
            </td>
            <td class="cell">
              <p>799</p>
            </td>
            <td class="cell">
              <p>N/A</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p>832</p>
            </td>
            <td class="cell">
              <p>4611</p>
            </td>
            <td class="cell">
              <p>884</p>
            </td>
            <td class="cell">
              <p>789</p>
            </td>
            <td class="cell">
              <p>1005</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU-GENIA</p>
            </td>
            <td class="cell">
              <p>874</p>
            </td>
            <td class="cell">
              <p>4624</p>
            </td>
            <td class="cell">
              <p>895</p>
            </td>
            <td class="cell">
              <p>783</p>
            </td>
            <td class="cell">
              <p>1020</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
        </table>
        <p>The experimental results also demonstrate that PTB is significantly worse than the other represen­tations with respect to cost for training/testing and contributions to accuracy improvements. The con­version from PTB to dependency-based representa­tions is therefore desirable for this task, although it is possible that better results might be obtained with PTB if a different feature extraction mechanism is used. Dependency-based representations are com­petitive, while CoNLL seems superior to HD and SD in spite of the imperfect conversion from PTB to CoNLL. This might be a reason for the high per­formances of the dependency parsers that directly compute CoNLL dependencies. The results for ENJU-CoNLL and ENJU-PAS show that PAS contributes to a larger accuracy improvement, although this does not necessarily mean the superiority ofPAS, because two imperfect conversions, i.e., PAS-to-PTB and PTB-to-CoNLL, are applied for creating CoNLL.</p>
      </subsection>
      <subsection number="4.3" title="Parser ensemble results">
        <p>Table 5 shows the accuracy obtained with ensembles of two parsers/representations (except the PTB for­mat). Bracketed figures denote improvements from the accuracy with a single parser/representation. The results show that the task accuracy significantly improves by parser/representation ensemble. Inter­estingly, the accuracy improvements are observed even for ensembles of different representations from the same parser. This indicates that a single parse representation is insufficient for expressing the true potential of a parser. Effectiveness of the parser en­semble is also attested by the fact that it resulted in larger improvements. Further investigation of the sources ofthese improvements will illustrate the ad­vantages anddisadvantages ofthese parsers andrep­resentations, leading us to better parsing models and a better design for parse representations.</p>
        <doubt alpha="50.0" length="36" tooSmall="False" monospace="0.0">Bag-of-words features 48.2/54.9/51.1</doubt>
        <doubt alpha="34.2" length="38" tooSmall="False" monospace="0.0">Yakushiji et al. (2005) 33.7/33.1/33.4</doubt>
        <doubt alpha="34.2" length="38" tooSmall="False" monospace="0.0">Mitsumori et al. (2006) 54.2/42.6/47.7</doubt>
        <doubt alpha="32.4" length="37" tooSmall="False" monospace="0.0">Giuliano et al. (2006) 60.9/57.2/59.0</doubt>
        <doubt alpha="28.6" length="35" tooSmall="False" monospace="0.0">Saetre et al. (2007) 64.3/44.1/52.0</doubt>
        <doubt alpha="36.0" length="25" tooSmall="False" monospace="0.0">This paper 54.9/65.5/59.5</doubt>
        <table caption="Table 6: Comparison with previous results on PPI extrac&#xAD;tion (precision/recall/f-score)"/>
      </subsection>
      <subsection number="4.4" title="Comparison with previous results on PPI extraction">
        <p>PPI extraction experiments on AImed have been re­ported repeatedly, although the figures cannot be compared directly because ofthe differences in data preprocessing and the number of target protein pairs (Saetre et al., 2007). Table 6 compares our best re­sult with previously reported accuracy figures. Giunot rely on syntactic parsing, while the former ap­plied SVMs with kernels on surface strings and the latter is similar to our baseline method. Bunescu and Mooney (2005) applied SVMs with subsequence kernels to the same task, although they provided only a precision-recall graph, and its f-score is around 50. Since we did not run experiments on protein-pair-wise cross validation, our system can­not be compared directly to the results reported by Erkan et al. (2007) and Katrenko and Adriaans (2006), while Saetre et al. (2007) presented better re­sults than theirs in the same evaluation criterion.<page local="8" global="53"/></p>
        <doubt alpha="54.0" length="50" tooSmall="False" monospace="0.0">liano et al. (2006) and Mitsumori et al. (2006) do</doubt>
        <table caption="Table 5: Results of parser/representation ensemble (f-score)" class="main" frame="box" rules="all" border="1" regular="False">
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p>PAS</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>KSDEP</p>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p>58.5 (+0.2)</p>
            </td>
            <td class="cell">
              <p>57.1 (-1.2)</p>
            </td>
            <td class="cell">
              <p>58.4 (+0.1)</p>
            </td>
            <td class="cell">
              <p>58.5 (+0.2)</p>
            </td>
            <td class="cell">
              <p>58.0 (-0.3)</p>
            </td>
            <td class="cell">
              <p>59.1 (+0.8)</p>
            </td>
            <td class="cell">
              <p>59.0 (+0.7)</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>RERANK</p>
            </td>
            <td class="cell">
              <p>CoNLL</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>56.7 (+0.1)</p>
            </td>
            <td class="cell">
              <p>57.1 (+0.4)</p>
            </td>
            <td class="cell">
              <p>58.3 (+1.6)</p>
            </td>
            <td class="cell">
              <p>57.3 (+0.7)</p>
            </td>
            <td class="cell">
              <p>58.7 (+2.1)</p>
            </td>
            <td class="cell">
              <p>59.5 (+2.3)</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>HD</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>56.8 (+0.1)</p>
            </td>
            <td class="cell">
              <p>57.2 (+0.5)</p>
            </td>
            <td class="cell">
              <p>56.5 (+0.5)</p>
            </td>
            <td class="cell">
              <p>56.8 (+0.2)</p>
            </td>
            <td class="cell">
              <p>57.6 (+0.4)</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>SD</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>58.3 (+1.6)</p>
            </td>
            <td class="cell">
              <p>58.3 (+1.6)</p>
            </td>
            <td class="cell">
              <p>56.9 (+0.2)</p>
            </td>
            <td class="cell">
              <p>58.6 (+1.4)</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell">
              <p>ENJU</p>
            </td>
            <td class="cell">
              <p>CoNLL</p>
              <p>HD</p>
              <p>SD</p>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p/>
            </td>
            <td class="cell">
              <p>57.0 (+0.3)</p>
            </td>
            <td class="cell">
              <p>57.2 (+0.5)</p>
              <p>57.1 (+0.5)</p>
            </td>
            <td class="cell">
              <p>58.4 (+1.2)</p>
              <p>58.1 (+0.9) 58.3 (+1.1)</p>
            </td>
            <td class="cell"/>
          </tr>
          <tr class="row">
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
            <td class="cell"/>
          </tr>
        </table>
      </subsection>
    </section>
    <section number="5" title="Related Work">
      <p>Though the evaluation of syntactic parsers has been a major concern in the parsing community, and a couple of works have recently presented the com­parison of parsers based on different frameworks, their methods were based on the comparison of the parsing accuracy in terms of a certain intermediate parse representation (Ringger et al., 2004; Kaplan et al., 2004; Briscoe and Carroll, 2006; Clark and Curran, 2007; Miyao et al., 2007; Clegg and Shep­herd, 2007; Pyysalo et al., 2007b; Pyysalo et al., 2007a; Sagae et al., 2008). Such evaluation requires gold standard data in an intermediate representation. However, it has been argued that the conversion of parsing results into an intermediate representation is difficult and far from perfect.</p>
      <p>The relationship between parsing accuracy and task accuracy has been obscure for many years. Quirk and Corston-Oliver (2006) investigated the impact of parsing accuracy on statistical MT. How­ever, this work was only concerned with a single de­pendency parser, and did not focus on parsers based on different frameworks.</p>
    </section>
    <section number="6" title="Conclusion and Future Work">
      <p>We have presented our attempts to evaluate syntac­tic parsers and their representations that are based on different frameworks; dependency parsing, phrase structure parsing, or deep parsing. The basic idea is to measure the accuracy improvements of the PPI extraction task by incorporating the parser out­put as statistical features of a machine learning classifier. Experiments showed that state-of-the-art parsers attain accuracy levels that are on par with each other, while parsing speed differs sig­nificantly. We also found that accuracy improve­ments vary when parsers are retrained with domain-specific data, indicating the importance of domain adaptation and the differences in the portability of parser training methods.</p>
      <p>Although we restricted ourselves to parsers trainable with Penn Treebank-style treebanks, our methodology can be applied to any English parsers. Candidates include RASP (Briscoe and Carroll, (Lin, 1998), and Link Parser (Sleator and Temperley, 1993; Pyysalo et al., 2006), but the domain adapta­tion of these parsers is not straightforward. It is also possible to evaluate unsupervised parsers, which is attractive since evaluation of such parsers with gold­standard data is extremely problematic.</p>
      <doubt alpha="53.2" length="47" tooSmall="False" monospace="0.0">2006), the C&amp;C parser (Clark and Curran, 2004),</doubt>
      <doubt alpha="64.4" length="45" tooSmall="False" monospace="0.0">the XLE parser (Kaplan et al., 2004), MINIPAR</doubt>
      <p>A major drawback of our methodology is that the evaluation is indirect and the results depend on a selected task and its settings. This indicates that different results might be obtained with other tasks. Hence, we cannot conclude the superiority of parsers/representations only with our results. In or­der to obtain general ideas on parser performance, experiments on other tasks are indispensable.</p>
    </section>
    <section title="Acknowledgments">
      <p>This work was partially supported by Grant-in-Aid for Specially Promoted Research (MEXT, Japan), Genome Network Project (MEXT, Japan), and Grant-in-Aid for Young Scientists (MEXT, Japan).</p>
    </section>
    <references>
      <p>D. M. Bikel. 2004. Intricacies of Collins' parsing model. <i>Computational Linguistics, </i>30(4):479-511.</p>
      <p>T. Briscoe and J. Carroll. 2006. Evaluating the accu­racy of an unlexicalized statistical parser on the PARC DepBank. In <i>COLING/ACL 2006 Poster Session.</i></p>
      <p>R. Bunescu and R. J. Mooney. 2004. Collective infor­mation extraction with relational markov networks. In <i>ACL 2004, </i>pages 439-446.</p>
      <p>R. C. Bunescu and R. J. Mooney. 2005. Subsequence kernels for relation extraction. In <i>NIPS 2005.</i></p>
      <p>E. Charniak and M. Johnson. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In <i>ACL 2005.</i><i/></p>
      <p>E. Charniak. 2000. A maximum-entropy-inspired parser. In<i>NAACL-2000, </i>pages 132-139.</p>
      <doubt alpha="64.8" length="54" tooSmall="False" monospace="0.0">S. Clark and J. R. Curran. 2004. Parsing the WSJ using</doubt>
      <p>CCG and log-linear models. In <i>42nd ACL. </i>S. Clark and J. R. Curran. 2007. Formalism-independent parser evaluation with CCG and DepBank. In <i>ACL</i><i>2007.</i><i/></p>
      <p>A. B. Clegg and A. J. Shepherd. 2007. Benchmark­ing natural-language parsers for biological applica­tions using dependency graphs. <i>BMC Bioinformatics,</i> 8:24.</p>
      <page local="9" global="54"/>
      <p>
        <i>2007.</i>
      </p>
      <p>M. Collins and N. Duffy. 2002. New ranking algorithms for parsing and tagging: Kernels over discrete struc­tures, and the voted perceptron. In <i>ACL 2002.</i></p>
      <p>M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In <i>35th ACL.</i></p>
      <p>M.-C. de Marneffe, B. MacCartney, and C. D. Man­ning. 2006. Generating typed dependency parses from phrase structure parses. In <i>LREC 2006.</i></p>
      <p>J. M. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In <i>COLING 1996.</i></p>
      <doubt alpha="54.2" length="48" tooSmall="False" monospace="0.0">G. Erkan, A. Ozgur, and D. R. Radev. 2007. Semi-</doubt>
      <p>supervised classification for extracting protein interac­tion sentences using dependency parsing. In <i>EMNLP</i></p>
      <p>D. Gildea. 2001. Corpus variation and parser perfor­mance. In <i>EMNLP 2001, </i>pages 167-202.</p>
      <p>C. Giuliano, A. Lavelli, and L. Romano. 2006. Exploit­ing shallow linguistic information for relation extrac­tion from biomedical literature. In <i>EACL 2006.</i></p>
      <p>T. Hara, Y. Miyao, and J. Tsujii. 2007. Evaluating im­pact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser. In <i>IWPT 2007.</i></p>
      <p>R. Johansson and P. Nugues. 2007. Extended constituent-to-dependency conversion for English. In <i>NODALIDA 2007.</i></p>
      <p>R. M. Kaplan, S. Riezler, T. H. King, J. T. Maxwell, and A. Vasserman. 2004. Speed and accuracy in shallow and deep stochastic parsing. In <i>HLT/NAACL '04.</i></p>
      <p>S. Katrenko and P. Adriaans. 2006. Learning relations from biomedical corpora using dependency trees. In <i>KDECB, </i>pages 61-80.</p>
      <p>J.-D. Kim, T. Ohta, Y. Teteisi, and J. Tsujii. 2003. GE-NIA corpus — a semantically annotated corpus for bio-textmining. <i>Bioinformatics, </i>19:i180-182.</p>
      <p>D. Klein and C. D. Manning. 2003. Accurate unlexical-ized parsing. In <i>ACL 2003.</i></p>
      <p>D. Lin. 1998. Dependency-based evaluation of MINI-PAR. In <i>LREC Workshop on the Evaluation of Parsing Systems.</i></p>
      <p>M. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1994. Building a large annotated corpus of En­glish: The Penn Treebank. <i>Computational Linguistics, </i>19(2):313-330.</p>
      <p>T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilis­tic CFG with latent annotations. In <i>ACL 2005.</i></p>
      <p>R. McDonald and F. Pereira. 2006. Online learning of approximate dependency parsing algorithms. In <i>EACL</i><i>2006.</i><i/></p>
      <p>T. Mitsumori, M. Murata, Y. Fukuda, K. Doi, and H. Doi. 2006. Extracting protein-protein interaction informa­tion from biomedical text with SVM. <i>IEICE - Trans. Inf. Syst., </i>E89-D(8):2464-2466.</p>
      <p>Y. Miyao and J. Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. <i>Computational Linguis­tics, </i>34(1):35-80.</p>
      <p>Y. Miyao, K. Sagae, and J. Tsujii. 2007. Towards framework-independent evaluation of deep linguistic parsers. In <i>Grammar Engineering across Frameworks</i><i>2007, </i>pages 238-258.</p>
      <p>A. Moschitti. 2006. Making tree kernels practical for natural language processing. In <i>EACL 2006.</i></p>
      <p>J. Nivre and J. Nilsson. 2005. Pseudo-projective depen­dency parsing. In <i>ACL 2005.</i></p>
      <p>S. Petrov and D. Klein. 2007. Improved inference for unlexicalized parsing. In <i>HLT-NAACL 2007.</i></p>
      <p>S. Pyysalo, T. Salakoski, S. Aubin, and A. Nazarenko. 2006. Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches. <i>BMC Bioinformatics, </i>7(Suppl. 3).</p>
      <p>S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski. 2007a. BioInfer: a cor­pus for information extraction in the biomedical do­main. <i>BMC Bioinformatics, </i>8(50).</p>
      <p>S. Pyysalo, F. Ginter, V. Laippala, K. Haverinen, J. Hei-monen, and T. Salakoski. 2007b. On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA. In <i>BioNLP 2007, </i>pages 25-32.</p>
      <p>C. Quirk and S. Corston-Oliver. 2006. The impact of parse quality on syntactically-informed statistical ma­chine translation. In <i>EMNLP 2006.</i></p>
      <p>E. K. Ringger, R. C. Moore, E. Charniak, L. Vander-wende, and H. Suzuki. 2004. Using the Penn Tree-bank to evaluate non-treebank parsers. In <i>LREC 2004.</i></p>
      <p>R. Saetre, K. Sagae, and J. Tsujii. 2007. Syntactic features for protein-protein interaction extraction. In <i>LBM2007 short papers.</i></p>
      <p>K. Sagae and J. Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensem­bles. In <i>EMNLP-CoNLL 2007.</i></p>
      <p>K. Sagae, Y. Miyao, T. Matsuzaki, and J. Tsujii. 2008. Challenges in mapping of syntactic representations for framework-independent parser evaluation. In <i>the Workshop on Automated Syntatic Annotations for In­teroperable Language Resources.</i></p>
      <p>D. D. SleatorandD. Temperley. 1993. Parsing English with a Link Grammar. In <i>3rd IWPT.</i></p>
      <doubt alpha="63.3" length="49" tooSmall="False" monospace="0.0">Y.Tsuruoka,Y.Tateishi, J.-D. Kim, T. Ohta, J. Mc-</doubt>
      <p>Naught, S. Ananiadou, and J. Tsujii. 2005. Develop­ing a robust part-of-speech tagger for biomedical text.</p>
      <p>In <i>10th Panhellenic Conference on Informatics.</i></p>
      <p>A. Yakushiji, Y. Miyao, Y. Tateisi, and J. Tsujii. 2005. Biomedical information extraction with predicate-argument structure patterns. In <i>First International Symposium on Semantic Mining in Biomedicine.</i></p>
    </references>
  </body>
</article>
