1 Introduction
Scientists must routinely review the scholarly literature in their fields to keep abreast of current advances and to retrieve information relevant to their research. However, the volume of online scientific literature is immense, and rapidly increasing. In the biomedical field, the National Centre for Biotechnology Information (NCBI) developed a literature search engine, PubMed1, to access various databases such as MEDLINE (journal citations and abstracts for biomedical literature), full-text life science e-journals, and online books.
In 2010 PubMed repositories consisted of more than 20 million citations for biomedical literature [23]. By 2019 the number of citations had increased to more than 30 million2. As a consequence, it has become extremely challenging for biomedical scientists to keep current with information in their fields. This challenge has attracted Natural Language Processing (NLP) researchers to develop resources and automated tools for performing various tasks in Information Extraction (IE) and Text Mining (TM) using online corpora of biomedical articles, and thus enable biomedical researchers to better manage and exploit this volume of data [18].
These research activities have led to the development of a new field, Biomedical Natural Language Processing (BioNLP), a collaboration between the biomedical and computational linguistics/artificial intelligence communities [17]. The types of tasks currently handled by BioNLP systems have generally been aimed at extracting very specific and limited information, for example, protein and gene names and relations [11], and so have been able to rely on relatively simple forms of information extraction. BioNLP has adapted various standard information extraction techniques, including both rule-based (e.g., shallow parsing, syntactic pattern-matching) and Machine Learning (e.g., Support Vector Machines, k-nearest neighbour classification method), to address several text-mining tasks, including extracting: protein-protein interactions (PPI) [21], drug-drug interactions (DDI) [28], gene relationships [19], and protein-residue associations [25].
Although these approaches fulfil some information needs, information extraction systems based on these can only recognize and extract minimal and specific information from biomedical texts. But other, more in-depth and comprehensive, information contained in biomedical texts would be highly valuable to scientists because this type of information can enable validating scientific claims, tracing current research directions in their field, reproducing scientific procedures and so forth. Recently, a new and more challenging information extraction task has been introduced as a means of obtaining these types of detailed information: identifying the argumentation structure in biomedical articles (e.g., [15] and [16]). Argumentation mining can be used to validate scientific claims and experimental methodology, and to plot deeper chains of scientific reasoning. Unlike earlier simpler forms of information extraction, here the goal is to identify the structure of argumentative components within an entire text-for example, premises, evidence, conclusions-as well as the relationships between components.
To achieve this goal the text needs to be analyzed. Our approach to this analysis is based on a working hypothesis:
We hypothesize that recognizing and detecting rhetorical moves would provide important information to our argumentation analysis framework, and that the Method sections in biochemistry articles contain moves which can be correlated with the author's experimental procedures. These moves can be used to determine salient information about the elements of the article's argumentative structure (e.g., premises) and can contribute to the overall understanding of the author's scientific claims.
A key aspect of our hypothesis is that development of a frame-based knowledge representation can be based on the semantics of the verbs associated with these procedures. This representation can provide detailed knowledge for understanding these rhetorical moves, which will in turn facilitate analysis of argumentation structure. In other words, we propose that a procedurally rhetorical verb-centric frame semantics can be used to obtain a sufficiently deep analysis of sentence meaning .
While this approach seems straightforward enough, the writing style of biochemistry articles requires the reader to have knowledge about biochemistry and biochemistry laboratory techniques and practices. This paper first gives the semantic roles that can be used in the semantics of each verb. Then an example of how an ontology containing knowledge about biochemistry laboratory techniques and practices can be used to fill the semantic roles of verbs which cannot be filled by information in the text.
2 Related Work
Swales [29] proposed the Create-A-Research-Space (CARS) model that uses intuition about the argumentative structure of scientific research articles. Swales defined rhetorical moves as text segments that convey communicative goals. He reviewed the Introduction section in 48 articles from social and natural science and found common rhetorical structures among most of these articles. Swales identified three moves in these articles: establishing a research territory, establishing a niche, and occupying the niche.
However, despite the widespread influence of the CARS model, some researchers observed two problems: (i) the inconsistent assignment of rhetorical moves to text segments because the identification of the rhetorical moves relies on overall text comprehension, and (ii) a lack of empirical validation of moves in linguistic terms [20].
To overcome these problems, Kanoksilap-atham [20] advanced Swales' approach to move analysis by developing a framework that combines his original CARS model with the use of Biber's multidimensional analysis [6] to enrich the model with additional information about linguistic characteristics. Biber's multidimensional analysis [6] is concerned with variation in the speaking and writing of English. Multidimensional analysis can be used to identify differences in linguistic characteristics between various text types at different levels of document structure (e.g., genre, internal section level). Although Kanoksilapatham provides an extension to the Swales's move analysis study, and attempted validation of these moves in biochemistry articles, she only provides a descriptive analysis about rhetorical moves without defining an explicit method for analyzing and recognizing these moves in texts.
Liakata et al. [22] developed an annotation scheme called Core Scientific Concepts (CoreSC) to classify sentences into scientific categories (e.g., related to author's other work). The CoreSC scheme consists of three layers: the first includes several categories to classify sentences; the second layer is concerned with properties of these categories; and the third layer creates a link to related instances of the same category. The authors use Machine Learning classifiers (i.e., Conditional Random Fields and Support Vector Machines) to automatically classify sentences into the CoreSC categorizes. The data set consisted of 265 biochemistry and chemistry articles. The authors were only able to achieve an accuracy around 50% in categorizing sentences in the appropriate CoreSC scientific categories which is inadequate for such a task.
Green [15] proposed a plan for creating an annotated corpus of biomedical genetics research articles. Green emphasized that this corpus would be beneficial to the argumentation mining community since it would provide a fine-grained annotation of argumentative components. Also since there are as yet few annotated corpora available, such a corpus would enrich research in the field of Computational Argumentation in general. The author stated that this corpus will be publicly available for further investigation by different research groups in various tasks of argumentation mining.
Green [16] specified a set of argumentation schemes for scientific claims in genetics research articles. The author used a corpus of unannotated genetics research articles, and identified the components (e.g., premises, conclusions) of an argument as well as its type of scheme. Based on the analyses of various genetics research articles, the author specified 10 argumentation schemes that are semantically different. These schemes were new and had not previously been proposed.
Furthermore, the specification of argumentation schemes was used to create annotation guidelines. Then, these guidelines were evaluated in a pilot study based on participants' ability to recognize these schemes by reading the guidelines. Overall, the author's ultimate goal for this initial study was to develop annotation guidelines for creating corpora for argumentation mining research. However, based on the pilot study, the results showed a variation in performance since there were two groups of participants (i.e., undergraduate students and researchers). The students performed poorly in recognizing argumentation schemes while the researchers were able to identify these schemes correctly in most cases.
3 Our Proposed Approach: Rhetorical Moves Mirror Scientific Experimental Procedures
Our intention is to develop a formal knowledge representation based on procedural verbs as a method for argumentation analysis. We introduced the notion of Swale's CARS model [29] in Section 2. We hypothesize that recognizing and detecting rhetorical moves would provide additional information to our framework of argumentation analysis. We also hypothesize that the Method sections in biochemistry articles contain moves which can be correlated with the author's experimental procedures. These moves can be used to determine salient information about the elements of the article's argumentative structure (e.g., premises) and can contribute to the overall understanding of the author's scientific claims. A key aspect of our hypothesis is that development of a frame-based knowledge representation can be based on the semantics of the verbs associated with these procedures. This representation can provide detailed knowledge for understanding these rhetorical moves, which will in turn facilitate analysis of argumentation structure. In other words, we propose that procedurally rhetorical verb-centric frame semantics can be used to obtain a deeper analysis of sentence meaning than is currently the case with simple methods of Information Extraction (e.g., shallow syntactic pattern) and in a computationally feasible manner.
Scientific argument3 is defined as a process that scientists follow by using certain procedures to obtain empirical data which will either support or defeat their claims, hence leading to the intended conclusion. The strength of a scientific argument depends on its reproducibility and consistency. For a scientific argument to be strong, a scientist should identify and explain all the procedures in their experiment, i.e., reproducibility, so that another researcher who follows the same procedures will reach the same conclusion, i.e., consistency. Thus, for a well-constructed scientific article, a scientist should expect the same conclusion if she follows the same procedures in the same sequence as described in the Method section.
Scientific writing in the biochemistry domain has certain characteristics that made it ideal for our purposes. In this domain, experimental procedures describe the sequence of actions the biochemist performs to carry out an experiment to derive verifiable scientific conclusions. The experimental procedures themselves can be verified because they are standard procedures described in detail in experimental manuals (e.g., Boyer [7] and Sambrook and Russell [26]). Verbs play an essential role as indicators of these experimental procedures.
These procedures can be viewed as corresponding to the elements of the scientific argumentation structure. For example, when examining a biological substance (e.g., a certain type of bacteria) in order to prove a hypothesis (e.g., this bacteria is correlated with a certain disease) the biochemist would perform a sequence of certain procedures to arrive at a conclusion. Essentially, biochemists create an argumentation framework through the scientific methodology they follow-how they perform their experiments is how they argue. We can observe that this genre- biochemistry articles-is procedure-oriented since the scientific procedures that are described are parallel to the scientific argumentation in the text. For example:
Example 1 "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer and proteins harvested in SDS-sample buffer, separated by SDS-PAGE, and analyzed by autoradiography." [12].
In this example, the verbs "washed", "harvested", "separated", and "analyzed" are used to illustrate the procedure steps in sequential order. Such an experiment can be reproduced if one follows these steps.
Fillmore [13] introduced the notion of frame semantics as a theory of meaning. A semantic frame is defined as "any coherent individuatable perception, memory, experience, action or object" by Fillmore [14]. In other words, coherently structured concepts that are related to each other represent a complete knowledge of world events or experiences. For example, to understand the word "buy", one would access the knowledge contained in the commercial transaction frame which includes words such as the person who buys the goods (buyer), the goods that are being sold (goods), the person who sells the goods (seller), and .the currency that the buyer and seller agree on (money).
Following Fillmore's theory of frame semantics, FrameNet [5] was developed to create an online lexical resource for English. This framework includes more than 170,000 manually annotated sentences and 10,000 words. The computational linguistic community has been attracted to the concept of the frame semantics and developed computational resources using this concept, such as VerbNet [27], an on-line verb lexicon for English and PropBank [24], an annotated corpus with basic semantic propositions.
Following the notion of frame semantics, we propose to build a knowledge representation framework to analyze verbs in a procedure-oriented genre. Our concept of procedurally rhetorical verb-centric frame semantics is intended to address this gap by developing a computationally feasible knowledge representation that will enable argumentation analysis.
The knowledge contained in the frame semantics will facilitate the extraction of elements of arguments, i.e., argumentation mining. To reiterate, our hypothesis is that procedurally rhetorical verb-centric frame semantics can provide a knowledge representation framework for analyzing and representing the meanings of the verbs used in biochemistry articles. In turn, these frames will facilitate the identification of argumentation structure in the discourse describing experimental procedures.
4 Ontological Knowledge Sources
To provide the knowledge required to achieve the rhetorical move analysis discussed in the previous section, we propose two sources organized as ontologies. An ontology, as used here, is composed of the concepts and the relations between them. We discuss two ontologies below. The first, semantic roles, represents the knowledge about verbs that we argue is needed to analyze rhetorical moves. This information is organized in VerbNet-like [27] verb frames. The second knowledge source is composed of information about experimental procedures in the biochemistry domain. This information is organized in the familiar graph-based web of objects, classes of objects, and relations among these.
4.1 Semantic Roles
As described earlier our experimental event scheme was inspired by the annotation scheme for bio-events [30]. We based our experimental event scheme for verb arguments on the inventory of semantic roles in VerbNet [27] and modified and added new semantic roles to define our scheme. Our experimental event scheme includes: Theme, Patient, Predicate, Agent, Location, Goal, etc. The complete set of semantic roles and their definitions in our experimental event scheme is presented in Table 2.
Move type | Definition |
---|---|
Description-of-method | Concerned with sentences that describe experimental events. |
Appeal-to-authority | Concerned with sentences that discuss the use of well-established methods. |
Background information | Concerned with all background information for the experimental events such as “method justification, comment, or observation, exclusion of data, approval of use of human tissue” as defined by Kanoksilapatham (2003). |
Source-of-materials | Concerned with the use of certain biological materials in the experimental events. |
Semantic role | Definition |
---|---|
Agent | Generally a human or an animate subject. |
Patient | Participants that have undergone a process. |
Theme | Participants in a location or undergoing a change of location. |
Goal: | |
Physical | Identifies a thing toward which an action is directed or a place to which something moves. |
Purpose | Identifies the stated purpose in a sentence for doing certain actions. |
Factitive | A referent that results from the action or state identified by a verb. |
Location | The physical place where the experiments took place. |
Protocol-Detail: | |
Time | Identifies the time or a duration of an experimental process. |
Temperature | Identifies the temperature of an experimental process. |
Condition | Identifies the condition of how an experimental process is performed. |
Repetition | Identifies the number of times an experimental process is repeated. |
Buffer | Identifies the buffer that was used in an experimental process. |
Cofactor | Identifies the cofactor that was used in an experimental process. |
Instrument: | |
Change | Describes objects (or forces) that come in contact with an object and cause some change. |
Measure | Describes an object or protocol that can measure another object(s). |
Observe | Describes an object which can be used to observe another object(s). |
Maintain | Describes an object or protocol which can be used to maintain the state of object(s). |
Catalyst | Describes an object that can be used as a catalytic “facilitator” for an experimental event to occur. |
Reference | Refers to a method or protocol that is being used. |
Mathematical | Describes a mathematical or computational instrument |
We have extended the VerbNet definition of the semantic role Instrument from simply describing "an object or force that comes in contact with an object and causes some change in them" [27] to include a variety of subcategories that correspond to various types of biological and man-made instruments that are used in a biochemistry laboratory. The new semantic roles (with example text in boldface) are:
-
Instruments used to change the state of an object. For example:
Example 2 "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer…"[12].
In this example, the pulldown buffer was used to wash (change the state of) the Beads with bound proteins. In this instance, the phrase "pulldown buffer" should be labeled as instrument (change).
-
Instruments used to maintain the state of an object. For example:
Example 3 "Once the samples were in EPR tubes, they were immediately frozen in liquid nitrogen, and stored in liquid nitrogen before using." [10].
In this example, the liquid nitrogen was used to store (maintain the condition of) the samples which were in the EPR tubes. In this case, the phrase "liquid nitrogen" should be labeled as instrument (maintain).
-
Instruments used to observe an object. For example:
Example 4 The mitochondria was observed by spinning disk confocal microscopy.
The spinning disk confocal microscopy is used to observe the mitochondria. We should label the phrase "spinning disk confocal microscopy" as instrument (observe).
-
Instruments used as a catalyst in experimental processes to occur. For example:
Example 5 "The ca. 900 bp PCR products were digested with NdeI and HindIII and ligated into pUC19." [9].
In this example, the NdeI and HindIII are enzymes used to facilitate the digestion (cutting) of the ca.(approximately) 900 bp PCR products. In this instance, the phrase "NdeI and HindIII" should be labeled as instrument (catalyst).
-
Instrument used to measure an object. For example:
Example 6 "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer and proteins harvested in SDS-sample buffer, separated by SDS-PAGE, and analyzed by autoradiography ."[ 12]
In this example, the autoradiography was used to analyze (measure) the proteins. In this example, the word "autoradiography" should be labeled as instrument (measure).
-
It could be used to describe a mathematical or computational instrument (e.g., simulation, algorithm, equation, and the use of software). For example:
Example 7 "Simulations of these EPR spectra were accomplished with the computer program QPOWA [ 30, 31] )." [10].
The computer program QPOWA was used here as computational instrument to perform simulations of the mentioned above EPR spectra. So, the phrase "the computer program QPOWA [ 30, 31]" should be labeled as instrument (computational instrument).
-
Finally it could be used as a reference for method or protocol that being used. For example:
Example 8 "The preparation of authentic vaccinia H5R protein and recombinant B1R protein kinase were as previously described [11]." [8]
The phrase "as previously described [11]" is to indicate that the authors referring to other method that they used in their current experimental process. We should label the phrase "as previously described [11]" as instrument (reference).
These sub-categories of the semantic role (instrument) are not necessarily exclusive to the mentioned types above. However, based on our full-text analysis, these instrument types are as comprehensive as we have achieved to date. We will add or update these sub-categories if we encounter a new type (usage) of instrument.
We have also proposed a new semantic role protocol detail that identifies certain types of information about experimental processes. These new subcategories (with example text in boldface) are:
-
Time or the duration of a process [27]. For example:
Example 9 "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer…" [12].
-
Temperature of an experimental process. For example:
Example 10: "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer..." [12].
-
Condition or manner of which an experimental process was carried out. For example:
Example 11 "Beads with bound proteins were washed six times (for 10 min under rotation at 4 C) with pulldown buffer . . . " [12].
-
Buffer which is "a solution containing either a weak acid and a conjugate base or a weak base and a conjugate acid, used to stabilize the pH of a liquid upon dilution."4 For example:
Example 12 "For phosphorylation, three identical reactions contained H5R protein (70 pmol), B1R protein kinase (90 μl) Tris-HCl, pH 7.4 (20 mM), magnesium chloride (5 mM), ATP (50 μM), [γ-32P] ATP (50 μCi) and dithiothreitol (2 mM) in a total volume of 500 μl "[ 8].
-
Cofactor is defined as "substances that are required for, or increase the rate of, catalysis."5 For example:
Example 13 "For phosphorylation, three identical reactions contained H5R protein (70 pmol), B1R protein kinase (90 μl), Tris-HCl, pH 7.4 (20 mM), magnesium chloride (5 mM), ATP (50 μM), [γ-32P] ATP (50 μCi) and dithiothreitol (2 mM) in a total volume of 500 μl "[8].
-
Repetition of a step in experimental processes. For example:
Example 14 "Beadswith boundproteinswere washed six times (for 10 min under rotation at 4 C) with pulldown buffer…" [12].
With these semantic roles we are able to provide the frames for procedural verbs. To illustrate, Fig. 1 contains the frame for the verb digest.
4.2 An Ontology of Biochemical Techniques and Laboratory Practices
Knowledge about how experiments are carried out in a biochemistry laboratory is absolutely essential to the understanding of much of the text found in biochemistry articles. We needed assistance from a biochemist to understand many of the sentences that are present in our corpus. With this in mind we have developed an ontology prototype to assist with a computational approach to analyzing the sentences found in the Methods section of a biochemistry article. Details of this prototype ontology are described elsewhere [3].
The example of a procedure called Alkaline Agarose Gel Electrophoresis is given in text format in Fig. 2. This is a common procedure used to isolate the biological substance that is used in future procedures from the other substances found in the solution that results from the previous procedures. The knowledge about how this electrophoesis procedure is carried out has been implemented in the prototype ontology. Why this knowledge is important is discussed in the following section.
5 A Manual Annotation of a Portion of a Method Section
We have selected three articles from our corpus randomly to manually analyze and extract steps in experimental procedures (processes) from the method section. Table 3 shows some sentences from one of these articles [9]. The purpose of this analysis is to identify the semantic roles of experimental processes and the semantic frames of procedural verbs that occurred in these processes. Also, we want to demonstrate the usefulness of our approach by mapping the knowledge of frame semantics and the ontological knowledge to rhetorical moves.
No. | Sentence |
---|---|
1 | The over-expression plasmid for L1, pUB5832, was digested with NdeI and HindIII, and the resulting ca. 900 bp piece was gel purified and ligated using T4 ligase into pUC19, which was also digested with NdeI and HindIII, to yield the cloning plasmid pL1PUC19. |
2 | Mutations were introduced into the L1 gene by using the overlap extension method of Ho et al. [60], as described previously [68]. |
3 | The oligonucleotides used for the preparation of the mutants are shown in Table 1.1. |
The sentences in Table 3 are three contiguous sentences in a biochemistry article. They discuss the idea of cutting a DNA piece from a plasmid, which is "a small circular and double-stranded DNA molecule that is distinct from a cell's chromosomal DNA",6 and ligate (attach) that piece to another plasmid to produce the desired protein. Table 4 shows five events from the sentences in Table 3. The events 1, 2, 3, and 4, which are demonstrated in Fig. 3, are extracted from Sentence No. 1, and Sentence No. 2 has only Event 5, while there is no actual experimental event in Sentence No. 3. It rather simply refers to a table in the article's prior text. Each event in Table 4 represents one complete experimental procedure. Also the actual sequence of experimental events in the lab don't necessarily follow the sequence that these events appear in the text. Another important aspect to note is that not all the essential information about experimental processes is found in the text, some information can be implied. However, these implied pieces of information can be inferred from an ontology of standard biochemistry procedures, some of which we have developed. Taking a look at Events 1-4 in Table 4:
Digestion of pUB5832: a 900 bp piece was cut out using two restriction enzymes (NdeI and Hind III).
Then, the gel purification of the 900 bp piece: gel electrophoresis was used in this purification step. This is implied information derived from the ontology.
At any time before Event 4, the digestion of pUC19 happens, This could happen before, after, between, or during Events 1 and 2.
After Events 1, 2, and 3, ligation of the 900 bp into pUC19 occurs.
Event 1 | Event 2 | Event 3 |
Sentence No. 1 — Patient: The over-expression plasmid for L1, pUB5832 — Predicate: digested — Instrument (catalyst): NdeI and HindIII |
Sentence No. 1 — Patient: the resulting ca. 900 bp piece — Predicate: gel purified — Instrument (catalyst): Gel electrophoresis |
Sentence No. 1 — Patient: pUC19 — Predicate: digested — Instrument (catalyst): NdeI and HindIII |
Event 4 | Event 5 | |
Sentence No. 1 — Patient: the resulting ca. 900 bp piece — Predicate: ligated — Instrument (catalyst): using T4 ligase — goal: into pUC19 |
Sentence No. 2 — Patient: the L1 gene — Predicate: introduced (mutated) — Instrument (reference type): using the overlap extension method of Ho et al. |
Sentence No. 3 does not contain experimental events. |
A lot of information can be derived from the text using knowledge about the verbs. This has been described earlier: the semantic roles of each verb together with syntactic information allows this information to be extracted from the text. Table 4 shows this extracted information. However, this is not enough to understand the information provided in the text.
A proper interpretation of the description of events in Sentence No. 1 cannot be completely derived from the text alone. An understanding of laboratory practice together with knowledge of what is involved in performing plasmid digestion, purification, and ligation is required. Some of the event sequencing can be derived from the text, for instance, the pragmatics of the conjunction "and" usually indicates that the second conjunct follows temporally after the first conjunct has completed. The phrase "the resulting" is also a key linguistic clue to determine this sequence. But, when the third event happens requires knowledge of biochemistry and laboratory practice as well as knowledge of the complete method. The linguistic information provided by the use of a relative clause does not enable a complete understanding of this event, so the ontology is required for the information required to do a proper interpretation. Another important aspect of the text is that all of the referents are described by singular nouns. However, knowing the biological processes that are carried out in the laboratory is important: solutions containing large numbers of the biological elements are used. Hence, one is not dealing with a single plasmid or a single piece from the plasmid, and when the digestion occurs, all of the pieces from the plasmids are in the solution including ones that didn't get digested, thus the need for the gel purification step which separates the various biological elements.
An example of inferring implied information from the ontology can be given. Event 2 in Table 4 is gel purification. What is used to perform this task is not given in the text. The following SPARQL query extracts some domain knowledge about the experimental procedure of Alkaline Agarose Gel Electrophoresis from our framework providing the missing instrument semantic role information.
Figure 4 shows all of the instruments involved in any state for all steps of the Alkaline Agarose Gel Electrophoresis procedure. Using this information and knowledge about the steps in procedure, the instrument gel electrophoresis can be inferred.
SPARQL Query
Query1. Return all devices involved in a state of all
steps (1.1, 1.2, 3)
SELECT ?step ? s tat e ?item
WHERE { ?step r d f : type : Step .
?step : hasState ? s tat e .
? s t a te : invol ves ?item .
?item r d f : type : Device }
6 Conclusions and Future Work
In this research we have provided prototypes for two ontologies of the biochemistry domain. The first ontology, procedurally rhetorical frame semantics, provides semantic roles for procedural verbs. The second ontology provides information about biochemical techniques. This ontology can be used to give information that does not appear in the scientific article text. To the best of our knowledge, no research has proposed or incorporated the idea of a semantic frame based on verb analysis to assist in the analysis of argumentation in biochemistry articles. Nor has any attempt been made to build an ontology of biochemical techniques and laboratory practices.
Our future goal is an in-depth argumentation analysis of biochemistry articles. Having access to the rhetorical moves that have been extracted using the two ontologies will enable a computationally feasible technique that will enable argumentation mining of more-detailed scientific knowledge than is currently available. This will be an important step towards providing researchers in Computational Argumentation working in domains with similar discourse structure with a means of using and evaluating the metrics we will develop. We have begun conducting an annotation study for both semantic roles [1] and rhetorical moves [2]. In addition, we have built a prototype ontology that we described in other work [3].
The SPARQL Query and Fig. 4 show the power of using the ontological knowledge to obtain relevant information about specific experimental processes7. We have also developed a set of frames for frequent procedural verbs (e.g., "digest") in our analyzed data set. Our aim is to extend the VerbNet project by providing syntactic and semantic information for these procedural verbs. Further details can be found in the first author's PhD thesis [4].