SciELO - Scientific Electronic Library Online

 
vol.23 número3Reasoning over Arabic WordNet Relations with Neural Tensor NetworkRelation between Titles and Keywords in Japanese Academic Papers using Quantitative Analysis and Machine Learning índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.23 no.3 Ciudad de México jul./sep. 2019  Epub 09-Ago-2021

https://doi.org/10.13053/cys-23-3-3242 

Articles of the Thematic Issue

Refining Concepts by Machine Learning

Marek Menšík1  * 

Marie Duží1 

Adam Albert1 

Vojtĕch Patschka1 

Miroslav Pajr2 

1 VSB-Technical University of Ostrava, Department of Computer Science, Czech Republic. marek.mensik@vsb.cz, marie.duzi@vsb.cz.

2 Silesian Univerzity in Opava, Institute of Computer Science, Czech Republic.


Abstract

In this paper we deal with machine learning methods and algorithms applied in learning simple concepts by their refining or explication. The method of refining a simple concept of an object O consists in discovering a molecular concept that defines the same or a very similar object to the object O. Typically, such a molecular concept is a professional definition of the object, for instance a biological definition according to taxonomy, or legal definition of roles, acts, etc. Our background theory is Transparent Intensional Logic (TIL). In TIL concepts are explicated as abstract procedures encoded by natural language terms. These procedures are defined as six kinds of TIL constructions. First, we briefly introduce the method of learning with a supervisor that is applied in our case. Then we describe the algorithm 'Framework' together with heuristic methods applied by it. The heuristics is based on a plausible supply of positive and negative (near-miss) examples by which learner's hypotheses are refined and adjusted. Given a positive example, the learner refines the hypothesis learnt so far, while a near-miss example triggers specialization. Our heuristic methods deal with the way refinement is applied, which includes also its special cases generalization and specialization.

Keywords: Machine learning; supervisor; transparent intensional logic; TIL; refinement; generalization; specialization; hypothesis; heuristics

1 Introduction

The method of supervised machine learning enables the agents in a multi-agent system to adjust their ontology and increase their knowledge. In [13] the method has been applied to learning the concept of a property that classifies geometric figures such as lancet arches.

In this paper we deal with natural language processing, which is an interdisciplinary discipline involving linguistics, logic and computer science. The goal of this paper is to describe the application of machine learning methods in agents' learning simple concepts by their refinement or explication. Refinement is rigorously defined below. Briefly, by refining a simple concept of an object O, we mean discovering a molecular concept that defines the same object O. In mathematics we use definitions like "a group is a set G equipped with a binary operation that combines any two elements of G to form another element of G in such a way that group axioms are satisfied, namely associativity, existence of the neutral element in G and invertibility."

Here the simple concept to be refined is that of a 'group'. The molecular concept refining 'group' is encoded by the right side of the definition, namely 'a set G equipped with a binary operation that combines any two elements of G to form another element of G in such a way that group axioms are satisfied, namely associativity, existence of the neutral element in G and invertibility'. In case of empirical concepts, it is more plausible to speak about explication. The reason is this.

To say that a molecular concept C is a refinement of a simple empirical concept D is risky. It would be a refinement only if the molecular concept C was equivalent to the original concept D, which means that both are the concepts of the same object O.

However, in the most interesting cases of empirical concepts of PWS-intensions we use a Carnapian explication rather than a definition proper, and then equivalence is surely not guaranteed1. Rather, a new molecular concept C (explicatum) should define an object O that is as close as possible to the object referred to by an inexact prescientific concept D (explicandum). In Meaning and Necessity (1947), Carnap characterizes explication as follows:

The task of making more exact a vague or not quite exact concept used in everyday life or in an earlier stage of scientific or logical development, or rather of replacing it by a newly constructed, more exact concept, belongs among the most important tasks of logical analysis and logical construction. We call this the task of explicating, or of giving an explication for, the earlier concept [1], (pp. 7-8) Keeping this difference in mind, in this paper we use the term 'refinement' for both cases, including explication of empirical concepts, because in our sample example of explicating the concept of myopia this simplification is harmless.

Our background theory is Transparent Intensional Logic (TIL) with its procedural (as opposed to set-theoretical) semantics. In TIL we explicate concepts procedurally. They are abstract structured procedures assigned to natural language terms as their meanings. In this way structured meanings are formalized in a finegrained way as so-called TIL constructions so that almost all the semantically salient features can be successfully dealt with.

To this end we use the so-called Normal Translation Algorithm (NTA) that processes text data and produces TIL constructions as their meanings.2 Having a meaning procedure, we can apply logic to prove what is entailed by it, compute the object (if any) produced by the procedure, deal with its structure, etc.

However, there is a problem of understanding simple or atomic concepts that are expressed by semantically simple terms like 'cat', 'dog', 'myopia', etc. They are basic 'building blocks' of molecular concepts, and as such they are formalized just by the simplest procedure Trivialization of a given object O, ‘0 O' in symbols, that refers to the object O and makes it available to other molecular procedures to operate on it. In proof-theoretic semantics the meaning of atomic terms is given by the rules that determine how to use them in proofs3. This works well in the language of mathematics and logic.

However, in natural language the 'meaning as proof' semantics is much less successful. For these reasons we decided to apply supervised machine learning methods.

The issue is this. When processing a natural language text, our agents learn structured TIL procedures encoded by sentences. For instance, the sentence "Tom has myopia" translates into the TIL procedure λwλtMyopiawt0Tom0.

It can be viewed as an instruction how in any possible world (λw) and time (λt) evaluate the truth-conditions of the sentence, which consists of these steps:

  • ― Take the individual Tom: 0Tom

  • ― Take the property of having Myopia: 0Myopia

  • ― Extensionalize the property with respect to world w and time t of evaluation: 0Myopiawt

  • ― Produce a truth-value by checking whether Tom has this property at the world w and time t of evaluation: [0Myopiawt 0Tom]

So far so good. We can derive that somebody has myopia, but this piece of information does not suffice to derive, for instance, that Tom has problems with impaired vision, needs negative dioptre correction, etc.

We need to refine the simple concept 0Myopia to learn in more details what 'myopia' is. In other words, we want to define the property of having myopia. To this end we try to extract from natural language texts the collection of so-called requisites that together define the property. Hence, the supervisor looks for sentences like "Myopia (also called near-sightedness) is the most common cause of impaired vision in people under age 40". Based on this piece of information the agent makes a hypothesis that among the requisites of myopia there are 'near-sightedness' and 'impaired vision'.

This is a positive example. Furthermore, we can read sentences like "Myopia is not caused by nerve trauma; rather, it occurs when the eyeball is too long, relative to the focusing power of the cornea and lens of the eye. This causes light rays to focus at a point in front of the retina, rather than directly on its surface. Near-sightedness also can be caused by the cornea and/or lens being too curved for the length of the eyeball. In some cases, myopia is due to a combination of these factors." The supervisor should extract a negative example that myopia is not caused by nerve trauma and a collection of positive examples like 'too long eyeball', 'wrong focusing', etc.

The algorithm of the learning process is based on such positive and negative examples. Given a positive example, the hypotheses are adjusted so that concepts of other requisites or typical properties are inserted. Negative (also 'near-miss') examples serve to the adjustment of the hypothesis (learnt so far) by specialization that excludes non-plausible elements. As a special case of refinement, we can also apply generalization. This is the case of inserting a more general concept in addition to some special constituents of the hypothesis. For instance, the degree of myopia is described in terms of the power of the ideal correction, which is measured in dioptres.

Now the agent can extract information like this. "Low myopia usually describes myopia of -3.00 dioptres or less (i.e. closer to 0.00), moderate myopia is between -3.00 and -6.00 dioptres, and high myopia is the degree -6.00 or more." By generalization we obtain information that myopia is corrected by negative dioptres.The rest of the paper is organized as follows.

In Section 2 we summarize foundations of TIL to describe logical machinery that we need in the rest of the paper. Section 3 introduces the principles of supervised machine learning. In Section 4 we deal with heuristic methods that are used to adjust and enrich agents' knowledge base. In Section 5, an example of using the algorithm of machine learning together with TIL formalization is adduced. Finally, concluding remarks can be found in Section 6.

2 Foundations of Transparent Intensional Logic (TIL)

Since the TIL logical system has been introduced in numerous papers and two books, see, for instance [2,3,4,5,7,8,17], here we just briefly summarise the main principles of a TIL fragment that we need for the purposes of this paper.

TIL is a partial, typed hyperintensional lambda calculus with procedural as opposed to set-theoretical denotational semantics. The terms of the TIL language denote abstract procedures that produce set-theoretical mappings (functions-in-extension) or lower-order procedures. These procedures are rigorously defined as TIL constructions. Being procedural objects, constructions can be executed in order to operate on input objects (of a lower-order type) and produce the object (if any) they are typed to produce, while non-procedural objects, i.e. non-constructions, cannot be executed. There are two atomic constructions that present input objects to be operated on.

They are Trivialization and Variables. The operational sense of Trivialization is similar to that of constants in formal languages. Trivialization presents an object X without the mediation of any other procedures. Using the terminology of programming languages, the Trivialization of X, ' 0X' in symbols, is just a pointer that refers to X. Variables produce objects dependently on valuations; they v-construct. We adopt an objectual variant of the Tarskian conception of variables. To each type countably many variables are assigned that range over this particular type. Objects of each type can be arranged into infinitely many sequences.

The valuation v selects one such sequence of objects of the respective type, and the first variable v-constructs the first object of the sequence, the second variable v-constructs the second object of the sequence, and so on. Thus, the execution of a Trivialization or a variable never fails to produce an object. However, the execution of some of the molecular constructions can fail to present an object of the type they are typed to produce. When this happens, we say that such constructions are v-improper.

There are two dual molecular constructions, which correspond to λ-abstraction and application in λ-calculi, namely Closure and Composition. (λ-)Closure, λx1xn  X, transforms into the very procedure of producing a function by abstracting over the values of the variables x1xn. The Closure λx1xm  Y is not v-improper for any valuation v, as it always v-constructs a function. Composition, X  X1Xn, is the very procedure of applying a function produced by the procedure X to the tuple-argument (if any) produced by the procedures X1,,Xn. While Closure never fails to produce a function, Composition is v-improper if one or more of its constituents X,  X1Xn are v- improper.

This happens when a partial function f is applied to an argument a such that the function f is not defined at a. Another cause of improperness can be type-theoretical incoherence of the Composition. For instance, the proposition that the number 5 is a student does not have a truth-value at any world w and time t of evaluation, because the property of being a student is the property of individuals rather than numbers. Hence the application of the (extensionalized) property of being a student to the number 5 in a particular world w and time t of evaluation, in symbols Student  w0t50, or Studentwt500 for short, is v-improper for every valuation v of the variables w (ranging over possible worlds) and t (ranging over times).

In what follows we define six kinds of TIL constructions, and afterwards the ramified hierarchy of types into which objects of TIL ontology are organised.

Definition (Constructions)

  • (i) Variables x, y, are constructions that construct objects (elements of their respective ranges) dependently on a valuation v; they v-construct.

  • (ii) Where X is an object whatsoever (even a construction), 0X is the construction Trivialization that constructs X without any change of X.

  • (iii) Let X, Y1,…,Yn be arbitrary constructions. Then CompositionX,  Y1Yn si the following construction. For any v, the CompositionX,  Y1Yn is v-improper if at least one of the constructionsX,  Y1Yn is v-improper by failing to v-construct anything, or if X does not v-construct a function that is defined at the n-tuple of objects v-constructed by Y1,Yn If X does v-construct such a function, then X,  Y1Ynv-constructs the value of this function at the n-tuple.

  • (iv) (λ-) Closureλx1xm   Y is the following construction. Let x1,x2,,xm be pair-wise distinct variables and Y a construction. Then λx1xm   Yv-constructs the function f that takes any members B1,,Bm of the respective ranges of the variables x1,,xm into the object (if any) that is vB1/x1,,Bm/xm-constructed by Y, where vB1/x1,,Bm/xm is like v except for assigning B1 to x1,,Bm to xm.

  • (v) Where X is an object whatsoever X1, is the construction Single Execution that v-constructs what X v-constructs. Thus, if X is a v-improper construction or not a construction as all, X1 is v-improper.

  • (vi) Where X is an object whatsoever, X2 is the construction Double Execution. If X is not itself a construction, or if X does not v-construct a construction, or if X v-constructs a v-improper construction, then X2 is v-improper. Otherwise X2v-constructs what is v-constructed by the construction v-constructed by X.

  • (vii) Nothing is a construction, unless it so follows from (i) through (vi).

With constructions of constructions, constructions of functions, functions, and functional values in our stratified ontology, we need to keep track of the traffic between multiple logical strata. The ramified type hierarchy does just that. The type of first-order objects includes all objects that are not constructions. Therefore, it includes not only the standard objects of individuals, truth-values, sets, etc., but also functions defined on possible worlds (i.e., the intensions germane to possible-world semantics). The type of second-order objects includes constructions of first-order objects and functions that have such constructions in their domain or range. The type of third-order objects includes constructions of first- and second-order objects and functions that have such constructions in their domain or range. And so on, ad infinitum.

Definition (types of order n). Let B be a base, where a base is a collection of pair-wise disjoint, non-empty sets. Then:

  • T1 (types of order 1).

  • i) Every member of B is an elementary type of order 1 over B.

  • ii) Let α,β1,,βmm>0 be types of order 1 over B. Then the collection α  β1,,βm of all m-ary partial mappings from β1××βm into α is a functional type of order 1 over B.

  • iii) Nothing is a type of order 1 over B unless it so follows from (i) and (ii).

  • Cn (constructions of order n)

  • i) Let x be a variable ranging over a type of order n. Then x is a construction of order n over B.

  • ii) Let X be a member of a type of order n. Then X0,X,X21 are constructions of order n over B.

  • iii) Let X,X1,,Xmm>0 be constructions of order n over B. Then X,X1,,Xm is a construction of order n over B.

  • iv) Let x1,,xm,Xm>0 be constructions of order n over B. Then λx1,,xm,X is a construction of order n over B.

  • v) Nothing is a construction of order n over B unless it so follows from Cn (i)-(iv).

  • Tn+1 (types of order n + 1)

  • Let *n be the collection of all constructions of order n over B. Then

  • i) *n and every type of order n are types of order n + 1.

  • ii) If m >0 and α,β1,,βm are types of order n + 1 over B, then α,β1,,βm (see T1 ii)) is a type of order n + 1 over B.

  • iii) Nothing is a type of order n + 1 over B unless it so follows from (i) and (ii).

Remark. For the purposes of the analysis of our sample example of agents' learning the concept of myopia intensional fragment of TIL based on the simple types of order 1 suffices. Yet when the agents learn new concepts, they enrich their ontology by new constructions that are just displayed rather than executed. To this end, the full ramified hierarchy is needed. For details see, e.g., [7,8].

For the purposes of natural-language analysis, we are usually assuming the following base of elementary types:

  • o: the set of truth-values {T, F};

  • ι: the set of individuals (the universe of discourse);

  • τ: the set of real numbers (doubling as times);

  • ω: the set of logically possible worlds (the logical space).

We model sets and relations by their characteristic functions. Thus, for instance, (oι) is the type of a set of individuals, while (oιι) is the type of a relation-in-extension between individuals. Empirical expressions denote empirical conditions that may or may not be satisfied at the world/time pair selected as points of evaluation. We model these empirical conditions as possible-world-semantic intensions. Intensions are entities of type (βω): mappings from possible worlds to an arbitrary type β. The type β is frequently the type of the chronology of α-objects, i.e., a mapping of type (ατ). Thus α-intensions are usually functions of type ((ατ)ω), abbreviated as 'ατω' . Extensional entities are entities of a type α where α≠(βω) for any type β.

Hence, empirical expressions denote (non-trivial, i.e. non-constant) intensions. Where variable w ranges over ω and t over τ, the following logical form essentially characterizes the logical syntax of empirical language:

λwλtwt.

Examples of frequently used intensions are:

  • ―propositions of type oτω denoted by sentences like "John is a student";

  • ―properties of individuals of type oιτω denoted by nouns and adjectives, e.g. 'student', 'red', 'tall', 'myopia', 'near-sighted';

  • ―binary relations-in-intension between individuals of type oιιτω , e.g. being 'composed of', 'to like';

  • ―individual offices (or roles) of type ιτω that are denoted by definite descriptions like 'the tallest mountain', 'Miss World 2019', 'the President of Zanzibar'.

Logical objects like truth-functions and are extensional: ˄ (conjunction), ˅ (disjunction) and ⸧ (implication) are of type (ooo), and ¬ (negation) of type (oo). The quantifiers α,α are type-theoretically polymorphic total functions of type (o(oα)), for an arbitrary type α, defined as follows. The universal quantifier α is a function that associates a class A of α-elements with T if A contains all elements of the type α, otherwise with F. The existential quantifier α is a function that associates a class A of α-elements with T if A is a non-empty class, otherwise with F.

Notational conventions. Below all type indications will be provided outside the formulae in order not to clutter the notation. The outermost brackets of Closures will be omitted whenever no confusion arises. Furthermore, 'X/α' means that an object X is (a member) of type α. 'X → α' means that X is typed to v-construct an object of type α, regardless of whether X in fact constructs anything. We write 'X → α' if what is v-constructed does not depend on a valuation v. Throughout, it holds that the variables wv ω o and t v τ. If C → vατω then the frequently used Composition [ [C, w]t], which is the intensional descent (a.k.a. extensionalization) of the α-intension v-constructed by C, will be encoded as 'C wt'. When applying quantifiers, we use a simpler notation 'xB','xB' instead of the full notation 'αλxB0','αλxB0'xα,Bo , to make the quantified constructions easier to read. When applying truth-functions we use infix notation without Trivialization. For instance, instead of the Composition 'AB0' we write simply 'AB' .

For illustration, here is an example of the analysis of a simple sentence "John is nearsighted". First, type-theoretical analysis, i.e. assigning types to the objects that receive mention in the sentence: John; Nearsighted/(oι)τω ; the whole sentence denotes a proposition of type oτω. Now we compose constructions of these objects to construct the denoted proposition. To predicate the property of being near-sighted of John, the property must be extensionalized first: N0earsighted  wt,, or Nearsightedwtvoι0 , for short. The Composition NearsightedwtJohn00vo ; and finally, the whole empirical sentence denotes a proposition of type oτω, hence it encodes as its meaning the Closure

λwλtNearsightedwtJ0ohn0 oτω.

In TIL we reject individual essentialism; instead, we adhere to intensional essentialism. It means that each α-intension P is necessarily related to a collection of requisites of P, its essence, that together define the intension P. For instance, requisites of the property of being a horse are the property of being a mammal of the family Equidae, species Equus Caballus, the property of having blood circuit, being a living creature, and many others. Necessarily, if some individual a happens to be a horse then a is a mammal of the family Equidae, etc.

The requisite relations Req are a family of relations-in-extension between two intensions, hence of the polymorphous type (oατωβτω), where possibly α = β. Infinitely many combinations of Req are possible, but the following is the relevant one that we need for our purpose:4

Req/(o(oι)τω(oι)τω): an individual property is a requisite of another such property. Thus, we define:

Definition (requisite relation between ι-properties) Let X, Y be constructions of properties, X,Y/*noιτω;xι,  True/ooτωτω: the property of propositions of being true in a given world and time of evaluation. Then ReqYX0=wt  xTruewtλwλtXwtx0TruewtλwλtYwtx0 .

Gloss definiendum as, "Y is a requisite of X", and definiens as, "Necessarily, at every (w, t), whatever x instantiates X at (w, t) also instantiates Y at (w, t)."

Remark. Here we have to apply the property of propositions True to deal with partiality. This is due to the fact that there is a stronger relation between properties, namely that of pre-requisite. If Y is a pre-requisite of X, then if an individual x does not instantiate Y it is neither true nor false that x instantiates X. The proposition λwλt[X wt x] has a truth-value gap. For instance, the property of having stopped smoking has a pre-requisite of being an ex-smoker. If somebody never smoked they could not stop smoking, of course. Then, however, the Composition  TruewtλwλtXwtx0 is simply false and since it is an antecedent of the above implication, the implication is true, as it should be.

Since the topic of this paper is learning and refining concepts, we need to define the notion of concept. In TIL concepts are defined as closed constructions in their normal form. Referring for details to [8, §2.2], we briefly recapitulate. Concepts are meanings of semantically complete terms that do not contain indexicals or other pragmatically incomplete terms. In case of the latter we furnish a pragmatically incomplete expression with an open construction containing free variables.

An open construction cannot be executed unless valuation of its free variables is supplied, usually by the situation of utterance. For instance, the meaning of the sentence "He is smart" is the open construction λwλtSmartwthe0hevι , that cannot be evaluated until an individual is assigned to the free variable he as its valuation.5 Hence, we don't treat this open construction as a concept. Since concepts should be at least in principle executable in any state of affairs, we explicate them as closed constructions. However, our TIL constructions are a bit too fine-grained from the procedural point of view.

Some closed constructions differ so slightly that they are virtually identical. In a natural language we cannot even render their distinctness, which is caused by the role of λ-bound variables that lack a counterpart in natural languages. These considerations motivated definition of the relation of procedural isomorphism on TIL constructions.6 Procedurally isomorphic constructions form an equivalence class at which we can vote for a representative. To this end a normalization procedure has been defined that results in the unique normal form C of a construction that is a representative of the class of procedurally isomorphic constructions. Hence, we adopt this definition:

Definition (concept) A concept is a closed construction in its normal form.

For the sake of simplicity, in what follows we deal with concepts simply as with closed constructions, ignoring the above technicalities, because we believe that this simplification is harmless for our purposes. The last notion we need to define is that of refinement of a concept. Basically, by refining a simple concept O0 of an object O we mean replacing O0 by an equivalent molecular concept C that produces the same object O. We also say that the molecular construction C is an ontological definition of the object O.

Here is an example. The Trivialization Prime0 is in fact the least informative procedure for producing the set of prime numbers.

Using particular definitions of the set of primes, we can refine the simple concept Prime0 in many ways, including:7

λxC0ardλyD0ivide y x =20 ,λxx10yD0ivide y x y=10y=x,

λxx>10¬yy>10y<xD0ivide y x .

The involved types are: v, the type of natural numbers; Card/(v(ov)): the cardinality of a set of natural numbers; Divide/(ovv): the relation of x being divisible by y; the other types are obvious. Thus, we define.

Definition (refinement of a construction) Let C1, C2, C3 be constructions. Let X0 be a simple concept of an object X and let X0 occur as a constituent of C1. If C2 differs from C1 only by containing in lieu of X0 an ontological definition of X, then C2 is a refinement of C1. If C3 is a refinement of C2 and C2 is a refinement of C1, then C3 is a refinement of C1.

Corollary. If C2 is a refinement of C1, then C1, C2 are equivalent but not procedurally isomorphic.

For instance, the simple concept of primes is not procedurally isomorphic with the above refinements, of course, which are molecular concepts with much richer structure than just Prime0. As a result, the term 'prime' is not synonymous with its equivalents like 'the set of naturals with just two factors', 'the set of naturals distinct from 1 that are divisible just by the number 1 and themselves', because the meanings of synonymous terms are procedurally isomorphic. Rather, 'prime' is only equivalent to these definitions. So much for our formalism and background theory.

3 Supervised Machine Learning

Supervised machine learning is a method of predicting functional dependencies between input values and the output value. The supervisor provides an agent/learner with a set of training data. These data describe an object by a set of attribute values such that there is a functional dependency between these values.

For instance, a house can be characterized by its size, locality, date of building, architecture style, etc., and its price. Obviously, the price of a house depends on its size, locality, date of building and architecture style. Hence, the price is called an output attribute and the other attributes are input attributes.

The goal of learning is to discover this functional dependency on the grounds of training data examples so that the agent can predict the value of the output attribute given the values of input attributes of a new instance.

More generally, where x1,,xn are values of input attributes and y an output attribute value, there is a function f such that y=fx1,,xn . The goal of the learning process is to discover a function h that approximates the function f as close as possible. The function h is called a hypothesis. The learner creates hypotheses on the grounds of training data (input-output values) provided by the supervisor.

Correctness of the hypothesis is verified by using a set of test examples given their input attributes. The hypothesis is plausible if the learner predicts the values of the output attribute with a maximum accuracy. 8 Since we decided to apply this method to learning concepts, we have to adjust the method a bit. First, instead of input/output attributes, we deal with concepts, that is closed constructions. The role of input 'attributes' is played by the constituents of a hypothetic molecular concept and instead of the output attribute we deal with the simple atomic concept that the learner aims to refine.

The hypothetic function is that of a requisite. Training data are natural-language texts. The supervisor extracts from the text data positive and negative examples. For instance, let the 'output' concept to be learned be that of a cat, i.e. 0Cat. The role of positive examples is played by particular descriptions of the property of being a cat like "Cat is a predatory mammal that has been domesticated". The learner establishes a hypothesis that the property:

λwλtλxP0redatoryM0ammal  wtxDomesticatedwtx0,

belongs to the essence of the property Cat. Negative examples delineate the hypothesis from other similar objects. For instance, the sentence "Dog is a domesticated predatory mammal that barks" can serve as a negative example for Cat. This triggers a specialization of the hypothetic concept to the construction:

λwλtλxP0redatoryM0ammal  wtxDomesticatedwtx0¬Barkwtx0.

Hence, given a positive example, the learner refines the hypothetic molecular concept by adding other concepts to the essence, while a negative example triggers specialization of the hypotheses. The hypothetic concept can be also generalized. For instance, the learner can obtain the sentence "Cat is a wild feline predatory mammal" as another positive example describing the property Cat. Since the properties Wild and Domesticated are inconsistent, the agent consults his/her ontology for a more general concept. If there is none, the 'union' of the properties, Wild or Domesticated, is included. As a result, the learner obtains this hypothesis:

λwλtλxF0elineP0redatoryM0ammal   wtxDomesticatedwtx0Wildwtx0¬Barkwtx0.

Remark. Both Feline and Predatory are property modifiers of type ((oι)τω(oι)τω)), i.e. functions that given an input property return another property as an output. Since these two modifiers are intersective, the rules of left- and right-subsectivity are applicable here.9 In other words, predatory mammal is a predator and is a mammal, similarly for feline.

If our agent has these pieces of information in their knowledge base, the above Composition FelinePredatoryMammal000wtx can be further refined to Feline'wtx0Predatory'wtx0Mammalwtx0, where Feline' and Predatory' are properties of individuals, i.e. objects of type (oι)τω.

Both generalization, specialization and conjunctive extension are methods of refining a hypothetic concept, the methods that we are going to describe in the next section.

3.1 Refining Hypothesis Space

In our method we try to find the description of all plausible hypotheses that are consistent with the training data and are derivable from the provided examples.10 To this end we assume that there is no noise in the training data [14]. In other words, the examples supplied to the learner are adequate for the prediction of the refined concept.

Obviously, a learner can usually examine just a small finite training set of examples instead of a possibly infinite set of sample concepts. Hence, inductive learning is applied to obtain a hypothetic concept.11 In the process of inductive learning, the relation 'more general' defined on the set of hypotheses is used. This relation is defined as follows. Let h1, h2 be hypothetic concepts defined on an input domain X. Then h1 is more general then h2, in symbols 'h2h1', iff:

xXh2x=1h1x=1.

Note. By (hi(x) = 1) we mean that an object x falls under the concept hi in a given state of affairs. Hence, this simplified notation can be read as "all objects x that fall under the concept h2 fall also under the more general concept h1". The subset of hypotheses obtained by inductive learning which is consistent with the training set of examples is called version-space.

3.2 Algorithm Framework

All machine learning algorithms, no matter into which family they belong, can be characterized by common categories which form a framework [11]. The algorithms are characterized by task goals, training data, data representation, and a set of operators which manipulate with data representation. In our machine learning algorithm, the framework can be briefly described as follows.

Objective Goal. As mentioned above, the goal of an agent is to discover the best refinement of the learned simple concept of an object O, i.e. a molecular closed construction that produces the same object. Moreover, this molecular concept should specify as much as possible of the requisites of the object O so that it also excludes other similar concepts.

Training data. An agent works with positive and negative examples that are sentences extracted by a supervisor from a textual base. Positive examples contain concepts of requisites specifying the learned simple concept, while negative examples specify properties that do not belong to the essence of the intension provided by the concept.

Data Representation. The agents must have an internal formal representation of data obtained by examples. Plausible hypotheses are then formulated in terms of this representation. Our formalism is that of Transparent Intensional Logic so that the sentences are analysed in terms of TIL constructions.

Knowledge Modifying Module. The learning algorithm is biased in favor of a preferred hypothesis. By using proper preferences, we reduce the hypothesis space. In version-space learning the bias is called a restriction bias, because the bias is obtained by restricting the allowable hypotheses. The agent uses a set of operations to modify the hypothesis during a heuristic search in the hypothesis space. The three main operations to modify a hypothetic concept are generalization, specialization and refinement. There are two possibilities how to obtain a proper hypothesis. The first one is based on using merely positive examples. In this case we need to be sure that the examples cover well the positive cases; in other words, we need examples containing all and only requisites of the learned concept.

The second way that we vote for is using both positive and negative examples. By applying specialization based on negative examples we exclude too general hypotheses.

4 Inductive Heuristics

For our purpose we voted for an adjusted version of Patrick Winston algorithm [18] of supervised machine learning. This algorithm applies the principles of generalization and specialization to obtain a plausible hypothesis, i.e. the functional dependency between input and output attributes. In our case the main principle is the method of refining the output simple concept. Hence, instead of a functional dependency between input and output attributes, we are looking for molecular concepts refining the output simple concept; constituents of the molecular concept are related to the output concept by the requisite relation. Winston algorithm assumes that examples differ from the model just in one attribute while in our case we develop the molecular concept by adding new constituents contained in example sentences describing or rather refining the output concept. Hence our algorithm does not compare a model with examples; rather, it compares the hypothetic concept with information in sample sentences.

As stated above, our main method is refinement of a concept, i.e. a hypothetic molecular construction. Based on positive examples we extend the collection of requisites by adding missing concepts in a conjunctive way. As a special case, generalization can be applied. Based on agents' ontologies, generalization usually concerns replacing one or more constituents of the hypothetic concept by a more general one.

Specialization is triggered by negative examples. As a result, negation of a property that does not belong to the essence of the hypothetic concept is inserted. Specialization serves to distinguish the output concept from similar ones. For instance, a wooden horse can serve as a negative example to the concept of horse, because a wooden horse is not a horse; rather, it is a toy horse though it may look like a genuine living horse.

Heuristic methods of the original Winston algorithm work with examples that cover all the attributes of a learned object. Based on positive examples the hypothesis is modified in such a way that the values of attributes are adjusted, or in case of a negative example an unwanted attribute marked as Must-not-be is inserted.

In our application the sentences that mention the learned concept contain as constituents some but not all the requisites of this concept, and we build up a new molecular concept by adding new information extracted from positive or negative examples. Hence, we had to implement a new heuristic Concept-introduction for inserting concepts of new requisites into a hypothetic concept. Negative examples trigger the method Forbid-link that inserts a concept of negated property into the hypothesis. Generalization is realized by modules that introduce a concept of a more general property; to this end we also adjusted the original heuristic Close-interval so that it is possible to generalize values of numeric concepts by the union of interval values from an example and model. 12 Here is a brief specification of the algorithm.

Refinement

  • 1. Compare the model hypothesis (to be refined) and the positive example to find a significant difference

  • 2. If there is a significant difference, then

    • a) if the positive example contains as its constituent a concept that the model does not have, use the Concept-introduction

    • b) else ignore example

Specialization

  • 1. Compare the model hypothesis (to be refined) and the near-miss example to find a significant difference

  • 2. If there is a significant difference, then

    • c) if the near-miss example has a constituent of the concept that the model does not have, use the Forbid-link

    • d) else ignore example

Generalization

  • 1. Compare the model hypothesis (to be refined) and the positive example to determine a difference

  • 2. For each difference do

    • a) if a concept in the model points at a value that differs from the value in the example, then

      • i) if the properties in which the model and example differ have the most specific general property, use the Climb-tree

      • ii) else use Union-set

    • b) if the model and example differ at an attribute numerical value or interval, use the Close-interval

    • c) else ignore example.

5 Example of Learning the Concept of Myopia

As a sample example we now introduce the process of learning refinements of the simple concept of myopia, i.e. Myopia0, by extracting information from natural language sentences describing the property of having myopia. As always, first types Myopia / oιτωSharp, Blur, Disorder, Eye-Nerve, Eye-Lenses / oιτωEye-Focus, Damaged, Inflexible /oιτωoιτωClose, Distant, Looking-at/oιιτωx,yιReq / (o(oι)τω(oι)τω) ∃/(o(oι)).

Positive examples trigger the heuristic Concept-introduction that inserts a concept of a new requisite into the concept learned so far:

  • 1. In myopia, close objects look sharp:

R0eqλwλtλx0λyLooking-atwtxy0Closewtxy0Sharpwty0 M0yopia  .

  • 2. In myopia, distant objects appear blurred:

R0eqλwλtλx0λyLooking-atwtxy0Distantwtxy0Blurwty0 M0yopia  .

  • 3. Myopia is an eye focusing disorder:

R0eqE0ye-FocusD0isorder  M0yopia  .

Negative examples trigger the heuristic Forbid-link that inserts a negative information to the concept learned so far:

  • 1. Cause of myopia is not a damaged eye-nerve:

¬C0auseD0amagedE0ye-Nerve  M0yopia  .

  • 2. Cause of myopia is not inflexible eye lenses:

¬C0auseI0nflexibleE0ye-Nerve  M0yopia  .

Cause/(o(oι)τω(oι)τω): relation between properties; for the sake of simplicity we analyse 'cause of something' just as such a relation though we are aware of the fact that the problem of the semantics of causal relations is much more complicated. Yet such a simplification is harmless in our example.

Simulation of the Algorithm Execution

The algorithm creates a molecular concept that will serve as an explication of the learned simple concept step by step. Each positive/negative example yields conjunctive/disjunctive or negative insertion of new constituents into the hypothesis learned so far. The execution of our algorithm begins with a first chosen positive example.

The construction encoded by this sentence (see above) becomes an initial hypothesis model:

"In myopia, close objects look sharp."

The second positive example:

"In myopia distant objects appear blur."

refines the model by Concept-introduction. This heuristic module inserts a new concept into the hypothetic model in the conjunctive way. As a result, we have a hypothetic model "In myopia, close objects look sharp and distant objects look blur':

R0eqλwλtλx0λyLooking-atwtxy0Closewtxy0Sharpwty0 M0yopia  ,.

R0eqλwλtλx0λyLooking-atwtxy0Distantwtxy0Blurwty0 M0yopia  .

The last positive example:

"It is an eye focusing disorder."

also refines the model by Concept-introduction:

R0eqλwλtλx0λyLooking-atwtxy0Closewtxy0Sharpwty0 M0yopia  ,.

R0eqλwλtλx0λyLooking-atwtxy0Distantwtxy0Blurwty0 M0yopia  ,

R0eqE0ye-FocusD0isorder  M0yopia  .

The first negative example "The cause of myopia is not a damaged eye nerve" triggers specialization of the hypothesis. As a result, we insert negative information about myopia:

R0eqλwλtλx0λyLooking-atwtxy0 M0yopia  ,.

R0eqE0ye-FocusD0isorder  M0yopia  .

¬0C0auseD0amagedE0ye-Nerve  M0yopia  .

The second negative example "Myopia is not caused by inflexible eye lenses" also specializes the concept. The resulting molecular concept is:

R0eqλwλtλx0λyLooking-atwtxy0Closewtxy0Sharpwty0 M0yopia  ,

R0eqλwλtλx0λyLooking-atwtxy0Distantwtxy0Blurwty0 M0yopia  ,

R0eqE0ye-FocusD0isorder  M0yopia  ,

¬0C0auseD0amagedE0ye-Nerve  M0yopia  ,

¬0C0auseI0nflexibleE0ye-Lenses  M0yopia  .

In this example we still did not deal with generalization. Generalization triggers one of the three heuristics, namely Climb-tree, Set-union or Close-interval. These heuristic modules adjust the hypothetic concept in the following way.

The Climb-tree heuristic module replaces two or more constituents of the concept learned so far by a more general constituent. To this end an agent must have a tree model of organizing concepts in agent's ontology. The concepts in the model are ordered with respect to the requisite relation. For instance, consider the concepts of the properties of having eye correction, wearing dioptric glasses, having contact lenses. Necessarily, if an individual x happens to have an eye correction then x wears dioptric glasses or contact lenses, or perhaps has some other eye correction.

Hence, the property of having eye correction is a requisite of both the properties of wearing dioptric glasses and having contact lenses. Let Correction/(oι)τω be the property of having eye correction, Glasses/(oι)τω the property of wearing dioptric glasses, Lenses/(oι)τω the property of having contact lenses.

Then the concept Correction0 is more general then Glasses0 and more general than Lenses0, i.e. the property Correction is a requisite of the properties Glasses and Lenses.

Hence, assume that a hypothetic model learned so far has a requisite constituent13:

R0eqG0lassesM0yopia .

Since this construction is too specific to properly explicate Myopia (having myopia, one can have contact lenses or have undergone eye-surgery), assume that the new positive example provided by the supervisor is:

"If somebody has myopia then

he/she has contact lenses".

It means that the property Lenses0 should be also inserted as a requisite constituent. However, the model would be too specific. If somebody has myopia, he/she might have another eye correction.

Therefore, we have to generalize, which is the task that the algorithm performs using agent's ontology. As a result, the new generalized model is adjusted so that the requisite conjunct R0eqG0lassesM0yopia . is replaced by the more general conjunct:

R0eqC0orrectionM0yopia .

The Set-union heuristic is applied in case we need to generalize several concepts C1, C2, C3, ... of properties but there is no most common general concept in agent's ontology. Generalization is achieved by inserting the respective concepts in the disjunctive way. For instance, if an agent has a hypothesis that the symptom of myopia is a headache Symptom0Headache0Myopia0.; and by a new positive example the agent learns that the symptom of myopia is an eye-ache, the new version of hypothesis will have a constituent that the symptoms of myopia are Headache or Eye- ache.

S0ymptomλwλtλxHeadachewtx0Eye-achewtx0M0yopia  .

Types. Symptom/(o(oι)τω(oι)τω): the relation between properties P, Q such that typically, if x has the property Q then x has also P; Headache, Eye-ache/(oι)τω14

The Close-interval heuristic module deals with attributes that have numerical values. For instance, in Wikipedia we can read.

The degree of myopia is described in terms of the power of the ideal correction, which is measured in dioptres:

  • ― Low myopia usually describes myopia of -3.00 dioptres or less (i.e. closer to 0.00).

  • ― Moderate myopia usually describes myopia between -3.00 and -6.00 dioptres.

  • ― High myopia usually describes myopia of -6.00 or more.

To simplify a bit, we just assume that the requisite part of the molecular concept has been inductively enriched by these constituents. Let PoC/(τι)τω be the attribute that associates an individual with a number that is the power of correction (measured in dioptres).

Remark. Attributes that have numerical values associate an individual with a number. Yet, the situation is a bit more complicated. We also need information on the unit in which this number has been obtained. In our example, the agents should know that the power of correction is measured in dioptres. Since the issue concerning physical, medical and other units has still not been properly dealt with in TIL, our agents simply keep these pieces of information in their ontology. Hence, when an agent learns that among the requisites of myopia there is the property of having power of correction equal to -5, the agent associates this number with dioptres.

The additional constituents of our hypothetic concept are these.

R0eqλwλtλx<0-03PoCwtx0  <0PoCwtx000 M0yopia  ,

R0eqλwλtλx<0-06PoCwtx0  <0PoCwtx0-03  M0yopia  ,

R0eqλwλtλx<0PoCwtx0-06  M0yopia  .

By applying generalization, i.e. the Close-interval heuristic, we obtain a generalized constituent, namely that the power of correction is negative:

R0eqλwλtλx<0PoCwtx000 M0yopia  .

The resulting explication of 'myopia' that our agent has learned is this:

R0eqλwλtλx0λyLooking-atwtxy0Closewtxy0Sharpwty0 M0yopia  ,

R0eqλwλtλx0λyLooking-atwtxy0Distantwtxy0Blurwty0 M0yopia  ,

R0eqE0ye-FocusD0isorder  M0yopia  ,

¬0C0auseD0amagedE0ye-Nerve  M0yopia  ,

¬0C0auseI0nflexibleE0ye-Lenses  M0yopia  ,

R0eqC0orrectionM0yopia ,

S0ymptomλwλtλxHeadachewtx0Eye-achewtx0M0yopia  ,

R0eqλwλtλx<0PoCwtx000 M0yopia  .

Obviously, our agent has learned a lot about myopia. Yet, we hesitate to claim that the agent learned a definition of the property myopia. The example we presented here is an idealized one.

In practice much more difficulties can crop up. The supervisor could have classified particular constituents improperly, for instance by assigning them the role of a requisite where it might have been just a typical property. Or, the supervisor might have confused symptoms and causes of myopia. Last but not least, pieces of information extracted from the text data might have come from an unreliable source. Anyway, we can conclude that the agent discovered a useful explication of the simple concept of myopia.

6 Conclusion

In this paper we introduced basic principles of supervised machine learning, namely the method of refining hypothesis by means of positive and negative examples.

The process of refinement of a given hypothesis triggered by positive and near-miss examples together with heuristic functions that modify the hypothesis has been described and illustrated by examples. We applied an adjusted version of Patrick Winston's data driven algorithm for machine learning.

The area under scrutiny has been agents' learning simple concepts by their refining. In other words, our agents learn new concepts by discovering compound concepts that explicate a given simple concept. The method itself has been illustrated by the example of agent's learning the simple concept of myopia. Our data have been formalized by means of the TIL tools, namely constructions and types produced by the NLA algorithm [12].

The proposed machine learning method heavily relies on the role of a supervisor. For a success in learning it is important that the supervisor extracts from a given text those sentences that mention the concept in a way plausible for learning. Moreover, there should not by any noise in these input data, and the supervisor should properly classify these sentences into positive and negative examples and properly recognise those properties that are requisites. Hence, we assume that the role of a supervisor is played by an experienced linguist. As a future research, we intend to extend the functionalities of the algorithm so that it will cover also the extraction of sample sentences where the output learned concept receives mention.

Though there is no substitute for a supervisor in a supervised machine learning method, its role can be at least partly played by the algorithm so that the manual work of a linguist is reduced to a minimum.

Our next goal is to improve the method so that the agents would learn synonymous terms referring to the same concept and distinguish them from merely equivalent ones. This is important for properly dealing with hyperintensional attitudes of knowing, believing, designing, calculating, solving, etc.

These attitudinal verbs are part and parcel of our everyday vernacular so that their proper analysis and logic should not be missing from any automatized multiagent system. And since these attitudinal verbs establish hyperintensional contexts where the substitution of merely equivalent terms fails, the agents need to know the synonyms of the learned concepts as well.

Acknowledgements

This research has been supported by the Grant Agency of the Czech Republic, project No. GA18-23891S, "Hyperintensional Reasoning over Natural Language Texts" and also by the internal grant agency of VSB-Technical University Ostrava, project No. SP2019/40, "Application of Formal Methods in Knowledge Modelling and Software Engineering II". Versions of this paper were presented at the 20th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2019, France.

References

1. Carnap, R. (1947). Meaning and necessity. Chicago: Chicago University Press. [ Links ]

2. Číhalová, M., Duží, M., & Menšík, M. (2014). Logical Specification of Processes. Frontiers in Artificial Intelligence and Applications, Vol. 260, pp. 45-63. [ Links ]

3. Duží, M. (2012). Extensional logic of hyperintensions. Lecture Notes in Computer Science, Vol. 7260, pp. 268-290. DOI: 10.1007/978-3-642-28279-9-19. [ Links ]

4. Duží, M. (2014). Communication in a Multi-Cultural World. Organon F ., Vol. 21, No. 2, pp. 198-218. [ Links ]

5. Duží, M. (2017). Property modifiers and intensional essentialism. Computación y Sistemas, Vol. 21, No. 4, 2017, pp. 601-613. DOI: 10.13053/CyS-21-4- 2811. [ Links ]

6. Duží, M. (2018). Logic of Dynamic Discourse; Anaphora Resolution. Frontiers in Artificial Intelligence and Applications, Vol. 301, pp. 263-279. DOI 10.3233/978-1-61499-834-1-263. [ Links ]

7. Duží, M. (2019). If structured propositions are logical procedures then how are procedures individuated? Synthese special issue on the Unity of propositions, Vol. 196, No. 4, pp. 1249-1283. DOI: 10.1007/s11229-017-1595-5. [ Links ]

8. Duží, M., Jespersen, B., & Materna, P. (2010). Procedural Semantics for Hyperintensional Logic. Foundations and Applications of Transparent Intensional Logic. [ Links ]

9. Francez, N. (2015). Proof-theoretic Semantics. Studies in Logic 57, College Publications. [ Links ]

10. Kovář, V., Baisa, V., & Jakubíček, M. (2016). Sketch Engine for Bilingual Lexicography. International Journal of Lexicography, Vol. 29, No. 3, pp. 339-352. [ Links ]

11. Luger, G. F. (2009). Artificial intelligence: structures and strategies for complex problem solving. [ Links ]

12. Medved', M., Šulganová, T., & Horák, A. (2017). Multilinguality Adaptations of Natural Language Logical Analyzer. Proceedings of the 11th Workshop on Recent Advances in Slavonic Natural Language (RASLAN'17), pp. 51-58. [ Links ]

13. Menšík, M., Duží, M., Albert, A., & Patschka, V., Pajr, M. (2020). Machine learning using TIL. To appear in Frontiers in Artificial Intelligence and Applications. [ Links ]

14. Mitchell, T.M. (1997). Machine Learning. McGraw- Hill. [ Links ]

15. Poole, D.L. & Mackworth, A.K. (2010). Artificial Intelligence: Foundations of Computational Agents. [ Links ]

16. Russell, S.J. & Norvig, P. (2014). Artificial Intelligence: a Modern Approach. Pearson Education. [ Links ]

17. Tichý, P. (1988). The Foundations of Frege's Logic. [ Links ]

18. Winston, P.H. (1992). Artificial Intelligence. Addison-Wesley Pub. Co. [ Links ]

1As an example, consider the simple concept of a planet. Which property falls under this concept? For sure, it is a property of individuals such that being a celestial body is a requisite of the property. Necessarily, if any individual x happens to be a planet, then x is a celestial body. However, which are the other requisites? One of the results of IAU 2006 General Assembly in Prague was the resolution 5A on 'Definition of Planet'. A 'planet' was defined as a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the neighbourhood around its orbit. Is this definition a refinement of the original concept of a planet? No, it is just an explication. As a secondary result, it also decided that the modifier 'dwarf' is privative with respect to 'planet' (see [5]); a dwarf planet like Pluto is not a planet. For details see, IAU International Astronomical Union 0603 Press Release: https://www.iau.org/news/pressreleases/detail/iau0603/

2For details, see [10], [12].

3See, for instance, [9].

4For details see [8, §4.1]

5If such a sentence occurs in a broader discourse, its meaning can be completed by anaphoric references as well. For instance, in "John is a student, he is smart" the meanings are not pragmatically incomplete, because the individual can be completed by anaphoric references as well. For instance, in "John is a student, he is smart" the meanings are not pragmatically incomplete, because the individual John is substituted for the anaphoric variable he. For details on resolving anaphoric references in TIL, see [6].

6For details, see [7].

7For the sake of simplicity, here we again use infix notation without Trivialization for the application of the binary relations >, < and the identity = between numbers.

8For details, see [14,15,16].

9For details and analysis of other kinds of modifiers, see [5].

10Hypothesis is consistent with the training data, i.e. the set S of examples, if the value predicted by the hypothesis is the value of output attribute of all examples belonging to S.

11For details on and definition of inductive learning see, e.g., [14, §2.2.2, p. 23].

12For the sake of simplicity, we did not change the original names of particular modules though we do not work with 'links' between objects and attribute values any more. The heuristics Require-link and Drop-link from the original algorithm have not been used in our adjusted version.

13We assume that the supervisor would not stop learning process at this stage, of course, because this hypothesis is too specific and calls for generalizing.

14We don't deal with the problem of defining the simple concept 0Symptom. While requisite is a necessary relation between properties, symptom is just a typical relation between properties.

Received: January 18, 2019; Accepted: February 21, 2019

* Corresponding author is Marek Menšík. marek.mensik@vsb.cz.

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License