SciELO - Scientific Electronic Library Online

 
vol.24 issue1Fractal Analysis for Classification of Electrical Testing of Polymer High Voltage InsulatorsA Multipurpose In Situ Adenocarcinoma Simulation Model with Cellular Automata and Parallel Processing author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.24 n.1 Ciudad de México Jan./Mar. 2020  Epub Sep 27, 2021

https://doi.org/10.13053/cys-24-1-2940 

Articles

Presburger Constraints in Trees

Everardo Bárcenas1  * 

Edgard Benítez-Guerrero2 

Jesús Lavalle3 

Guillermo Molero-Castillo1 

1 Universidad Nacional Autónoma de México, México. ebarcenas@unam.mx, gmoleroca@fi-b.unam.mx

2 Universidad Veracruzana, México. edbenitez@uv.mx

3 Benemérita Universidad Autónoma de Puebla, México. jlavalle@cs.buap.mx


Abstract

The fully enriched μ-calculus is an expressive propositional modal logic with least and greatest fixed-points, nominals, inverse programs and graded modalities. Several fragments of this logic are known to be decidable in EXPTIME. However, the full logic is undecidable. Nevertheless, it has been recently shown that the fully enriched μ-calculus is decidable in EXPTIME when its models are finite trees. In the present work, we study the fully-enriched μ-calculus for trees extended with Presburger constraints. These constraints generalize graded modalities by restricting the number of children nodes with respect to Presburger arithmetic expressions. We show that this extension is decidable in EXPTIME. In addition, we also identify decidable extensions of regular tree languages (XML schemas) with interleaving and counting operators. This is achieved by a linear characterization in terms of the logic. Regular path queries (XPath) with Presburger constraints on children paths are also characterized. These results imply new optimal reasoning (emptiness, containment, equivalence) bounds on counting extensions of XPath queries and XML schemas.

Keywords: Presburger arithmetic; modal logics; automated reasoning; XPath; regular languages; interleaving

1 Introduction

The μ-calculus is an extension of the propositional modal logic with least and greatest fixed-points. This logic subsumes many temporal, modal and description logics (DLs), such as the Propositional Dynamic Logic (PDL) and the Computation Tree Logic (CTL) [10,9]. Due to its expressive power and nice computational properties, the μ-calculus has been extensively used in many areas of computer science, such as program verification, concurrent systems and knowledge representation. In this last domain, the μ-calculus has been particularly useful in the identification of expressive and computationally well-behaved DLs [10], which are used as the web ontology language OWL, now a standard technology for the W3C. Another standard for the W3C is the XPath query language for XML.

XPath also takes an important role in many XML technologies, such as XProc, XSLT and XQuery. Due to its capability to express recursive and multi-directional navigation, the μ-calculus has also been successfully used as a framework for the evaluation and reasoning of XPath queries [5,4,12]. However, extending (Presburger) arithmetical constraints for XPath queries leads to undecidablity [36]. In the current paper, we identify a decidable extension, vía a logic characterization, of XPath queries with Presburger arithmetical constraints on children paths.

The μ-calculus is as expressive as the monadic second order logic (MSO) [12], it has thus been successfully used in the XML setting in the description of schema languages [4], which can be seen as the tree version of regular expressions. Analogously as regular expressions are interpreted as sets of strings, XML schemas (regular tree expressions) are interpreted as sets of unranked trees (XML documents). For example, expression p(q*) represents the sets of trees rooted at p with either none, one or more children subtrees matching q. See Figure 1(a) for an interpretation of p(q*). Counting operators impose occurrence bounds on children [28]. For instance, p(q [3,5])) denotes the trees rooted at p, with at least 3 but no more than 5 children matching q. In Figure 1(b), it is depicted an interpretation of p(q [3,5])). In the present work, we proposea new EXPTIME decision algorithm for regular tree languages with counting operators. Furthermore, we also identify a new extension of regular tree languages with the interleaving operator [27]. This extension has also an EXPTIME upper bound. This operator is used to denote the trees resulting from the permutation of siblings, and it has also applications in algebraic approaches of concurrent computation models. Also, in XML schema languages there are interleaving operators, such as in RelaxNG.

Fig. 1 Tree expressions 

1.1 Related Work

The extension of the μ-calculus with nominals, inverse programs and graded modalities is known as the fully enriched μ-calculus [10]. Nominals are intuitively interpreted as singleton sets, inverse programs are used to express past properties (backward navigation along accesability relations), and graded modalities express numerical constraints on the number immediate successor nodes [10]. However, satisfiability/validity of the fully enriched μ-calculus was proven by Bonattiand Peron to be undecidable [11]. Nevertheless, Barcenas et al. [4] recently showed that the fully enriched μ-calculus is decidable in single exponential time when its interpretations are restricted to finite trees. Graded modalities in the context of trees are used to constrain the number of children nodes with respect to natural numbers. In this work, we introduce a generalization of graded modalities.

This generalization considers numerical bounds on children with respect to Presburger arithmetical expressions, as for instance ϕ>ψ, which restricts the number of children where ϕ holds to be strictly greater than the number of children where ψ is true. Other works have previously considered Presburger constraints on tree logics. MSO with Presburger constraints was shown to be undecidable by Seidl et al. [31]. Demri and Lugiez proved a PSPACE-complete bound on the propositional modal logic with Presburger constraints [16]. A tree logic with a fixed-point and Presburger constraints was shown to be decidable in EXPTIME by Seidl et al. [33,32]. In the current work, we push decidability further by allowing nominals and inverse programs in addition to Presburger constraints.

Regarding expressiveness, in [17] is proposed an extension of MSO with counting on unranked trees, together with the corresponding weighted automata. Other forms of counting considering more extensive tree regions, such as ancestors or descendants, have been studied in [37,2,8,39,5,34,1,24], however in contrast with the current work, these works consider node occurrence constraints with respect to natural numbers only. An extension of first order logic with two variables FO2 with counting quantifiers interpreted over two forests of ranked trees was recently proposed in [13,14]. The counting quantifiers consist of existential #k#,,= restrictions with respect to a constant k. This logic was shown to be in NEXPTIME.

As in this article, Fischer-Ladner satisfiability algorithms for counting extensions of the μ-calculus for trees are introduced in [4,5].

In these works, it was also showed that these counting constraints with respect to numbers only, although exponentially more succinct, does not introduce extra-expressive power. This allowed to use the traditional Fixed-Point Theorem for the μ-calculus to prove the algorithm correctness. Contrastingly, in the current work we propose a counting extension of the μ-calculus with full Presburger arithmetic, which clearly implies more expressive power. In order to prove the algorithm correctness, we prove a generalization of the Fixed-Point Theorem for the Presburger extension of the μ-calculus for trees, which is also technical result of independent interest.

Complexity and succinctness of regular tree (string) languages extended with counting and interleaving operators have been extensively studied in [28,27,19,4,15]. In [19], Gelade shows that the interleaving operator is exponentially more succinct, even when it is directly encoded by tree automata. It is also shown that hardcoding counting operators produces doubly exponential larger expressions. Meyer and Stockmeyer showed in [28] that the equivalence of regular expressions with counting operators is EXPSPACE-complete. EXPSPACE completeness of the equivalence of regular expressions with interleaving was proven in [27].

In [4], it is described and extension of regular trees expressions with counting and interleaving, where emptiness, containment and equivalence are decidable in EXPTIME. In the current work, we identify further extensions of interleaving occurring in recursive and disjunctive fragments. Emptiness, containment and equivalence are also proved to be EXPTIME.

Regarding counting, operators introduced in [4], although impose occurrence bounds on children, do not restrict the consecutive occurrence of children. This contrasts with traditional semantics of counting in regular languages [28,19]. In the current work, by means of Presburger constraints, we identify an extension of regular tree expressions with traditional counting operators decidable in EXPTIME. In [15], it is introduced a polynomial algorithm for the containment of regular trees with both counting and interleaving.

The algorithm assumes two main restrictions on tree expressions: propositions occur only once, and counting can be applied to propositions only. The EXPTIME algorithm proposed in the current work overcomes these two limitations.

Counting operators on regular paths (XPath) have been studied before in [36,4,5]. ten Cate and Marx [36] showed that Presburger constraints on full regular paths lead to an undecidable formalism.

In [4], it was shown that when restricting only children with respect to a natural number (encoded in binary), reasoning (emptiness, containment and equivalence) is in EXPTIME. This result was extended to operators capable of constraining (w.r.t. a binary number) any regular path, including ancestors, descendants, compositions, etc. [5]. In this paper, we show that full Presbuger arithmetic becomes decidable on children paths. Furthermore, we set new optimal EXPTIME bounds for containment and equivalence on full (multi-directional) regular paths with Presburger constraints on children.

1.2 Outline

We introduce a modal logic for trees with fixed-points, inverse programs, and Presburger constraints in Section 2.

In Section 3, an EXPTIME satisfiability algorithm is described and proved correct. Also, it is shown that the computational cost of the algorithm is single exponential time with respect to the size of the input formula, even if the Presburger constraints are encoded in binary form.

In Section 4, we introduce extensions of regular tree languages with counting and interleaving operators.

In Section 5, regular path queries extended with Presburger constraints on children are introduced. Both extensions of regular trees and paths are shown to be succinctly captured in terms of logic formulas. This implies the satisfiability algorithm can be used as an optimal reasoning framework for regular trees and paths with counting and interleaving.

A summary together with a discussion of further research perspectives are reported in Section 6.

2 Tree Logic with Recursion, Inverse, and Presburger Constraints

In this section, we introduce an expressive modal logic for finite unranked tree models. The tree logic (TL) is equipped with operators for recursion (μ), inverse programs (I), and Presburger constraints (C). This logic can be seen as the fully enriched μ-calculus [10], extended with Presburger constraints, interpreted over tree structures.

2.1 Syntax and Semantics

We first consider a set of propositions P, a finite set of modalities M={↓,→,↑,←}, and a set of variables X.

Definition 1 (μTLIC syntax). The set of μTLIC formulas is defined by the following grammar:

ϕ:=px¬ϕϕϕmϕμx.ϕγ>b,γ:=aϕγ+γ,

where pP,xX,mM,bN,kN\0,1,aZ\0. Numbers a, k and b are asumed to be encoded in binary. We consider the following assumptions about variable occurrences: as usual, in order to ensure the existence of fixed-points, we assume variables occurs positively, that is, under the scope of an even number of negations [38]; also, variables are guarded, that is, variables occur bounded (where μ is the only binding operator) and under the scope of a modal formula mϕ, or a counting formula (γ > b) [38]; and in order to make the least and greatest fixed points coincide, we assume variables are cycle-free, that is, variables does not occur under the scope of a modality and its converse [20].

In order to provide a formal semantics, we need some preliminaries. A tree structure T is a tuple P,N,R,L, where:

  • P is a set of propositions;

  • — the set of nodes N is defined as a complete prefix-closed non-empty finite set of words over the natural numbers ℕ, that is, N is a finite set of words NN,* such that if niN, where nN* and iN, then also nN; ;

  • R:N×M×N is a transition relation, written nRn',m, such that for all ni,ni+1N where iN,nRni,,niRn,,ni+1Rni, and niRni+1,, we say n is the parent of ni, hence ni is the child of n, ni+1 is a following (right) sibling of ni and hence ni is a previous (left) sibling of ni+1 and

  • L:N×P is a left-total labeling relation, written pLn.

In the setting of XML documents (unranked trees), node labels are defined by a function instead of a relation, that is, exactly one proposition holds at each node. The satisfiability algorithm described in Section 3 can easily be adapted for this restriction (see Definition 6).

Given a tree structure, a valuation V of variables is defined as a function from the set variables X to a set of nodes V:X2N. For nodes N'N, we write VN'x to denote the valuation V', such that V'x=N' and V'x'=Vx for x'x.

Definition 2 (μTLIC semantics). Given a tree structure T and a valuation V, μTLIC formulas are interpreted as follows:

pVT=npLn,

xVT=Vx,

¬ϕVT=N\ϕVT,

ϕψVT=ϕVTψVT,

mϕVT=nRn,mϕVT,

μx.ϕVT=N'NϕVN'xTN',

γ>bVT=nγVTn>k,

aϕVTn=a×Rn,ϕVT,

γ1+γ2VTn=γ1VTn+γ2VTn.

We say a tree T satisfies, or is a model of, a formula ϕ, if and only if, the interpretation of ϕ under T and any valuation V is not empty, that is, ϕVT. A formula is valid, if and only if, it is satisfied by every tree. It is easy to see a formula ϕ is valid, if and only if, ¬ϕ is unsatisfiable.

The satisfiability problem for μTLIC consists in deciding whether or not a given formula is satisfiable.

We now give an intuition about the interpretation of formulas: propositions p are node labels; negation and disjunction are interpreted as the complement and the union of sets, respectively; modal formulas mϕ are true in nodes, such that ϕ holds in at least one accessible node through adjacency m, which may be either ↓, →, ↑ or ←, which in turn are interpreted as the children, right sibling, parent and left siblings relation, respectively; μx.ϕ is interpreted as the least fixed-point; a counting formula γ > b hold in a node, if and only if, the number of its children where γ true satisfy the corresponding constraint > b, respectively, for instance, p1 + (-2)p2 > 0 holds in nodes where the number of children where p1 holds is greater than twice the number of children where p2 is true.

We also use the following traditional notation:

:=p¬p,:=¬,ϕψ:=¬¬ϕ¬ψ,mϕ:=¬m¬ϕ,γb:=¬γb.

Note that is true in every node, hence in none, conjunction ϕ˄ψ holds whenever both ϕ and ψ are true, [m] ϕ holds in nodes where ϕ is true in each accessible node through m, and γ ≤ b is true in nodes where the corresponding children satisfy ≤ b. Other common counting operators can also be expressed, for instance, γ = b can be written instead of (γ ≤ b) ˄ (γ > b - 1), where b > 0. Below in further sections, we also write b1#γ#b2#>,,= instead of γ#b1γ#b2.

Examples

Consider for instance the following formula ψ:

pr2q,

where (r ≤ 2q) stands for 1r + (-2)q ≤ 0. ψ is true in nodes labeled by p, such that the number of its q children is at least twice the number of its r children. This is a common example that goes beyond the expressive power of (graded) μ-calculus and regular languages [16].

In Figure 2, there is a graphical representation of a model for ψ:

Fig. 2 A model for ψ:=(p˄[r≤2q]) 

It is also possible to express recursive navigation, for example, consider the following formula ϕ:

μx.ψx,

ϕ is true in nodes with at least one descendant where ψ is true, that is, ϕ recursively navigates along children until it finds a ψ node.

Backward navigation may also be expressible with the help of inverse programs (converse modalities). For instance, consider the following formula φ:

μx.ψx,

φ holds in nodes with an ancestor where ψ is true, that is, φ recursively navigates along parents until it finds a ψ node. Furthermore, in contrast with other approaches without converse modalities [16,32], this new feature allows to count also on sibling nodes. For instance:

p>10

holds in nodes with more than 10 siblings named p.

2.2 Other Forms of Counting

We now show how several notions of counting (nominals, graded modalities and global counting) can also be expressed in terms of μTLIC formulas.

Hybrid Logics

The interpretation of nominals is a singleton, that is, nominals are formulas which are true in exactly one node in the entire model [9]. Now, it is easy to see that μTLIC can navigate recursively thanks to the fixed-points, and in all directions thanks to inverse programs.

Hence, μTLIC can then express for a formula to be true in one node while being false in all other nodes of the model. Nominals are then defined as follows:

nomϕ=ϕSiblings¬ϕDescendants¬ϕ,Ancestors¬ϕSiblings¬ϕ,Descendants¬ϕ,

where formulas Siblings(ϕ), Ancestors(ϕ), and Descendants(ϕ) are true, if and only if, ϕ is true in all siblings, ancestors, and descendants, respectively. More precisely:

Navmϕ:=mμx.ϕmx¬m,Siblingsϕ:=NavϕNavϕ,Descendantsϕ:=NavϕSiblingsϕ,Ancestorsϕ:=Navϕ.

Graded Logics

In modal logics, graded modalities are specialized operators for expressing numerical bounds on the occurrence of a sole formula in adjacent nodes. In the context of tree models, the numerical bounds are on children nodes [4]. For instance, formula ,kϕ holds in nodes with at least k + 1 children where ϕ is true. More precisely, given a tree structure T and a valuation V, graded formulas are interpreted as follows:

,kϕVT=nn'n'Rn,ϕVT>k,,kϕVT=¬,k¬ϕVT,

where k is a positive integer number encoded in binary. ,kϕ formulas are true in nodes with all but at most k children satisfying ϕ. It is then easy to see that Presburger formulas can express graded modalities, more precisely, for any tree T and valuation V, we have that:

ϕ>kVT=,kϕVT.

Global Counting

Global numerical constraints, as its name suggest, are operators used to impose constraints on the occurrence of a sole formula with respect to a constant in the entire model [37,2,39,5].

That is, a formula ϕ>G k holds in the entire model, if and only if, ϕ is satisfied by at least k + 1 nodes. More precisely, the interpretation of global counting formulas with respect to a tree 𝒯 and a valuation V is the following:

ϕ>GkVT=NifVT>k;otherwise.

Note that the intended interpretation of ϕG k is the same as for formula ¬(ϕ>G k). In [5], it was shown that regular path queries (XPath) with numerical constraints on any path, for instance ancestors or descendants, can be succinctly expressed by global counting formulas. It was also shown in [5] that global numerical constraints does not provided extra expressive power by means of a reduction to the two-way μ-calculus (without counting). More precisely, for any binary tree T (for technical convenience, binary trees are used instead of n-ary trees, there is a well known bijection between them, see Figure 3 on page 7) and valuation V, we have the following:

ϕ>GkVT=μx.CkϕrxxVT,

where r stands for the root node ¬¬, and Ckϕ counts at least k + 1 occurrences of ϕ in descendant nodes. More precisely:

C0ϕ:=μx.ϕxx,C1ϕ:=μx.ϕC0ϕC0ϕ,¬ϕC0ϕC0ϕxx,Ciϕ:=μx.ϕCi-1ϕCi-1ϕ,k1+k2=i-2Ck1ϕCk2ϕ,¬ϕk1+k2=i-1Ck1ϕCk2ϕ,xx.

μTLIC can therefore also express global numerical constraints. However, it is not hard to see that hardcoding of global numerical constraints comes at an exponential cost [5].

3 Satisfiability

In the current section, we describe a satisfiability algorithm for the logic μTLIC. That is, given an input μTLIC formula, the algorithm decides whether or not there is a tree model satisfying the formula. The algorithm inspired from the well-known Fischer-Ladner approach [18] and the binary encoding of counting constraints introduced in [5,4]. Candidate trees are enumerated starting from the single nodes (leaves). Then, parents are iteratively added until a satisfying tree is found. The stop condition for this iterative process is given by the number of available nodes, which are defined as sets of subformulas (of the input formula). These subformulas represents the information required to build the trees: node names, tree topology, Presburger constraints. One notable distinction of our algorithm is that Presburger constraints are encoded in binary form.

Before describing the algorithm, we describe the notion of trees. Then we show that the algorithm is correct and in EXPTIME.

3.1 Fischer-Ladner-Presburger Trees

We first give a detailed description of the syntactic version of tree models constructed by the satisfiability algorithm.

Some preliminaries are now introduced. There is well-known bijection between binary and n-ary trees [22]. One adjacency is interpreted as the first child relation and the other adjacency is for the right sibling relation. In Figure 3 is depicted an example of the bijection. Hence, without loss of generality, from now on, we consider binary unranked trees only. At the logic level, formulas are interpreted as expected: ϕ holds in nodes such that ϕ is true in its first child; ϕ holds in nodes where ϕ is satisfied by its right (following) sibling; ϕ is true in nodes whose parent satisfies ϕ; and ϕ satisfies nodes where ϕ holds in its left (previous) sibling.

Fig. 3 Bijection of n-ary and binary trees 

For the satisfiability algorithm we consider formulas in negation normal form only.

The negation normal form (NNF) of μTLIC formulas is defined by the usual De Morgan rules and the following ones:

nnf¬p:=¬p,nnf¬x:=¬x,nnf¬ϕψ:=nnf¬ϕnnf¬ψ,nnf¬ϕψ:=nnf¬ϕnnf¬ψ,nnf¬mϕ:=mnnf¬ϕ¬m,nnf¬μx.ϕ:=μx.nnf¬ϕx¬x,nnf¬γ>b:=γb,nnf¬γb:=γ>b.

Hence, negation symbol ¬ in formulas in NNF occurs only in front of propositions and formulas of the form m. It is also easy to see that the negation normal form of a formula has linear size with respect to the size of the formula. Also notice that we consider an extension of μTLIC formulas consisting of conjunctions, γ ≤ b, and formulas, with the expected semantics.

From now on, we often write γ#b to denote any of the following formulas: γ > b or γ ≤ b.

We now consider a binary encoding of natural numbers. Given a finite set of propositions, the binary encoding of a natural number is the Boolean combination of propositions satisfying the binary representation of the given number. For example, number 0 is written i0¬pi, and number 7 is p2p1p0i>2¬pi, (111 in binary). The binary encoding of numbers is required in the definition of counters, which are used in the satisfiability algorithm to verify counting subformulas.

Definition 3 (Counters). Given a formula ϕ and a number b > 0, a counter of ϕ set to k is defined by:

Cϕ=b:=ϕi-,

where i0,logb,ϕi- is a sequence of propositions ϕi occurring positively in the binary encoding of b.

Consider for instance the counter of formula ϕ set to 7:

Cϕ=7:=ϕ0,ϕ1,ϕ2.

We write (C(ϕ)=b) ∈ S, when each ϕiS where Cϕ=b:=ϕi-.

A formula ϕ induces a set of counters corresponding to its counting subformulas. The bound on the number of propositions used by counters is given by K(ϕ), and it is proved in Theorem 5.

Nodes in Fischer-Ladner-Presburger trees are defined as sets of subformulas. These subformulas are extracted with the help of the Fischer-Ladner-Presburger Closure. Before defining the Closure, we define the following set of subformulas of a counting expression:

Saϕ=ϕ,Sγ1+γ2=Sγ1Sγ2.

We often write ϕγ to denote ϕS(γ).

Now, consider the following binary relation RFLP on the set of μTLIC formulas, for i=1,2, =,, j=0,,logKϕ and each ϕγS(γ):

RFLPψ,nnf¬ψ,RFLPψ1ψ2,ψi,RFLPmψ,ψ,RFLPμx.ψ,ψμx.ψx,RFLPγ#b,μx.ϕγx,RFLPγ#b,ϕγj,RFLP¬ψ,ψ.

Definition 4 (Fischer-Ladner-Presburger Closure). Given a formula ϕ, the Fischer-Ladner-Presburger Closure of ϕ is defined as CLFLPϕ=CLkFLPϕ, such that k is the smallest positive integer satisfying CLkFLPϕ=CLk+1FLPϕ, where for i ≥ 0:

CL0FLPϕ=ϕ,CLk+1FLPϕ=CLiFLPϕ,ψRFLPψ',ψ,ψ'CLiFLPϕ.

Example 1. Consider the following formula:

ϕ:=pq-r>1r>0.

This formula holds in p nodes with at least one more q child with respect to r children. In Figure 4, there is graphical representation of a ϕ-tree (Definition 7) for formula ϕ. In the definition of ϕ-trees, we use the notion of Fischer-Ladner-Presburger closure, which in the case of ϕ is defined as follows for j = 0,1,2:

CLFLPϕ=pq-r>1r>0,pq-r>1,r>0,p,q-r>1,q,r,qj,rj,μx.qx,μx.rx,CLFLPnnfϕ.

Fig. 4 ϕ-tree model for ϕ:=pq-r>1r>0 built by the satisfiability algorithm in 5 steps 

We are now ready to define the lean set for nodes in Fischer-Ladner-Presburger trees. The lean set contains the propositions, modal subformulas, counters and counting subformulas of the formula in question (for the satisfiability algorithm). Intuitively, propositions will serve to label nodes, modal subformulas contain the topological information of the trees, and counters are used to verify the satisfaction of counting constraints.

Definition 5 (Lean). Given a formula ϕ, its lean set is defined as follows:

leanϕ=p,mϕ,γ#b,ψγiCLFLPϕm,p',

provided that p’ does not occur in ϕ , and m =, ↓, →, ↑, ←.

Example 2. Consider again the formula ϕ:=pq-r>1r>0. of Example 1, then for i = 1, 2 and j = 0,1,2, we have that:

leanϕ=p,q,r,qj,rj,ψiψi,nnfψj,nnfψj,q-r#1,r#0,m,p',

Where

ψ1=μx.qx and ψ2=μx.rx.

Recall that qj and rj are the corresponding propositions (associated to q and r) required to define the corresponding binary encoding of numbers.

Definition 6. The set of ϕ-nodes is defined as follows:

Nϕ=nϕleanϕpnϕ,mψnϕmnϕ,nϕnϕ.

Intuitively, a ϕ-node nϕ is defined as a subset of the lean, such that:

  • — at least (exactly1) one proposition (different from the counter propositions) occurs in nϕ ;

  • — if a modal subformula mψ occurs in nϕ , then m also does; and

  • — both and can not occur in nϕ .

When it is clear from the context, ϕ-nodes are called simply nodes.

We are finally ready to define the Fischer-Lander-Presburger ϕ-trees.

Definition 7. A ϕ-tree is defined:

  • — either as empty ∅, or

  • — as a triple nϕ,T1ϕ,T2ϕ, provided that nϕ is a ϕ-node and Tiϕ (i = 1,2) are ϕ-trees.

The root of nϕ,T1ϕ,T2ϕ, is nϕ. We often call ϕ-trees simply trees.

Example 3. Consider again the following formula ϕ:=pq-r>1r>0. Then T=n0,n1,,n2,,n3,,n4,,, is a ϕ-tree, where:

n0=p,Cq=0,Cr=0,q-r>1,r>0,ψ1,ψ2,,n1=q,Cq=3,Cr=1,q-r1,r0,ψ1,ψ2,,,n2=q,Cq=2,Cr=1,q-r1,r0,ψ1,ψ2,,,n3=q,Cq=1,Cr=1,q-r1,r0,ψ2,,,n4=r,Cq=0,Cr=1,q-r1,r0,.

ϕ-nodes ni (i = 0,...,4) are defined from the lean of ϕ (Example 2). In Figure 4 is depicted a graphical representation of T. Notice that counters in the root n0 are set to zero 0, that is, no proposition corresponding to counters occurs. This is because counters are intended to count siblings only. For instance, counters in n1 are set to 3 and 1 for q and r, respectively, because there are three q's and one r in n1 and its siblings. Counting formulas occur positively only at the root n0, because they are intended to be true when the counters in the children of n0 satisfy the Presburger constraints. Since ni(i > 0) does not have children, then counting formulas occur negatively (recall the negation normal form of the input formula is also in the lean) in these nodes. Finally, notice that modal subformulas define the topology of the tree.

3.2 The Algorithm

We now define a satisfiability algorithm for the logic μTLIC following the Fischer-Ladner method [5,4,16,18]. Given an input formula, the algorithm decides whether or not the formula is satisfiable. The algorithm builds ϕ-trees in a bottom-up manner. Starting from the leaves, parents are iteratively added until a satisfying tree, with respect to ϕ, is found.

Algorithm 1 describes the bottom-up construction of ϕ-trees.

Algorithm 1 Satisfiability algorithm for μTLIC 

The set Init(ϕ) gathers the leaves. The satisfiability of formulas with respect to ϕ-trees is tested with the entailment relation . Inside the loop, the Update function consistently adds parents to previously build trees until either a satisfying tree is found or no more trees can be built. If a satisfying tree is found, the algorithm returns the input formula is satisfiable, otherwise, the algorithm returns the input formula is not satisfiable.

Example 4. Consider the formula ϕ:=pq-r>1r>0. The ϕ-tree T, described in Example 3, is built by the satisfiability algorithm in 5 steps. All leaves are first defined by Init(ϕ) (Definition 9): notice that n4 is a leaf because it does not contain downward modal formulas. Once in the while cycle, parents and previous siblings are iteratively added to previously built trees, which by the second step consists of leaves only: since μx.rx and occur in n3, and r and occur in n4, it is clear n3 can be the parent of n4, analogously for n2 and n3, and n1 and n2, respectively; also it is clear that n0 can be a parent of n1. Notice that n0 is the root due to the absence of upward modal formulas and . The construction of T is graphically represented in Figure 4.

We now give a detailed description of the algorithm components.

Definition 8 (Entailment). The entailment relation is defined as follows:

nϕnnϕϕnn¬ϕnϕnψnϕψnϕnϕψnψnϕψnϕμx.ϕxnμx.ϕ.

If there is a node n in a tree T, such that n entails ϕnϕ and formulas and . does not occur in the root of T, we then say that the tree T entails ϕ,Tϕ. Given a set of trees X, if there is a tree T in X entailing ϕ,Tϕ, then X entails ϕ,Xϕ. Relation is defined in the obvious manner.

Leaves are ϕ-nodes without downward adjacencies, that is, formulas with the form ψ or ψ do not occur in leaves. Also, counters are properly initialized, that is, for each counting subformula γ#b of the input formula, if a leaf satisfies ϕγ then C(ϕγ) = 1 is contained in the leaf, otherwise C(ϕγ) = 0, that is, no counting proposition corresponding to ϕγ occurs in the leaf. The set of leaves is defined by the Init function.

Definition 9 (Init). Given a formula ϕ, its initial set Init(ϕ) is defined as follows:

nϕNϕ,nϕγ#bleanϕ,ϕγnϕCϕγ=1nϕγ#bleanϕ,ϕγnϕCϕγ=0nϕ.

Notice that, from definition of ϕ-nodes, if formulas of the forms and do not occur in leaves, then neither formulas of the forms ψ and ψ do.

Example 5. Consider again the formula ϕ of Example 3. It is then easy to see that n4 is a leaf. n4 does not contain downward modal formulas ψ and ψ. Also, counters are properly initialized in n4, i.e., C(r) = 1 occurs in n4.

The Update function consistently adds parents to previously built trees. Consistency is defined with respect to two different notions. One notion is with respect to modal formulas. For example, a modal formula ϕ is contained in the root of a tree, if and only if, its first child satisfies ϕ.

Definition 10 (Modal Consistency). Given a ϕ-node nϕ and a ϕ-tree T with root r, nϕ and T are m modally consistent Δmnϕ,T, if and only if, for all mψ,m-ϕ, in lean (ϕ), where m, and m-,, we have that:

mψnϕrψ,m-ψrnϕψ.

Example 6. Consider ϕ in Figure 4. In step 2, it is easy to see that n3 is modally consistent with n4: formula μx.rx is clearly true in n3, because r occurs in n4. In the following steps, ni is clearly modally consistent with ni+1.

Another consistency notion is defined in terms of counters. Since the first child is the upper one in a tree, it must contain all the information regarding counters, i.e., each time a previous sibling is added by the algorithm, counters must be updated. Counter consistency must also consider that counting formulas occurs in the parents, if and only if, the counters of its first child are consistent with constraints in counting subformulas.

Definition 11 (Counter Consistency). Given a ϕ-node nϕ and trees T1 and T2, we say that nϕ and T1 and T2 are counter consistent, written Θnϕ,T1,T2, if and only if, for the roots r1 and r2 of T1 and T2, respectively, and for all counting formulas γ#b in lean(ϕ), we have that:

Cϕγ=b'nϕ,nϕϕγCϕγ=b'-1r2,Cϕγ=b'nϕ,nϕϕγCϕγ=b'r2,γ#bnϕCϕγ=b'r1:γb'ϕγ#b.

Example 7. Consider the formula ϕ of Example 3 and Figure 4. In steps 2, 3 and 4, since previous siblings are added, counters for q are incremented in n3, n2 and n1, respectively. In step 5, the counting formulas q-r>1 and r>0 are present in the root n0, due to the fact that counters, in the first child, satisfy the Presburger constraints.

Update function gathers the notions of counter and modal consistency.

Definition 12 (Update). Given a set of ϕ-trees X and set of ϕ-nodes Y, the update function is defined as follow for i = 1,2:

UpdateX,Y=nϕ,T1,T2TiX,nϕY,Δinϕ,Ti,Θnϕ,T1,T2.

We finally define the function root(X), which takes as input a set of ϕ-trees and returns a set with the roots of the ϕ -trees.

3.3 Correctness

We now show that Algorithm 1 is correct, and then we describe its complexity. Correctness is shown by proving that the algorithm is sound and complete. For these proofs, we first need a fixed point theorem. Proving substitution is monotone is the first step.

Lemma 1. Given a μTLIC formula ϕ, a tree structure T=P,N,R,L and a valuation V, let f:PNPN be defined as fS=ϕVSxT, where x is a free variable inϕ. If SS', then fSfS'.

Proof. We proceed by induction on the structure of ϕ. Base cases are trivial and most inductive cases are straightforward by inductive hypothesis. Recall that since we are considering only formulas in negated normal form, negation symbols ¬ occur in front of propositions and formulas m only. Consider for instance the case of conjunction ψ˄φ. By inductive hypothesis we know ψVSxTψVS'xT and φVSxTφVS'xT, hence ψVSxTφVSxTψVS'xTφVS'xT, and then ψφVSxTψφVS'xT.

The case of the Presburger formula γ > b is more interesting. We prove this case by a second induction on the structure of γ. We distinguish three base cases, the first one is aψ>b Notice in this case a ≥ 0. By the first inductive hypothesis we know ψVSxTψVSxT. It is then clear that, for any node n, ψVSxTnψVS'xTn. The second base case is a1γ1+a2γ2>b, where both a1 and a2 are non-negative integers. This is straightforward from the first base case. The third base case is a1γ1-a2γ2>b. From the first base case, it is easy to see that for any n and i = 1,2, a1γ1VSxTna1γ1VS'xTn. Hence, a1γ1VSxTn-a2γ2VSxTna1γ1VS'xTn-a2γ2VS'xTn. The inductive step in γ > b is immediate from the bases cases. For the case γ ≥ b, recall that in order to ensure the fixed-point existence, variables can only occur positively, hence, variables in γ occur negatively. We then proceed analogously as in the case γ > b.

Consider now the case for μy.ψ. Now let SiN be defined by gSi,SSi, where gSi,S=ψVSiySxT, and Si'N by gSi,S'Si'. Now let S0=iSi and S0'=iSi'. Note that fS=S0 and fS'=S0'. By inductive hypothesis we know gSi,SSi,S' for every i. Now by transitivity of the subset relation (recall gSi,S'Si,S' we obtain gS0,SS0', that is, there is an i such that S0'=Si. By definition of S0, it is easy to see S0Si for every i. We then conclude S0S0'.

We now prove the fixed point Theorem.

Theorem 1. Given a μTLIC formula ϕ, a tree structure T=P,N,R,L and a valuation V, let f:PNPN be defined as fS=ϕVSxT, where x is a free variable in ϕ, then the least fixed point of f is N'NfN'N'.

Proof. Since N is finite, then there is a finite number of SiN, such that fSiSi. Let S=iSi. Since SSi. for every i, and by Lemma 1, we obtain that fSfSi. By definition of each Si, we also know SSi. for every i. Then by transitivity of the subset relation we obtain fSSi. Now recalling that S=iSi,fSSi implies fSS. Then by Lemma 1, ffSfS. Then again by definition of Si, there is a j such that fS=Sj. Since SSj, hence SfS and therefore S=fS. Now since SSi, for every i, it is then clear that S is the least fixed point.

A straightforward observation from the fixed point Theorem 1 is that a fixed point μx.ϕ is equivalent to its unfolding ϕμx.ϕx, that is, for any T and valuation V, we have that μx.ϕVT=ϕμx.ϕxVT.

Theorem 2 (Soundness). If the satisfiability algorithm returns that ϕ is satisfiable, then there is tree model satisfying ϕ.

Proof. Assume T is the ϕ-tree that entails ϕ. Then we construct a tree model T isomorphic to T as follows:

  • — the nodes of T are the ϕ-nodes;

  • — for each triple (n, T1, T2) in T, n1Rn, and n2Rn,, provided that ni are the roots of Ti(i = 1,2); and

  • — if pn, then pL(n).

We now show by induction on the structure of the input formula ϕ that T satisfies ϕ.

Base cases are immediate, that is, when the input formula is either a proposition, a negated proposition or formulas with the form ¬m.

Negations and disjunctions are also immediate by induction. Modal formulas mϕ are clearly satisfied by the construction of the model T and because ϕ is satisfied by induction. For counting formulas a1ψ1+a2ψ2++anψn#b recall by induction there is a node n in T such that Cψ1=b1,Cψ2=b2,,Cψn=bnn and a1b1+a2b2++anbn#b.

Also in T, we know that n is the first child of a node n0 where a1ψ1+a2ψ2++anψn#b is entailed. It is then easy to see that T satisfies a1ψ1+a2ψ2++anψn#b due to the construction described above. In the case of fixed-points, if μx.ϕ is entailed by T, then we know by the definition of the entailment relation that ϕμx.ϕx, is also entailed by T. In order to show that ϕμx.ϕx, is satisfied by T, we proceed by another structural induction on ϕ, which goes smothly since fixed-points are not considered (variables occur only in the scope of modalities or counting formulas).

Completeness proof is divided in two main steps: first we show that there is a lean labeled version of the satisfying model; and then we show that the algorithm can actually build the lean labeled version of the tree model.

Theorem 3. If there is a tree structure T satisfying a formula ϕ, then there is a Fischer-Ladner-Presburger ϕ-tree entailing ϕ.

Proof. Assume T satisfies the formula ϕ. We construct a lean labeled version T of T as follows: the nodes and shape of T are the same as in T; for each ψleanϕ, if n in T satisfies ψ, then ψ is in n of T; and the counters are set in the nodes in T as the algorithm does in a bottom-up manner.

It is now shown by induction on the derivation of Tϕ that T entails ϕ. By the construction of T and by induction most cases are straightforward. For the fixed-point case μx.ψ, we proceed by induction on the structure of the unfolding ψμx.ψx. That there is a finite unfolding comes from the Fixed-Point Theorem 1: μx.ϕVT=ϕμx.ϕxVT. This induction is immediate because variables and hence unfolded fixed-points occur in the scope of modal or counting formulas only.

Before proving that the algorithm builds T, we need to show that there are enough ϕ-nodes to construct T. Recall ϕ-nodes are lean subsets, and the lean is composed by propositions, modal and counting formulas occurring in the input formula, plus counters. Since counters count children nodes, we then need a bound on the number of children. For this purpose, we use a bound of an integer programming problem.

Theorem 4. [30] Let A be a m × n integer matrix and b a m-vector, both with entries in ℤ. Then if Ax = b has a solution xNn, it also has one in 0,,nma2m+1n.

Theorem 5. If a formula ϕ is satisfiable, then there is a ϕ-tree entailing ϕ where each node has at most an exponential number of children with respect to the size of ϕ.

Proof. Since ϕ is satisfiable, there is a ϕ-tree T entailing ϕ by Theorem 3. Recall each node in T is a subset of lean(ϕ), thus composed by propositions, modal formulas mψ and counting formulas γ#b. Notice formulas ψ and γ#b are the ones enforcing child witnesses. Also notice ψ are equivalent to ψ>0. We now encode this set of counting formulas as an integer programming problem in order to obtain a bound on the number of required to children: a1ϕ1++anϕn#b as a1xϕ1++anxϕn+x=b+1. when # is >, and a1xϕ1++anxϕn-x=b when # is ≤, where x ≥ 0. Then, each node has at most an exponential number of children by Theorem 4.

We are now ready to show that the algorithm builds the lean labeled version T of the satisfying model T.

Theorem 6 (Completeness). If there is a tree model T satisfying a formula ϕ, then the satisfiability algorithm returns that ϕ is satisfiable.

Proof. The proof proceeds by induction on the height of T. The base case is trivial. Consider now the induction step. By induction, we know that the left and right subtrees of T were built by the algorithm, we now show that the root n of T can be joined to the previously built left and right subtrees. This is true due to the following: Δ(n,ni) is consistent with R, where i = 1, 2 and ni are the roots of the left and right subtrees, respectively; and by Theorem 5, there are at most an exponential number of children, with respect to the size of ϕ, distinguished by counters, encoded in binary by a linear number of propositions.

Theorem 7 (Complexity). μTLIC satisfiability is EXPTIME-complete.

Proof. We first show that the lean set of the input formula ϕ has linear size with respect to the size of ϕ. This is easily proven by induction on the structure ϕ and by Theorem 5: an exponential number of children can be distinguished by a linear amount of counting propositions (recall counters are encoded in binary). We then proceed to show that the algorithm takes exponential time with respect to the size of the lean. Since ϕ-nodes are defined as subsets of the lean, it is then clear that the number of ϕ-nodes is single exponential with respect to lean size, then there is at most an exponential number of steps in the loop of the algorithm.

It remains to prove that each step, including the ones inside the loop, takes at most exponential time: computing Init(ϕ) implies the traversal of Nϕ and hences takes exponential time; testing takes linear time with respect to the node size, and hence its cost is exponential with respect to the set of trees; and since the cost of relations of modal and counter consistency Δm and Θ is linear, then the Update functions takes at most exponential time. Since the μ-calculus for trees is EXPTIME-complete [12], then μTLIC is hard for EXTPIME, therefore, also complete.

4 Extended Regular Tree Languages

In this section, we introduce several extensions of regular tree languages, which encompass most XML schema languages used in practice, such as DTDs, XML Schema and RelaxNG [29,22]. First we consider the extension with the interleaving operator [27]. Then the extension with counting operators [28]. We show that these extension can be linearly characterized by μTLIC. In Section 5, we also show that regular path queries (XPath) [36] with Presbuger constraints on children path can also be linearly expressed by μTLIC. As a consequence, μTLIC can be used as a framework for standard XML reasoning problems involving schemas and queries with counting and interleaving operators. In Section 3, we describe an EXPTIME satisfiability algorithm, which together with results described in this Section, imply new optimal (EXPTIME) bounds on emptiness, inclusion and equivalence of XPath queries (with counting) and XML schemas (with counting and interleaving).

4.1 Regular Tree Languages

We define the syntax of regular trees similarly as in [20,22].

Definition 13 (Syntax of regular trees). We define the set of regular tree expressions by the following grammar:

e:=ϵxpeeee+e let x=e_____ in e.

We write p instead of pϵ and we consider eϵ and ϵe to be simply e.

Following [22], we now give a precise semantics of regular tree expressions, but first, we define the following notation. Consider a tree structure T=P,N,R,L, recall P is a set of propositions, N a set of nodes, R is transition function among nodes forming a tree, and L is a function labeling nodes with propositions. Then, we write n,T- to denote T, with root n and children subtrees T-=T1,T2,,Tk, that is n1,n2,,nkRn,, where ni is the root of Tii=1,,k. If we write T-S, we mean the composition of Ti is in set S. By the composition of a sequence of trees T-, written T1T2Tk, we mean the resulting tree n,T-. The composition of two sets of trees S1 and S2, written S1S2 denotes the composition of all pairs of trees in S1 and S2, more precisely, the trees T1T2, such that TiSii=1,2.

Definition 14 (Semantics of regular trees). Given a valuation V (from variables to sets of trees), regular tree expressions are interpreted as follows:

ϵV=,xV=Vx,peV=n,T-pLn,T-eV,e1e2V=e1Ve2V,e1+e2V=e1Ve2V,let x=e____ in eV=elfpVeVx_____,

where lfp lfpVeVx_____ stands for the least fixe< point of the substitution function.

Note that there is always a least fixed poir due to the Knaster-TarskiTheorem [35] on fixe< points (substitution is monotone with respect to th subset ordering).

Intuitively, regular tree expressions are interpreted as sets of unranked trees: ϵ is interpreted as the empty set; p[e] denotes the sets of trees whose root is labeled by p and whose children are denoted by e; the interpretation of e1e2 is the set of trees whose children are denoted by e1 and e2, from left to right; e1+e2 is interpreted as the union of the interpretations of e1 and e2; and let x=e____ in e is interpreted as the least fixed point. The Kleene star operator can be expressed in terms of the least fixed point, for instance, the regular tree expression pq* in Figure 1(a) can be written as follows:

let  x=qx+ϵ  in  px+ϵ.

We now show that regular tree expressions can linearly be translated in terms of the μ-calculus, as already shown in [4,5]. This implies that traditional reasoning problems, such as emptiness, containment (inclusion) and equivalence, can be efficiently expressed in terms of the satisfiability of μ-calculus formulas. Before defining a translation function from regular tree expressions to logic formulas, we define μx.ϕ___ in ϕ as a generalization (several binded variables) of the fixed point operator with the expected semantics. This generalization does not provide more expressive power, although, it is more succinct [4].

Definition 15. We define the following translation function from regular tree expressions to μ-calculus formulas:

Fϵ:=,Fx:=x,Fpe:=pFe,Fe1e2:=Fe1Fe2,Fe1+e2:=Fe1Fe2,Fletx=e____ in e:=μx.Fe. ______inFe,

where Fme is defined as follows for m,

  • ¬m if e is ϵ,

  • ¬m Fme' if e has the forms ϵ+e',e'+ϵ and e' is nullable,

  • ¬m  mFe' if e has the forms ϵ+e',e'+ϵ and e' is not nullable, and

  • mFe otherwise.

We say an expression e is nullable when it is a variable bounded to an expression that can be interpreted as the empty tree, as for instance ϵ+e'.

Now, consider as an example the expression pq*. This can be expressed in terms μTLIC as follows:

Fpq*:=Flet x=qx+ϵ in px+ϵ,:=μx.q¬¬x,in p¬x.

Theorem 8 (Reasoning on regular trees [4,5]). Given any two regular tree expressions e1 and e2, we have that for any tree T and valuations V and V':

  • e1V=, if and only, FeV'T=;

  • e1Ve2V, if and only if, Fe1¬Fe2V'T=; and

  • Fei has linear size with respect to ei (i=1,2).

4.2 Interleaving

The interleaving operator, sometimes called shuffle operator, is a common extension of regular languages [27]. In particular, there is an interleaving operator in XML Schema and RelaxNG. Intuitively, the interleave of two regular tree expressions matches the concatenation of trees corresponding to the expressions regardless their order. This operator does not introduce more expressive power to regular languages, however, it is double-exponentially more succinct [19]. For instance, the interleaving of expressions pq and rs can be described as follows:

pq&rs:=pqrs+prqs+prsq+rpqs+rpsq+rspq.

Definition 16 (Interleaving). The interleaving operator in regular tree expressions is inductively defined as follows:

e&ϵ:=e,eϵ&e:=e,e0&e1+e2:=e0&e1+e0&e2,e1+e2&e0:=e0&e1+e0&e2,p1e1e2&p2e3e4:=p1e1e2&p2e3e4,+p2e3p1e1e2&e4.

Counting formulas in μTLIC can be used to represent the interleaving of regular tree expressions. For this purpose, we restrict the expressions that can be interleaved. This restriction is defined by the following grammar:

e':=pee'e'e'+e',

where e is a regular tree expression without restrictions (Definition 13), and disjunctions have constant size, that is, for expressions e1'+e2', we have that e1'*=e2'*, where:

pe*=1,e1e2*=e1*+e2*,e1+e2*=maxe1*,e2*.

For instance, expressions of the form p&q* are disallowed, notice however that recursion can occur at another level of interleaving, for instance, p&r[q*]. Now for an example of constant size disjunctions, (ppp+ qq)&rrr is not allowed, because |ppp|* ≠ |qq|*. Instead, equally sized disjunction can occur at the same level of interleaving, for example (ppp+qqq)&rrr. Notice this restriction applies at top level only, hence expressions as the following are perfectly allowed s[ppp+qq]rrr.

We then define the translation of interleaving as follows.

Definition 17 (Translation of interleaving). Given two regular tree expressions e1' and e2', we translate the interleaving operator as follows:

Fe1'&e2':=Fe1'=1Fe2'=1,=chsizee1'&e2',

Fei' is a linear translation of expression ei' into a μTLIC formula:

Fpe:=pFe,Fe1e2:=Fe1μx.Fe2x,Fe1+e2:=Fe1Fe1.

The translation F of unrestricted regular tree expressions is given in Definition 15, and chsize is defined as follows:

chsizepe=1,chsizee1+e2=maxchsizee1,chsizee2,chsizee1e2=chsizee1&e2,=chsizee1+chsizee2.

Intuitively, chsize computes the number of children to be interleaved.

As an example consider the expression p[qr&st]. This can be expressed in terms of μTLIC as follows:

Fpqr&st:= pFqr=1Fst=1,=chsizeqr&st,=:pqμx.rx=1,sμx.tx=1=4.

Note that concatenation order is preserved, that is, q goes always first than r, and s goes first than t. However, none other order restriction is imposed, hence, p, q, r, s may occur interleaved, as long as we know there are only 4 children.

From Theorem 8 and Definition 17, it is now easy to see we can efficiently reason on regular expressions with interleaving in terms of the satisfiability of μTLIC formulas.

Theorem 9 (Reasoning on regular trees with interleaving). Given any two regular tree expressions e1 and e2 with interleaving, we have that for any tree T and valuations V and V’ the following holds:

  • e1V=, if and only if, Fe1V'T=;

  • e1Ve2V, if and only if, Fe1¬Fe2V'T=; and

  • Fei has linear size with respect to ei (i=1,2):

Proof. For the first item, the proof goes by induction on the structure of e1. All cases are identical as in Theorem 8. We only show here the case of interleaving, that is, when e1 has the form e1'&e2'. Recall that:

Fe1'&e2':=Fe1'=1Fe2'=1,=chsizee1'&e2'.

Now, it is proved by induction that Fei' is the translation of ei'i=1,2, that is, ei'V=, if and only if, Fei'V'T=. Consider ei' is of the form p[e], then:

Fpe:=pFe.

The argument in this case also goes as the correspoding case of Theorem 8. Consider now this case:

Fei,1'ei,2':=Fei,1'μx.Fei,1'x.

Which is is immediate since Fei,j' (for j = 1,2) corresponds by induction to the translation of ei,j'. The case for ei,1'+ei,2' also goes straightforward by induction.

Now, Fei'=1 then states Fei' occurs as a child only once. Nevertheless, there is no children order restriction. Since the number of children to be interleaved is constant, then =chsizee1'&e2' fix the number of children to be interleaved. Therefore:

e1'&e2'V,if and only if,Fe1'&e2'V.

The second item is analogous. And the third one is straightforward by noticing F does not introduce duplications.

4.3 Counting

Counting operators in regular languages restrict the occurrences of expressions with respect to natural numbers. For instance, p2,5 denotes the finite concatenation of at least 2 p's and at most 5, this can be expressed as follows:

pp+ppp+pppp+ppppp.

As one may easily notice, this counting operators do not provided more expressive power, however, they are exponentially more succinct [19], that is, expressing p[a,b], where a and b are natural numbers encoded in binary, results in an exponentially larger regular expression (without counting constructors).

One may think that counting restrictions in regular languages may be easily expressed by counting formulas in μTLIC, however, recall that counting formulas do not impose any occurrence order, whereas counting restrictions in regular languages do, expressions must be consecutively concatenated. In contrast with counting regular expressions in [4,5], where there is no order preservation, here we show that counting Presburger formulas may impose order restrictions on counting regular expressions, as in [28,19]. Furthermore, in the current work, we consider a more general form of the counting than [28,19], because counting expressions may no exhibit an upper bound.

Definition 18 (Counting regular tree expressions). Counting regular tree expressions are defined as follows:

pe'a,b,

where e’ is a regular expression without recursion at top level, that is, e':=pee'e'e'+e', where e is a regular expression without restrictions, a is a natural number encoded in binary, and b is also a natural number greater than a encoded in binary or ∞.

Intuitively, ea,b stands for the successive concatenation of e, such that it occurs at least a times and at most b times. If b is ∞, then there is no upper bound.

This can be see as a generalization of the Kleene star, which can be expressed by e0,. It is then worth to notice that although the general recursion operator is not allowed to occur inside the top level of the counting operator, other forms of recursion, as seen above with the Kleene star can be used. It is also important to note that with this subtle extension, allowing no upper bound, counting expressions become exponentially more concise. This can be seen when expressing ea,, which can be encoded as ea,ae. The exponential gain becomes more evident when this duplication of e occurs in expresions with nested counting.

Definition 19 (Translation of counting). We translate the counting operator as follows:

Fpe'a,b:=pab=Fe',

where e’ is a regular expression without recursion at top level.

Consider as an example the following expression: qp2,5. This can be expressed in terms of μTLIC formulas as follows:

q2b=p.

This formula means that q nodes have at least 2 p children, but no more than 5.

From Theorem 8 and Definition 19 we clearly can imply reasoning on regular expressions with counting and interleaving in terms of the satisfiability of μTLIC formulas. One may have noticed that the translation of counting regular tree expressions is not linear when considering the resulting counting formula as syntactic sugar. We then consider μTLIC extended in the obvious way with the additional counting operators. In Section 3, we present a satisfiability algorithm for μTLIC that can be easily extended with the syntactic sugar operator for counting.

Theorem 10 (Reasoning on regular trees with interleaving and counting). Given any two regular tree expressions e1 and e2 with interleaving and counting operators, we have that for any tree 𝒯 and valuations V and V’ the following holds:

  • e1V=, if and only if, FeV'T=;

  • e1Ve2V, if and only if, Fe1¬Fe2V'T=; and

  • Fei has linear size with respect to ei (i=1,2):

Proof. For the first item, the proof goes by induction on the structure of e1. All cases are identical as in Theorem 8. The case of interleaving was shown in Theorem 9. Here, we only show the case of counting: pe'a,bV=, if and pab=Fe'V'T=. It is shown by induction that Fe' corresponds to the translation of e’:

e'V=,if and only if,Fe'V'T=.

This was already showed in the proof of Theorem 9. Then, it follows that =Fe' restricts all children to match Fe'. ab in addition constrain the number of children to be at least a but no more than b.

The second item is analogous. And the third one is straightforward by noticing F does not introduce duplications.

5 Regular Counting Paths

XPath is a query language for semi-structured data (XML), its navigation core is known as regular paths, and it corresponds to the First Order Logic with two variables FO2 [26]. We now introduce an extension of regular paths, considered in the specification of XPath [36], consisting of Presburger arithmetical constraints on children paths, that is, regular paths expressing children relations. We also give a new EXPTIME bound for reasoning on this counting extension of regular paths.

Definition 20 (Counting paths syntax). We inductively define regular paths expressions with Presburger constraints by the following grammar:

α:=*,ϱ:=αpα:pϱ/ϱϱβ,β:=κ>bϱββ¬β,κ:=aϱκ+κ,ρ:=ϱ/ρρρρρρρ,

where p is a proposition and a and b are integers encoded in binary, b is non-negative.

In order to ensure decidability, path expressions occuring in the scope of a counting operator > are restricted to children, that is, they are expression of the forms: ↓,↓:p,↓[β] or ↓:p[β].

We now give a formal description of the interpretation of regular paths with Presburger constraints.

Definition 21 (Counting paths semantics). Given a tree structure T, regular paths with Presburger constraints are interpreted as follows:

T=N×N,pT=n,npLn,αT=n1,n2n1αn2,α:pT=n1,n2αTpLn2,ϱ1/ϱ2T=ϱ1Tϱ2T,ϱβT=n1,n2ϱTn2βT,/ϱT=r,nϱTris the root,ρ1ρ2T=ρ1Tρ2T,ρ1ρ2T=ρ1Tρ2T,ρ1ρ2T=ρ1Tρ2T,

where n1αn2 holds, if and only if, n1 is related to n2 through α in T, and the interpretation of qualifiers (β) is the following.

κ>bT=nκTn>b,aζTn=a*n1n,n1ζT,κ1+κ2Tn=κ1Tn+κ2Tn,ϱT=n1n,n2ϱT,¬βT=NβT,β1β2T=β1Tβ2T.

Intuitively, regular paths are interpreted over tree structures as pairs of nodes. The left nodes, known as the context, represent from where the path is evaluated, and the right nodes, denote the selection of the path. Axis relation ↓, as in μTLIC, stands for the children relation, → for the following sibling relation, ↑ for parents, ← for previous siblings, for descendants, and for ancestors.

So basic paths α : p denotes pair of nodes, such that the right node of the pair is labeled by p and it is related with the left node of the pair by α. So for instance, :p stands for the pairs of nodes, such that the right node of the pairs is the descendant of the left node of the pair. ϱ/ϱ stands for the compositions of paths. For example, :p/:q intuitively navigates first to the children named p, and from there to the q descendants. ϱβ denotes the pair of nodes of ϱ that satisfies β, which is a Boolean expression composed by regular paths and Presburger children paths. Consider for instance the path :p/:q, in contrast with the previous example, this expressions denotes the p children having at least one descendant named q. In Presburger expressions κ > b, path occurring in κ are children paths. For example, :p:q-:r>0 denotes the p children with more q children than r children. Another example is :p:q>5, which denotes the p children with more than 5 q children. We also use the following syntactic sugar for qualifiers:

β1β2:=¬¬β1¬β2,κb:=¬κ>b,κ=k:=κkκ>b-1,a1ϱ1#a2ϱ2:=a1ϱ+-a2ϱ2#0,

where # stands for <, ≤, ≥, =. /ρ stands for the pair of nodes denoted by p, such that the left node of the pairs is the root. Union, intersection and difference of paths are expressed as ρρ',ρρ',ρρ', respectively.

Regular paths can be linearly translated in terms of μ-calculus [4]. Consider for instance the following expression: :p/:q. This path, evaluated from any node (context), selects the p children with at least one q descendant. Nodes selected by this path can be expressed by the following formula:

pμx.qx.

Arithmetical constraints on children path can be expressed by μTLIC counting expressions. For example, :p:q>b, selects the p nodes with at least b + 1 children named q. This can be easily written in terms of μTLIC formulas as follows:

pq>b.

As another example consider :p:q=:r, which selects the p children with the same number of children named q and r. In terms of μTLIC, we then write:

pq=r,

When characterizing regular paths in terms of μTLIC formulas, we can denote the context from where paths are evaluated by some other formula. We usually denote this context formula by C.

Definition 22 (Translation of counting paths). We define the translation of regular paths with Presburger constraints, with respect to a context formula C, as follows:

F,C:=C,F,C:=C,F,C:=C,F,C:=C,F,C:=μx.Cx,F,C:=μx.Cx,Fα:p,C:=Fα,Cp,Fϱ1/ϱ2,C:=Fϱ2,Fϱ1,C,Fϱβ,C:=Fϱ,CF'β,,F/ϱ,C:=Fϱ,C¬,Fρ1ρ2,C:=Fρ1,CFρ2,C,Fρ1ρ2,C:=Fρ1,CFρ2,C,Fρ1ρ2,C:=Fρ1,C¬Fρ2,C.

where the translation of qualifiers F' is defined as follows.

F'κ>b,C:=F'κ>b,F'aϱ,C:=aF''ϱ,F'κ1+κ2,C:=F'κ1+F'κ2,F'α,C:=Fα-,C,F'α:p,C:=Fα-:p,C,F'ϱ1/ϱ2,C:=F'ϱ1,F'ϱ2,C,F'ϱβ,C:=F'ϱ,F'β,C,F'¬β,C:=¬F'β,C,F'β1β2,C:=F'β1,CF'β2,C,

where α- is the dual of α, that is, -=,-=,-=, and α̿=α, and F’’ translates children paths as follows:

F'':=,F'':p:=p,F''β:=F'β,,F'':pβ:=pF'β,.

From this translation, it is then clear that μTLIC can be used as a query reasoning framework for regular paths with Presburger constraints.

Theorem 11 (Counting paths reasoning). For any regular path query with Presburger constraints ρ1 and ρ2, any formula C, any tree structure T , and any valuation V, the following holds:

  • Fρ1,CVT=nn',nρ1T,n'CVT;

  • ρ1Tρ2T if and only if Fρ1,¬Fρ2,VT=; and

  • Fρi has linear size with respect to ρi(i=1,2).

Proof. The proof of the first item goes by induction on the structure of the input query. Base cases are immediate, as well as most inductive ones. We will only consider then the case when the input query has the following form: ϱκ>b. We then use another induction on the structure of κ. Consider then the case ϱa:p>b. According to Definition 22, we obtain the following:

Fϱ:p>b,C:=Fϱ,Cap>b.

Now, by induction, we know that for any tree T and valuation V, it is the case that:

Fϱ,CVT=nn',nϱT,n'CVT.

Since ap > b holds in nodes with more than b children, it is then easy to see that:

Fϱ:p>b,CVT=nn',nϱa:p>bTn'CVT.

Other base cases for κ are analogous. Consider now the following input query ϱκ1-κ2>b. This is translated as follows:

Fϱ,CF'κ1+F'κ2>b.

As in the base cases, by structural induction on paths, we know that ϱ exactly corresponds to its translation. By structural induction on children paths, we then obtain that κ1 and κ2 also correspond to their respective translation. It is then clear to infer the following:

Fϱκ1+κ2>b,CVT=nn'CVTn',nϱκ1+κ2>bT.

The second item is an immediate consequence of the first one.

Regarding the third item, since the translation does not introduce duplications (Definition 22), the proof goes straightforward by structural induction.

Corollary 1 (Query reasoning in the presence of schemas). Given any two regular tree expressions e1 and e2 with interleaving and counting operators, any regular paths with Presburger constraints ϱ1 and ϱ2, and any formula C, we have that for any tree T and valuation V the following holds:

  • A query ρ1 is empty in the presence of a regular tree (schema) e1 , if and only if, Fϱ1,CFe1VT=;

  • a query ρ1 in the presence of a regular tree (schema) e1 is contained in a query ρ2 in the presence of a regular tree e2, if and only if, Fϱ1,CFe1¬Fϱ2,CFe2VT=; and

  • Fei and Fϱi,C have linear size with respect to ei,ϱii=1,2, and C.

6 Conclusions

We introduced a modal logic for trees with a fixed point, inverse programs, and Presburger constraints (μTLIC). This logic can been seen as the fully enriched μ-calculus for trees extended with Presburger constraints.

Regular tree languages (XML schemas) can be linearly captured by the logic. We introduced extensions of regular trees with interleaving and counting operators. These extensions can also be linearly characterized by μTLIC. Moreover, regular path queries (XPath) with Presburger constraints on children paths are also linearly translated in terms of μTLIC formulas.

Since the logic is closed under negation, it can be used as a XML reasoning framework for counting extensions of XPath and XML schemas. We showed that the logic is decidable in single exponential time, even if the Presbuger constraints are encoded in binary.

This result implies new EXPTIME bounds on XPath counting fragments and regular tree extensions with interleaving and counting.

In [6,3], decidable classes of ranked trees with counting and (dis)equality constraints are studied. As a further research perspective, we are interested in the relation of counting and equality constraints on unranked trees. We believe efficient decidability algorithms may be extracted from the modal logic approach.

In another setting, arithmetical constraints on trees have been also successfully used in the verification of balanced tree structures such as AVL or red-black trees [25,21].

We believe another field of application for the logic presented in the current work is in the verification of balanced tree structures. We also believe the logic can be used as an expressive framework in context-aware systems [7,23], where counting constraints play a key role when modeling location/distance variables.

References

1. Aminof, B., Murano, A., & Rubin, S. (2018). CTL* with graded path modalities. Inf. Comput., 262(Part), 1-21. [ Links ]

2. Areces, C., Hoffmann, G., & Denis, A. (2010). Modal logics with counting. In Dawar, A. & de Queiroz, R. J. G. B., editors, Logic, Language, Information and Computation, WoLLIC 2010, volume 6188 of Lecture Notes in Computer Science. Springer, 98-109. [ Links ]

3. Bárcenas, E., Benítez-Guerrero, E., & Lavalle, J. (2016). On regular paths with counting and data tests. Electr. Notes Theor. Comput. Sci., 328, 3-16. [ Links ]

4. Bárcenas, E., Genevès, P., Layaïda, N., & Schmitt, A. (2011). Query reasoning on trees with types, interleaving, and counting. In Walsh, T., editor, IJCAI. IJCAI/AAAI, 718-723. [ Links ]

5. Bárcenas, E. & Lavalle, J. (2014). Global numerical constraints on trees. Logical Methods in Computer Science, 10(2). [ Links ]

6. Barguñó, L., Creus, C., Godoy, G., Jacquemard, F., & Vacher, C. (2013). Decidable classes of tree automata mixing local and global constraints modulo flat theories. Logical Methods in Computer Science , 9(2). [ Links ]

7. Bettini, C., Brdiczka, O., Henricksen, K., Indulska, J., Nicklas, D., Ranganathan, A., & Riboni, D. (2010). A survey of context modelling and reasoning techniques. Pervasive and Mobile Computing, 6(2), 161-180. [ Links ]

8. Bianco, A., Mogavero, F., & Murano, A. (2012). Graded computation tree logic. ACM Trans. Comput. Log., 13(3), 25. [ Links ]

9. Bonatti, P. A., Lutz, C., Murano, A., & Vardi, M. Y. (2006). The complexity of enriched mu-calculi. In Bugliesi, M., Preneel, B., Sassone, V., & Wegener, I., editors, ICALP, volume 4052 of Lecture Notes in Computer Science. Springer, 540-551. [ Links ]

10. Bonatti, P. A., Lutz, C., Murano, A., & Vardi, M. Y. (2008). The complexity of enriched mu-calculi. Logical Methods in Computer Science , 4(3). [ Links ]

11. Bonatti, P. A. & Peron, A. (2004). On the undecidability of logics with converse, nominals, recursion and counting. Artif. Intell., 158(1), 75-96. [ Links ]

12. Calvanese, D., Giacomo, G. D., Lenzerini, M., & Vardi, M. Y. (2010). Node selection query languages for trees. In Fox, M. & Poole, D., editors, AAAI. AAAI Press. [ Links ]

13 . Charatonik, W. & Witkowski, P. (2013). Two-variable logic with counting and trees. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS. IEEE Computer Society, 73-82. [ Links ]

14. Charatonik, W. & Witkowski, P. (2016). Two-variable logic with counting and trees. ACM Trans. Comput. Log. , 17(4), 31:1-31:27. [ Links ]

15. Colazzo, D., Ghelli, G., Pardini, L., & Sartiani, C. (2013). Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking. Theor. Comput. Sci., 492, 88-116. [ Links ]

16. Demri, S. & Lugiez, D. (2010). Complexity of modal logics with Presburger constraints. J. Applied Logic, 8(3), 233-252. [ Links ]

17. Droste, M. & Vogler, H. (2011). Weighted logics for unranked tree automata. Theory of Computing Systems, 48(1), 23-47. [ Links ]

18. Fischer, M. J. & Ladner, R. E. (1977). Propositional modal logic of programs (extended abstract). In Hopcroft, J. E., Friedman, E. P., & Harrison, M. A., editors, Proceedings of the 9th Annual ACM Symposium on Theory of Computing. ACM, 286-294. [ Links ]

19. Gelade, W. (2010). Succinctness of regular expressions with interleaving, intersection and counting. Theor. Comput. Sci. , 411(31-33), 2987-2998. [ Links ]

20. Genevès, P., Layaïda, N., Schmitt, A., & Gesbert, N. (2015). Efficiently deciding μ-calculus with converse over finite trees. ACM Trans. Comput. Log. , 16(2), 16. [ Links ]

21. Habermehl, P., losif, R., & Vojnar, T. (2010). Automata-based verification of programs with tree updates. Acta Inf., 47(1), 1-31. [ Links ]

22. Hosoya, H., Vouillon, J., & Pierce, B. C. (2005). Regular expression types for XML. ACM Trans. Program. Lang. Syst., 27(1), 46-90. [ Links ]

23. Limón, Y., Bárcenas, E., Benítez-Guerrero, E., & Molero, G. (2018). On the consistency of context-aware systems. Journal of Intelligent and Fuzzy Systems, 34(5), 3373-3383. [ Links ]

24. Malvone, V., Mogavero, F., Murano, A., & Sorrentino, L. (2018). Reasoning about graded strategy quantifiers. Inf. Comput. , 259(3), 390-411. [ Links ]

25. Manna, Z., Sipma, H. B., & Zhang, T. (2007). Verifying balanced trees. In Artëmov, S. N. & Nerode, A., editors, LFCS, volume 4514 of Lecture Notes in Computer Science. Springer. ISBN 978-3-540-72732-3, 363-378. [ Links ]

26. Marx, M. (2005). Conditional XPath. ACM Trans. Database Syst., 30(4), 929-959. [ Links ]

27. Mayer, A. J. & Stockmeyer, L. J. (1994). Word problems-this time with interleaving. Inf. Comput. , 115(2), 293-311. [ Links ]

28. Meyer, A. R. & Stockmeyer, L. J. (1972). The equivalence problem for regular expressions with squaring requires exponential space. In 13th Annual Symposium on Switching and Automata Theory. IEEE Computer Society, 125-129. [ Links ]

29. Murata, M., Lee, D., Mani, M., & Kawaguchi, K. (2005). Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn., 5(4), 660-704. [ Links ]

30. Papadimitriou, C. H. (1981). On the complexity of integer programming. J. ACM, 28(4), 765-768. [ Links ]

31. Seidl, H., Schwentick, T., & Muscholl, A. (2003). Numerical document queries. In Neven, F., Beeri, C., & Milo, T., editors, PODS. ACM. ISBN 1-58113-670-6, 155-166. [ Links ]

32. Seidl, H., Schwentick, T., & Muscholl, A. (2008). Counting in trees. In Flum, J., Grädel, E., & Wilke, T., editors, Logic and Automata, volume 2 of Texts in Logic and Games. Amsterdam University Press, 575-612. [ Links ]

33. Seidl, H., Schwentick, T., Muscholl, A., & Habermehl, P. (2004). Counting in trees for free. In Díaz, J., Karhumäki, J., Lepistö, A., & Sannella, D., editors, ICALP, volume 3142 of Lecture Notes in Computer Science. Springer, 1136-1149. [ Links ]

34. Sorrentino, L., Rubin, S., & Murano, A. (2018). Graded CTL* over finite paths. In Aldini, A. & Bernardo, M., editors, Proceedings of the 19th Italian Conference on Theoretical Computer Science, volume 2243 of CEUR Workshop Proceedings. CEUR-WS.org, 152-161. [ Links ]

35. Tarski, A. (1955). A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math., 5(2), 285-309. [ Links ]

36. ten Cate, B. & Marx, M. (2009). Axiomatizing the logical core of XPath 2.0. Theory Comput. Syst., 44(4), 561-589. [ Links ]

37. Tobies, S. (2001). Complexity results and practical algorithms for logics in knowledge representation. Ph.D. thesis, RWTH Aachen University, Germany. [ Links ]

38. Venema, Y. (2012). Lecture Notes on the modal p-calculus. The University of Amsterdam. [ Links ]

39. Zawidzki, M., Schmidt, R. A., & Tishkovsky, D. (2013). Satisfiability problem for modal logic with global counting operators coded in binary is NExpTime-complete. Inf. Process. Lett., 113(1-2), 34-38. [ Links ]

1In the XML setting, exactly one proposition occurs at each node.

Received: April 27, 2018; Accepted: October 29, 2019

* Corresponding author is Everardo Bárcenas. ebarcenas@unam.mx

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License