Using BiLSTM in Dependency Parsing for Vietnamese

Thi, Luong Nguyen; My, Linh Ha; Minh, Huyen Nguyen Thi; Le-Hong, Phuong; Thi, Luong Nguyen; My, Linh Ha; Minh, Huyen Nguyen Thi; Le-Hong, Phuong

doi:10.13053/cys-22-3-3023

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.22 n.3 Ciudad de México Jul./Sep. 2018

https://doi.org/10.13053/cys-22-3-3023

Articles of the Thematic Issue

Using BiLSTM in Dependency Parsing for Vietnamese

Luong Nguyen Thi¹

Linh Ha My²

Huyen Nguyen Thi Minh²

Phuong Le-Hong²

^¹ Dalat University, Lamdong, Vietnam

^² VNU University of Science, Hanoi, Vietnam

Abstract:

Recently, deep learning methods have achieved good results in dependency parsing for many natural languages. In this paper, we investigate the use of bidirectional long short-term memory network models for both transition-based and graph-based dependency parsing for the Vietnamese language. We also report our contribution in building a Vietnamese dependency treebank whose tagset conforms to the Universal Dependency schema. Various experiments demonstrate the efficiency of this method, which achieves the best parsing accuracy in comparison to other existing approaches on the same corpus, with unlabeled attachment score of 84.45% or labeled attachment score of 78.56%.

Keywords: Deep learning; BiLSTM; dependency parsing; Vietnamese

1 Introduction

Dependency parsing consists of graph-based and transition-based parser (Kubler et al., 2009). Given sentence s, a graph-based algorithm finds the highest scoring parse tree from all possible outputs while a transition-based algorithm builds a parse by a sequence of actions. In recent years, many researchers have developed deep learning approaches with high accuracy in English, Chinese, etc. Chen and Manning proposed a novel way of learning a neural network classifier in a greedy, transition-based dependency parser which achieved USA=92.2% and LSA=89.7% on the English Penn Treebank [¹].

Dyer et al. (2015) [³] also presented stack LSTMs, recurrent neural networks for sequences, with push and pop operations, and used them to implement a state-of-the-art of transition-based dependency parser with USA=93.2% and LSA=90.9% in English. Kiperwasser et al. (2016) [⁵] presented a simple and effective scheme for dependency parsing based on bidirectional-LSTMs (BiLSTMs) which had USA=93.8% and LSA=91.5% for English. Besides, Dozat and Manning (2016) [²] have recently inherited from Kiperwasser et al. using neural attention in a simple graph-based dependency parser. Their parser gained a state-of-the-art or its performance on standard treebanks in six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset.

Regarding Vietnamese dependency parsing, there have been many contributions to parsing. In 2008, Nguyễn Lê Minh et al. [¹²] used MST parser on a corpus consisting of 450 sentences. Then, in 2012, Phuong Le et al. [⁶] applied a lexicalized tree-adjoining grammar parser trained on a subset of the Vietnamese treebank. In 2013, Thi-Luong et al. [¹⁸] used MaltParser on a Vietnamese dependency treebank which is converted automatically from a Vietnamese treebank. One year later, Dat et al. [¹⁴] also presented a new conversion method to automatically transform a constituent-based Vietnamese Treebank into dependency trees.

In 2015, Phuong Le et al. [⁸] improved accuracy of Vietnamese dependency parsing, used distributed word representations with Skip-gram and GloVe model for transition-based dependency parsing. In 2016, Thi-Luong et al. [¹⁶] also used distributed word representations with Skip-gram in graph-based dependency parsing for Vietnamese and Dat et al. [¹³] presented an empirical study for Vietnamese dependency parsing. In 2017, Kiem Hieu [¹⁵] presented their work on building BKTreebank, a dependency treebank for Vietnamese.

1.1 Transition-Based Dependency Parsing

The transition system has a set of configurations and a set of transitions which are applied to configurating. By parsing a sentence, the system is initialized to an initial configuration based on the input sentence, and transitions are repeatedly applied to this configuration. After a finite number of transitions, the system arrives at a terminal configuration, and a parse tree is read off the terminal configuration. In a greedy parser, a classifier is used to choose the transition and take in each configuration, based on features extracted from the configuration itself. The parsing algorithm is presented in Algorithm 1 below.

Algorithm 1 Greedy transition-based parsing

Many transition-based systems [⁷] are popular such as arg-eager algorithm, arg-standard algorithm. However in this work, we employ the arc-hybrid system which is similar to these. In the arc-hybrid system, a configuration c = (α, β, T) consists of a stack α, a buffer β, and a set T of dependency arcs.

Both the stack and the buffer hold integer indices pointing to sentence elements. Given a sentence s = w₁, w₂,..., w_n, the system is initialized with an empty stack, an empty arc set, and β = 1,..., n, ROOT, where ROOT is the special root index. Any configuration c with an empty stack and a buffer containing only ROOT is terminal, and the parse tree is given by the arc set T_c of c. The arc-hybrid system allows 3 possible transitions, SHIFT, LEFT and RIGHT, defined as:

— SHIFT[(α, b₀|β, T)] = (α|b₀, β, T),
— LEFT_l[(α|s₁|s₀, b₀|β, T)] = (α|s₁, b₀|β, T ∪ {(b₀, s₀, l)}),
— RIGH_l[(α|s₁|s₀, β, T)] = (α|s₁, β, T ∪{(s₁, s₀, l)}).

1.2 Graph-Based Dependency Parsing

The second approach is the graph-based dependency parsing algorithm introduced by McDonald et al. [¹¹]. In this algorithm, the weights of the edges are calculated for building dependency graphs of a sentence as follows:

s(i,j)=w⋅f(i,j),

where w is the weight of the (i, j) edge, f(i, j) is feature of (i, j) edge. The weight of (i, j) edge represents the ability to create a dependency between the head (w_i) and the dependence (w_j). If the arc score function is known, then the weight of graph is:

S(G=(V,E))=∑(i,j)s(i,j).

Then, based on the weights of all edges in graph, McDonald et al. [¹⁰] showed that this problem is equivalent to finding the highest scoring directed spanning tree for the graph G originating out of the root node 0.

1.3 Long Short-Term Memory

Recurrent Neural Network. The recurrent neural network (RNN) is a class of artificial neural network designed for sequence labeling task. It takes input as a sequence of vector and returns another sequence. The simple architecture of RNN has an input layer x, hidden layer h and output layer y. At each time step t, the values of each layer are computed as follows:

ht=fh(Wihxt+Whhht−1),yt=fo(Whoht),

where W_ih, W_hh and W_ho are the three connection weight matrices and f_h and f_o that are sigmoid and softmax are the hidden and output unit activation functions.

Long Short-Term Memory. Long Short-Term Memory (LSTM) was first proposed in 1997 by Sepp Hochreiter et al. [⁴]. LSTM is an extended model of RNN which is designed to combat with these vanishing and exploding gradient problems when learning with long-range sequences. LSTM networks are the same as RNN, except that the hidden layer updates are replaced by memory cells. Figure 1 shows a LSTM cell, including i, f, o are the input,forget and output gates, respectively. c and c˜ denote the memory cell content. LSTM cell calculates a hidden state s_t as following equations:

Fig. 1 Long Short-Term Memory cell

where σ is the element-wise sigmoid function and ⊙ is the element-wise product, i, f, o and c are the input gate, forget gate, output gate, and cell vector respectively. Uⁱ, U^f, U^c, U^o are connection weight matrices between input x and gates, and Wⁱ, W^f, W^c, W^o are connection weight matrices between gates and hidden state h.

Bidirectional Long Short-Term Memory. The original LSTM uses only previous contexts for prediction. For many sequence labeling tasks, it is advisable to take the contexts from two directions. Bidirectional LSTM utilizes both the previous and future context by processing the sequence in two directions, and generate two independent sequences of LSTM output vectors.

2 Approach

2.1 Universal Dependency Parsing in Vietnamese

2.1.1 Universal Dependency

The dependency label represents the dependence between the two words in the sentence. Each pair of words, in different positions, will have a different dependency label. There is a general conversion rule to do the dependency label which is uniform throughout the language. There are many sets of relational labels for a language which are different from each others.

The Universal dependencies - UD^¹ was developed by the Stanford University team, Marneffe et al. [⁹]. This is a project developed based on the treebank annotation for multi-language, with the goal of facilitating the development of multilingual parsing, cross-language learning, research and analysis from the perspective of the type of language. This project was developed based on the Stanford Dependency - SD dependency labels, also by the Stanford University team (Marneffe et al., 2015) based on multi-lingual labels (Petrov et al., 2012) and the magnetic word form (Zeman, 2008).

The general objective of developing Universal dependencies is to provide a labels set and guidelines to facilitate the construction of of similar works for other languages and allow expansion to a new language. The labels in SD are organized in groups of subject, object, clauses, word definitions, or nouns. Stanford offers nearly 50 types of English dependencies based-on PennTreebank corpus. All of these dependencies are twofold: between a head word and its dependent word. Each relation is given by three components: dependency label, head word and dependent word.

Universal dependencies can be applied to many different languages, which can be used to suggest improvements in dependency parsing, even for English. This research team has developed a core label set that has been extensively tested in a variety of languages, meaning that this core label set can be applied in many different languages. It is also possible to add new labels as needed by categorizing special linguistic relationships, or for individual cases of one or more groups of languages. This label set may correspond to many different languages such as English, French, German, Chinese.... This label is useful because it can indicate a dependency for the same sentence, in different languages.

Universal dependencies contain 40 labels that were organized to allow principles of the UD taxonomy such that rows correspond to functional categories in relation to the head (core arguments of clausal predicates, non-core dependents of clausal predicates, and dependents of nominals) while the columns correspond to structural categories of the dependent (nominals, clauses, modifier words, function words) as in Table 1. All of Universal dependencies are defined and there are specific examples that can use to develop and build a complete label for the others language.

Table 1 Dependencies in universal Stanford Dependencies

	Nominals	Clauses	Modifier words	Function Words
*Core arguments*	nsubj	csubj
	obj	ccomp
	iobj	xcomp
*Non-core arguments*	nsubj	csubj
	obl	advcl	advmod	aux
	vocative		discourse
	expl
	dislocated
*Coordination*	*MWE*	*Loose*	*Special*	*Other*
conj	fixed	list	orphan	punct
cc	flat	parataxis	goeswith	root
	compound		reparandum	dep

2.1.2 Vietnamese Dependencies

Based on universal dependencies and Viettree-bank, we have built Vietnamese dependencies. This set has labels that coincide with the labels in the UD and several new labels. The Vietnamese dependencies set has 46 labels. Some of the dependent labels that we have designed specifically for Vietnamese:

— csubj: asubj (adjective subject: A adjective subject is an adjective phrase which is the syntactic subject of a clause. In Vietnamese, the subject is usually a noun (or a noun phrase), but there are some cases adjectives be the subject:
- - Xa_xa là hố bom.

— csubj: vsubj (verb subject): This is used to describe the phenomenon as a verb is a subject of a sentence. In Vietnamese, the subject is usually a noun, but there are some cases adjective, verb, clause can do the subject of a sentence:
- - Học tâp là nhiệm vụ chính → csubj:vsubj(là, học tâp)

— nc (classifier noun): This relation represents the relationship between a classifier noun with common nouns. The classifier noun always stands before the common noun, for example, “cái”, “con ”...
- - Hai con mèo đen đang ăn cá. → nc(mèo, con)

— vnom (verb nominal): This is used for the relationship between a verb moninal and a classifier noun. The classifier noun is always before the verb. Example: “cái”, “sụ”, “việc”,...
- - Cái ăn khan hiếm quá! → vnom(ăn, cái)

Then, we have a comparison between the two sets of labels under Tables 2 and 3.

Table 2 Comparison between Vietnamese dependencies (VD) and Universal dependencies (UD), part 1

VD (2016)	UD (2015)	Meaning
csubj	csubj	Clausal subject
csubj:asubj
csubj:vsubj
acomp	xcomp	Adjectival complement
amod	amod	Adjectival modier
apredmod	advmod	Adjectival modier of a predicate
advmod	advmod	Adverbial modier
advcl	advcl	Adverbial clause modier
aux	aux	Auxiliary
auxpass	auxpass	Passive auxiliary
appos	appos	Appositional modier
cc	cc	Coordination
ccomp	ccomp	Clausal complement
conj	conj	Conjunct
cop	cop	Copula
dep	dep	Dependent
det	det	Determiner
discourse	discourse	Discourse element
dislocated	dislocated	Dislocated elements
dobj	dobj	Direct object
foreign	foreign	Foreign words
iobj	iobj	Indirect object
list	list	List
mark	mark	Marker
neg	neg	Negation modier

Table 3 Comparison between Vietnamese dependencies (VD) and Universal dependencies (UD), part 2.

VD (2016)	UD (2015)	Meaning
nn	compound	Noun compound modier
nsubj	nsubj	Nominal subject
num	nummod	Numeric modier
number	compound	Element of compound number
parataxis	parataxis	Parataxis
pcomp	mark	Prepositional complement
pobj	case	Object of a preposition
prep	nmod	Prepositional modier
punct	punct	Punctuation
remnant	remnant	Remnant in ellipsis
reparandum	reparandum	Overridden disfluency
rcmod	acl:relcl	Relative clause modier
ref	ref	Referent
root	root	root
tmod	nmod:tmod	Temporal modier
vcomp	ccomp	Verb complement of a verb
vmod	amod:vmod	Verb modier of an NP
vocative	vocative	Vocative
xcomp	xcomp	Open clausal complement
nsubjpass	nsubjpass	Passive nominal subject
csubjpass	csubjpass	Clausal passive subject
-	expl	Expletive
-	goeswith	Goes with
nc	-	Classifier noun
vnom	-	Verb nominal

2.2 BiLSTM in Dependency Parsing

2.2.1 Using BiLSTM Feature Representation

Instead of using direct feature vectors in dependency parsing, we use the same method in [⁵]. Each of feature vectors by its BiLSTM encoding, and uses a concatenation of a minimal set of such BiLSTM encodings as a feature function, which is then passed to a non-linear scoring function (multi-layer perceptron).

Give input sentence s with n words: w₁,..., w_n and the corresponding POS tags p₁,..., p_n . Each word w_i and POS p_i with embedding vectors e(w_i) and e(p_i) and denote x_1:n is a sequence of input vectors with:

xi=e(wi) ∘ e(pi).

The embedding are trained together with the model. We alse denoted v_i is the output of this model. v_i is computed as follows:

vi=BiLSTM(x1:n,i).

A Bidirectional LSTM composed of two LSTMs: LSTM_f and LSTM_b. The LSTM_f reads the sequence in its regular order and the LSTM_b reads it in reverse. Concretely, given a sequence of vectors x_1:n and index i, the function BiLSTM_θ(x_1:n, i) is defined as:

BiLSTMθ(x1:n,i)=LSTMf(x1:i) ∘ LSTMb(xn:i),vi=BiLSTMθ(x1:n,i).

The feature function φ is then the concatenation of a small number of BiLSTM vectors. The resulting feature vectors are then scored using a non-linear function, namely a multi-layer perceptron with one hidden layer (MLP):

MLPθ(x)=W2⋅tanh⁡(W1⋅x+b1)+b2,

where θ = W², W¹, b², b¹ are the model parameters.

2.2.2 Transition-Based Dependency Parsing uses BiLSTM Feature Representation

Given a sentence s, the transition-based parser is initialized with configuration c. Then, a feature function φ(c) represents the configuration c as a vector. The feature function is the concatenated BiLSTM vectors of the some items on the stack and the buffer. For example, for a configuration c = (...|s₂|s₁|s₀, b₀|..., T) the feature extractor is the top 3 items on the stack and the first item on the buffer. It is defined as:

ϕ(c)=vs2 ∘ vs1 ∘ vs0 ∘ vb0,vi=BiLSTM(x1:n,i).

Each transition is scoring using an MLP that is fed the BiLSTM encodings of vectors that are gotten from the feature extractor. Each x_i is concatenation of a word and a POS vector. SCORE assigning scores to (configuration, transition) pairs. SCORE scores the possible transition t = Shift, Left_Arc, Right_Arc, and the highest scoring transition t^ is chosen. The transition t^ is applied to the configuration that will output a new configuration.

2.2.3 Graph-Based Dependency Parsing uses BiLSTM Feature Representation

In graph-based parsing, the weights of the edges are calculated for building dependency graphs of s = x_1:n a sentence as follows:

predict(s)=[arg maxy∈Y(s)scoreglobal(s,y)],scoreglobal(s,y)=∑part∈yscorelocal(s,part),

where space Y(s) of valid dependency trees over s.

Arc-factored parsing decomposes the score of a tree to the sum of the score of its head-modifier arcs (h, m):

parse(s)=[arg⁡ max⁡y∈Y(s)∑(h,m)∈yscore(ϕ(s,h,m))],

where φ(s, h, m) is the feature extractor which uses the BiLSTM encoding of the head word and the modifier word: φ(s, h, m) = BiLSTM(x_1:n, h) ◦ BiLSTM(x_1:n, m).

The final model is:

parse(s)=arg maxy∈Y(s)∑(h,m)∈yscore(ϕ(s,h,m))=arg maxy∈Y(s)∑(h,m)∈yMLP(vh ∘ vm),vi=BiLSTM(x1:n,i).

3 Experiments

3.1 Datasets

We use the similar database in our research [⁸, ¹⁶, ¹⁸]. Text corpus for distributed word representations: To create distributed word representations, we use the dataset consisting of 7.3 GB of text from 2 million articles collected via the Vietnamese news portal. The text is first normalized to lower case. All special characters are removed except these common symbols: the comma, the semi-colon, the colon, the full stop and the percentage sign. All numeral sequences are replaced with the special token <number>, so those correlations between a certain word and a number are correctly recognized by the neural network or the log-bilinear regression model.

Each word in the Vietnamese language may consist of more than one syllable with spaces in between, which could be regarded as multiple words by the unsupervised models. Hence it is necessary to replace the spaces within each word with underscores to create full word tokens. The tokenization process follows the method described in [¹⁷]. After removal of special characters and tokenization, the articles add up to 969 million word tokens, spanning a vocabulary of 1.5 million unique tokens. We train the unsupervised models with the full vocabulary to obtain the representation vectors, and then prune the collection of word vectors to the 5.000 most frequent words, excluding special symbols and the token <number> representing numeral sequences.

Dependency treebank. We conduct our experiments on the Vietnamese dependency treebank dataset. This treebank is derived automatically from the constituency-based annotation of the VTB [¹⁸], containing 10.471 sentences (225.085 tokens). We manually check the correctness of the conversion on a subset of the converted corpus to come up 3.000 of universal dependency with a training set of 2.200 sentences, a test set of 400 sentences and a dev set of 400 sentences.

3.2 Feature Sets

Feature sets in transition-based: For each parser configuration c = (...|s₂|s₁|s₀, b₀|..., T) and transition f(c) in the gold parse. φ(c) is the feature vector representation if the parser configuration c. We denoted part-of-speech tags of token w is p(w). We use the notation tk(w) and e(w) to denote the extracting the word and the distributed representation of the word of token w. rm(w) and lm(w) corresponding to the right-most and left-most modifier of token w. We used the feature templates for the classifier in Table 4. Each feature v_tk (w) = p(w)◦tk(w) or v_e = p(w)◦e(w) is a feature template of token w.

Table 4 Feature sets for use in the transition classifier

Feature set	Feature templates
φ₀	v_tk(s₀), v_tk(s₁), v_tk(s₂), v_tk(b₀)
φ₁	v_e(s₀), v_e(s₁), v_e(s₂), v_e(b₀)
φ₂	φ₀, v_tk(rm(s₀)), v_tk(lm(s₀)), v_tk(rm(s₁)), v_tk(lm(s₁)), v_tk(rm(s₂)), v_tk(lm(s₂)), v_tk(lm(b₀))
φ₃	φ₁, v_e(rm(s₀)), v_e(lm(s₀)), v_e(rm(s₁)), v_e(lm(s₁)), v_e(rm(s₂)), v_e(lm(s₂)), v_e(lm(b₀))

Feature sets in graph-based: The feature-set proposed by McDonald et al. (2005) with 18 templates for a first-order parser, while the first order feature extractor in the actual implementation’s code (MSTParser^²) includes roughly a hundred feature templates. In this case, feature extractor uses merely encoding of the headword and the modifier word with pos and word.

3.3 Vietnamese Dependency Parsing Based-on Bist-Parser

The Bist-parser is a tool, using BiLSTM feature extractors with graph-based and transition-based dependency parsers. This tool was developed by Kiperwasser et al., using BiLSTM feature extractors in Section 2.2.

We use two attachment scores, labeled atta-chment score (LAS) and unlabelled attachment score (UAS) to evaluate the accuracy of the dependency parsing system. Attachment scores are defined as the percentage of correct dependency relations recovered by the parser. A dependency relation is considered correct if both the source word and the target word are correct (UAS), plus the dependency type is correct (LAS).

We also estimate on the Vietnamese dependency treebank [¹⁸]. The result is the highest accuracy in Vietnamese dependency parsing as presenting in Table 6.

Table 5 Accuracy of Bist-parser with feature sets on the Vietnamese universal dependency treebank

Feature set	System	Test
Feature set	System	USA	LSA
φ₂	Transition-based	76.86%	72.38%
	Graph-based	77.79%	74.08%
φ₃	Transition-based	75.75%	71.13%
	Graph-based	78.17%	74.84%
Phuong et al. [8]	Transition-based	73.21%	63.06%
Luong et al. [16]	Graph-based	73.09%	68.32%

Table 6 Accuracy of Bist-parser with feature sets on Vietnamese dependency treebank [¹⁸]

Feature set	System	Test
Feature set	System	USA	LSA
φ₂	Transition-based	82.77%	76.02%
	Graph-based	84.05%	78.35%
φ₃	Transition-based	83.17%	76.70%
	Graph-based	84.45%	78.56%
Luong et al. [18]	Transition-based	73.03%	66.35%
Some results on the other dependency banks in Vietnamese
Kiem-Hieu [15]	Graph-based	84.4%	81.4%
Dat Quoc et al. [14]	Graph-based (MSTParser)	79.08%	71.66%
Dat Quoc et al. [13]	Graph-based (Neural network)	80.66%	73.53%

4 Conclusion

In this paper, we presented in detail to contribute Vietnamese universal dependency. We also use this data in the Bist-parser system which is based on bidirectional LSTMs for dependency parser. We evaluated the accuracy of the system for Vietnamese parsing in two cases: with or without using the distributed word representations feature in the Bist-parser system.

The accuracy of our system is UAS=78.17% and LAS= 74.84% when we use gloVe model for producing distributed word representations on Vietnamese universal dependency. This result is the highest accuracy in comparison with the previous researches. It increases about 5.0%, with details increasing from 73.21% to 78.17% and from 68.32% to 74.84% for USA and LSA respectively. This system gets state of the art performance on Viettreebank [¹⁸] with UAS=84.45% and LAS=78.56%.

In the future, we will integrate the CRF into this system. We also conduct another approach to apply this model to a constituency-based structure in Vietnamese.

References

1. Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. Moschitti, A., Pang, B., & Daelemans, W., editors, EMNLP, ACL, pp. 740-750. [ Links ]

2. Dozat, T., & Manning, C. D. (2016). Deep biaffine attention for neural dependency parsing. CoRR, Vol. abs/1611.01734. [ Links ]

3. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-based dependency parsing with stack long short-term memory. CoRR, Vol. abs/1505.08075. [ Links ]

4. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Comput., Vol. 9, No. 8, pp. 1735-1780. [ Links ]

5. Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional lstm feature representations. CoRR, Vol. abs/1603.04351. [ Links ]

6. Le-Hong, P., Nguyen, T. M. H., & Azim, R. (2012). Vietnamese parsing with an automatically extracted tree-adjoining grammar. Proceedings of the IEEE International Conference in Computer Science: Research, Innovation and Vision of the Future, RIVF, HCMC, Vietnam. [ Links ]

7. Le-Hong, P., Nguyen, T. M. H., Nguyen, P. T., & Roussanaly, A. (2010). Automated extraction of tree adjoining grammars from a treebank for Vietnamese. Proceedings of The Tenth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+10), Yale University, New Haven, CT, USA. [ Links ]

8. Le-Hong, P., Nguyen, T.-M.-H., Nguyen, T.-L., & Ha, M.-L. (2015). Fast Dependency Parsing Using Distributed Word Representations. Springer International Publishing, Cham, pp. 261-272. [ Links ]

9. Marneffe, M.-C. D., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal stanford dependencies: a cross-linguistic typology. Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., & Piperidis, S., editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland. [ Links ]

10. McDonald, R., Crammer, K., & Pereira, F. (2005). Online large-margin training of dependency parsers. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 91-98. [ Links ]

11. McDonald, R. T., & Nivre, J. (2011). Analyzing and integrating dependency parsers. Computational Linguistics, Vol. 37, No. 1, pp. 197-230. [ Links ]

12. Minh, N. L., Đi࿇p, H. T., & K྿, T. M. (2008). Nghiên cứu luྭt hi࿇u chỉnh kết quྣ dùng phương pháp MST phân tích cú pháp ph࿥ thuộc ti྿ng vi࿇t. ICT-rda 8, Hanoi, Vietnam, pp. 258-267. [ Links ]

13. Nguyen, D. Q., Dras, M., & Johnson, M. (2016). An empirical study for vietnamese dependency parsing. Proceedings of the Australasian Language Technology Association Workshop 2016, Melbourne, Australia, pp. 143-149. [ Links ]

14. Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., Nguyen, P.-T., & Nguyen, M. L. (2014). From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. Proceedings of 19th International Conference on Application of Natural Language to Information Systems, pp. 196-207. [ Links ]

15. Nguyen, K.-H. (2017). Bktreebank: Building a vietnamese dependency treebank. CoRR, Vol. abs/1710.05519. [ Links ]

16. Nguyen, T.-L., Ha, M.-L., Le-Hong, P., & Nguyen, T.-M.-H. (2016). Using distributed word representations in graph-based dependency parsing for Vietnamese. pp. 804-810. [ Links ]

17. Phuong, L. e., Thi Minh Huyen, N., Roussanaly, A., & Vinh, H. T. (2008). A Hybrid Approach to Word Segmentation of Vietnamese Texts. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 240-249. [ Links ]

18. T.L., N., M.L., H., V.H., N., T.M.H., N., & P, L.-H. (2013). Building a treebank for vietnamese dependency parsing. International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future, RIVF 2013, Hanoi, Vietnam, November 10-13, 2013, IEEE, pp. 147-151. [ Links ]

¹ http://universaldependencies.org/guidelines.html

² http://www.seas.upenn.edu/strctlrn/MSTParser/MSTParser.html

Received: January 20, 2018; Accepted: March 05, 2018

Corresponding author is Luong Nguyen Thi. halinh.hus@gmail.com, luongnt@dlu.edu.vn, huyenntm@vnu.edu.vn, phuonglh@vnu.edu.vn

This is an open-access article distributed under the terms of the Creative Commons Attribution License