1 Introduction
In recent years, extensive research has been conducted on personality profiling based on text. Understanding the personality traits of individuals through their posts on platforms like Facebook or essays can serve various purposes. It can aid in tasks such as detecting psychological disorders, personalizing advertisements, and even identifying suitable candidates for employment [15, 12, 11].
Large Language Models (LLMs) have emerged as the gold standard for natural language generation. With LLMs [1, 7], we have the ability to imbue text with specific personality traits, allowing us to create dialogue or actions for characters in a manner that reflects their unique personalities [22, 24].
The prospect of automatically modeling characters with distinct personality and temperament using LLMs is particularly intriguing. However, to achieve this goal, our first step is to develop an automated system capable of verifying whether the generated text aligns with the intended personality traits.
In related works, researchers have commonly utilized both the Myers-Briggs Type Indicator (MBTI) model and the Big Five [10] models for personality profiling. Despite the achievement of better metrics such as F1-Score and accuracy with the MBTI model [6], criticisms regarding its validity and reliability abound [17, 18, 5].
Consequently, we opted to employ the Big Five model for our study. The Big Five model, also known as the Five Factor Model, offers a comprehensive framework for understanding and categorizing personality traits based on five key dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (OCEAN).
The Big Five model has been extensively researched and validated across diverse populations and cultural contexts, making it a robust and well-established tool for personality assessment.
Its empirical foundation and cross-cultural applicability make it particularly suitable for our purposes of character profiling and classification [20, 21].
2 Related Work
Predicting personality traits from social media has become increasingly popular due to the potential to leverage this information to enhance user interactions across a wide range of computerized platforms and interfaces. The authors of [14] carried out a comparison between classifiers in WEKA (Bayes, Functions, Rules, Trees, and Meta) for predicting student personality traits using Twitter data. Only extraversion from the Big Five model was considered. Four correlated profile features were selected and mapped. Evaluation using 10-fold cross-validation showed OneR as the best classifier with 0.87 in F1 score.
In their study, Alameda et al.[8] created a corpus of Portuguese Twitter posts for personality profiling using the Big Five model. They applied machine learning algorithms and achieved an F1 score of 0.76 using TF-IDF and logistic regression.
Akrami, N et al. [3] curated an extensive dataset by expertly annotating personality traits in texts from various online sources. They then partitioned this dataset into a large, low-reliability subset and a smaller, high-reliability subset. Using these datasets, they trained and tested multiple machine learning models, including a language model, to extract personality traits from text.
Results indicated superior performance of models trained on the smaller, high-reliability dataset, yet when tested on diverse datasets, the best model failed to outperform a random baseline. Another study analyzed personality data from 335 users and found that popular users and influencers tend to be extroverted and emotionally stable, with influencers showing higher levels of conscientiousness and popular users displaying higher levels of openness.
Additionally, the study demonstrated a method for accurately predicting a user’s personality traits based on publicly available profile information such as following, followers, and listed counts, achieving a root-mean-squared error below 0.88 on a [1,5] scale for all five personality traits [19]. In the realm of social media research, various studies have indicated that prediction accuracy remains consistent across Big Five traits. Moreover, these studies suggest that accuracy tends to improve when analyses encompass demographics and incorporate multiple types of digital footprints [4].
All preceding studies have focused on personality profiling with real individuals. However, our research aims to conduct personality profiling on fictional characters. In [9] the authors conducted personality profiling of fictional characters from books by creating a corpus comprising text encompassing direct speech, actions, and descriptions of the main characters. They employed WordNet, VerbNet, and word vector representation for feature extraction and utilized traditional machine learning models such as support vector machine. The results revealed an F1 score of 0.693 when utilizing character descriptions for personality profiling.
Authors of [13], this study presents a novel approach to crafting character profiles for Spanish fictional literary works using Artificial Intelligence techniques. A tool is developed to mitigate information loss stemming from text cacophony reduction.
The integration of the Bidirectional Transformer Encoder (BERT) layer ensures the model comprehends the broader context of the text within practical Natural Language Processing (NLP) applications. It’s noteworthy to mention that the researchers opted not to utilize the Big Five personality model, but instead employed the MBTI for character profiling.
3 Corpus
Our methodology involves the creation of a corpus by the systematic collection and analysis of dialogues and actions from a selection of 87 movie characters. This corpus that we named Movie Character Personality Corpus based on Big Five (MochaP) comprises 4155 distinct texts, capturing both direct speech and character actions.
The annotation of the text corresponds to the Big Five profiling associated with each character, obtained from a website featuring personality profiles of various public figures, historical figures, and fictional characters.
To collect the movie scripts, we accessed the website “The Script Lab” [23] hosting PDF files of the scripts. Due to the absence of a standard format, we developed multiple scripts to extract dialogue and actions from various script formats. Additionally, we manually curated the results to address inconsistencies arising from the lack of uniform formatting.
For annotation purposes, the website “Personality database” [16] provides personality profiles based on various models, including the Big Five Model. These profiles adhere to the SLOAN convention, which we subsequently translated into the OCEAN model, a more widely recognized framework for personality analysis.
The personality profiles are assigned by the website contributors, and different contributions undergo validation, with the final profile determined by the consensus of the votes. We assigned a label of 0 if the personality trait was voted as low on the website, and 1 in the opposite case, resulting in binary labels. For instance, a character would receive a label of 1 if classified as extroverted and 0 if categorized as introverted, for the Extraversion trait, the same process was applied to each of the Big Five traits. It is important to note that the corpus is currently under development, with plans to expand both the number of characters and the variety of movies included.At this stage. The distribution of dialogues and texts are shown in Figure 1 and Figure 2.

Fig. 1 The distribution of dialogues (including direct speech and actions) for each big five traits across the current corpus
4 Methodology
The task of personality profiling was approached as a classification problem, where each Big5 personality trait was treated as a binary classification task.
4.1 Feature Extraction
Initially, we decided to utilize both actions and direct speech without distinction, the Figure 3 shows the steps followed. Therefore, the texts were preprocessed together. We applied a preprocessing to text for improving the performance of the model, this preprocessing helps to clean the data and normalise it. The steps are the following:
The order of preprocessing is crucial. First, we need to split the text into meaningful parts for processing, a step known as tokenization. In English, certain words contain contractions, so we use Multi-Word Token (MWT) expansion to address this.
Next, we apply Part of speech tagging, which is essential for lemmatization as it aids in disambiguation and accurate tag assignment. Lemmatization helps reduce the dimensionality of vectors, making them less sparse.
We made the decision to retain special characters as they might contribute to the effectiveness of personality profiling, similar to stopwords. Subsequently, we segregated the actions and direct speech,the Figure 4 shows the steps. Direct speech underwent preprocessing as mentioned earlier.
However, for the actions, we filtered them to include only the verbs, serving as additional independent features. If there were no actions or direct speech, these features were filled with zeros.
In the traditional machine learning approach, we chose to assess two distinct forms of vectorization: Bag of Words and TF-IDF (Term Frequency-Inverse Document Frequency) values.
4.2 Machine Learning Models
We conducted tests using different machine learning algorithms for the binary classification of each Big Five personality traits. These classifiers are:
Logistic Regression: Stochastic Gradient Descent optimizer, l2 penalty.
Support Vector Machines:Stochastic Gradient Descent optimizer, linear kernel.
Naive Bayes: For multinomially distributed data.
We chose these classifiers because they have been extensively used over time across various text classification tasks, yielding consistently good results for this kind of task [2].
4.3 LSTM
For this approach, we applied the same preprocessing techniques, followed by text vectorization using a word Embedding with a latent dimension of 20. A dropout layer was incorporated, followed by an LSTM cell, and finally a dense layer for classification. The loss function utilized was binary cross-entropy, and we employed an Adam optimizer.
5 Results
5.1 Traditional Machine Learning
For this approach, we evaluated both vectorization: Bag of Words (Table 1) with unigrams and TF-IDF (Table 2) , also with unigrams, with direct speech and actions merged. We conducted a 10-fold cross-validation, and the reported results in the following tables represent the average of the 10 outcomes. Additionally, we tested the separation of direct speech and actions described in the precious section, solely with Bag of Words with unigrams, utilizing 10-fold cross-validation as well. Results are shown in Table 3.
Table 1 Results of logistic regression and support vector machines using bag of words vectorization
| Model | Metric | O | C | E | A | N |
| RL | Recall | 0.6858 | 0.692 | 0.6825 | 0.6975 | 0.7148 |
| Precision | 0.6733 | 0.6695 | 0.6713 | 0.6828 | 0.7138 | |
| Accuracy | 0.7 | 0.725 | 0.6931 | 0.7137 | 0.7154 | |
| F1-Score | 0.676 | 0.6757 | 0.6732 | 0.6858 | 0.7137 | |
| SVM | Recall | 0.6623 | 0.6631 | 0.6548 | 0.6721 | 0.697 |
| Precision | 0.6543 | 0.6572 | 0.6521 | 0.6646 | 0.697 | |
| Accuracy | 0.6777 | 0.7 | 0.6668 | 0.6911 | 0.6971 | |
| F1-Score | 0.6551 | 0.6593 | 0.6526 | 0.666 | 0.6961 | |
| NB | Recall | 0.7123 | 0.7493 | 0.718 | 0.7291 | 0.7416 |
| Precision | 0.6863 | 0.6816 | 0.6758 | 0.6762 | 0.7331 | |
| Accuracy | 0.7205 | 0.7573 | 0.7123 | 0.7272 | 0.7371 | |
| F1-Score | 0.6909 | 0.6932 | 0.6774 | 0.6812 | 0.7329 |
Table 2 Results of logistic regression and support vector machines using TF-IDF vectorization
| Model | Metric | O | C | E | A | N |
| RL | Recall | 0.7157 | 0.7429 | 0.7001 | 0.723 | 0.7267 |
| Precision | 0.6918 | 0.6743 | 0.6787 | 0.6864 | 0.7254 | |
| Accuracy | 0.7248 | 0.7517 | 0.7063 | 0.7289 | 0.7267 | |
| F1-Score | 0.6964 | 0.6855 | 0.6814 | 0.6915 | 0.7251 | |
| SVM | Recall | 0.6889 | 0.7057 | 0.6832 | 0.706 | 0.7178 |
| Precision | 0.6799 | 0.6761 | 0.6722 | 0.6879 | 0.7167 | |
| Accuracy | 0.7043 | 0.7354 | 0.6938 | 0.721 | 0.7181 | |
| F1-Score | 0.6824 | 0.6838 | 0.6741 | 0.6923 | 0.7164 | |
| NB | Recall | 0.7581 | 0.791 | 0.726 | 0.7757 | 0.7432 |
| Precision | 0.623 | 0.5925 | 0.6024 | 0.607 | 0.7272 | |
| Accuracy | 0.6959 | 0.7178 | 0.6666 | 0.6938 | 0.7325 | |
| F1-Score | 0.6069 | 0.5726 | 0.5736 | 0.5833 | 0.7257 |
Table 3 Results of logistic regression and support vector machines using bag of words vectorization with separated features
| Model | Metric | O | C | E | A | N |
| RL | Recall | 0.5901 | 0.599 | 0.5851 | 0.6038 | 0.6108 |
| Precision | 0.5822 | 0.5784 | 0.5776 | 0.5883 | 0.61 | |
| Accuracy | 0.6166 | 0.6573 | 0.606 | 0.6354 | 0.6118 | |
| F1-Score | 0.5819 | 0.5782 | 0.5764 | 0.5877 | 0.6089 | |
| SVM | Recall | 0.574 | 0.5872 | 0.5709 | 0.5889 | 0.6031 |
| Precision | 0.5692 | 0.5729 | 0.5656 | 0.578 | 0.6026 | |
| Accuracy | 0.6 | 0.6452 | 0.5918 | 0.6207 | 0.6036 | |
| F1-Score | 0.5691 | 0.5734 | 0.5647 | 0.5772 | 0.6011 | |
| NB | Recall | 0.6104 | 0.65 | 0.6194 | 0.6096 | 0.6149 |
| Precision | 0.5908 | 0.5827 | 0.5867 | 0.5764 | 0.6125 | |
| Accuracy | 0.6359 | 0.6876 | 0.6327 | 0.6395 | 0.6161 | |
| F1-Score | 0.5885 | 0.5745 | 0.5776 | 0.5681 | 0.6114 |
5.2 LSTM
For the Deep Learning approach, we employed the LSTM architecture as described in the section above. We utilized only the merged features (direct speech with actions), and a hold-out validation strategy was implemented. The results displayed in Table 4.
6 Discussion
In approaching the problem of personality profiling as a binary classification task, we view it akin to a text classification problem. Text classification has been extensively studied, and we tested several classification models known for their effectiveness with short text. Initially, we experimented with traditional machine learning algorithms, which are adept at classifying short text. The TF-IDF metric helped penalize terms that appear frequently across documents, prioritizing words that can differentiate between documents. Additionally, logistic regression indicated that within the feature space we constructed, the classes are linearly separable. This classification is a crucial step in a larger project where we intend to use it to develop a reward model. Therefore, we aimed to keep the classification process simple and efficient. Traditional machine learning methods were chosen for their speed and memory efficiency.
We also attempted to use deep learning, specifically LSTM, to enhance our results. However, we found that due to the length of the text and the size of the corpus, the LSTM did not perform as well as we had expected. Traditional machine learning can perform well even with a smaller corpus. Additionally, logistic regression is not a black box method, allowing us to achieve explainability in our experimental results.
7 Conclusion
Text generation has had a significant impact in recent years, offering various applications including the creation of fictional characters with predefined personalities. To achieve this automatically, we must develop automated personality profiling, which is currently a hot topic in research.
However, it’s essential to acknowledge that this task is challenging and requires substantial knowledge and resources to complete effectively. In this study, we aimed to explore personality profiling specifically for fictional characters.
While related works have shown promising results in personality profiling using social media data, they often incorporate additional data types beyond text alone. In our approach, we focused solely on textual data and experimented with both traditional machine learning algorithms and LSTM models. Our objective was to achieve profiling with minimal computational power.
Our results indicate that traditional machine learning algorithms outperformed LSTM models in this context. This suggests that lexical resources can effectively capture the essence of personality traits. Notably, TF-IDF representation proved to be well-suited for the task. Additionally, we observed that data splitting did not improve classification performance.
While our results are promising, there is room for improvement. In future work, we aim to expand the corpus and enhance the features used for classification. Despite the challenges, we achieved commendable results with simple machine learning algorithms and established the Big5 Movie Character corpus.










nueva página del texto (beta)





