Film language affecting behaviour: A psycholinguistic approach

Dublin Core

Title

Film language affecting behaviour: A psycholinguistic approach

Creator

Aleksandra Tuneski

Date

2021

Description

Films are a popular form of art and entertainment that enable people to enjoy a story through multiple stimuli perception and stimulation of emotions. Plenty are the film elements that impact the audience’s attitude towards the film, yet language style has rarely been taken in consideration for research. This study focused on examining whether there exists a relationship between the audience’s favouritism for films and the linguistic style present in them, predominantly concentrating on emotional factors of language in films. A dataset containing the widest public ratings of films was obtained from the Internet Movie Database platform and paired with respective transcribed film dialogues provided by OpenSubtitles.org. The corpora’s transcripts (n=88,573) were analysed using the Linguistic Inquiry and Word Count software and all the variables produced were then correlated with IMDb’s weighted film ratings. The project found that all types of emotions present in transcripts of film language were significantly, negatively associated with the IMDb rating outcomes, while the effect sizes were small. This finding suggests there might be an inclination for emotions to be felt in other areas of stimuli perception, rather than verbal language, when it comes to films. Additional exploratory analyses showed how other variables correlated with film rating scores and practical application of study findings within the advertising industry were identified.

Subject

Pearson’s correlation

Source

Dataset

The dataset used for the study is purely secondary and consists of transcribed film dialogues (N=88,573) complemented with each film’s respective Internet Movie Database (IMDb) rating, which at the time of collection had a minimum of 100 user ratings per film. IMDb is an online film rating platform where the wider audience must register for an account and is then able to rate and review the films they have watched. Registered IMDb members rate films on a 10-point scale, with 1 indicating “terrible” and 10 indicating “excellent” (Boyd et al., 2020). IMDb’s rating algorithms produce ratings that are weighted by metrics associated with users, rather than average ratings. Although the algorithms are unavailable to the public, IMDb’s rating system has shown consistency across all films because the weighted ratings constantly provide reliability by reducing the possibilities of a small group of users to take advantage of the rating system (IMDb, 2021). IMDb is one of the most popular and authoritative film rating websites, where the total ratings of a film are anonymous and voluntarily provided (Sawers, 2015).

The transcribed film dialogues data was provided by OpenSubtitles.org and the corpora was previously organised and used in a study by Boyd et al. (2020); it was generally provided by the authors for the purpose of this project. OpenSubtitles.org is an online website that provides transcribed and translated captions of motion pictures, audio files and various other audio-visual files (OpenSubtitles.org, 2021). The corpora used by Boyd et al. (2020) contains purely English-language film subtitles, corresponding to films originally released in English, or foreign films whose dialogues have been translated to English. Boyd et al. (2020) combined the transcribed film dialogues provided by OpenSubtitles.org with the IMDb ratings, along with other IMDb categories such as film genre, year of release, country of production, et cetera. Almost 90% of the IMDb categories linked to the films’ ratings are irrelevant for the purpose of this project, thus solely the film ratings will be taken in consideration for analysis.

Automated Textual Analysis Software (LIWC)

To conduct the automated textual analysis, this research project will use the Linguistic Inquiry and Word Count (LIWC) tool; also called “Luke”. LIWC is a textual analysis program that measures the degree to which various dimensions of words are used in a text (Tausczik & Pennebaker, 2010). LIWC program has two central features – the processing component and the dictionaries. The processing feature takes a text file and analyses it word by word, comparing each word with the dictionary files, sorting the word out as, for example, verb or second person pronoun (Boyd, 2017). Once the program finishes running, it produces an output where all the LIWC categories used in the text are listed, as well as the rates and percentages that each category was used in the given text.

The dictionaries are at the heart of the LIWC program and they identify the group of words that belong to each category (Pennebaker et al., 2015). When the program was being created, the authors aimed at developing measures to define emotions present in words, cognitive processes, signs of self-reflection, et cetera, and in order to assign a psychological component to words, human judges contributed in developing the categories LIWC possesses today (Boyd, 2017). Across approximately 80 dimensions (see Appendix A), LIWC analyses the text in relation to various parts of speech, thinking styles, social concerns and emotions (Pennebaker et al., 2001). For example, the “positive emotion” category contains words such as “love”, “happy” and “nice”, while the “cognitive processes” category comprises words like “examine”, “think” and “understand”.

Over the years, LIWC has been able to uncover psychological patters and personalities purely from textual analysis; Petrie et al. (2008) used LIWC to investigate the Beatles’ lyrics and found out that it was possible to distinguish each songwriter’s unique language style, and also to discover whose Beatle’s style was predominant in collaboratively written songs. Researches have shown LIWC to be one of the most reliable automated textual analysis tools that is able to uncover and predict psychological implications residing in written sources, thus this study will employ this tool to test its hypothesis.

Data Preparation and Analysis

The initial corpora was subjected to cleaning procedures, where data which did not meet all inclusion criteria was removed from the dataset. The inclusion criteria consisted of film ratings having at least 100 user votes, transcribed dialogues having at least 100 words and corpora variables containing all data values. The cleared dataset (N=85,130) is going to be tested in the LIWC program, where each word within the transcripts will be counted and sorted among the LIWC dictionary categories it belongs to. For the main hypothesis, the program will analyse the dataset for LIWC variables that have been shown to be correlated with positive and negative evaluations in the past. This way, the quantified rates of positive and negative emotion words in each dialogue will be identified. Once the rates have been extracted, a bivariate Pearson’s correlation will be conducted to assess whether there exists a significant relationship between positive and negative emotion words in film dialogues and their IMDb ratings. Additionally, exploratory analyses will be run to search for significant relationships between the dataset variables and the film ratings, again by conducting Pearson’s correlation tests between the ratings and all LIWC variables produced.

Publisher

Lancaster University

Identifier

Tuneski (2021)

Contributor

Amy Austin and Lesley wu

Rights

Open

Language

English

Type

Secondary Data

Coverage

LA1 4YF

LUSTRE

Supervisor

Ryan Boyd

Project Level

MSc

Topic

Language psychology

Sample Size

88,573

Statistical Analysis Type

Pearson Correlation

Files

LIWC Codebook.pdf

Collection

Citation

Aleksandra Tuneski, “Film language affecting behaviour: A psycholinguistic approach,” LUSTRE, accessed March 29, 2024, https://www.johnntowse.com/LUSTRE/items/show/106.