This code accompanies the following paper (Evolang 2024 presentation & proceedings paper), which describes the analysis performed using this code:
Peter Dekker, Sonja Gipper & Bart de Boer (2024). 3SG is the most conservative subject marker across languages: An exploratory study of rate of change.
A cross-linguistic analysis of the rate of change of different subject markers is performed, by comparing proto and modern forms from the following dataset:
Ilja Seržant. (2021). Dataset for the paper "Universal attractors in language evolution provide evidence for the kinds of efficiency pressures involved" [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7641119
With the script analyse_data.py, an analysis of the rate of change of person markers for different grammatical persons is performed.
The script requires a few installation steps:
- The script uses R, so install R, CMake and some dependencies needed for the R packages beforehand. On an Ubuntu machine:
sudo apt install r-base cmake libharfbuzz-dev libfribidi-dev
-
Install the Python package requirements:
pip3 install -r requirements.txt(or withsudofor global install) -
Install the required R packages:
python3 install_r_pkgs.py(or withsudofor global install) -
This script requires the file
verbal_person-number_indexes_merged.csvfrom https://zenodo.org/records/7641119 (version v5) to be in the folderdata. -
Now you can run the script by running:
python analyse_data.py. It will output plots in theoutput_datadirectory. For the model using unnormalised Levenshtein distance, set the variableNORMALISATIONtonone. For the normalised model, where the Levenshtein distance is divided by the length of the longest form, setNORMALISATIONtomax.