This is a rewrite of TALEN: a Tool for Annotation of Low-resource ENtities using React.js and a python backend.
This software was designed for annotating Named Entity Recognition (NER), but can be used for any token-level sequence annotation task.
Check out a demo here: annotate.universalner.org.
- npm
- python 3.6+
The code is separated into two folders: client/, which holds the frontend, and server/, which holds the backend.
Each folder has it's own README file, with more details (probably too many).
Installation and running will be done separately for each folder.
To install MongoDB:
$ brew update
$ brew install mongodbTo install the backend:
$ cd server
$ python -m venv cool-environment-name # virtual env optional but strongly recommended
$ source cool-environment-name/bin/activate
$ pip install -r requirements.txt
cd ..To install the frontend:
$ cd client
$ npm install
$ cd ..First, make sure that MongoDB is running locally.
$ bash start_mongo.sh
$ cd server
$ python -m scripts.mongo_stats -e dev # check that it worked.You can also check this with check_mongo.sh, and stop it with stop_mongo.sh. (These commands assume that you
are on a Mac, and have Homebrew installed. Instructions will be different on Windows and Linux).
$ cd server
$ export ENV=dev && python app.pyThis will default to port 8080, but you can change this by setting the $PORT variable.
There are two options for viewing the frontend. If you want to modify it and have it reload automatically, start the node server (in a new terminal):
$ cd client
$ export REACT_APP_URL="http://localhost:8080" && npm run startIf you are ready to start annotating in earnest, compile the react code into static files and serve alongside the flask app. To do this, run (in client/):
$ cd client
$ npm run buildThis will create a folder called client/build containing static files.
Then, with the backend server running, visit, localhost:8080/.
The primary method for storing data is in MongoDB.
This repo also contains some example datasets in server/data/, as well as corresponding dataset config files in config/datasets/.
Every .yml file in config/datasets/ will be loaded as a config file. Each config file must contain:
- name: some string identifier
- path: path to the dataset
- reader: the Python class that will read this data. See
server/data_readersfor examples.
You may optionally include a list of labels and their colors, but by default each config file inherits the
labelset from config/base.yml.
One of the motivators for writing this software was to annotate Universal Dependencies with NER tags.
To get going with annotation, see this file.
To build:
$ ./run_docker.sh build
To run:
$ export ENV=prod # currently, you have to run on prod
$ ./run_docker.sh run
Then visit http://localhost:1337 in a browser.
Run:
$ python -m scripts.download_data_to_bio --environment prod --dataset-name sv_pud-ud-test
Replace sv_pud-ud-test with any dataset you choose. This will download to a single file, and only
needs read-only privileges on the Mongo DB.
When running the app locally, it will add a default user with username "a" and password "a". When running in production,
use the manage_users.py script to add, update, or delete users.
Run:
$ python -m scripts.get_interannotator_agreement
The repo has an Action defined in /.github/workflows/cloud-run.yml that deploys to Google Cloud Run when merging to master. Notice that MONGO_USERNAME and MONGO_PASSWORD variables are stored as secrets in Google Cloud.
If you use this in your research paper, please cite us!
@inproceedings{talen2018,
author = {Stephen Mayhew, Dan Roth},
title = {TALEN: Tool for Annotation of Low-resource ENtities},
booktitle = {ACL System Demonstrations},
year = {2018},
}
You can read the paper here: http://cogcomp.org/papers/MayhewRo18.pdf