Configure SmartAnno

Configure word embedding model

Because SmartAnno’s current backend models use word embedding, we need to configure where to get the pretrained embedding model, and its vocabulary size and dimension size.

If you directly downloaded Glove embedding, you will need to use gensim to convert the format first as below

from gensim.scripts.glove2word2vec import glove2word2vec
# suppose 'resources/glove.6B.100d.txt' is the embedding file to be converted 
glove2word2vec('resources/glove.6B.100d.txt', 'resources/glove.6B.100d.gs.txt')

Then you can check its vocabulary and dimension size using following command (linux):

head -1 resources/glove.6B.100d.gs.txt

The first number is the vocabulary size, the second is the dimension size.

Additional tranformer models will be available soon.

Configure UMLS API key

We can use UMLS REST API to expand keywords by looking for synonyms. These keywords are used in the initial rule-based pre-annotator.

Here is the instruction video about how to get an UMLS API key (at 01:12 from the beginning)

We can also leave it blank if we don’t want to use UMLS.

Configure SmartAnno

With the information above, we are now able to configure SmartAnno:

Configure Screenshot

For more details, please refer to this runnable Colab Notebook.