No public posts in this group. You must login and become a member in order to post messages, and view any private posts.

AMIRA is a successor suite to the ASVMTools. The AMIRA toolkit includes a clitic tokenizer (TOK), part of speech tagger (POS) and base phrase chunker (BPC) - shallow syntactic parser. The technology of AMIRA is based on supervised learning with no explicit dependence on knowledge of deep morphology, hence, in contrast to systems such as MADA, it relies on surface data to learn generalizations. In general the tools are based on using a unified framework casting each of the component problems as a classification problem. The underlying technology uses Support Vector Machines in a sequence modeling framework using the YAMCHA toolkit. The system is very fast and robust and allows for a limited number of variable user settings depending on the disambiguation granularity. The AMIRA tools have been widely used for different NLP applications due to its speed and high performance. It has been used for preprocessing for the purposes of MT, IR, Parsing, NER, IE, etc. The different components of the tool suite could be invoked together, taking raw text in any encoding and producing clitic tokenized, POS tagged and/or chunked data, or the different components could be applied directly on some given texts, for example POS tagging could be applied on raw text directly without necessarily the need to explicitly invoke tokenization. It is worth noting that the AMIRA tools are trained on MSA, however we have adapted them on a shallow level to handle dialectal Arabic.



Syndicate content