Named Entities Workshop (NEWS) 2011
IJCNLP 2011 Workshop
12 Nov 2011,Chiang Mai,Thailand

Frequently Asked Questions

  1. What is Machine Transliteration?
  2. What are the formats of training/development data and submission results?
  3. Some target names in English include stress marks and hyphens. What do they mean in Arabic transcriptions?
  4. Should hyphens-dashes be treated as word separators?

1. What is Machine Transliteration?

Transliteration is to transcribe words from one language to another in a different writing system while maintaining their phonetic equivalence. Machine Transliteration is to automate such a process using computational approach.

2. What are the formats of training/development data and submission results?

The format will be described in details in the Whitepaper. It will be the same XML format used in the last year shared task.

3. Some target names in English include stress marks and hyphens. What do they mean in Arabic transcriptions?

Apostrophes represent either the Arabic letter 'alif (glottal stop) or 'ayn (voiced pharyngeal fricative). Dashes show that a name is unitary and indivisible. For example, on seeing the name "'Abdul 'Aziz", some non-Arabs might incorrectly assume that "'Abdul" is a given name, and that "'Aziz" is a surname. Dashes in romanized Arab names help prevent such mistakes. You may remove those characters if your system does not need them.

4. Should hyphens-dashes be treated as word separators?

Yes, all special characters including hyphens, commas, paranthesis, quotes, colons, semi-colons etc. should be treated as word splitters. For example, the title, “The_Pursuit_of_Happyness” should be considered to be composed of 4 words {The, Pursuit, of, Happyness}. Similarly, the word “Al-Ma'un” is composed of 3 words {Al, Ma, un}. Similarly, “de Icaza” or “von Neumann” would not be considered as a single string but two strings each. In the target language, if “von Neumann” corresponds to a single name “фонНеймана” then no pairing of full or substrings are considered transliterations. If “von Neumann” correspond to “фон Неймана” then, there are two correct transliterations, namely {“von”, “фон”} and {“Neumann”, “Неймана”}.