Posts Tagged ‘lexer’

  • One+ Language: Three+ Scripts?


    My research and teaching is based in Hindi and Urdu.  Although arguably different spectrums of the same language, they are written in different scripts.  Hindi, which in general has more Sanskrit vocabulary  is written in a left-to-right devanagari (देवनागरी) script and Urdu, which draws more on Persian and Arabic, in a right-to-left Perso-Arabic script called nastaleeq (نستعلیق).  Both scripts are generally but not entirely phonetic.

    In a series of small projects, I have explored the possibility of creating a meta-notation that will encapsulate enough information to allow the representation both scripts, as well as phonetic transcription and diacritic-based transliteration, of any given text ( ;  The overall goal of my project is to use digital technology to override the limitations of the division between these scripts/languages.  My interest is in encoding more phonetic of etymological information about texts than Unicode will allow, so that digital texts can be used for both advanced humanities research as well as for language pedagogy.

    My  particular research interest is in Urdu poetry, which is among the most prized literary genres in South Asia.  I have explored ways of building on the class-based lexer/parsers used for script conversion in order to facilitate computational prosody of Urdu poetic texts.  Urdu meter is based on length rather than stressed syllables.  Using the meta-notion therefore would allow that these popular texts could be read not only in Hindi and transliteration but also exposed as sound in time.  How can that be visualized?