《文字書寫系統(tǒng)的計算理論》以說明文語轉換系統(tǒng)的可操作性問題為前提,目的并不是要介紹不同的文字書寫系統(tǒng)。最重要的理論論點都在第一章提出。其兩個基本論點是:(一)詞形到書寫規(guī)則的映射存在正則關系(regular relation);(二)一個特定語言的書寫系統(tǒng)所表達的語言學信息具有一致性(consistency)。其它的章節(jié)主要是通過實例以不同的角度來對這兩個論點作出詳細的闡述和證明。第二章較詳細的闡述了書寫系統(tǒng)的正則性。第三章則詳細說明了特定文字如何表達語言學信息以及所信息表達信息的一致性問題。第四章介紹現(xiàn)代語言學的幾種常用的文字體系分類,進而提出對文字書寫系統(tǒng)的二維分類方法。第五章簡要介紹如何用心理語言學的方法來分析母語讀者進行文語轉換的方式,并將本書所提出的理論與心理語言學的結論進行印證。第六章先講解文字與書寫系統(tǒng)是如何被不同的文字借鑒以及承傳的方式方法,另外給出文字中對縮寫和數(shù)字的表述以及轉換,最后對本書的內(nèi)容做了一個總結。
導讀F9
PrefaceF29
List of FiguresF31
List of TablesF33
1 Reading Devices1
1.1 Text to Speech Conversion:A Brief Introduction2
1.2 The Task of Pronouncing Aloud:A Model6
1.2.1 A Simple Example from Russian6
1.2.2 Formal Definitions11
1.2.2.1 AVMs and Annotation Graphs11
1.2.2.2 Definitions13
1.2.2.3 Axioms14
1.2.3 Central Claims of the Theory15
1.2.3.1 Regularity16
1.2.3.2 Consistency19
1.2.4 Further Issues20
1.2.4.1 Why a Constrained Theory of Writing Systems?21
1.2.4.2 Orthography and the “Segmental” Assumption23
1.3 Terminology and Conventions25
1.A Appendix:An Overview of Finite State Automata and Transducers29
1.A.1 Regular Languages and Finite State Automata29
1.A.2 Regular Relations and Finite State Transducers30
2 Regularity34
2.1 Planar Regular Languages and Planar Regular Relations35
2.2 The Locality Hypothesis41
2.3 Planar Arrangements:Examples42
2.3.1 Korean Hankul43
2.3.2 Devanagari45
2.3.3 Pahawh Hmong47
2.3.4 Chinese48
2.3.5 A Counterexample from Ancient Egyptian54
2.4 Cross Writing System Variation in the SLU55
2.5 Macroscopic Catenation:Text Direction59
2.A Sample Chinese Characters and Their Analyses62
3 ORL Depth and Consistency67
3.1 Russian and Belarusian Orthography:A Case Study67
3.1.1 Vowel Reduction68
3.1.2 Regressive Palatalization73
3.1.3 Lexical Marking in Russian and Other Issues76
3.1.4 Summary of Russian and Belarusian79
3.2 English79
3.3 The Orthographic Representation of Serbo Croatian Consonant Devoicing89
3.3.1 Methods and Materials91
3.3.2 Results92
3.4 Cyclicity in Orthography95
3.5 Surface Orthographic Constraints96
3.A English Deep and Shallow ORLs99
3.A.1 Lexical Representations99
3.A.2 Rules for the Deep ORL127
3.A.3 Rules for the Shallow ORL129
4 Linguistic Elements131
4.1 Taxonomies of Writing Systems:A Brief Overview132
4.1.1 Gelb132
4.1.2 Sampson133
4.1.3 DeFrancis134
4.1.3.1 No Full Writing System Is Semasiographic134
4.1.3.2 All Full Writing Is Phonographic135
4.1.3.3 Hankul Is Not Featural135
4.1.4 A New Proposal
4.1.5 Summary
4.2 Chinese Writing
4.3 Japanese Writing
4.4 Some Further Examples
4.4.1 Syriac Syame
4.4.2 Reduplication Markers
4.4.3 Cancellation Signs
5 Psycholinguistic Evidence
5.1 Multiple Routes and the Orthographic
Depth Hypothesis
5.1.1 Evidence for the Orthographic Depth Hypothesis
5.1.2 Evidence against the Orthographic Depth Hypothesis
5.2 "Shallow" Processing in "Deep" Orthographies
5.2.1 Phonological Access in Chinese
5.2.2 Phonological Access in Japanese
5.2.3 Evidence for the Function of Phonetic Components in Chinese
5.2.4 Summary
5.3 Connectionist Models:The Seidenberg-McClelland Model
5.3.1 Outline of the Model
5.3.2 What Is Wrong with the Model?
5.4 Summary
6 Further Issues
6.1 Adaptation of Writing Systems:The Case of Manx Gaelic
6.2 Orthographic Reforms: The Case of Dutch
6.2.1 The 1954 Spelling Rules
6.2.2 The 1995 Spelling Rules
6.3 Other Forms of Notation:Numerical Notation and Its Relation to Number Names
6.4 Abbreviatory Devices
6.5 Non-Bloomfieldian Views on Writing
6.6 Postscript
Bibliography
Index
Our starting point for this study of writing systems is text-to-speech synthe-sis - TTS, and more specifically the computational problem of convertingfrom written text into a linguistic representation. While the connection be-tween TTS systems on the one hand and writing systems on the other maynot be immediately apparent, a moment's reflection will make it clear thatthe problem to be solved by a TTS system - namely the conversion ofwritten text into speech - is exactly the same problem as a human readermust solve when presented with a text to be read aloud. And just as writingsystems, their properties, and the ways in which they encode linguistic infor-mation are of interest to psycholinguists who study how people read, so (inprinciple) should such considerations be of interest to those who develop
TTS technology: At the very least, it ought to be of as much interest as,for example, understanding the physiology and acoustics erlying speechproduction, something that early speech synthesis researchers such as Fant(1960) were heavily involved in.
Since my starting point is TTS, and since I assume that most readers willnot be familiar with this field, I will start this chapter with a review of someof the issues relevant to the development of TTS systems, particularly asthey relate to the problem of analyzing input text. This will be the topic ofSection 1.1. In Section 1.2 1 will informally introduce, by way of a simpleexample, the model that I shall be developing throughout the rest of thisbook. Finally, Section 1.3 will introduce some aspects of the formalism andthe conventions that will be used throughout this book.
……