Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp)
A multilingual, POS-tagged, and audio-textual databank
This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed
recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40
types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then,
it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data
as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic
factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic
metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social
groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the
field of Sino-Philippine (socio)linguistics.
Article outline
- 1.Introduction
- 1.1The Lannangs and language
- 1.2The state of (socio)linguistic research in the Lannang and Sino-Philippine context
- 1.3LanCorp as a solution
- 2.Considerations in the creation of LanCorp
- 3.Procedure
- 3.1Collection
- 3.2Transcription
- 3.3Processing
- 4.Corpus data distribution by selected factors
- 4.1Style
- 4.2Age groups
- 4.3Sex
- 4.4Religion
- 4.5Language
- 5.Metadata
- 6.Format and accessibility
- 7.LanCorp in action: Exploratory analyses
- 7.1Variation in the first-person inclusive pronoun
- 7.2Variation in conjunctions with adversative function: Tagalog-derived vs. Hokkien-derived
- 7.3Variation in conjunctions of manner: Tagalog-derived vs. Hokkien-derived
- 7.4Variation in Hokkien-derived conjunctions of manner: Khânân(g) vs. khâlân(g)
- 8.Conclusion
-
References
References (68)
References
Anthony, L. (2022). AntConc (Version
4.0.5) [Computer software]. Waseda University. [URL]
Benor, S. B. (2010). Ethnolinguistic
repertoire: Shifting the analytic focus in language and ethnicity. Journal of
Sociolinguistics, 14(2), 159–183.
Boersma, P., & Weenink, D. (2021). Praat:
Doing phonetics by computer (6.1.51) [Computer
software]. [URL]
Cheng, A. (2016). A
Survey of English Vowel Spaces of Asian American Californians. UC Berkeley PhonLab Annual
Report 2016, 348–384.
Cheng, A., & Cho, S. (2021). The
effect of ethnicity on identification of Korean American
speech. Languages, 6(4), 186.
Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E. (2011). Contact,
the feature pool and the speech community: The emergence of Multicultural London
English. Journal of
Sociolinguistics, 15(2), 151–196.
Chu, R. (2010). Chinese
and Chinese Mestizos of Manila: Family, Identity, and
Culture, 1860s–1930s. Brill.
Chu, R. (2021). From
‘sangley’ to ‘Chinaman’, ‘Chinese Mestizo’ to ‘Tsinoy’: Unpacking ‘Chinese’ identities in the Philippines at the turn of the
twentieth-century. Asian
Ethnicity,
24
(1), 7–37.
Chua, D. A. (2004). From
Chinese to Filipino: Changing Identities of the Chinese in the Philippines [Unpublished master’s
thesis]. The University of British Columbia.
Chuaunsu, R. (1989). A
Speech Communication Profile of Three Generations of Filipino-Chinese in Metro Manila: Their Use of English, Pilipino and
Chinese Languages in Different Domains, Role-Relationships, Speech Situations and
Functions [Unpublished master’s thesis]. University of the Philippines.
Chun, E. W. (2001). The
construction of white, black, and Korean American identities through African American Vernacular
English. Journal of Linguistic
Anthropology,
11
(1), 52–64.
Doeppers, D. (1986). Destination,
selection and turnover among Chinese migrants to Philippine cities in the nineteenth
century. Journal of Historical
Geography,
12
(4), 235–260.
Dy, C. J. (1972). The
syntactic structures of Amoy as used in the Philippines. Philippine Journal of
Linguistics,
3
(2), 75–94.
ELAN (Version 5.9) [Computer
software]. (2020). Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. [URL]
Gonzales, W. D. W. (2016). Trilingual
code-switching using quantitative lenses: An exploratory study on Hokaglish. Philippine Journal
of
Linguistics,
47
1, 106–128.
Gonzales, W. D. W. (2017b). Philippine
Englishes. Asian
Englishes,
19
(1), 79–95.
Gonzales, W. D. W. (2021). Filipino,
Chinese, neither, or both? The Lannang identity and its relationship with language. Language
&
Communication,
77
1, 5–16.
Gonzales, W. D. W. (2022a). Hybridization. In A. M. Borlongan (Ed.), Philippine
English: Development, Structure, and Sociology of English in the
Philippines (pp. 170–183). Routledge.
Gonzales, W. D. W. (2022b). Interactions
of Sinitic Languages in the Philippines: Sinicization, Filipinization, and Sino-Philippine Language
Creation. In Z. Ye (Ed.), The
Palgrave Handbook of Chinese Language
Studies (pp. 369–408). Springer Nature Singapore.
Gonzales, W. D. W. (2022c). The
Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech
collected from the metropolitan Manila Lannangs between 2016 and 2020. Deep Blue Data, Deep
Blue Repositories. The University of Michigan Library.
Gonzales, W. D. W. (2022d). “Truly
a Language of Our Own” A Corpus-Based, Experimental, and Variationist Account of Lánnang-uè in
Manila [Doctoral dissertation, University of Michigan]. Deep Blue Documents @ University of Michigan.
Gonzales, W. D. W. (2023b). Spread,
stability, and sociolinguistic variation in multilingual practices: The case of
Lánnang-uè. International Journal of Multilingualism. Advance online
publication.
Gonzales, W. D. W. (in
press). Mixed language in flux? The various impacts of multilingual contact on Lánnang-uè’s
wh-question system. International Journal of
Bilingualism.
Gonzales, W. D. W., Hiramoto, M., Leimgruber, J. R. E., & Lim, J. J. (2023). The
Corpus of Singapore English Messages (CoSEM). World
Englishes,
42
(2), 371–388.
Hau, C. (2014). The
Chinese Question: Ethnicity, Nation, and Region in and Beyond the Philippines. NUS Press and Kyoto University Press.
Haugen, E. (1971). The
ecology of language. Linguist
Report,
13
(25), 19–26.
Hebdige, D. (1979). Subculture:
The Meaning of Style. Routledge.
Imao, Y. (2022). CasualConc (Version
3.0) [Computer software]. Osaka University. [URL]
Inoue, A. (2008). Copula
Variability in Hawai’i Creole [Doctoral dissertation, University of Hawaiʻi at Mānoa]. ScholarSpace @ University of Hawaiʻi at Mānoa. [URL]
Klamer, M., & Moro, F. R. (2020). What
is “natural” speech? Comparing free narratives and Frog stories in Indonesia. Language
Documentation,
14
1, 238–313.
Klöter, H. (2011). The
Language of the Sangleys: A Chinese Vernacular in Missionary Sources of the Seventeenth
Century. Brill.
Kuznetsova, A., Brockhoff, P. B., & Christhensen, R. H. B. (2019). Tests
in linear mixed effects models: Package ‘lmerTest’ [Computer software]. [URL]
Labov, W. (1972a). Sociolinguistic
Patterns. Academic.
Labov, W. (1972b). Some
principles of linguistic methodology. Language in
Society,
1
(1), 97–120.
Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional
Random Fields: Probabilistic models for segmenting and labeling sequence
data. In C. E. Brodley & A. Pohoreckyj Danyluk (Eds.), Proceedings
of the 18th International Conference on Machine
Learning, 282–289. [URL]
Lausberg, H., & Sloetjes, H. (2009). Coding
gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, Instruments, &
Computers,
41
(3), 841–849.
Leimgruber, J., Lim, J. J., Gonzales, W. D. W., & Hiramoto, M. (2021). Ethnic
and gender variation in the use of Colloquial Singapore English discourse particles. English
Language and
Linguistics,
25
(3), 601–620.
MacSwan, J. (2022). Codeswitching
and translanguaging. In S. Mufwene & A. M. Escobar (Eds.), The
Cambridge Handbook of Language
Contact (pp. 90–114). Cambridge University Press.
Mallinson, C., Childs, B., & Van Herk, G. (2017). Data
Collection in Sociolinguistics. Routledge.
Nelson, G. (2012). International
Corpus of English. [URL]
O’Keeffe, A., & McCarthy, M. J. (Eds.) (2022). The
Routledge Handbook of Corpus Linguistics (2nd
ed.). Routledge.
Philippine Statistics
Authority. (2010). The 2010 census of population and housing reveals the
Philippine population at 92.34 Million. [URL]
R Core Team. (2023). R: A language and
environment for statistical computing (Version 4.3.1) [Computer
software]. R Foundation for Statistical Computing. [URL]
Sharma, D., & Sankaran, L. (2011). Cognitive
and social forces in dialect shift: Gradual change in London Asian speech. Language Variation
and
Change,
23
(3), 399–428.
Stabile, C. M. (2019). “like,
local people doing that”: Variation in the Production and Social Perception of Discourse-pragmatic Like in
Pidgin and Hawai‘i English [Doctoral dissertation, University of Hawaiʻi at Mānoa]. ScholarSpace @ University of Hawaiʻi at Mānoa. [URL]
Starr, R. L., & Balasubramaniam, B. (2019). Variation
and change in English /r/ among Tamil Indian Singaporeans. World
Englishes,
38
(4), 630–643.
Tagliamonte, S. (2006). Analysing
Sociolinguistic Variation. Cambridge University Press.
Tan, S. V. (1993). The
Education of Chinese in the Philippines and Koreans in Japan [Unpublished Master’s
thesis]. University of Hong Kong.
Tan-Gatue, B. (1955). The
social background of thirty Chinese-Filipino marriages. Philippine Sociological
Review,
3
(3), 3–13.
The Lannang Archives. (2020). Lannang
Orthography. [URL]
Thomason, S. (2007). Language
contact and deliberate change. Journal of Language
Contact,
1
(1), 41–62.
Tsai, H.-M. (2017). A
Study of Philippine Hokkien Language [Unpublished doctoral
dissertation]. National Taiwan Normal University.
Umbal, P. (2021). Filipinos
Front Too! A Sociophonetic analysis of Toronto English /u/-fronting. American
Speech,
96
(4), 397–423.
Uytanlet, J. L. (2014). The
Hybrid Tsinoys: Challenges of Hybridity and Homogeneity as Sociocultural Constructs Among the Chinese in the
Philippines [Unpublished doctoral dissertation]. Ashbury Theological Seminary.
Van der Loon, P. (1966). The
Manila incunabula and early Hokkien studies (part 1). Asia Major,
12
1, 1–43.
Van Rossum, G., & Drake, F. L. (2009). Python
3 Reference Manual. CreateSpace.
Wardhaugh, R. (2015). An
Introduction to Sociolinguistics. Wiley-Blackwell.
Weisser, M. (2016). Practical
Corpus Linguistics: An Introduction to Corpus-based Language Analysis. Wiley Blackwell.
Zhu, J., Zhang, C., & Jurgens, D. (2022). Phone-to-audio
alignment without text: A semi-supervised approach. ICASSP 2022–2022 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), 8167–8171.
Zufferey, S. (2020). Introduction
to Corpus Linguistics. John Wiley and Sons.
Zulueta, J. (2007). I
“speak Chinese but…”: Code-switching and identity construction among Chinese-Filipino
youth. Caligrama,
3
(2).