Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp)
A multilingual, POS-tagged, and audio-textual databank
This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed
recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40
types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then,
it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data
as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic
factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic
metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social
groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the
field of Sino-Philippine (socio)linguistics.
Article outline
- 1.Introduction
- 1.1The Lannangs and language
- 1.2The state of (socio)linguistic research in the Lannang and Sino-Philippine context
- 1.3LanCorp as a solution
- 2.Considerations in the creation of LanCorp
- 3.Procedure
- 3.1Collection
- 3.2Transcription
- 3.3Processing
- 4.Corpus data distribution by selected factors
- 4.1Style
- 4.2Age groups
- 4.3Sex
- 4.4Religion
- 4.5Language
- 5.Metadata
- 6.Format and accessibility
- 7.LanCorp in action: Exploratory analyses
- 7.1Variation in the first-person inclusive pronoun
- 7.2Variation in conjunctions with adversative function: Tagalog-derived vs. Hokkien-derived
- 7.3Variation in conjunctions of manner: Tagalog-derived vs. Hokkien-derived
- 7.4Variation in Hokkien-derived conjunctions of manner: Khânân(g) vs. khâlân(g)
- 8.Conclusion
-
References
References (68)
References
Anthony, L. (2022). AntConc (Version
4.0.5) [Computer software]. Waseda University. [URL]
Benor, S. B. (2010). Ethnolinguistic
repertoire: Shifting the analytic focus in language and ethnicity. Journal of
Sociolinguistics, 14(2), 159–183. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Boersma, P., & Weenink, D. (2021). Praat:
Doing phonetics by computer (6.1.51) [Computer
software]. [URL]
Cheng, A. (2016). A
Survey of English Vowel Spaces of Asian American Californians. UC Berkeley PhonLab Annual
Report 2016, 348–384. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cheng, A., & Cho, S. (2021). The
effect of ethnicity on identification of Korean American
speech. Languages, 6(4), 186. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E. (2011). Contact,
the feature pool and the speech community: The emergence of Multicultural London
English. Journal of
Sociolinguistics, 15(2), 151–196. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chu, R. (2010). Chinese
and Chinese Mestizos of Manila: Family, Identity, and
Culture, 1860s–1930s. Brill. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chu, R. (2021). From
‘sangley’ to ‘Chinaman’, ‘Chinese Mestizo’ to ‘Tsinoy’: Unpacking ‘Chinese’ identities in the Philippines at the turn of the
twentieth-century. Asian
Ethnicity,
24
(1), 7–37. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chua, D. A. (2004). From
Chinese to Filipino: Changing Identities of the Chinese in the Philippines [Unpublished master’s
thesis]. The University of British Columbia.
Chuaunsu, R. (1989). A
Speech Communication Profile of Three Generations of Filipino-Chinese in Metro Manila: Their Use of English, Pilipino and
Chinese Languages in Different Domains, Role-Relationships, Speech Situations and
Functions [Unpublished master’s thesis]. University of the Philippines.
Chun, E. W. (2001). The
construction of white, black, and Korean American identities through African American Vernacular
English. Journal of Linguistic
Anthropology,
11
(1), 52–64. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Doeppers, D. (1986). Destination,
selection and turnover among Chinese migrants to Philippine cities in the nineteenth
century. Journal of Historical
Geography,
12
(4), 235–260. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dy, C. J. (1972). The
syntactic structures of Amoy as used in the Philippines. Philippine Journal of
Linguistics,
3
(2), 75–94.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
ELAN (Version 5.9) [Computer
software]. (2020). Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. [URL]
Gonzales, W. D. W. (2016). Trilingual
code-switching using quantitative lenses: An exploratory study on Hokaglish. Philippine Journal
of
Linguistics,
47
1, 106–128.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2017b). Philippine
Englishes. Asian
Englishes,
19
(1), 79–95. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2021). Filipino,
Chinese, neither, or both? The Lannang identity and its relationship with language. Language
&
Communication,
77
1, 5–16. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2022a). Hybridization. In A. M. Borlongan (Ed.), Philippine
English: Development, Structure, and Sociology of English in the
Philippines (pp. 170–183). Routledge. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2022b). Interactions
of Sinitic Languages in the Philippines: Sinicization, Filipinization, and Sino-Philippine Language
Creation. In Z. Ye (Ed.), The
Palgrave Handbook of Chinese Language
Studies (pp. 369–408). Springer Nature Singapore. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2022c). The
Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech
collected from the metropolitan Manila Lannangs between 2016 and 2020. Deep Blue Data, Deep
Blue Repositories. The University of Michigan Library. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (2022d). “Truly
a Language of Our Own” A Corpus-Based, Experimental, and Variationist Account of Lánnang-uè in
Manila [Doctoral dissertation, University of Michigan]. Deep Blue Documents @ University of Michigan. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Gonzales, W. D. W. (2023b). Spread,
stability, and sociolinguistic variation in multilingual practices: The case of
Lánnang-uè. International Journal of Multilingualism. Advance online
publication. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gonzales, W. D. W. (in
press). Mixed language in flux? The various impacts of multilingual contact on Lánnang-uè’s
wh-question system. International Journal of
Bilingualism.
Gonzales, W. D. W., Hiramoto, M., Leimgruber, J. R. E., & Lim, J. J. (2023). The
Corpus of Singapore English Messages (CoSEM). World
Englishes,
42
(2), 371–388. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hau, C. (2014). The
Chinese Question: Ethnicity, Nation, and Region in and Beyond the Philippines. NUS Press and Kyoto University Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Haugen, E. (1971). The
ecology of language. Linguist
Report,
13
(25), 19–26.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hebdige, D. (1979). Subculture:
The Meaning of Style. Routledge.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Imao, Y. (2022). CasualConc (Version
3.0) [Computer software]. Osaka University. [URL]
Inoue, A. (2008). Copula
Variability in Hawai’i Creole [Doctoral dissertation, University of Hawaiʻi at Mānoa]. ScholarSpace @ University of Hawaiʻi at Mānoa. [URL]
Klamer, M., & Moro, F. R. (2020). What
is “natural” speech? Comparing free narratives and Frog stories in Indonesia. Language
Documentation,
14
1, 238–313.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Klöter, H. (2011). The
Language of the Sangleys: A Chinese Vernacular in Missionary Sources of the Seventeenth
Century. Brill. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kuznetsova, A., Brockhoff, P. B., & Christhensen, R. H. B. (2019). Tests
in linear mixed effects models: Package ‘lmerTest’ [Computer software]. [URL]
Labov, W. (1972a). Sociolinguistic
Patterns. Academic.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Labov, W. (1972b). Some
principles of linguistic methodology. Language in
Society,
1
(1), 97–120. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional
Random Fields: Probabilistic models for segmenting and labeling sequence
data. In C. E. Brodley & A. Pohoreckyj Danyluk (Eds.), Proceedings
of the 18th International Conference on Machine
Learning, 282–289. [URL]
Lausberg, H., & Sloetjes, H. (2009). Coding
gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, Instruments, &
Computers,
41
(3), 841–849. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leimgruber, J., Lim, J. J., Gonzales, W. D. W., & Hiramoto, M. (2021). Ethnic
and gender variation in the use of Colloquial Singapore English discourse particles. English
Language and
Linguistics,
25
(3), 601–620. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
MacSwan, J. (2022). Codeswitching
and translanguaging. In S. Mufwene & A. M. Escobar (Eds.), The
Cambridge Handbook of Language
Contact (pp. 90–114). Cambridge University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mallinson, C., Childs, B., & Van Herk, G. (2017). Data
Collection in Sociolinguistics. Routledge. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nelson, G. (2012). International
Corpus of English. [URL]
O’Keeffe, A., & McCarthy, M. J. (Eds.) (2022). The
Routledge Handbook of Corpus Linguistics (2nd
ed.). Routledge. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Philippine Statistics
Authority. (2010). The 2010 census of population and housing reveals the
Philippine population at 92.34 Million. [URL]
R Core Team. (2023). R: A language and
environment for statistical computing (Version 4.3.1) [Computer
software]. R Foundation for Statistical Computing. [URL]
Sharma, D., & Sankaran, L. (2011). Cognitive
and social forces in dialect shift: Gradual change in London Asian speech. Language Variation
and
Change,
23
(3), 399–428. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stabile, C. M. (2019). “like,
local people doing that”: Variation in the Production and Social Perception of Discourse-pragmatic Like in
Pidgin and Hawai‘i English [Doctoral dissertation, University of Hawaiʻi at Mānoa]. ScholarSpace @ University of Hawaiʻi at Mānoa. [URL]
Starr, R. L., & Balasubramaniam, B. (2019). Variation
and change in English /r/ among Tamil Indian Singaporeans. World
Englishes,
38
(4), 630–643. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tagliamonte, S. (2006). Analysing
Sociolinguistic Variation. Cambridge University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tan, S. V. (1993). The
Education of Chinese in the Philippines and Koreans in Japan [Unpublished Master’s
thesis]. University of Hong Kong. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Tan-Gatue, B. (1955). The
social background of thirty Chinese-Filipino marriages. Philippine Sociological
Review,
3
(3), 3–13.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
The Lannang Archives. (2020). Lannang
Orthography. [URL]
Thomason, S. (2007). Language
contact and deliberate change. Journal of Language
Contact,
1
(1), 41–62. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tsai, H.-M. (2017). A
Study of Philippine Hokkien Language [Unpublished doctoral
dissertation]. National Taiwan Normal University.
Umbal, P. (2021). Filipinos
Front Too! A Sociophonetic analysis of Toronto English /u/-fronting. American
Speech,
96
(4), 397–423. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Uytanlet, J. L. (2014). The
Hybrid Tsinoys: Challenges of Hybridity and Homogeneity as Sociocultural Constructs Among the Chinese in the
Philippines [Unpublished doctoral dissertation]. Ashbury Theological Seminary.
Van der Loon, P. (1966). The
Manila incunabula and early Hokkien studies (part 1). Asia Major,
12
1, 1–43.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Van Rossum, G., & Drake, F. L. (2009). Python
3 Reference Manual. CreateSpace.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wardhaugh, R. (2015). An
Introduction to Sociolinguistics. Wiley-Blackwell.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Weisser, M. (2016). Practical
Corpus Linguistics: An Introduction to Corpus-based Language Analysis. Wiley Blackwell. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zhu, J., Zhang, C., & Jurgens, D. (2022). Phone-to-audio
alignment without text: A semi-supervised approach. ICASSP 2022–2022 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), 8167–8171. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zufferey, S. (2020). Introduction
to Corpus Linguistics. John Wiley and Sons. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zulueta, J. (2007). I
“speak Chinese but…”: Code-switching and identity construction among Chinese-Filipino
youth. Caligrama,
3
(2). ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)