Preliminary investigations into the word categorization system of BERT, 2023

Item — Call Number: MU Thesis Chi

Identifier: b7931714

Scope and Contents

From the Collection:

The collection consists of theses written by students enrolled in the Monmouth University graduate Computer Science program. The holdings are primarily bound print documents that were submitted in partial fulfillment of requirements for the Master of Science degree.

Dates

Creation: 2023

Creator

Chiaravalloti, Isabella (2001- ) (Person)
Rosca, Daniela (Thesis advisor, Person)
Scherl, Richard (Thesis advisor, Person)
Yu, Cui (Thesis advisor, Person)

Language of Materials

From the Collection:

Unless noted otherwise at the resource component level, the language of the collection materials is English.

Conditions Governing Access

The collection is open for research use. Access is by appointment only.

Access to the collection is confined to the Monmouth University Library and is subject to patron policies approved by the Monmouth University Library.

Collection holdings may not be borrowed through interlibrary loan.

Research appointments are scheduled by the Monmouth University Library Archives Collections Manager (723-923-4526). A minimum of three days advance notice is required to arrange a research appointment for access to the collection.

Patrons must complete a Researcher Registration Form and provide appropriate identification to gain access to the collection holdings. Copies of these documents will be kept on file at the Monmouth University Library.

Full Extent

1 Items (print book) : 70 pages ; 8.5 x 11.0 inches (28 cm).

Additional Description

Abstract

Bidirectional Encoder Representations from Transformers (BERT), introduced by Google, is a powerful natural language processing model as it is able to understand the meaning of words in a sentence in context. WordNet, developed at Princeton University, is a lexical database that shows semantic relationships between words. This thesis looks to investigate BERT’s word categorization system by looking at groups of example sentences given from related WordNet synsets. Because BERT allows a contextual word embedding to be extracted from each sentence, this was done with each of the words that share a meaning in each of the sentences. Agglomerative hierarchical clustering was then used on these extracted embeddings to see how the words with similar meanings in each of the sentences were related. The results from clustering these sentences were then compared to how WordNet considered the sentences to be related.

Keywords: BERT, WordNet, natural language processing, agglomerative clustering, transformers, word embeddings

Partial Contents

Abstract -- Acknowledgements -- List of figures -- Table of contents -- 1. Introduction -- 2. Background -- 3. Implementation and results -- 4. Future research -- 5. Conclusion -- 6. References -- 7. Appendix.