Logo
Forgot Password
List of Coordinators Departments and coordinators
Software Engineering
Hazrina Sofian
Computer System & Network
Noorzaily Mohamed Nor
Artificial Intelligence
Dr. Nurul Japar
Information System
Sri Devi A/p Ravana
Multimedia
Hannyzzura Pal@affal
Islamic Studies
Hannyzzura Pal@affal

A multitask and multimodal chatGPT on reasoning and interactivity

Student

JERAELYN TAN MING LI

Supervisor

Chan Chee Seng

Collaborator

Dr Fan Lixin


ChatGPT is an advanced language model based on the GPT-3.5 architecture, designed for generating human-like text in conversational contexts. It employs a transformer-based neural network to capture context and generate high-quality text. However, it has limitations such as potentially generating incorrect or incoherent responses (hallucination) and a token limit that restricts text length. Understanding these limitations is crucial. This project aims to comprehensively understand ChatGPT's functioning and explore the token limit challenge. The objectives include analyzing the model's architecture, investigating token limit challenges, and optimizing text generation. The literature review covers advancements in language modeling, tokenization's impact on text quality, prompt engineering, and hallucination detection. Instruction tuning is introduced for improving language models. The problem statements are the lack of understanding of ChatGPT's functioning and limited research on the token limit challenge. The research methodology involves a preliminary study, categorization, algorithm implementation, and evaluation. The proposed algorithm incorporates document embedding, vector search, and clustering techniques to overcome the token limit issue. It shows promise in improving response precision and relevance within the token limit. By addressing these objectives and utilizing the proposed methodology, this study aims to enhance understanding of ChatGPT, optimize text generation within the token limit, and contribute to natural language processing and conversational AI advancements.

 

Keywords: ChatGPT, Tokenization, Vectorization, Clustering.