FSKTM-Ilmiah

Project Album

Project Album View Project

Username

Password

Forgot Password

Remember me

List of Coordinators Departments and coordinators

Software Engineering
Nur Nasuha Binti Mohd Daud

Computer System & Network
Noorzaily Mohamed Nor

Artificial Intelligence
Dr. Nurul Japar

Information System
Sri Devi A/p Ravana

Multimedia
Hannyzzura Pal@affal

Islamic Studies
Hannyzzura Pal@affal

A multitask and multimodal chatGPT on reasoning and interactivity

Student

JERAELYN TAN MING LI

Supervisor

Chan Chee Seng

Collaborator

Dr Fan Lixin

ChatGPT is an advanced language model based on the GPT-3.5 architecture, designed for generating human-like text in conversational contexts. It employs a transformer-based neural network to capture context and generate high-quality text. However, it has limitations such as potentially generating incorrect or incoherent responses (hallucination) and a token limit that restricts text length. Understanding these limitations is crucial. This project aims to comprehensively understand ChatGPT's functioning and explore the token limit challenge. The objectives include analyzing the model's architecture, investigating token limit challenges, and optimizing text generation. The literature review covers advancements in language modeling, tokenization's impact on text quality, prompt engineering, and hallucination detection. Instruction tuning is introduced for improving language models. The problem statements are the lack of understanding of ChatGPT's functioning and limited research on the token limit challenge. The research methodology involves a preliminary study, categorization, algorithm implementation, and evaluation. The proposed algorithm incorporates document embedding, vector search, and clustering techniques to overcome the token limit issue. It shows promise in improving response precision and relevance within the token limit. By addressing these objectives and utilizing the proposed methodology, this study aims to enhance understanding of ChatGPT, optimize text generation within the token limit, and contribute to natural language processing and conversational AI advancements.

Keywords: ChatGPT, Tokenization, Vectorization, Clustering.