Llama Adventure

Description

Llama Adventure is a minimal text-based adventure game with a backend powered by Python/Flask and a frontend developed using Next.js. This project was created as part of the AICG class task 2.

Current Progress

The system can correctly recognize user inputs and respond accordingly through a hidden <//Key Words//> output mechanism.

Game Guide

Completed Sections:

Starting Point

Each of the four directions has a unique description, providing players with an immersive exploration experience.

Act II: Western Fields

Players encounter a hidden landmine in this area. Triggering it results in instant death, reducing their health to 0.
This section serves to introduce environmental hazards and encourages players to think critically about their choices and surroundings.

Act II: Southern Shelter

If players choose to spend the night here, they are ambushed by a gang of bandits and killed.
This event highlights the dangers of trusting seemingly safe locations and encourages strategic planning.

Act II: Northern Factory Ruins

Players are shot and killed by a sniper upon entering this area.
This encounter emphasizes the importance of scouting and caution in hostile environments.

Act II: Eastern Abandoned City

Players can choose to explore either the residential building ruins or the research facility ruins in this area.

Act III: Left Residential Building

Players are killed by a blizzard in this area.

Act III: Right Research Facility Ruins

Players discover an Old World fusion reactor here, gaining 100 gold.

Winning Path

Start by heading to the city, then proceed to the research facility ruins within the city.

How to Run

Using Docker (Recommended for beginners)

Ensure Docker or Docker Desktop is installed on your system. For detailed installation instructions, visit this link.
Clone the repository and navigate to the project root directory:
```
git clone <repository-folder>
cd <repository-folder>
```
Run the following command to start the application. Note that it may take some time to download the required images:
```
docker-compose up
```
Open your browser and visit:
```
http://localhost:3000/
```
to interact with the frontend.

Using Separate Frontend and Backend (Recommended for advanced users)

This setup is ideal for users with high-performance computers, real-time requirements, or those seeking a more raw web experience.

Clone the repository and copy the frontend and backend directories to your preferred machines or cloud servers.

Backend Setup

Install the required dependencies using requirements.txt. Ensure CUDA is correctly installed and matches the required version (e.g., 12.4).
```
pip install -r requirements.txt
```
Configure configure.yaml to include the necessary API keys for securing exposed APIs.
Expose the necessary ports.
Start the Uvicorn server with the following command (adjust based on your operating system):
```
uvicorn app:app --host 0.0.0.0 --port 5000
```

Frontend Setup

Install Node.js dependencies using package-lock.json:
```
npm install
```
Configure configure.yaml to set the backend server's IP address and API key.
Build and start the Node.js server with the following commands:
```
npm run build
npm start
```
Access the application through the Node.js server for the frontend experience.

Known Bugs and Solutions

1. Browser error when running the game (especially during the initial communication button check):

Error:

Error: Failed to fetch
Failed to load resource: net::ERR_EMPTY_RESPONSE

Solution:

This indicates that the frontend cannot communicate with the backend. Please verify that the backend is running correctly and that the public/config.yaml file in the frontend correctly points to the backend.

2. Backend is running, but the frontend reports the following error:

Error:

Access to fetch at 'http://localhost:5000/test-communication' from origin 'http://localhost:3001' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

Solution:

This is due to the browser's CORS policy preventing direct API communication between local servers. Please check that the FRONTEND_URL setting in the backend's config.yaml correctly points to the frontend.

3. Long delay on the first click of "Load Module"

Solution:

This is not a bug. The delay occurs because the backend requires time to download the model after running the docker-compose up command. To avoid blocking processes, the model's loading and downloading process is handled in the background, so progress is not displayed in real time.

For the current phi-3.5-mini, the model weights are approximately 5GB. You can monitor internet traffic and memory usage using resource managers or other tools to check the download progress.

4. Large Docker image size

Yes, both the frontend and backend images exceed 2GB, with the backend image approaching 30GB after loading the model. This is due to Node.js and PyTorch providing extensive tools for various development tasks, most of which are unnecessary for this specific task. In the future, I plan to optimize this or transition the architecture to frameworks like WebLLM to reduce the overall footprint.

5. Unexpected Model Outputs

Sometimes the model unexpectedly generates unconventional tags. Usually, once a reply in the chat history produces such an anomalous tag, it continues to do so. This is likely caused by insufficiently strict prompt design. However, due to GPU memory limitations, creating a longer and more detailed prompt is not currently feasible.

Temporary Solution:

Delete all YAML records in the backend db directory and restart the game.

6. Occasional Memory Overflow

Most of the time, this is caused by the large parameter size of the model and excessively long prompts. Currently, there are no efficient solutions. Future plans include using quantized models, adopting more efficient frameworks, or switching to smaller models.

Temporary Solutions:

Adjust the max_new_tokens parameter in the backend's config.yaml. Note that this might truncate outputs and cause tag loss.
Modify the max_history_conversation parameter in chatAPIInteraction.ts within the utils directory of the frontend. (Future updates will integrate this parameter into the frontend config.yaml for easier configuration.)

Frontend

Using a Next.js project as the frontend facilitates rapid development of a functional prototype.

Development Mode

The application includes a development mode controlled by the environment variable GAME_IS_IN_DEV_MODE. When set to "true", a toggle button appears at the top-right corner of the page. This button allows users to enable or disable development mode, which activates or hides logs, context, and other non-essential information.

Animation Heartbeat

The frontend continuously monitors the backend's heartbeat API. If communication fails or there is a timeout exceeding 10 seconds, it indicates that the backend might have crashed or disconnected. This feature helps users identify whether the backend is still processing intensive tasks, such as large model generation, or if it requires intervention to address a crash or disconnection.

Special Behavior in Debug Mode

In debug mode, this component becomes clickable. When clicked, the component turns yellow and stops sending requests to the backend, while updating the corresponding global variable. This allows developers to pause frequent API requests and debug issues between the frontend and backend more effectively.

Button_Send

Functionality:

Dynamically adjusts the button's appearance based on the global states modelLoaded, modelLoading, and modelGenerating:
- Gray: "No Model" indicates no model is loaded; the button is disabled.
- Green: "Send" indicates the model is loaded; the button is clickable.
- Yellow: "Generating" indicates the model is generating; the button is disabled.
- Orange: "Model Loading" indicates the model is loading; the button is disabled.
On click, calls the /chat/ API and sends the input box content as the user message.
Integrates YAML configuration file reading to retrieve the backend URL and API key.

DialogueBoard

Functionality:

Occupies 80% of the parent container's width, with a height ranging from the top 3% to the bottom 15% of the container.
Calls the /chat-get-response/ API to fetch generated dialogue content when modelGenerating changes from true to false.
Formats and displays the API's response (supports automatic line breaks, hides scrollbars, and allows vertical scrolling).
Displays content in black font.

UnstallModule

Functionality:

A circular button with a diameter of 50px, positioned at the top-left corner of the parent container.
On click, calls the /unload_model/ API to unload the model.
Updates the global state modelLoaded to false based on the operation's result.

Discussion

Why use Docker with bulky images?

The use of Docker, despite the inclusion of large images, is aimed at maximizing accessibility for users with no technical background. Docker enables a seamless "download-and-run" experience, which is ideal for beginners. Advanced users, on the other hand, can explore alternative deployment methods, such as deploying the backend and frontend separately, to better suit their specific needs.

Why use a frontend-backend structure despite its bulkiness?

Specialization: JavaScript is well-suited for frontend UI development, but it lacks robust support for backend tasks, particularly those involving AI frameworks, where Python excels.
Efficiency: Bundling everything into the frontend would significantly increase its size (especially model parameters), leading to higher memory usage and slower loading times. Splitting the frontend and backend allows for distributing workloads across two machines, optimizing performance and user experience.

Why use Transformers instead of more efficient libraries like ONNX Runtime or llama.cpp?

Rapid Prototyping: Transformers provide an easy-to-use framework that significantly reduces development time. Among the options, it is currently the most accessible framework known to the author.
High Compatibility: While ONNX Runtime offers high efficiency and compatibility across various devices, it requires a higher learning curve. llama.cpp, though efficient, involves manual compilation, making it less beginner-friendly—especially for users who prefer a straightforward "download-and-run" approach.
Updated Libraries: The ONNX Runtime onnxruntime-genai library was recently updated, and many tutorials have yet to synchronize with the changes, leading to issues where examples currently fail to run. Given the lack of time to debug or wait for updated guides, Transformers were chosen to rapidly develop a minimal viable prototype.

Why not quantize the model despite its high memory requirements (16GB+ VRAM)?

Quantization in Transformers is not as straightforward as in the original PyTorch framework. Although efforts are underway to address this, switching to a different technology stack requires additional development time. This task is included in the project's to-do list for future improvements.

To-Do

Explore adopting ONNX Runtime for enhanced efficiency.
Implement necessary model quantization to enable accessibility on a wider range of devices.

Road Map

Create a frontend-backend structure capable of communication (Completed)
Develop a test API in the backend and a sample webpage in the frontend (Completed)
Import the model into the backend and attempt to use it there (Completed)
Generate outputs using the model in the backend (Completed)
Develop additional functions in the backend (saving and reading history, checking status) (Completed)
Package backend functions and features into APIs (Completed)
Create initialization buttons and interaction logic in the frontend for communication (Completed)
Develop a frontend-backend structure capable of single-turn dialogue (Completed)
Enhance to support multi-turn dialogues with context-awareness (Completed)
Add game logic and character state storage (Completed)
Enable game-character-based dialogue by generating context for the model (Completed)
Add more NPC characters (Not Completed)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml

ElessarWillomoont/Llama_Adventure_AICG

Folders and files

Latest commit

History

Repository files navigation

Llama Adventure

Description

Current Progress

Game Guide

Completed Sections:

Starting Point

Act II: Western Fields

Act II: Southern Shelter

Act II: Northern Factory Ruins

Act II: Eastern Abandoned City

Act III: Left Residential Building

Act III: Right Research Facility Ruins

Winning Path

How to Run

Using Docker (Recommended for beginners)

Using Separate Frontend and Backend (Recommended for advanced users)

Backend Setup

Frontend Setup

Known Bugs and Solutions

1. Browser error when running the game (especially during the initial communication button check):

2. Backend is running, but the frontend reports the following error:

3. Long delay on the first click of "Load Module"

4. Large Docker image size

5. Unexpected Model Outputs

6. Occasional Memory Overflow

Frontend

Development Mode

Animation Heartbeat

Special Behavior in Debug Mode

Button_Send

Functionality:

DialogueBoard

Functionality:

UnstallModule

Functionality:

Discussion

Why use Docker with bulky images?

Why use a frontend-backend structure despite its bulkiness?

Why use Transformers instead of more efficient libraries like ONNX Runtime or llama.cpp?

Why not quantize the model despite its high memory requirements (16GB+ VRAM)?

To-Do

Road Map

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages