Llama Adventure is a minimal text-based adventure game with a backend powered by Python/Flask and a frontend developed using Next.js. This project was created as part of the AICG class task 2.
The system can correctly recognize user inputs and respond accordingly through a hidden <//Key Words//>
output mechanism.
- Each of the four directions has a unique description, providing players with an immersive exploration experience.
- Players encounter a hidden landmine in this area. Triggering it results in instant death, reducing their health to 0.
- This section serves to introduce environmental hazards and encourages players to think critically about their choices and surroundings.
- If players choose to spend the night here, they are ambushed by a gang of bandits and killed.
- This event highlights the dangers of trusting seemingly safe locations and encourages strategic planning.
- Players are shot and killed by a sniper upon entering this area.
- This encounter emphasizes the importance of scouting and caution in hostile environments.
- Players can choose to explore either the residential building ruins or the research facility ruins in this area.
- Players are killed by a blizzard in this area.
- Players discover an Old World fusion reactor here, gaining 100 gold.
- Start by heading to the city, then proceed to the research facility ruins within the city.
-
Ensure Docker or Docker Desktop is installed on your system. For detailed installation instructions, visit this link.
-
Clone the repository and navigate to the project root directory:
git clone <repository-folder> cd <repository-folder>
-
Run the following command to start the application. Note that it may take some time to download the required images:
docker-compose up
-
Open your browser and visit:
http://localhost:3000/
to interact with the frontend.
This setup is ideal for users with high-performance computers, real-time requirements, or those seeking a more raw web experience.
- Clone the repository and copy the frontend and backend directories to your preferred machines or cloud servers.
-
Install the required dependencies using
requirements.txt
. Ensure CUDA is correctly installed and matches the required version (e.g., 12.4).pip install -r requirements.txt
-
Configure
configure.yaml
to include the necessary API keys for securing exposed APIs. -
Expose the necessary ports.
-
Start the Uvicorn server with the following command (adjust based on your operating system):
uvicorn app:app --host 0.0.0.0 --port 5000
-
Install Node.js dependencies using
package-lock.json
:npm install
-
Configure
configure.yaml
to set the backend server's IP address and API key. -
Build and start the Node.js server with the following commands:
npm run build npm start
-
Access the application through the Node.js server for the frontend experience.
Error:
Error: Failed to fetch
Failed to load resource: net::ERR_EMPTY_RESPONSE
Solution:
This indicates that the frontend cannot communicate with the backend. Please verify that the backend is running correctly and that the public/config.yaml
file in the frontend correctly points to the backend.
Error:
Access to fetch at 'http://localhost:5000/test-communication' from origin 'http://localhost:3001' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Solution:
This is due to the browser's CORS policy preventing direct API communication between local servers. Please check that the FRONTEND_URL
setting in the backend's config.yaml
correctly points to the frontend.
Solution:
This is not a bug. The delay occurs because the backend requires time to download the model after running the docker-compose up
command. To avoid blocking processes, the model's loading and downloading process is handled in the background, so progress is not displayed in real time.
For the current phi-3.5-mini
, the model weights are approximately 5GB. You can monitor internet traffic and memory usage using resource managers or other tools to check the download progress.
Yes, both the frontend and backend images exceed 2GB, with the backend image approaching 30GB after loading the model. This is due to Node.js and PyTorch providing extensive tools for various development tasks, most of which are unnecessary for this specific task. In the future, I plan to optimize this or transition the architecture to frameworks like WebLLM to reduce the overall footprint.
Sometimes the model unexpectedly generates unconventional tags. Usually, once a reply in the chat history produces such an anomalous tag, it continues to do so. This is likely caused by insufficiently strict prompt design. However, due to GPU memory limitations, creating a longer and more detailed prompt is not currently feasible.
Temporary Solution:
Delete all YAML records in the backend db
directory and restart the game.
Most of the time, this is caused by the large parameter size of the model and excessively long prompts. Currently, there are no efficient solutions. Future plans include using quantized models, adopting more efficient frameworks, or switching to smaller models.
Temporary Solutions:
- Adjust the
max_new_tokens
parameter in the backend'sconfig.yaml
. Note that this might truncate outputs and cause tag loss. - Modify the
max_history_conversation
parameter inchatAPIInteraction.ts
within theutils
directory of the frontend. (Future updates will integrate this parameter into the frontendconfig.yaml
for easier configuration.)
Using a Next.js project as the frontend facilitates rapid development of a functional prototype.
The application includes a development mode controlled by the environment variable GAME_IS_IN_DEV_MODE
. When set to "true"
, a toggle button appears at the top-right corner of the page. This button allows users to enable or disable development mode, which activates or hides logs, context, and other non-essential information.
The frontend continuously monitors the backend's heartbeat
API. If communication fails or there is a timeout exceeding 10 seconds, it indicates that the backend might have crashed or disconnected. This feature helps users identify whether the backend is still processing intensive tasks, such as large model generation, or if it requires intervention to address a crash or disconnection.
In debug mode, this component becomes clickable. When clicked, the component turns yellow and stops sending requests to the backend, while updating the corresponding global variable. This allows developers to pause frequent API requests and debug issues between the frontend and backend more effectively.
- Dynamically adjusts the button's appearance based on the global states
modelLoaded
,modelLoading
, andmodelGenerating
:- Gray: "No Model" indicates no model is loaded; the button is disabled.
- Green: "Send" indicates the model is loaded; the button is clickable.
- Yellow: "Generating" indicates the model is generating; the button is disabled.
- Orange: "Model Loading" indicates the model is loading; the button is disabled.
- On click, calls the
/chat/
API and sends the input box content as the user message. - Integrates YAML configuration file reading to retrieve the backend URL and API key.
- Occupies 80% of the parent container's width, with a height ranging from the top 3% to the bottom 15% of the container.
- Calls the
/chat-get-response/
API to fetch generated dialogue content whenmodelGenerating
changes fromtrue
tofalse
. - Formats and displays the API's response (supports automatic line breaks, hides scrollbars, and allows vertical scrolling).
- Displays content in black font.
- A circular button with a diameter of 50px, positioned at the top-left corner of the parent container.
- On click, calls the
/unload_model/
API to unload the model. - Updates the global state
modelLoaded
tofalse
based on the operation's result.
The use of Docker, despite the inclusion of large images, is aimed at maximizing accessibility for users with no technical background. Docker enables a seamless "download-and-run" experience, which is ideal for beginners. Advanced users, on the other hand, can explore alternative deployment methods, such as deploying the backend and frontend separately, to better suit their specific needs.
-
Specialization: JavaScript is well-suited for frontend UI development, but it lacks robust support for backend tasks, particularly those involving AI frameworks, where Python excels.
-
Efficiency: Bundling everything into the frontend would significantly increase its size (especially model parameters), leading to higher memory usage and slower loading times. Splitting the frontend and backend allows for distributing workloads across two machines, optimizing performance and user experience.
-
Rapid Prototyping: Transformers provide an easy-to-use framework that significantly reduces development time. Among the options, it is currently the most accessible framework known to the author.
-
High Compatibility: While ONNX Runtime offers high efficiency and compatibility across various devices, it requires a higher learning curve. llama.cpp, though efficient, involves manual compilation, making it less beginner-friendly—especially for users who prefer a straightforward "download-and-run" approach.
-
Updated Libraries: The ONNX Runtime
onnxruntime-genai
library was recently updated, and many tutorials have yet to synchronize with the changes, leading to issues where examples currently fail to run. Given the lack of time to debug or wait for updated guides, Transformers were chosen to rapidly develop a minimal viable prototype.
Quantization in Transformers is not as straightforward as in the original PyTorch framework. Although efforts are underway to address this, switching to a different technology stack requires additional development time. This task is included in the project's to-do list for future improvements.
- Explore adopting ONNX Runtime for enhanced efficiency.
- Implement necessary model quantization to enable accessibility on a wider range of devices.
- Create a frontend-backend structure capable of communication (Completed)
- Develop a test API in the backend and a sample webpage in the frontend (Completed)
- Import the model into the backend and attempt to use it there (Completed)
- Generate outputs using the model in the backend (Completed)
- Develop additional functions in the backend (saving and reading history, checking status) (Completed)
- Package backend functions and features into APIs (Completed)
- Create initialization buttons and interaction logic in the frontend for communication (Completed)
- Develop a frontend-backend structure capable of single-turn dialogue (Completed)
- Enhance to support multi-turn dialogues with context-awareness (Completed)
- Add game logic and character state storage (Completed)
- Enable game-character-based dialogue by generating context for the model (Completed)
- Add more NPC characters (Not Completed)