-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt Tuning returns low-quality results #103
Comments
train.json |
Hi, I think I found the problem: I used the flash attention flag during training, so in the inference code, we would need the extra flag Thanks! |
I feel it probably worths noting this down in the inference document so that the consumer of this model needs to be aware of this particular flash attention flag as it requires changes to downstream inference pipeline code. |
@weidotwisc on second thoughts im a little puzzled. FA2 is a drop-in replacement for the attention layer, which only speeds up the compute, but should not affect the result. As such, regardless if you had trained in FA2, during inference it should not matter if you had set FA2 or not. |
@fabianlim If we use SFT trainer code and specify |
@weidotwisc could you also share you test set and evaluation script if you have it? |
EDIT: I now see the train.json in the above comment. I missed it the first time around. Trying to reproduce this issue using the platform testing environment This process is being documented here. @weidotwisc I don't have access to the data set you used to tune, |
Thanks @olson-ibm !
instead of any reasonable yaml-like output such as
|
@olson-ibm thank you! Some of the things we need to try :
|
@Ssukriti could we please assign this to someone if it is being worked on by the platform team? Or should we tracking this elsewhere? |
@weidotwisc Neither I nor @olson-ibm were able to reproduce your low quality inference results although we are running inference different than you. You can see Joe's configuration and inference run in the issue. Here are the configurations I used: {
"model_name_or_path": "granite-20b-code-all-yaml-2k-v1.1",
"tokenizer_name_or_path": "granite-20b-code-all-yaml-2k-v1.1",
"training_data_path": "train.json",
"num_train_epochs": 5.0,
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 1,
"learning_rate": 0.03,
"response_template": "\n### Response:",
"dataset_text_field": "output",
"num_virtual_tokens": 100,
"prompt_tuning_init": "RANDOM",
"peft_method": "pt",
"logging_steps": 1,
"include_tokens_per_second": true,
"max_seq_length": 512
} This matches the same configurations you used above. I prompt tuned on 1 GPU with flash-attention with the accelerate_launch.py script which calls the Here is the loss for the tuning:
I then ran inference using the run_inference.py script with changes I have in a PR to run with or without flash-attention. Because the sequence length is so low, I got 1368 warning messages of An inference I ran with flash-attention: {
"formatted input": "### Input: \n- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry\n\n### Response:",
"predicted target": [
"community.docker.docker_login:\n registry: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n username: \"{{ registry_username }}\"\n password: \"{{ registry_password }}\"\n email: \"{{ registry_email }}\"\n reauthorize: true\n\n- name: Pull the image\n community.docker.docker_image:\n name: \"{{ image_name }}\"\n tag: \"{{ image_tag }}\"\n source: pull"
],
"ref target": [
"community.docker.docker_login:\n password: \"{{ registry_pass }}\"\n username: \"{{ registry_user }}\"\n registry_url: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n reauthorize: true"
]
}, Same without flash-attention enabled at inference time: {
"formatted input": "### Input: \n- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry\n\n### Response:",
"predicted target": [
"community.docker.docker_login:\n registry: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n username: \"{{ registry_username }}\"\n password: \"{{ registry_password }}\"\n email: \"{{ registry_email }}\"\n reauthorize: true\n\n- name: Pull the image\n community.docker.docker_image:\n name: \"{{ image_name }}\"\n tag: \"{{ image_tag }}\"\n source: pull"
],
"ref target": [
"community.docker.docker_login:\n password: \"{{ registry_pass }}\"\n username: \"{{ registry_user }}\"\n registry_url: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n reauthorize: true"
]
} As you can see the outputs are the same and both are formatting in correct ansible. Here is another one where the results are slightly different but both still look good: {
"formatted input": "### Input: \n- name: Create {{ docker_compose_dir }} directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}\"\n state: directory\n\n- name: Create Redis socket directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}/redis_socket\"\n state: directory\n mode: 511\n\n- name: Create Docker Compose Configuration\n ansible.builtin.template:\n src: \"{{ item.file }}.j2\"\n dest: \"{{ docker_compose_dir }}/{{ item.file }}\"\n mode: \"{{ item.mode }}\"\n loop:\n - file: environment.sh\n mode: \"0600\"\n - file: credentials.py\n mode: \"0600\"\n - file: docker-compose.yml\n mode: \"0600\"\n - file: nginx.conf\n mode: \"0600\"\n - file: redis.conf\n mode: \"0664\"\n register: awx_compose_config\n\n- name: Render SECRET_KEY file\n ansible.builtin.copy:\n content: \"{{ secret_key }}\"\n dest: \"{{ docker_compose_dir }}/SECRET_KEY\"\n mode: 384\n register: awx_secret_key\n\n- name: Remove AWX containers before migrating postgres so that the old postgres container does not get used\n community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: absent\n ignore_errors: true\n\n- name: Start the containers\n\n### Response:",
"predicted target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: present\n\n- name: Wait for AWX to start\n ansible.builtin.wait_for:\n port: 8052\n delay: 10\n\n- name: Migrate the database\n ansible.builtin.shell: docker exec -i awx_task awx-manage migrate --noinput\n register: migrate_result\n until: migrate"
],
"ref target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n restarted: \"{{ awx_compose_config is changed or awx_secret_key is changed }}\"\n register: awx_compose_start"
]
} Without flash-attn: {
"formatted input": "### Input: \n- name: Create {{ docker_compose_dir }} directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}\"\n state: directory\n\n- name: Create Redis socket directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}/redis_socket\"\n state: directory\n mode: 511\n\n- name: Create Docker Compose Configuration\n ansible.builtin.template:\n src: \"{{ item.file }}.j2\"\n dest: \"{{ docker_compose_dir }}/{{ item.file }}\"\n mode: \"{{ item.mode }}\"\n loop:\n - file: environment.sh\n mode: \"0600\"\n - file: credentials.py\n mode: \"0600\"\n - file: docker-compose.yml\n mode: \"0600\"\n - file: nginx.conf\n mode: \"0600\"\n - file: redis.conf\n mode: \"0664\"\n register: awx_compose_config\n\n- name: Render SECRET_KEY file\n ansible.builtin.copy:\n content: \"{{ secret_key }}\"\n dest: \"{{ docker_compose_dir }}/SECRET_KEY\"\n mode: 384\n register: awx_secret_key\n\n- name: Remove AWX containers before migrating postgres so that the old postgres container does not get used\n community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: absent\n ignore_errors: true\n\n- name: Start the containers\n\n### Response:",
"predicted target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: present\n\n- name: Wait for AWX to start\n ansible.builtin.wait_for:\n host: \"{{ awx_host }}\"\n port: 8052\n delay: 10\n\n- name: Migrate the database\n ansible.builtin.shell: docker exec -i awx_task awx-manage migrate --noinput"
],
"ref target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n restarted: \"{{ awx_compose_config is changed or awx_secret_key is changed }}\"\n register: awx_compose_start"
]
} |
@anhuong @olson-ibm Thanks Anh and Joe! To reproduce what I have observed, I am attaching two files here: To run the script, first change two lines in the ansible_pt_inference_shared.py script: line 6 to point to your checkpoint (SFT flash_attn flag enabled) path and line 7 to point to your train_tiny.json Then run "python ansible_pt_inference_shared.py", in my case, I see
Thanks! Wei P.S.: for some reason I cannot attach .py file, so I renamed ansible_pt_inference_shared.py to ansible_pt_inference_shared.txt, you would need to rename it back. |
Also wanted to note some testing I did with different max sequence lengths:
|
Thanks, Anh! But the problem that I was reporting , in my opinion, has less to do with training. When I ran inference (see ansible_pt_inference_shared.py that I had shared previously) without the attn flag against a flash attn model returns gibberish results.I wonder could you give it a try to run the inference script that I had attached against one of your flash attention enabled trained model and see if you have seen the same problem ? Thanks! |
Official inference stack by platform is TGIS . We could not reproduce any such problem with TGIS . Hence there is no bug for platform to debug . We give as ab example local inferencing using HF APIs in our tuning library , which is the official way from HF to infer models , we did not see any issue with that either ! So @weidotwisc i would suggest you to try the inference script in our repo and debug your script accordingly. The issue reported was that quality is poor - we have demonstrated quality is not poor using platform stack |
@anhuong You mentioned that "I then ran inference using the run_inference.py script with changes I have in a PR to run with or without flash-attention." --> what is right process to run it and where is the document to run it ? Thanks! Wei |
@weidotwisc Are you able to access this issue that walks through running the model on TGIS? This is what I ran for inference: # given file of input data
$ python scripts/run_inference.py --model <path-to-model> --text_file <path-to-text-file> --max_new_tokens 100
# single text input
$ python scripts/run_inference.py --model <path-to-model> --text "### Input: - name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry ### Response:" --max_new_tokens 100 I also ran the script with the training dataset you have attached and was not able to get the same garbled output you mentioned. Running against my prompt tuned model, this is the output I got: [root@sft-trainer-test-anh-eval app]# python ansible_pt_inference_shared.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 9/9 [2:32:08<00:00, 1014.23s/it]
- name: Ensure pip
community.general.easy_install:
name: pip
state: present
- name: Ensure docker for python
ansible.builtin.pip:
name: docker
version: 2.0.0
- name: Fetch registry certificate
ansible.builtin.fetch:
src: "{{ docker_storage_dir }}/certs/docker.crt"
dest: /tmp/docker_registry.crt
flat: true
run_once: true
delegate_to: "{{ groups['docker_registry'][0] }}"
- name: Create registry certificate directory
ansible.builtin.file:
path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}
state: directory
mode: 493
- name: Copy registry certificate file
ansible.builtin.copy:
src: /tmp/docker_registry.crt
dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt
- name: Enable service docker, restart to pickup registry certificate
ansible.builtin.systemd:
name: docker
enabled: true
state: restarted
- name: Login to the registry
input_len: 284
/usr/local/lib/python3.11/site-packages/peft/peft_model.py:1232: UserWarning: Position ids are not supported for parameter efficient tuning. Ignoring position ids.
warnings.warn("Position ids are not supported for parameter efficient tuning. Ignoring position ids.")
len_outputs: 540
output_str:
community.docker.docker_login:
registry: "{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}"
username: "{{ registry_username }}"
password: "{{ registry_password }}"
email: "{{ registry_email }}"
reauthorize: true
- name: Install packages
ansible.builtin.apt:
name: "{{ item }}"
state: present
with_items:
- git
- vim
- curl
- htop
- tree
- tmux
- python-software-properties
- software-properties-common
- build-essential
- libssl-dev
- libffi-dev
- python-dev
- python-pip
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python- You can see the input string printed out as well as the output string, which although has |
@anhuong Thanks a lot for the help! I wonder when you ran Thanks! |
Yes I ran against the same code model enabling flash_attn during tuning time and then did not use flash_attn via your script during inference time. |
@anhuong I used the model_path=/dccstor/weiz/bikinie/peft/pt/sft_debugging/fms-hf-tuning/pt_ckpts/checkpoint-5730
python scripts/run_inference.py --model $model_path --text "- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry" --max_new_tokens 100 Then I get the inference_result.json that looks like this: [
{
"input": "- name: Ensure pip\\n community.general.easy_install:\\n name: pip\\n state: present\\n\\n- name: Ensure docker for python\\n ansible.builtin.pip:\\n name: docker\\n version: 2.0.0\\n\\n- name: Fetch registry certificate\\n ansible.builtin.fetch:\\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\\n dest: /tmp/docker_registry.crt\\n flat: true\\n run_once: true\\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\\n\\n- name: Create registry certificate directory\\n ansible.builtin.file:\\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\\n state: directory\\n mode: 493\\n\\n- name: Copy registry certificate file\\n ansible.builtin.copy:\\n src: /tmp/docker_registry.crt\\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\\n\\n- name: Enable service docker, restart to pickup registry certificate\\n ansible.builtin.systemd:\\n name: docker\\n enabled: true\\n state: restarted\\n\\n- name: Login to the registry",
"output": "- name: Ensure pip\\n community.general.easy_install:\\n name: pip\\n state: present\\n\\n- name: Ensure docker for python\\n ansible.builtin.pip:\\n name: docker\\n version: 2.0.0\\n\\n- name: Fetch registry certificate\\n ansible.builtin.fetch:\\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\\n dest: /tmp/docker_registry.crt\\n flat: true\\n run_once: true\\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\\n\\n- name: Create registry certificate directory\\n ansible.builtin.file:\\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\\n state: directory\\n mode: 493\\n\\n- name: Copy registry certificate file\\n ansible.builtin.copy:\\n src: /tmp/docker_registry.crt\\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\\n\\n- name: Enable service docker, restart to pickup registry certificate\\n ansible.builtin.systemd:\\n name: docker\\n enabled: true\\n state: restarted\\n\\n- name: Login to the registry. -... -.2.2. -.:.:.:.:.:.:.............,.,.,...,..,.,.,.,.,.,.,....,.,..,::::::::::::::::.:.:........."
}
] If you scroll down to the end of the inference result, you can see my flash-attn-enabled PT-tuned model generates gibberish results. I have two questions: Thanks! Wei |
The inference script that uses flash attention is on a branch here: https://github.com/anhuong/fms-hf-tuning/blob/flash-attn-inference/scripts/run_inference.py and requires adding flag I also updated my above comment as the input text used for inference should include the alpaca formatting with
I'm unsure, this seems to be the most relevant hange but if the previous use_flash_attention_2 flag is not fully deprecated yet then the functionality should be the same. |
@anhuong Thanks! I was asking "What was your PT training launching script that enabled flash attention ?" --> Training not inference script. Thanks! Wei |
I used the script in this repo sft_trainer.py but triggered with the build scripts |
@anhuong Thanks! This is the script that I used to launch the PT training job export MODEL_PATH=/dccstor/ai4code-ansible/shared/ckpt/granite-20b-code-all-yaml-2k/
export DATA_PATH=/dccstor/weiz/peft/train.json
export OUTPUT_PATH=pt_ckpts
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=.
python tuning/sft_trainer.py \
--model_name_or_path $MODEL_PATH \
--training_data_path $DATA_PATH \
--output_dir $OUTPUT_PATH \
--peft_method "pt" \
--tokenizer_name_or_path $MODEL_PATH \
--num_train_epochs 5 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--learning_rate 3e-2 \
--weight_decay 0. \
--warmup_ratio 0.0 \
--lr_scheduler_type "linear" \
--logging_steps 1 \
--include_tokens_per_second \
--packing False \
--response_template "\n### Response:" \
--dataset_text_field "output" \
--use_flash_attn True \
--torch_dtype "bfloat16" \
--num_virtual_tokens 100 \
--max_seq_length 512 \
--prompt_tuning_init RANDOM as mentioned at the top of this issue thread. I followed the README at https://github.com/foundation-model-stack/fms-hf-tuning#prompt-tuning- and modified accordingly (in particular to have flash attn enabled by I wonder how should I run your training script in a similar fashion ? Thanks! |
As noted above, I ran with the same configurations you had set. This required creating the JSON config, setting env var By default when running this script, use_flash_attn is set to true. You can see the full set of params set:
|
To reproduce the bug, this is what I did: (1) checkout the latest fms-hf-tuning at commit c2f2f8c (Apr 29,2024) (2) Run the training launching script, as documented in https://github.com/foundation-model-stack/fms-hf-tuning#prompt-tuning-, with the following modification (in particular, export MODEL_PATH=/dccstor/ai4code-ansible/shared/ckpt/granite-20b-code-all-yaml-2k/
export DATA_PATH=/dccstor/weiz/irene/fms-hf-tuning/examples/prompt_tuning_ans/containers_infra-ent-train-fms.json
export OUTPUT_PATH=pt_ckpts
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=.
python tuning/sft_trainer.py \
--model_name_or_path $MODEL_PATH \
--training_data_path $DATA_PATH \
--output_dir $OUTPUT_PATH \
--peft_method "pt" \
--tokenizer_name_or_path $MODEL_PATH \
--num_train_epochs 5 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--learning_rate 3e-2 \
--weight_decay 0. \
--warmup_ratio 0.0 \
--lr_scheduler_type "linear" \
--logging_steps 1 \
--include_tokens_per_second \
--packing False \
--response_template "\n### Response:" \
--dataset_text_field "output" \
--use_flash_attn True \
--torch_dtype "bfloat16" \
--num_virtual_tokens 100 \
--max_seq_length 512 \
--prompt_tuning_init RANDOM (3) Run the single text example inference script as you proposed, pointing to the checkpoint from step (2) model_path=/dccstor/weiz/bikinie/peft/pt/sft_debugging/fms-hf-tuning/pt_ckpts/checkpoint-5730
python scripts/run_inference.py --model $model_path --text "- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry" --max_new_tokens 100 The inference output (aka after the input repeating part) of the model, stored in inference_result.json (attached below) looks like this:
Thanks! Wei |
@weidotwisc instead of me tuning on an old commit, you should tune the model using the latest repo main branch. In addition, I noted to try inference with the |
(1) It is the latest commit that I tested against -- please note the commit hash! (2) I have updated my inference script to be: python scripts/run_inference.py --model $model_path --text "### Input: - name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry ### Response:" --max_new_tokens 100 I still get garbage towards the end see attached json file. Thanks! |
Sorry for missing that it is the latest commit hash, thanks for tuning on latest commit. Sorry for being pedantic, but could you try with text Also it looks like we tuned the same way so unless the base models we are using differently, not sure on why inference results are different. Also if you are trying to use flash attention with the inference script, make sure you are using my fork. You should not need to pass in True Finally, potentially since we are running on different devices with potentially different dependencies this could also be causing problems. Can you please let me know what versions of these dependencies you have: transformers, accelerate, peft, trl (all can get with pip show) and finally cuda (can get with
|
@anhuong Thanks! (1) The attached is the inference_result.json that uses the "\n\n" before "### Response:". It is still gibberish towards the end. (2) I am not sure if I understand your comments about (i) your fork and (ii) pass the --use_flash_attn flag when using inference script. When I was using inference script, (i) I used scripts/run_inference.py in the (3) My environments are the following peft: 0.8.2
transformers: 4.37.2
trl: 0.7.10
accelerate: 0.27.0
cuda: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0 Thanks! Wei |
Thanks Wei, my change has been merged so you can now use the latest main branch and add flag I do wonder if the different dependency versions are making a difference in the tuning and inference since you are on slightly older dependency versions but on a newer cuda version than me. As we are not able to reproduce your results. |
Thanks, Anh! Yeah, now it works :) The attached is the I see your solution is also to ask the user the pass a flag that indicates that is a flash attention enabled training model, then the inference code would load the model by including the option Now, two things are on my mind: (2) If (1) is indeed the case, then the consumers of platform SFT trained models need to be aware of this, unless you have a control over how all the users are going to use the model. For example, in our group (AI4Ansible) 's evaluation pipeline, we didn't have this Thanks! Wei |
Describe the bug
Prompt Tuning model generates low-quality output
Platform
Please provide details about the environment you are using, including the following:
Sample Code
My launching script is
To repeat, one needs to define MODEL_PATH to the yaml-2k path (which is shared via COS bucket) and DATA_PATH to the training data
Expected behavior
Using the first training data item as the input to the model should show something close to the ground truth, i.e.,
Observed behavior
When running through inference, I got
Additional context
The SFT training loss has looked absolutely normal though.
I was able to adapt the HF PT tutorial at https://huggingface.co/docs/peft/main/en/task_guides/clm-prompt-tuning to run our task to get reasonable results.
The text was updated successfully, but these errors were encountered: