The code KDD17 paper "Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks" and extended journal version "Learning with Heterogeneous Side Information Fusion for Recommender Systems"
Readers are welcomed to fork this repository to reproduce the experiments and follow our work. Please kindly cite our paper
@inproceedings{zhao2017meta,
title={Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks},
author={Zhao, Huan and Yao, Quanming and Li, Jianda and Song, Yangqiu and Lee, Dik Lun},
booktitle={KDD},
pages={635--644},
year={2017}
}
@TechnicalReport{zhao2018learning,
title={Learning with Heterogeneous Side Information Fusion for Recommender Systems},
author={Zhao, Huan and Yao, Quanming and Song, Yangqiu and Kwok, James and Lee, Dik Lun},
institution = {arXiv preprint arXiv:1801.02411},
year={2018}
}
We released related datasets: yelp-200k, amazon-200k, yelp-50k and amazon-50k. Any problems, you can create an issue. Note that the amazon dataset is provied by Prof. Julian McAuley, thus if you use this dataset for your paper, please cite the authors' paper as instructed in the website http://jmcauley.ucsd.edu/data/amazon/
For the sake of ease, a quick instruction is given for readers to reproduce the whole process on yelp-50k dataset. Note that the programs are testd on Linux(CentOS release 6.9), Python 2.7 and Numpy 1.14.0 from Anaconda 4.3.6.
- Unzip the file
FMG_released_data.zip
, and create a directorydata
in this project directory. - Move yelp-50k and amazon-50k into the
data
directory, then iteratively create directoriessim_res/path_count
andmf_features/path_count
in directorydata/yelp-50k/exp_split/1/
. - Create directory
log
in the project bymkdir log
. - Create directory
fm_res
in the project bymkdir fm_res
.
To generate the similarity matrices on yelp-50k dataset, run
python 200k_commu_mat_computation.py yelp-50k all 1
The arguments are explained in the following:
yelp-50k: specify the dataset.
all: run for all pre-defined meta-graphs.
1: run for the split dataset 1, i.e., exp_split/1
One dependent lib is bottleneck, you may install it with pip install bottleneck
.
To generate the latent features by MF based on the simiarity matrices, run
python mf_features_generator.py yelp-50k all 1
The arguments are the same as the above ones.
Note that, to improve the computation efficiency, some modules are implements with C and called in python(see load_lib method in mf.py). Thus to successfully run python mf_features_generator.py
you need to compile two C source files. The following scripts are tested on CentOS, and readers may take as references.
gcc -fPIC --shared setVal.c -o setVal.so
gcc -fPIC --shared partXY.c -o partXY.so
After the compiling, you will get two files in the project directory setVal.so
and partXY.so
.
After obtain the latent features, then the readers can run FMG model as following:
python run_exp.py config/yelp-50k.yaml -reg 0.5
One may read the comment in files in directory config for more information.
If you have any questions about this project, you can open issues, thus it can help more people who are interested in this project. I will reply to your issues as soon as possible.