Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most popular items #33

Open
michaltrmac opened this issue Dec 14, 2016 · 3 comments
Open

Most popular items #33

michaltrmac opened this issue Dec 14, 2016 · 3 comments

Comments

@michaltrmac
Copy link

Hi,

how can I setup seldon-server to recommend most popular items?
I try to set it with "cluster-by-dimension" model, but it always faild on model train action with java.lang.ClassNotFoundException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException

Thanks
mt.

@ukclivecox
Copy link
Contributor

Hi,

Can you tell us the command you are using to train the model?

@michaltrmac
Copy link
Author

Hi,

i edit part of seldon.con file to look like this:

        "cluster-by-dimension": {
            "config": {
                "inputPath": "%SELDON_MODELS%",
                "outputPath": "%SELDON_MODELS%",
                "activate":true,
                "startDay" : 1,
                "days" : 1,
                "activate" : true,
                "jdbc" : "jdbc:mysql://mysql:3306/client?user=root&characterEncoding=utf8",
                "minActionsPerUser" : 0,
                "delta" : 0.1,
                "minClusterSize" : 200
            },
            "training": {
                "job_info": {
                    "cmd": "%SPARK_HOME%/bin/spark-submit",
                    "cmd_args": [
                        "--class",
                        "io.seldon.spark.cluster.ClusterUsersByDimension",
                        "--master",
                        "spark://spark-master:7077",
                        "--driver-memory",
                        "8g",
                        "--executor-memory",
                        "8g",
                        "--total-executor-cores",
                        "12",
                        "%SELDON_SPARK_HOME%/seldon-spark-%SELDON_VERSION%-jar-with-dependencies.jar",
                        "--client",
                        "%CLIENT_NAME%",
                        "--zookeeper",
                        "%ZK_HOSTS%"
                    ]
                },
                "job_type": "spark"
            }
        },

then i run

seldon-cli client --action processactions --client-name test2 --input-date-string 20161214

after that

seldon-cli model --action add --client-name test2 --model-name cluster-by-dimension --startDay 17149 --days 30

with output:

connecting to zookeeper-1:2181,zookeeper-2:2181 [SUCCEEDED]
Model [cluster-by-dimension] already added
adding config  startDay : 17149
adding config  days : 30
Writing data to file[/seldon-data/conf/zkroot/all_clients/test2/offline/cluster-by-dimension/_data_]
updated zk node[/all_clients/test2/offline/cluster-by-dimension]

and finally I run:

seldon-cli model --action train --client-name test2 --model-name cluster-by-dimension

Part of train output:

log4j:WARN No appenders could be found for logger (org.apache.curator.retry.ExponentialBackoffRetry).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Confguration from zookeeper -> {"activate":true,"days":30,"delta":0.1,"inputPath":"/seldon-data/seldon-models","jdbc":"jdbc:mysql://mysql:3306/client?user=root&password=mypass&characterEncoding=utf8","minActionsPerUser":0,"minClusterSize":200,"outputPath":"/seldon-data$
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/15 09:54:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/15 09:54:45 INFO Slf4jLogger: Slf4jLogger started
16/12/15 09:54:45 INFO Remoting: Starting remoting
16/12/15 09:54:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:37775]
16/12/15 09:54:46 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
ClusterConfig(test2,/seldon-data/seldon-models,/seldon-data/seldon-models,17149,30,,,false,zookeeper-1:2181,zookeeper-2:2181,true,jdbc:mysql://mysql:3306/client?user=root&password=mypass&characterEncoding=utf8,0,0.1,200)
16/12/15 09:54:50 WARN ThrowableSerializationWrapper: Task exception could not be deserialized
java.lang.ClassNotFoundException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:278)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1779)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1907)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1806)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2016)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1940)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1806)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2016)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1940)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1806)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
16/12/15 09:54:50 ERROR TaskResultGetter: Could not deserialize TaskEndReason: ClassNotFound with classloader org.apache.spark.util.MutableURLClassLoader@51bc1897

Maybe I'm missing some settings or something, but the seldon docs is pretty confusing and there is not much about "most popular recommender".

I also found this file https://github.com/SeldonIO/seldon-server/blob/d1ec05a6f59b152eca438f1e67e2dc73a3879483/offline-jobs/spark/src/main/scala/io/seldon/spark/recommend/MostPopularJob.scala
but don't know how to use it.

Thanks
m.

@seldondev
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants