-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathappendix-cypher5a.qmd
345 lines (263 loc) · 14.5 KB
/
appendix-cypher5a.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
---
title: "S: Student Clashes - Deeper Dive"
lightbox:
match: auto
effect: fade
desc-position: bottom
loop: false
---
<br>
This page explore different graph data structures and queries for the purposes of identifying student clashes. It illustrates the inherent flexibility of graph databases and that thorough modelling and profiling of the data can lead to more efficient and effective queries.
Use-case is king when it comes to optimised databases and performant queries.
## Scenario
Each model below will use the same basic scenario:
* Two students - Alice and Bob
* Three activities:
* `ITGD` - Introduction to Graph Databases
* `Neo4j` - Neo4j for Beginners
* `TigerDB` - TigerGraph for Data Scientists
* Each activity has a start and end time
* Each activity is scheduled for several weeks
* There are deliberate clashes between the activities to illustrate the concept of a student clash
## Model 1 - Activity Occurrence
Each 'occurrence' of an activity is a separate node. This model is simple and easy to understand, but proliferates nodes which lead to inefficient and complex queries.
### Create data
```{cypher}{.scroll-cypher}
// Create unique activity nodes (TestActivityModel1)
CREATE (:TestActivityModel1 { actName: "ITGD", date: date("2024-08-06"), startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "ITGD", date: date("2024-08-13"), startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "ITGD", date: date("2024-08-20"), startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "ITGD", date: date("2024-08-27"), startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "ITGD", date: date("2024-09-03"), startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "Neo4j", date: date("2024-07-30"), startTime: localtime("10:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "Neo4j", date: date("2024-08-13"), startTime: localtime("10:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "Neo4j", date: date("2024-08-27"), startTime: localtime("10:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel1 { actName: "TigerDb", date: date("2024-08-06"), startTime: localtime("11:00:00"), endTime: localtime("12:00:00") })
CREATE (:TestActivityModel1 { actName: "TigerDb", date: date("2024-08-13"), startTime: localtime("11:00:00"), endTime: localtime("12:00:00") });
// Create unique student nodes (TestStudentModel1)
CREATE (:TestStudentModel1 { stuFirstName_anon: "Alice", stuID_anon: "test-student-1" })
CREATE (:TestStudentModel1 { stuFirstName_anon: "Bob", stuID_anon: "test-student-2" });
// Create ATTENDS relationships (one student attends all TigerDb and Neo4j)
MATCH (s:TestStudentModel1 { stuID_anon: "test-student-1" })
MATCH (a:TestActivityModel1) WHERE a.actName IN ["TigerDb", "Neo4j"]
CREATE (s)-[:ATTENDS]->(a);
MATCH (s:TestStudentModel1 { stuID_anon: "test-student-2" })
MATCH (a:TestActivityModel1) WHERE a.actName IN ["ITGD", "Neo4j"]
MERGE (s)-[:ATTENDS]->(a) ;
```
![Model 1](./images/cypher-clash1-model.png)
To identify the clashes, this query can be run:
```{cypher}{.scroll-cypher}
MATCH (s:TestStudentModel1)-[:ATTENDS]->(a1:TestActivityModel1)
WITH s, a1
MATCH (s)-[:ATTENDS]->(a2:TestActivityModel1)
WHERE a1 <> a2
AND a1.date = a2.date
AND (a1.startTime < a2.endTime AND a1.endTime > a2.startTime) // overlap condition
AND NOT (a1.startTime = a2.endTime OR a1.endTime = a2.startTime) // xxclude "touching" cases
AND a1.actName < a2.actName // ensures only one direction of the pair is returned
RETURN s.stuFirstName_anon AS Student,
a1.date AS ClashDate,
a1.actName AS Activity1,
a1.startTime + "-" + a1.endTime AS Timeslot1,
a2.actName AS Activity2,
a2.startTime + "-" + a2.endTime AS Timeslot2
ORDER BY Student, ClashDate;
```
Which correctly identifies Bob's clash:
![Model 1 results](./images/cypher-clash1-result.png)
One way of measuring and comparing query performance is to look at the `PROFILE` and `dbhits.` In this instance there are 278 database accesses.
![Model 1 profile](./images/cypher-clash1-db-hits.png)
## Model 2 - Date and Time Nodes
Model 2 uses a single node for each activity but has date and time nodes. This model is more complex in that there are more node labels, but can be more efficient for certain queries.
### Create data
```{cypher}{.scroll-cypher}
// Create unique time nodes
CREATE (:TestStartTimeNode { time: localtime("09:00:00") })
CREATE (:TestStartTimeNode { time: localtime("10:00:00") })
CREATE (:TestStartTimeNode { time: localtime("11:00:00") })
CREATE (:TestEndTimeNode { time: localtime("11:00:00") })
CREATE (:TestEndTimeNode { time: localtime("12:00:00") })
// Create unique date nodes
CREATE (:TestDateNode { date: date("2024-07-30") })
CREATE (:TestDateNode { date: date("2024-08-06") })
CREATE (:TestDateNode { date: date("2024-08-13") })
CREATE (:TestDateNode { date: date("2024-08-20") })
CREATE (:TestDateNode { date: date("2024-08-27") })
CREATE (:TestDateNode { date: date("2024-09-03") })
// Create activity nodes
CREATE (:TestActivityModel2 { actName: "ITGD" })
CREATE (:TestActivityModel2 { actName: "Neo4j" })
CREATE (:TestActivityModel2 { actName: "TigerDb" });
// Connect ITGD to dates and times (using MERGE)
MATCH (a:TestActivityModel2 { actName: "ITGD" })
MATCH (d:TestDateNode) WHERE d.date IN [date("2024-08-06"), date("2024-08-13"), date("2024-08-20"), date("2024-08-27"), date("2024-09-03")]
MERGE (a)-[:SCHEDULED_ON]->(d)
WITH a
MATCH (st:TestStartTimeNode { time: localtime("09:00:00") })
MATCH (et:TestEndTimeNode { time: localtime("11:00:00") })
MERGE (a)-[:STARTS_AT]->(st)
MERGE (st)-[:ENDS_AT]->(et);
// Connect Neo4j to dates and times (adjust dates/times and use MERGE)
MATCH (a:TestActivityModel2 { actName: "Neo4j" })
MATCH (d:TestDateNode) WHERE d.date IN [date("2024-07-30"), date("2024-08-13"), date("2024-08-27")]
MERGE (a)-[:SCHEDULED_ON]->(d)
WITH a
MATCH (st:TestStartTimeNode { time: localtime("10:00:00") })
MATCH (et:TestEndTimeNode { time: localtime("11:00:00") })
MERGE (a)-[:STARTS_AT]->(st)
MERGE (st)-[:ENDS_AT]->(et);
// Connect TigerDb to dates and times (adjust dates/times and use MERGE)
MATCH (a:TestActivityModel2 { actName: "TigerDb" })
MATCH (d:TestDateNode) WHERE d.date IN [date("2024-08-06"), date("2024-08-13")]
MERGE (a)-[:SCHEDULED_ON]->(d)
WITH a
MATCH (st:TestStartTimeNode { time: localtime("11:00:00") })
MATCH (et:TestEndTimeNode { time: localtime("12:00:00") })
MERGE (a)-[:STARTS_AT]->(st)
MERGE (st)-[:ENDS_AT]->(et);
// Create Students and ATTENDS relationships (same as Model 1)
CREATE (:TestStudentModel2 { stuFirstName_anon: "Alice", stuID_anon: "test-student-1" })
CREATE (:TestStudentModel2 { stuFirstName_anon: "Bob", stuID_anon: "test-student-2" });
MATCH (s:TestStudentModel2 { stuID_anon: "test-student-1" })
MATCH (a:TestActivityModel2) WHERE a.actName IN ["TigerDb", "Neo4j"]
CREATE (s)-[:ATTENDS]->(a);
MATCH (s:TestStudentModel2 { stuID_anon: "test-student-2" })
MATCH (a:TestActivityModel2) WHERE a.actName IN ["ITGD", "Neo4j"]
MERGE (s)-[:ATTENDS]->(a) ;
```
![Model 2](./images/cypher-clash2-model.png)
To identify student clashes, this query can be run.
```{cypher}{.scroll-cypher}
MATCH (s:TestStudentModel2)-[:ATTENDS]->(a1:TestActivityModel2)-[:SCHEDULED_ON]->(d:TestDateNode)
WITH s, a1, d
MATCH (s)-[:ATTENDS]->(a2:TestActivityModel2)-[:SCHEDULED_ON]->(d) // Same date
MATCH (a1)-[:STARTS_AT]->(st1:TestStartTimeNode)-[:ENDS_AT]->(et1:TestEndTimeNode)
MATCH (a2)-[:STARTS_AT]->(st2:TestStartTimeNode)-[:ENDS_AT]->(et2:TestEndTimeNode)
WHERE a1 <> a2
AND (st1.time < et2.time AND et1.time > st2.time) // overlap condition
AND NOT (st1.time = et2.time OR et1.time = st2.time) // xxclude "touching" cases
AND a1.actName < a2.actName // ensures only one direction of the pair is returned
RETURN s.stuFirstName_anon AS Student,
d.date AS ClashDate,
a1.actName AS Activity1,
st1.time + "-" + et1.time AS Timeslot1,
a2.actName AS Activity2,
st2.time + "-" + et2.time AS Timeslot2
ORDER BY Student, ClashDate;
```
The results are the same as Model 1, but the profile is different, with fewer `database accesses` (143):
![Model 2 profile](./images/cypher-clash2-db-hits.png)
## Model 3 - Date Nodes
Model 3 uses a single node for each activity as well as date nodes - start and end times are properties of the activity.
```{cypher}{.scroll-cypher}
// Create unique date nodes
CREATE (:TestDateNodeModel3 { date: date("2024-07-30") })
CREATE (:TestDateNodeModel3 { date: date("2024-08-06") })
CREATE (:TestDateNodeModel3 { date: date("2024-08-13") })
CREATE (:TestDateNodeModel3 { date: date("2024-08-20") })
CREATE (:TestDateNodeModel3 { date: date("2024-08-27") })
CREATE (:TestDateNodeModel3 { date: date("2024-09-03") })
// Create activity nodes with start/end times as properties
CREATE (:TestActivityModel3 { actName: "ITGD", startTime: localtime("09:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel3 { actName: "Neo4j", startTime: localtime("10:00:00"), endTime: localtime("11:00:00") })
CREATE (:TestActivityModel3 { actName: "TigerDb", startTime: localtime("11:00:00"), endTime: localtime("12:00:00") });
// Connect Activities to Dates (using MERGE)
MATCH (a:TestActivityModel3 { actName: "ITGD" })
MATCH (d:TestDateNodeModel3) WHERE d.date IN [date("2024-08-06"), date("2024-08-13"), date("2024-08-20"), date("2024-08-27"), date("2024-09-03")]
MERGE (a)-[:SCHEDULED_ON]->(d);
MATCH (a:TestActivityModel3 { actName: "Neo4j" })
MATCH (d:TestDateNodeModel3) WHERE d.date IN [date("2024-07-30"), date("2024-08-13"), date("2024-08-27")]
MERGE (a)-[:SCHEDULED_ON]->(d);
MATCH (a:TestActivityModel3 { actName: "TigerDb" })
MATCH (d:TestDateNodeModel3) WHERE d.date IN [date("2024-08-06"), date("2024-08-13")]
MERGE (a)-[:SCHEDULED_ON]->(d);
// Create Students and ATTENDS relationships
CREATE (:TestStudentModel3 { stuFirstName_anon: "Alice", stuID_anon: "test-student-1" })
CREATE (:TestStudentModel3 { stuFirstName_anon: "Bob", stuID_anon: "test-student-2" });
MATCH (s:TestStudentModel3 { stuID_anon: "test-student-1" })
MATCH (a:TestActivityModel3) WHERE a.actName IN ["TigerDb", "Neo4j"]
CREATE (s)-[:ATTENDS]->(a) ;
MATCH (s:TestStudentModel3 { stuID_anon: "test-student-2" })
MATCH (a:TestActivityModel3) WHERE a.actName IN ["ITGD", "Neo4j"]
CREATE (s)-[:ATTENDS]->(a) ;
```
![Model 3](./images/cypher-clash3-model.png)
The results are the same as Model 1 and Model 2, but the profile is different again with even fewer `database accesses` (58):
![Model 3 profile](./images/cypher-clash3-db-hits.png)
### Model 4 - Times on Relationships
Model 4 uses a single node for each activity and date - start and end times are now properties of the *relationship* between the activity and the date.
```{cypher}{.scroll-cypher}
// Create unique date nodes
MERGE (:TestDateNodeModel4 { date: date("2024-07-30") })
MERGE (:TestDateNodeModel4 { date: date("2024-08-06") })
MERGE (:TestDateNodeModel4 { date: date("2024-08-13") })
MERGE (:TestDateNodeModel4 { date: date("2024-08-20") })
MERGE (:TestDateNodeModel4 { date: date("2024-08-27") })
MERGE (:TestDateNodeModel4 { date: date("2024-09-03") })
// Create activity nodes
MERGE (:TestActivityModel4 { actName: "ITGD" })
MERGE (:TestActivityModel4 { actName: "Neo4j" })
MERGE (:TestActivityModel4 { actName: "TigerDb" });
// Connect ITGD to Dates with START and END relationships
MATCH (a:TestActivityModel4 { actName: "ITGD" })
MATCH (d:TestDateNodeModel4) WHERE d.date IN [date("2024-08-06"), date("2024-08-13"), date("2024-08-20"), date("2024-08-27"), date("2024-09-03")]
MERGE (a)-[:STARTS { time: localtime("09:00:00") }]->(d)
MERGE (a)-[:ENDS { time: localtime("11:00:00") }]->(d);
// Connect Neo4j to Dates (adjust dates and times)
MATCH (a:TestActivityModel4 { actName: "Neo4j" })
MATCH (d:TestDateNodeModel4) WHERE d.date IN [date("2024-07-30"), date("2024-08-13"), date("2024-08-27")]
MERGE (a)-[:STARTS { time: localtime("10:00:00") }]->(d)
MERGE (a)-[:ENDS { time: localtime("11:00:00") }]->(d);
// Connect TigerDb to Dates (adjust dates and times)
MATCH (a:TestActivityModel4 { actName: "TigerDb" })
MATCH (d:TestDateNodeModel4) WHERE d.date IN [date("2024-08-06"), date("2024-08-13")]
MERGE (a)-[:STARTS { time: localtime("11:00:00") }]->(d)
MERGE (a)-[:ENDS { time: localtime("12:00:00") }]->(d);
// Create Students and ATTENDS relationships
MERGE (:TestStudentModel4 { stuFirstName_anon: "Alice", stuID_anon: "test-student-1" })
MERGE (:TestStudentModel4 { stuFirstName_anon: "Bob", stuID_anon: "test-student-2" });
MATCH (s:TestStudentModel4 { stuID_anon: "test-student-1" })
MATCH (a:TestActivityModel4) WHERE a.actName IN ["TigerDb", "Neo4j"]
MERGE (s)-[:ATTENDS]->(a) ;
MATCH (s:TestStudentModel4 { stuID_anon: "test-student-2" })
MATCH (a:TestActivityModel4) WHERE a.actName IN ["ITGD", "Neo4j"]
MERGE (s)-[:ATTENDS]->(a) ;
```
![Model 4](./images/cypher-clash4-model.png)
The results are again the same as Model 1, Model 2, and Model 3, but the profile is different with the most `database accesses` (293):
![Model 4 profile](./images/cypher-clash4-db-hits.png)
## Conclusion
Each model has its own strengths and weaknesses. The choice of model will depend on the specific requirements. The more complex models can be more efficient for certain queries, but can also be more difficult to understand and maintain. The simpler models are easier to understand and maintain, but can be less efficient for certain queries.
Of the four tested, Model 3 was the most efficient in terms of database hits on the *very* small test dataset used. However, this may not be the case with larger datasets. It is important to profile the queries and the data to determine the best model for the specific requirements.
## Delete Data
The cypher below deletes all test data.
### Model 1
```cypher
// Delete all TestActivityModel1 nodes
MATCH (a:TestActivityModel1)
DETACH DELETE a;
// Delete all TestStudentModel1 nodes
MATCH (s:TestStudentModel1)
DETACH DELETE s;
```
### Model 2
```cypher
// Delete test data for Model 2
MATCH (n)
WHERE n:TestStudentModel2 OR n:TestActivityModel2 OR n:TestDateNode OR n:TestStartTimeNode OR n:TestEndTimeNode
DETACH DELETE n
```
### Model 3
```cypher
// Delete test data for Model 3
MATCH (n)
WHERE n:TestStudentModel3 OR n:TestActivityModel3 OR n:TestDateNodeModel3
DETACH DELETE n
```
### Model 4
```cypher
MATCH (n)
WHERE n:TestStudentModel4 OR n:TestActivityModel4 OR n:TestDateNodeModel4
DETACH DELETE n
```