update latex and remove archbee tags

superlinked · Apr 9, 2024 · 1599892 · 1599892
1 parent 026fe26
commit 1599892
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 24 deletions.
diff --git a/docs/articles/node_representation_learning.md b/docs/articles/node_representation_learning.md
@@ -78,22 +78,19 @@ Let's look at Node2Vec first.
 
 As opposed to BoW vectors, node embeddings are vector representations that capture the structural role and properties of nodes in a network. Node2Vec is an algorithm that learns node representations using the Skip-Gram method; it models the conditional probability of encountering a context node given a source node in node sequences (random walks):
 
-<!--
 $P(\text{context}|\text{source}) = \frac{1}{Z}\exp(w_{c}^Tw_s)$
--->
 
-<img src="../assets/use_cases/node_representation_learning/context_proba_v3.png" alt="Node2Vec conditional probability" data-size="50" />
 
-Here, *w_c* and *w_s* are the embeddings of the context node *c* and source node *s* respectively. The variable *Z* serves as a normalization constant, which, for computational efficiency, is never explicitly computed.
+Here, $w_c$ and $w_s$ are the embeddings of the context node $c$ and source node $s$ respectively. The variable $Z$ serves as a normalization constant, which, for computational efficiency, is never explicitly computed.
 
 The embeddings are learned by maximizing the co-occurence probability for (source,context) pairs drawn from the true data distribution (positive pairs), and at the same time minimizing for pairs drawn from a synthetic noise distribution. This process ensures that the embedding vectors of similar nodes are close in the embedding space, while dissimilar nodes are further apart (with respect to the dot product).
 
-The random walks are sampled according to a policy, which is guided by 2 parameters: return *i*, and in-out *q*.
+The random walks are sampled according to a policy, which is guided by 2 parameters: return $i$, and in-out $q$.
 
-- The return parameter *p* affects the likelihood of immediately returning to the previous node. A higher *p* leads to more locally focused walks.
-- The in-out parameter *q* affects the likelihood of visiting nodes in the same or a different neighborhood. A higher *q* encourages Depth First Search (DFS), while a lower *q* promotes Breadth-First-Search-like (BFS) exploration.
+- The return parameter $p$ affects the likelihood of immediately returning to the previous node. A higher $p$ leads to more locally focused walks.
+- The in-out parameter $q$ affects the likelihood of visiting nodes in the same or a different neighborhood. A higher $q$ encourages Depth First Search (DFS), while a lower $q$ promotes Breadth-First-Search-like (BFS) exploration.
 
-These parameters are particularly useful for accommodating different networks and tasks. Adjusting the values of *p* and *q* captures different characteristics of the graph in the sampled walks. BFS-like exploration is useful for learning local patterns. On the other hand, using DFS-like sampling is useful for capturing patterns on a bigger scale, like structural roles.
+These parameters are particularly useful for accommodating different networks and tasks. Adjusting the values of $p$ and $q$ captures different characteristics of the graph in the sampled walks. BFS-like exploration is useful for learning local patterns. On the other hand, using DFS-like sampling is useful for capturing patterns on a bigger scale, like structural roles.
 
 ### Node2Vec embedding process
 
@@ -172,7 +169,7 @@ A straightforward approach for combining vectors from different sources is to **
 
 From the plot above, it's clear that the scales of the embedding vector lengths differ. To avoid the larger magnitude Node2Vec vector overshadowing the BoW vector, we can divide each embedding vector by their average length. 
 
-But we can _further_ optimize performance by introducing a **weighting factor** (α). The combined representations are constructed as `x = torch.cat((alpha * v_n2v, v_bow), dim=1)`. To determine the appropriate value for α, we employ a 1D grid search approach. Our results are displayed in the following plot.
+But we can _further_ optimize performance by introducing a **weighting factor** ($\alpha$). The combined representations are constructed as `x = torch.cat((alpha * v_n2v, v_bow), dim=1)`. To determine the appropriate value for $\alpha$, we employ a 1D grid search approach. Our results are displayed in the following plot.
 
 ![Grid search for alpha](../assets/use_cases/node_representation_learning/grid_search_alpha_bow.png)
 
@@ -208,13 +205,11 @@ The GraphSAGE layer is defined as follows:
 $$h_i^{(k)} = \sigma(W (h_i^{(k-1)} + \underset{j \in \mathcal{N}(i)}{\Sigma}h_j^{(k-1)}))$$
 
 
-<img align="left" src=assets/use_cases/node_representation_learning/sage_layer_eqn_v3.png alt="GraphSAGE layer defintion" data-size="50" />
-
-Here σ is a nonlinear activation function, *W^k* is a learnable parameter of layer *k*, and *N(i)* is the set of nodes neighboring node *i*. As in traditional Neural Networks, we can stack multiple GNN layers. The resulting multi-layer GNN will have a wider receptive field. That is, it will be able to consider information from bigger distances, thanks to recursive neighborhood aggregation.
+Here $\sigma$ is a nonlinear activation function, $W^{(k)}$ is a learnable parameter of layer $k$, and $\mathcal{N}(i)$ is the set of nodes neighboring node $i$. As in traditional Neural Networks, we can stack multiple GNN layers. The resulting multi-layer GNN will have a wider receptive field. That is, it will be able to consider information from bigger distances, thanks to recursive neighborhood aggregation.
 
 To **learn the model parameters**, the [GraphSAGE authors](https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf) suggest two approaches:
 1. In a _supervised_ setting, we can train the network in the same way we train a conventional NN for a supervised task (for example, using Cross Entropy for classification or Mean Squared Error for regression).
-2. If we have access only to the graph itself, we can approach model training as an _unsupervised_ task, where the goal is to predict the presence of edges in the graph based on node embeddings. In this case, the link probabilities are defined as *P*[*j ∈ N*(*i*)] = *σ*(dot(*h_i, h_j*)). The loss function is the Negative Log Likelihood of the presence of the edge and *p*.
+2. If we have access only to the graph itself, we can approach model training as an _unsupervised_ task, where the goal is to predict the presence of edges in the graph based on node embeddings. In this case, the link probabilities are defined as $P(j \in \mathcal{N}(i)) \approx \sigma(h_i^Th_j)$. The loss function is the Negative Log Likelihood of the presence of the edge and $P$.
 
 It's also possible to combine the two approaches by using a linear combination of the two loss functions.
 But in this example we stick with the unsupervised variant.
@@ -350,7 +345,7 @@ As a final note, we've included a **pro vs con comparison** of our two node repr
 | --- | --- | --- |
 | Generalizing to new nodes | no | yes |
 | Inference time | constant | we can control inference time |
-| Accommodating different graph types and objectives | by setting *p* and *q* parameters, we can adapt representations to fit our needs | limited control | 
+| Accommodating different graph types and objectives | by setting $p$ and $q$ parameters, we can adapt representations to fit our needs | limited control | 
 | Combining with other representations | concatenation | by design, the model learns to map node features to embeddings |
 | Dependency on additional representations | relies solely on graph data | depends on quality and availability of node representations; impacts model performance if lacking |
 | Embedding flexibility | very flexible node representations | representations of nodes with similar neighborhoods can't have much variation |

diff --git a/docs/manifesto.md b/docs/manifesto.md
@@ -10,33 +10,24 @@ Getting stuff to production takes a lot more than launching “cool demos.” It
 
 * Prioritize low latency & cost as baseline infrastructure prerequisites. We want speed and affordability. It is _not_ acceptable to be waiting *seconds* for queries.
 
-:::hint{type="info"}
 [Data Sources](building_blocks/data_sources/readme.md), [Vector Search & Management](building_blocks/vector_search/readme.md) have in-depth reviews of vendors and models.
-:::
 
 ## Composition
 
 Full-stack LLM application builder tools are like a black box, it's hard to figure out what happens under the hood and impossible to control it properly. As a result, we believe that building your stack from atomized components is far superior. It's transparent, and you can configure it to meet your needs.
 
-:::hint{type="info"}
 [Building Blocks](building_blocks/readme.md) is where we put together and revise literature around creating vector stacks.
-:::
 
 ## Anti-hype
 
 We don’t want to make content /*just/* about LLMs or how to “build a chatgpt for your data.” Vector retrieval is much broader, and includes far more use cases, like recommender systems, fraud, computer vision, and beyond.
 
-:::hint{type="info"}
 [Use cases](use_cases/readme.md) is a dedicated space for the myriad ways in which vector retrieval is used.
-:::
 
 ## Together we're better
 
 We're here to learn and support each other, as we develop this space together. This is a safe space and there are **no** stupid questions. Submit your feedback using the feedback button at the bottom of each page, or email [email protected] with the subject line "VectorHub feedback." The more we ask, test, and experiment, the better we become. Let's do this!
 
 ## Spread the word!
 
-See something you like? Or hear something interesting? Tell people and share with the hashtag #vectorhub.
-
-
-
+See something you like? Or hear something interesting? Tell people and share with the hashtag #vectorhub.