Skip to content

Commit

Permalink
Improve the performance.
Browse files Browse the repository at this point in the history
  • Loading branch information
yjcyxky committed Mar 23, 2024
1 parent 5e6b914 commit d179879
Show file tree
Hide file tree
Showing 16 changed files with 322 additions and 86 deletions.
19 changes: 19 additions & 0 deletions docs/publications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### Entity Extraction

- [Ollama](https://ollama.com/): Get up and running with large language models, locally.
- [Vicuna](https://vicuna.lmsys.org/): An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
- [Phind-codellama](https://ollama.com/library/phind-codellama): Code generation model based on Code Llama.
- [allenai/OLMo](https://github.com/allenai/OLMo): Modeling, training, eval, and inference code for OLMo.

### Semantic Search and Reranking

- [ParadeDB](https://www.paradedb.com/): ParadeDB is a modern Elasticsearch alternative built on Postgres.

### Similar Projects

- [Consensus](https://consensus.app/search/): Ask a question, get conclusions from research papers.

### PDF Parsing

- [Grobid](https://grobid.readthedocs.io/en/latest/Grobid-service/)
- [pdffigures2](https://github.com/allenai/pdffigures2)
6 changes: 2 additions & 4 deletions src/api/req.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use anyhow;
use poem_openapi::{types::ToJSON, Object};
use poem_openapi::Object;
use reqwest;
use serde::{Deserialize, Serialize};

Expand Down Expand Up @@ -37,9 +37,7 @@ pub struct PublicationDetail {
}

impl Publication {
pub async fn fetch_publication(
id: &str,
) -> Result<PublicationDetail, anyhow::Error> {
pub async fn fetch_publication(id: &str) -> Result<PublicationDetail, anyhow::Error> {
let api_token = match std::env::var("GUIDESCOPER_API_TOKEN") {
Ok(token) => token,
Err(_) => {
Expand Down
8 changes: 4 additions & 4 deletions src/model/llm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ impl LlmContext for SubgraphWithCtx {
};

// You need to prepare two fields: 1) subgraph: a json string; 2) context_str: a string, it need to be a disease name, such as "ME/CFS".
prompt_templates.insert("explain_subgraph_mechanism_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{ context_str }}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and paths within this Subgraph that are most relevant to {{ context_str }}. To achieve this, I need your assistance to: 1. Identify and explain the key nodes that are most relevant to {{ context_str }}. Please base your explanation on the nodes and their attributes and their known roles in the disease, explaining which nodes are critical and why these nodes are critical. 2. Determine and describe the main paths connecting these key nodes. Please discuss how these paths might be involved in the onset, progression, or treatment of the disease, considering the type and strength of interactions between nodes. 3. Provide a report summarizing your findings and understanding, including a list of key nodes and paths, along with a rationale for how they are related to {{ context_str }}. Note that, given the complexity and multifactorial nature of diseases, explanations may need to integrate multiple attributes of nodes and their interactions. \n\nGuidance for Response:\n\nPlease address the aforementioned inquiries based on the Knowledge Subgraph and your expertise. For each of the questions related to the Knowledge Subgraph and its implications for {{context_str}}, it is imperative that you provide supporting literature. This literature must exclusively come from PubMed, which is a critical repository for reliable medical research findings. Your responses should not only incorporate insights derived from these studies but also include citations formatted according to standard academic practices. Specifically, citations should detail the authors, title, journal name, year of publication, and the PubMed ID (PMID) to facilitate easy verification and further reading.\n\nFor example, a proper citation format would be: Doe J, Smith A, Jones B. Title of the Article. Journal Name. Year;Volume(Issue):Page numbers. PMID: XXXXXXX.");
prompt_templates.insert("explain_subgraph_mechanism_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{context_str}}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and paths within this Subgraph that are most relevant to {{context_str}}. To achieve this, I need your assistance to: 1. Identify and explain the key nodes that are most relevant to {{context_str}}. Please base your explanation on the nodes and their attributes and their known roles in the disease, explaining which nodes are critical and why these nodes are critical. 2. Determine and describe the main paths connecting these key nodes. Please discuss how these paths might be involved in the onset, progression, or treatment of the disease, considering the type and strength of interactions between nodes. 3. Provide a report summarizing your findings and understanding, including a list of key nodes and paths, along with a rationale for how they are related to {{context_str}}. Note that, given the complexity and multifactorial nature of diseases, explanations may need to integrate multiple attributes of nodes and their interactions. \n\nGuidance for Response:\n\nPlease address the aforementioned inquiries based on the Knowledge Subgraph and your expertise. For each of the questions related to the Knowledge Subgraph and its implications for {{context_str}}, it is imperative that you provide supporting literature. This literature must exclusively come from PubMed, which is a critical repository for reliable medical research findings. Your responses should not only incorporate insights derived from these studies but also include citations formatted according to standard academic practices. Specifically, citations should detail the authors, title, journal name, year of publication, and the PubMed ID (PMID) to facilitate easy verification and further reading.\n\nFor example, a proper citation format would be: Doe J, Smith A, Jones B. Title of the Article. Journal Name. Year;Volume(Issue):Page numbers. PMID: XXXXXXX.");

let mut m3 = HashMap::new();
m3.insert("key", "explain_subgraph_importance_with_disease_ctx");
Expand All @@ -287,10 +287,10 @@ impl LlmContext for SubgraphWithCtx {
};
// JSON version
// You need to prepare two fields: 1) subgraph: a json string; 2) context_str: a string, it need to be a disease name, such as "ME/CFS".
// m.insert("explain_subgraph_importance_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{ context_str }}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and paths within this Subgraph that are most relevant to {{ context_str }}. To achieve this, please label these nodes and paths as critical, important, moderate, or less important based on your knowledges and the subgraph, and provide a rationale for your assessment. After labeling the nodes and paths, please output your findings as an array, the format is like```{your_output}```. The array contains a set of objects, each object have as least three columns: id (node or edge), importance, reason.");
// m.insert("explain_subgraph_importance_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{context_str}}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and paths within this Subgraph that are most relevant to {{context_str}}. To achieve this, please label these nodes and paths as critical, important, moderate, or less important based on your knowledges and the subgraph, and provide a rationale for your assessment. After labeling the nodes and paths, please output your findings as an array, the format is like```{your_output}```. The array contains a set of objects, each object have as least three columns: id (node or edge), importance, reason.");

// Table version
prompt_templates.insert("explain_subgraph_importance_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{ context_str }}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and edges within this Subgraph that are most relevant to {{ context_str }}. To achieve this, please label these nodes listed in the subgraph as Critical, Important, Moderate, or Less Important based on your knowledges on {{ context_str }} and the subgraph, and provide a rationale for your assessment. After labeling the nodes, please output your results as a table (not a file). The table contains a set of rows, each row have as least six columns: #, ID (node id), Name (node name), Importance, Reliability, Reason. Please note: 1. the subgraph might be incomplete, so you need to use your knowledges to think these nodes step by step, and then assess the importance of the nodes; 2. you need to consider the importance of the node types for specific diseases, for example, symptom might be important for a symptom-defined disease, but it might be less important for a genetic disease; 3. you need to consider the reliability of the relation, for example, if the relation is mentioned more frequent in your knowledgebase, then we can treat it more reliable; 4. the final table you output should ordered by importance and reliability, and the importance should be ordered by Critical, Important, Moderate, and Less Important; 5. you need to tell me why you think the node is important / less important, reliable / less reliable, and the reason should be based on the subgraph and your knowledges (recommendation).");
prompt_templates.insert("explain_subgraph_importance_with_disease_ctx", "Knowledge Subgraph: {{subgraph}}\n\nKnowledge Subgraph Analysis Request:\nI have a set of Subgraph data that includes a collection of genes/proteins and their connections to a specific disease, {{context_str}}. This Subgraph consists of nodes (representing genes/proteins) and edges (representing interactions or relationships between the genes/proteins). Each node has associated attributes, such as name, description, or known disease associations. Edges may also have attributes, like the type or strength of interaction. My goal is to identify key nodes and edges within this Subgraph that are most relevant to {{context_str}}. To achieve this, please label these nodes listed in the subgraph as Critical, Important, Moderate, or Less Important based on your knowledges on {{context_str}} and the subgraph, and provide a rationale for your assessment. After labeling the nodes, please output your results as a table (not a file). The table contains a set of rows, each row have as least six columns: #, ID (node id), Name (node name), Importance, Reliability, Reason. Please note: 1. the subgraph might be incomplete, so you need to use your knowledges to think these nodes step by step, and then assess the importance of the nodes; 2. you need to consider the importance of the node types for specific diseases, for example, symptom might be important for a symptom-defined disease, but it might be less important for a genetic disease; 3. you need to consider the reliability of the relation, for example, if the relation is mentioned more frequent in your knowledgebase, then we can treat it more reliable; 4. the final table you output should ordered by importance and reliability, and the importance should be ordered by Critical, Important, Moderate, and Less Important; 5. you need to tell me why you think the node is important / less important, reliable / less reliable, and the reason should be based on the subgraph and your knowledges (recommendation).");

let mut m4 = HashMap::new();
m4.insert("key", "explain_path_within_subgraph");
Expand All @@ -304,7 +304,7 @@ impl LlmContext for SubgraphWithCtx {
prompts.push(m4);
};
// Actually, in this case, the context_str is a path name, such as "ME/CFS-[treated_by]->Ibuprofen-[treats]->Headache".
prompt_templates.insert("explain_path_within_subgraph", "Knowledge Subgraph: {{subgraph}}\n\nMy goal is to explain the path {{ context_str }} within the subgraph. Please provide a detailed explanation of the path.");
prompt_templates.insert("explain_path_within_subgraph", "Knowledge Subgraph: {{subgraph}}\n\nMy goal is to explain the path {{context_str}} within the subgraph. Please provide a detailed explanation of the path.");

let mut m5 = HashMap::new();
m5.insert("key", "explain_path_with_attention_subgraph");
Expand Down
8 changes: 7 additions & 1 deletion studio/config/config.embed.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
// https://umijs.org/config/
import { defineConfig } from 'umi';

// How to improve the loading performance: https://juejin.cn/post/7207743145998811173
export default defineConfig({
outputPath: '../assets',
publicPath: '/assets/',
history: {
type: 'hash',
},
// https://umijs.org/blog/code-splitting#%E4%BB%A3%E7%A0%81%E6%8B%86%E5%88%86%E6%8C%87%E5%8D%97 (It's similar with dynamicImport in umi 3.x)
codeSplitting: {
jsStrategy: 'granularChunks'
jsStrategy: 'depPerChunk'
},
esbuildMinifyIIFE: true,
favicons: ['/assets/gene.png'],
jsMinifier: 'terser',
jsMinifierOptions: {

},
proxy: undefined,
locale: {
default: 'en-US',
Expand Down
13 changes: 12 additions & 1 deletion studio/config/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import { defineConfig } from 'umi';
import path from 'path';
import proxy from './proxy';
import CompressionPlugin from 'compression-webpack-plugin';
import { routes as defaultRoutes } from './routes';

// const isDev = process.env.NODE_ENV === 'development';
Expand All @@ -24,7 +25,7 @@ export default defineConfig({
request: {},
npmClient: 'yarn',
dva: {},
chainWebpack: (config: any) => {
chainWebpack: (config: any, { env }) => {
config.merge({
resolve: {
fallback: {
Expand All @@ -36,6 +37,16 @@ export default defineConfig({
// https://github.com/webpack/webpack/discussions/13585
config.resolve.alias.set('perf_hooks', path.resolve(__dirname, 'perf_hooks.ts'));
// console.log("config.resolve.alias", config.resolve.alias);

if (env === 'production') {
config.plugin('compression-webpack-plugin').use(
new CompressionPlugin({
test: /.js$|.html$|.css$/,
threshold: 10240,
deleteOriginalAssets: false,
}),
);
}
},
layout: {
// https://umijs.org/docs/max/layout-menu
Expand Down
1 change: 1 addition & 0 deletions studio/config/routes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ export const routes = [
redirect: '/knowledge-table',
},
{
path: '/*',
component: './404',
},
];
Expand Down
3 changes: 3 additions & 0 deletions studio/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"dev": "max dev",
"build": "max build",
"build:embed": "cross-env UMI_APP_IS_STATIC=true UMI_ENV=embed UMI_APP_AUTH0_CLIENT_ID=Y08FauV1dAEiocNIZt5LiOifzNgXr6Uo UMI_APP_AUTH0_DOMAIN=biomedgps.jp.auth0.com max build",
"build:analyze": "cross-env ANALYZE=true max build",
"format": "prettier --cache --write .",
"postinstall": "max setup",
"setup": "max setup",
Expand All @@ -27,6 +28,7 @@
"@fingerprintjs/fingerprintjs": "^3.3.6",
"@handsontable/react": "^12.1.3",
"@mlc-ai/web-llm": "0.2.15",
"@sentry/react": "^7.108.0",
"@textea/json-viewer": "^2.9.0",
"@umijs/max": "^4.1.2",
"@umijs/route-utils": "^2.0.0",
Expand Down Expand Up @@ -83,6 +85,7 @@
"@types/react": "^18.0.0",
"@types/react-dom": "^18.0.0",
"@umijs/plugins": "^4.1.2",
"compression-webpack-plugin": "^11.1.0",
"cross-env": "^7.0.3",
"husky": "^8.0.3",
"lint-staged": "^13.2.0",
Expand Down
26 changes: 25 additions & 1 deletion studio/src/app.tsx
Original file line number Diff line number Diff line change
@@ -1,10 +1,31 @@
import Footer from '@/components/Footer';
import Header from '@/components/Header';
import ErrorBoundary from '@/components/ErrorBoundary';
import { RequestConfig, history, RuntimeConfig } from 'umi';
import { PageLoading, SettingDrawer } from '@ant-design/pro-components';
import { Auth0Provider } from '@auth0/auth0-react';
import { CustomSettings, AppVersion } from '../config/defaultSettings';
import { getJwtAccessToken } from '@/components/util';
import * as Sentry from "@sentry/react";

// Configure Sentry for error tracking
Sentry.init({
dsn: "https://[email protected]/4506958288846848",
integrations: [
Sentry.browserTracingIntegration(),
Sentry.replayIntegration({
maskAllText: false,
blockAllMedia: false,
}),
],
// Performance Monitoring
tracesSampleRate: 1.0, // Capture 100% of the transactions
// Set 'tracePropagationTargets' to control for which URLs distributed tracing should be enabled
tracePropagationTargets: ["localhost", /^https:\/\/drugs.3steps\.cn\/api/],
// Session Replay
replaysSessionSampleRate: 0.1, // This sets the sample rate at 10%. You may want to change it to 100% while in development and then sample at a lower rate in production.
replaysOnErrorSampleRate: 1.0, // If you're not already sampling the entire session, change the sample rate to 100% when sampling sessions where errors occur.
});

// 运行时配置
// @ts-ignore
Expand Down Expand Up @@ -48,6 +69,7 @@ export const request: RequestConfig = {
baseURL: apiPrefix,
errorConfig: {
errorHandler: (resData) => {
console.log("errorHandler: ", resData);
return {
...resData,
success: false,
Expand Down Expand Up @@ -189,7 +211,9 @@ export const layout: RuntimeConfig = (initialState: any) => {
} else {
return (
<>
{children}
<ErrorBoundary>
{children}
</ErrorBoundary>
</>
);
}
Expand Down
4 changes: 3 additions & 1 deletion studio/src/components/ChatBox/index.tsx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { ReactChatPlugin } from 'biominer-components';
import { filter, set } from 'lodash';
import * as webllm from "@mlc-ai/web-llm";
import { initChat } from '@/components/util';
import { initChat } from '@/components/webllm';
import { useEffect, useState } from 'react';
import { message as AntdMessage } from 'antd';
import rehypeRaw from 'rehype-raw';
Expand Down Expand Up @@ -51,7 +51,9 @@ const ChatBoxWrapper: React.FC<ChatBoxProps> = (props) => {

useEffect(() => {
const initChatBox = async () => {
// @ts-ignore
if (window.chat) {
// @ts-ignore
setChat(window.chat);
} else {
const chat = await initChat();
Expand Down
Loading

0 comments on commit d179879

Please sign in to comment.