-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genie gives rise to a large number of GenieSession files via GenieSessionFileSession with enormous footprint >300GB #702
Comments
I just realised that perhaps this issue should have been filed under Stipple.jl repository. |
@zygmuntszpak the sessions are used to store the state of the model and set it to its latest value after a page refresh for example. So it's not necessary, it's more of a UX feature. A 20 MB serialized session file is pretty large though, so for performance it would be better not to use the session anymore. But the weird part is that it should definitely not write multiple files - there is one session file per user (per browser session). Can you explain a bit how the requests are being made? |
This arises out of a Stipple app that I have made. I will try to create a minimal working example ASAP for you, but the basic structure is something like this: module DataStream
const loop_retrieve_data = Threads.Atomic{Bool}(false)
@genietools
@app begin
@private running = false
@in livestream_checked = false
@private ext_heat_in_temp::DataFrame = filter(sensor -> sensor.point == "EXT_HEAT_IN_TEMP", list_of_sensor_messages ) |> DataFrame
@onchange isready begin
@show "App is loaded"
end
@onchange livestream_checked begin
if livestream_checked
# I pass the __model__ variable here because I want to move all the code for updating the plots outside
# of the @app block. Otherwise, I end up with a lot of business logic in between all of these handlers
# and it makes it all harder to read. The __model__ variable is implicitly created by the macros as far as I see.
# Perhaps what I am doing here is not permitted, and is the cause of all the additional sessions?
spawn_update_plots_task!(__model__)
else
# Stop the existing task via this global atomic variable
DataStream.loop_retrieve_data[] = false
end
end
end
const loop_retrieve_data = Threads.Atomic{Bool}(false)
function spawn_update_plots_task!(__model__)
loop_retrieve_data[] = true
DataStream.update_plots_task[] = @spawn update_plots!(__model__)
errormonitor(DataStream.update_plots_task[])
return nothing
end
function update_plots!(__model__)
println("Task started")
while loop_retrieve_data[]
# Grab the latest readings from the SQL database
list_of_sensor_messages = retrieve_data()
# Update the reactive data fields
__model__.ext_heat_in_temp[] = filter(sensor -> sensor.point == "EXT_HEAT_IN_TEMP", list_of_sensor_messages ) |> DataFrame |> x->pick_subset(__model__, x)
update_ext_heat_in_temp_plot!(__model__)
sleep(1)
end
end
function update_ext_heat_in_temp_plot!(__model__)
__model__.ext_heat_in_temp_trace[] = [scatter(
x = __model__.ext_heat_in_temp[:, "timestamp"],
y = __model__.ext_heat_in_temp[:, "value"],
mode = "lines",
marker = attr(size=10, color="rgba(255, 182, 193, .9)"),
name = "Temperature")]
return nothing
end
end
Then I have an module App
using GenieFramework
@genietools
# Need to do this here because setting the Logging level in SearchLight directly
# does not work (currently broken). If we don't put this log level to warn, then every time we do a SQL query the
# logs and stdout are flooded with info messages summarising the SQL query.
Genie.Configuration.config!(log_level = Genie.Logging.Warn)
using SearchLight
using SearchLightSQLite
using Serialization
using Base.Threads
using Dates
include(joinpath("lib","app","resources","sensors", "Sensors.jl"))
include(joinpath("lib","app","resources","sensors", "SensorsValidator.jl"))
using .Sensors
using .SensorsValidator
export Sensor
include(joinpath("lib","DataStream", "DataStream.jl"))
using .DataStream
@page("/", joinpath("lib", "DataStream", "datastream_ui.jl"), layout = "layout.jl", model = DataStream)
end # module App
|
Thanks - working on a quick patch to allow disabling the storage to session. |
Until the patch is out you can safely delete the files. You can get the the path with: Stipple.ModelStorage.Sessions.GenieSessionFileSession.SESSIONS_PATH[] |
@zygmuntszpak OK, currently tagging a new version of Stipple.jl that allows disabling the model storage. Once it's out you can use it like this: module App
# set up Genie development environment
using GenieFramework
Stipple.enable_model_storage(false)
@genietools
# ... rest of your code Commit here: GenieFramework/Stipple.jl@17bc484 |
Another thing, please make sure you run your deployed app in production env. You can pass With the patch the sessions are still created (as a security feature) but the model is no longer stored. If you still get a high number of sessions, let me know. There should only be one session per user/browser. |
Thank you. I'll also continue trying to produce a MWE for the large number of sessions I was experiencing. Regarding switching to the production environment, is it not sufficient for me to do something like: Genie.Configuration.config!(app_env= "prod") at the start of my code? I also have a suspicion for what might be the cause of #659 so I'll let you know if I make any significant discoveries there. |
Here is a MWE which gives rise to many sessions. I haven't tested this with your latest patch yet. I've attached the project as a standalone package GenieDebug and left only the core pieces to make a proper Genie app. A preview of the main files: module App
using GenieFramework
@genietools
# TODO This needs to be read from a file instead (as in the SearchLight config file)
# We need to do this here because setting the Logging level in SearchLight directly
# does not work (currently broken)
Genie.Configuration.config!(log_level = Genie.Logging.Warn)
using Base.Threads
include("DataStream/DataStream.jl")
using .DataStream
@page("/", joinpath("DataStream", "datastream_ui.jl"), layout = "layout.jl", model = DataStream)
end # module App
module DataStream
using GenieFramework
using DataFrames
using PlotlyBase
using Base.Threads
@genietools
const update_plots_task::Ref{Task} = Ref(Task(nothing))
const loop_retrieve_data = Threads.Atomic{Bool}(false)
@app begin
@in livestream_checked = false
@private ext_heat_in_temp::DataFrame = DataFrame(timestamp=1:10000, value=rand(10000))
@out ext_heat_in_temp_trace = [scatter()]
@out ext_heat_in_temp_layout = PlotlyBase.Layout(
xaxis_title = "Time",
yaxis_title = "Temperature (Celcius)",
title = "External Heat In Temperature"
)
@onchange isready begin
@show "App is loaded"
end
@onchange livestream_checked begin
if livestream_checked
spawn_update_plots_task!(__model__)
else
# Stop the existing task via this global atomic variable
DataStream.loop_retrieve_data[] = false
end
end
end
function retrieve_data()
return rand(100)
end
function update_plots!(__model__)
println("Task started")
while loop_retrieve_data[]
# Update the reactive data fields
__model__.ext_heat_in_temp[] = DataFrame(timestamp=1:10000, value=rand(10000))
update_ext_heat_in_temp_plot!(__model__)
println("Inside Reading Task Timestamp: " * string(Dates.now()))
sleep(1)
end
end
function spawn_update_plots_task!(__model__)
loop_retrieve_data[] = true
DataStream.update_plots_task[] = @spawn update_plots!(__model__)
errormonitor(DataStream.update_plots_task[])
return nothing
end
function update_ext_heat_in_temp_plot!(__model__)
__model__.ext_heat_in_temp_trace[] = [scatter(
x = __model__.ext_heat_in_temp[:, "timestamp"],
y = __model__.ext_heat_in_temp[:, "value"],
mode = "lines",
marker = attr(size=10, color="rgba(255, 182, 193, .9)"),
name = "Temperature")]
return nothing
end
end
header(class="st-header q-pa-sm",
checkbox("Live Stream", :livestream_checked)
)
cell(class = "st-module",
[
Stipple.Html.div(class = "q-mt-pa",
[
plot(:ext_heat_in_temp_trace, layout = :ext_heat_in_temp_layout)
])
])
cell(style="display: flex; justify-content: space-between; align-items: center; background-color: #112244; padding: 10px 50px; color: #ffffff; top: 0; width: 100%; box-sizing: border-box;", [
cell(style="font-size: 1.5em; font-weight: bold;",
"Debug App"
),
Html.div(style="display: flex; gap: 20px;", [
a(href="/", style="text-decoration: none; color: #ffffff;",
"Data Stream"
)
])
])
page(model, partial=true, [@yield]) |
The flag needs to be set before Genie.loadapp() is called. Say you are launch the app from a SSH connection, you would do
Otherwise, you can switch the application to PROD by default by editing the
https://learn.genieframework.com/docs/reference/server/configuration |
I just released a patch so that we can also set |
This does change the app env but if you put it there it's run very late in the load order, so its impact is only partial (and so are the benefits). |
@zygmuntszpak thanks for the MWE. I've run it locally and I can't reproduce the issue of excessive sessions files being created. In my tests it works as expected by creating N+1 sessions (where N is the number of clients/browsers connected). The +1 is from |
Oh! Now I see it. When you check "Live Stream" it creates a few sessions per second. |
@zygmuntszpak OK, I found the issue. It is caused by using DataStream.update_plots_task[] = update_plots!(__model__) then the issue goes away. Clearly it has to do with running the update and the model in different threads/processes but I don't fully understand what exactly causes it, that would take a lot more debugging. I suggest running without FYI @hhaensel interesting bug here :) |
In reality I have a lot more streams and I run some analysis and data wrangling operations and after those are complete I take the result and plot it. The reason why I put those in a different thread is not to hold up the responsiveness of the server to other actions a user might perform. So while in this instance not spawning a separate task on a seperate thread makes no difference, in my actual use case it does because it may take several seconds to complete processing between plot updates. Interesting that you pointed out the updates may not arrive in the right order. Are you referring to the situation when an update for a different reactive variable is triggered in the main thread, and and an update for the plots is triggered in this separate thread I spawned, and then the separate thread might transmit before the first? If all the updates to the plots happen in the separate thread, then surely those plot updates should happen in the right order because I'm only spawning one separate thread which is executing the plot updates in sequence. I think that spawning a thread and triggering updates is the reason for occasional closed connections from the browser that we are seeing in the other issue. I think that some data race condition is created so that while genie is trying to construct a websocket frame to transmit to the browser due to an update of a reactive variable, the reactive variable data changes midway and a corrupted websocket frame is generated. This may explain why it is difficult to consistently reproduce the closed connection issue (because it relies on an occasional data race condition). My current thinking is to introduce a channel to communicate between the separate thread which will retrieve the data, do so processing and push it into a channel when it is ready to be plotted. Then the main app must read from the channel and update the plots when they are ready. |
@zygmuntszpak Have you run the app with the latest version of Stipple and with |
@essenciary I have just now tested Out of curiosity, I added a println of
I couldn't get this to work. I added a julia> Genie.loadapp()
██████╗ ███████╗███╗ ██╗██╗███████╗ ███████╗
██╔════╝ ██╔════╝████╗ ██║██║██╔════╝ ██╔════╝
██║ ███╗█████╗ ██╔██╗ ██║██║█████╗ ███████╗
██║ ██║██╔══╝ ██║╚██╗██║██║██╔══╝ ╚════██║
╚██████╔╝███████╗██║ ╚████║██║███████╗ ███████║
╚═════╝ ╚══════╝╚═╝ ╚═══╝╚═╝╚══════╝ ╚══════╝
| Website https://genieframework.com
| GitHub https://github.com/genieframework
| Docs https://genieframework.com/docs
| Discord https://discord.com/invite/9zyZbD6J7H
| Twitter https://twitter.com/essenciary
Active env: PROD
┌ Warning:
│ No secret token is defined through `Genie.Secrets.secret_token!("token")`. Such a token
│ is needed to hash and to encrypt/decrypt sensitive data in Genie, including cookie
│ and session data.
│
│ If your app relies on cookies or sessions make sure you generate a valid token,
│ otherwise the encrypted data will become unreadable between app restarts.
│
│ You can resolve this issue by generating a valid `config/secrets.jl` file with a
│ random token, calling `Genie.Generator.write_secrets_file()`.
│
└ @ Genie.Secrets C:\Users\zygmu\.julia\packages\Genie\5qchC\src\Secrets.jl:27
Loading appERROR: LoadError: SystemError: opening file "C:\\Users\\zygmu\\.julia\\dev\\GenieDebug\\DataStream\\config\\env\\global.jl": No such file or directory
in expression starting at C:\Users\zygmu\.julia\dev\GenieDebug\DataStream\DataStream.jl:1
Stacktrace:
[1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
@ Base .\error.jl:176
[2] kwcall(::NamedTuple{(:extrainfo,), Tuple{Nothing}}, ::typeof(systemerror), p::String, errno::Int32)
@ Base .\error.jl:176
[3] kwcall(::NamedTuple{(:extrainfo,), Tuple{Nothing}}, ::typeof(systemerror), p::String)
@ Base .\error.jl:176
[4] #systemerror#82
@ .\error.jl:175 [inlined]
[5] systemerror
@ .\error.jl:175 [inlined]
[6] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
@ Base .\iostream.jl:293
[7] open
@ .\iostream.jl:275 [inlined]
[8] open(f::Base.var"#418#419"{String}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Base .\io.jl:393
[9] open
@ .\io.jl:392 [inlined]
[10] read
@ .\io.jl:473 [inlined]
[11] _include(mapexpr::Function, mod::Module, _path::String)
@ Base .\loading.jl:1955
[12] include
@ .\Base.jl:457 [inlined]
[13] bootstrap(context::Module; show_banner::Bool)
@ Genie.Loader C:\Users\zygmu\.julia\packages\Genie\5qchC\src\Loader.jl:78
[14] kwcall(::NamedTuple{(:show_banner,), Tuple{Bool}}, ::typeof(Genie.Loader.bootstrap), context::Module)
@ Genie.Loader C:\Users\zygmu\.julia\packages\Genie\5qchC\src\Loader.jl:64
[15] top-level scope
@ C:\Users\zygmu\.julia\packages\GenieFramework\VrbUK\src\GenieFramework.jl:113
in expression starting at C:\Users\zygmu\.julia\dev\GenieDebug\app.jl:15
Ready!
It appears to be looking for the I ended up getting the production environment to work by creating |
Actually, I'm surprised that the current app has global state. The current documentation states:
route("/") do
model = @init
@show model
page(model, ui()) |> html
end Example 1 of the Stipple repo says that one must declare a global model in the following manner: route("/") do
global model
model = Name |> init
page(model, ui()) |> html
end I was actually interested in making a global model, and assumed that @page("/", joinpath("DataStream", "datastream_ui.jl"), layout = "layout.jl", model = DataStream) The documentation explains:
However, when I @macroexpand @page("/", joinpath("DataStream", "datastream_ui.jl"), layout = "layout.jl", model = DataStream)
:(Stipple.Pages.Page(layout = "layout.jl", model = DataStream, context = Main, "/", view = joinpath("DataStream", "datastream_ui.jl"))) which makes no reference to I noticed that there was some discussion about global models which seems to suggest some kind of alternative, but I couldn't glean enough from the discussion to understand how to proceed. So in summary, I am actually surprised that I have a global state. I'm curious which part of my GenieDebug app sets the global state, and what one would in principle do to turn it off. In my actual use case, I have a multi-page app and so would like to understand how to toggle global versus non-global models in such a scenario.
Using your latest patch with websocket-recording.mp4connected-clients.mp4I do, however, see invalid frame header errors from time to time which I suspect are due to some data race condition. Regarding the Here is the Manifest file in case it proves useful: |
@zygmuntszpak thanks for the feedback. The global requires some debugging, I don't have deep knowledge about that area as I try to never use global models. I think @PGimenez and @hhaensel might understand that better. But I'll try to take some time to dig into that myself. I suspect it has to do with explicitly passing the module into Thanks for sharing the manifest file, I'll try it out. Sounds weird that it still uses the temp folder. Sessions however, are still used. The use of session is two fold: |
Thanks for the alternative GenieDebug structure. I see what you mean regarding the async task. The issue is that it appears to no longer be possible to end the loop because @onchange livestream_checked begin
@warn "Livestream checked: $livestream_checked"
@async begin
while livestream_checked
@warn livestream_checked
ext_heat_in_temp_trace = update_stream()
sleep(1)
yield()
end
end |> errormonitor
end However, I upped the number of points that would need to be plotted to slow down the app and noticed something rather odd. After unchecking the |
Yes, this, coupled with the high number of sessions seems to indicate that the async task gets disconnected from ws updates. |
This is an interesting case that is worth diving into. I suspect that it has to do with data sync across threads/workers. I'm not an expert and need to research more, but if some data serialization is involved, then I expect a websocket connection can not be serialized and restored. |
Apparently the issue is that when the variable's value is changed in the browser, the new value is propagated to the backend but not to the async task running the loop. This could be a limitation of Observables.jl, which is what Stipple uses for reactivity. To make your loop work, you'd need to define another variable to control the loop, and change its value from the Julia code instead of from the browser. Here's what I did to make it work: @private run_livestream = false
@onchange livestream_checked begin
run_livestream = livestream_checked
@warn "Livestream checked: $livestream_checked"
@async begin
while run_livestream
@warn livestream_checked
@warn run_livestream
ext_heat_in_temp_trace = update_stream()
sleep(1)
end
end |> errormonitor
end
|
However, I think this issue is specific to your MWE @zygmuntszpak, perhaps because of the configuration changes. For example, this app works fine and I'm turning off the loop with a button in the browser: module App
using GenieFramework
@genietools
@app begin
@in running = false
@in x = 0.00
@in spawn = false
@onbutton spawn begin
if !running
running = true
@async begin
x = 0
while x <= 100 && running
x = x + 1
sleep(1)
end
x = 0
running = false
end
end
end
end
function ui()
[btn("Spawn task", @click(:spawn)),btn("Stop task", @click("running = false")),bignumber("Counter", :x)]
end
@page("/", ui)
Server.isrunning() || Server.up()
end
|
Describe the bug
I'm writing an app (work related) which continually ingests an MQTT stream of IoT sensors, does some analysis and dynamically plots it using Stipple. The app is meant to run continually, but within 24 hours the PC runs out of disk space because of an enormous number of session files that are serialized to a temporary folder. There are multiple session files created within a minute, and each session file eventually becomes 20mb large. The sheer number of these files eventually crashed the program because there is no more disk space:
To reproduce
I need to construct a separate minimal Stipple app, run it for a while and monitor the number of session files to see if it will behave the same as my current app. As far as I can see I am not doing anything unusual in my current program. I am sending a lot of datapoints for plotting to the frontend, which is probably the reason for the large file size per session. However, I don't understand why so many sessions are created since I am simply starting the server once and letting it run continuously.
Expected behavior
I don't fully understand what is meant to be stored in a Session (i.e. why they become so large), and why so many temporary files are created. I obviously don't expect to generate 300GB of temporary files within 24 hours. Is persisting the session files to disk absolutely necessary? Perhaps I could periodically delete some of them (e.g. run a cleanup every 30 minutes to delete the oldest files etc?)
Additional context
Please include the output of
julia> versioninfo()
and
pkg> st
Please answer these optional questions to help us understand, prioritise, and assign the issue
1/ Are you using Genie at work or for hobby/personal projects?
This is a work project, and I am busy running a trial data capture. I was surprised to run out of disk space within a day.
2/ Can you give us, in a few words, some details about the app you're building with Genie?
I'm writing an app (work related) which continually ingests an MQTT stream of IoT sensors, does some analysis and dynamically plots it using Stipple. The app is meant to run continually.
The text was updated successfully, but these errors were encountered: