-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hub] University of Texas #968
Comments
I'm checking it all out and getting acquainted. Thanks! This is an introductory class, so in the past I've used phpmyadmin for students to browse tables and to create their first tables. That avoids having to teach the CREATE syntax and browsing the tables (rather than using a SELECT * query) helps build analogies to Excel, helping to scaffold students. But phpmyadmin had its own issues, exposing lots of complexity. Looking at pgadmin as the equivalent in the postgres world seems even more complexity (or maybe that's just me adjusting to postgres rather than MySQL/MariaDB (which is fine, I'd been meaning to do that for years anyway). For browsing, any thoughts on this: |
I am cutting a pasting comments from written by @yuvipanda on this hub from another channel: @colliand I've now set up https://utexas-demo.pilot.2i2c.cloud/ with the postgres setup!
They can try it out now! Let me know what else I can do, @colliand. |
Next cut/paste from Yuvi: As an alternative to the ipython-sql magic, I've installed an SQL kernel itself (https://github.com/jupyter-xeus/xeus-sql). You can see this in JupyterLab as an xsql kernel: In notebooks using that, you can connect using:
And then write SQL (with syntax highlighting!) directly in the notebook, without having to use magics. Maybe this is an option. My suggestion is to avoid As an alternative to pgadmin4 / phpmyadmin, we can also consider https://www.adminer.org/. It seems simpler and maybe less confusing. We can also get them MySQL instead of postgres if that makes teaching easier! It's not much more difficult. Finally, when you respond to the UTexas people can you include me too? Thanks |
Ah, thanks @colliand and @yuvipanda very cool stuff. I had played with the xsql bit, but now understand it a bit more. Might indeed be useful. I'm happy to switch teaching to postgres, it's not a lot of work (thanks DBI-API in python!) and it's useful for me to know what is specific to MySQL and postgres in my materials. Plus postgres is a useful keyword for student resumes. Adminer (hah, ok, it's admin-er rather than ad-miner) looks useful for both setting up tables and basic browsing, not much different to phpmyadmin. Ah, and it even allows you to sort the columns by clicking on them (and shows the SQL used to execute the query it is showing, which is fantastic). So, I think adding adminer to the image would be great! Not sure if there is a way to get it to show up in the Launcher? |
@jameshowison great to hear adminer looks good to you. I'll try and get that running. What is your timeline for this class? |
Yes, that would be great. Very tight, I'm afraid. Ideally this would be
working next week.
…On Wed, Feb 2, 2022 at 10:25 AM Yuvi Panda ***@***.***> wrote:
@jameshowison <https://github.com/jameshowison> great to hear adminer
looks good to you. I'll try and get that running. What is your timeline for
this class?
—
Reply to this email directly, view it on GitHub
<#968 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWOUVYXLF7TJZ3F2E3PPTUZFLF7ANCNFSM5NKXIFUQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'm running into some issues with ipython-sql and CREATE DATABASE. Not sure of the right place to ask about this (or the level of engineer support) but I asked about this at the ipython-sql github. I'm also not sure how active that project has been, recently. catherinedevlin/ipython-sql#207 Any advice on location, fixes, support approach welcome! |
@jameshowison I'd say the xeus-sql project (with the xsql kernel) is more active, and might provide more help. So perhaps that's the avenue to explore? |
- Sets it up as a sidecar, so initial user setup is done easily. When https://github.com/yuvipanda/jupyter-pgadmin4-proxy actually works, we might be able to get rid of the sidecar. - Sets the default username and password to a well known string, and protection is offered by jupyter-server-proxy instead. - Upgrade to a more recent version of postgres while we are at it Ref 2i2c-org#968
@jameshowison I spent a bunch of time trying to get adminer to work but failed, unfortunately. Instead, I got pgadmin4 working. You can go to https://utexas-demo.pilot.2i2c.cloud/hub/user-redirect/proxy/absolute/5050/ and after starting your jupyterhub server, it will ask you for credentials. use [email protected], password jovyan. It'll still talk to each user's individual database server, although that will need to be setup in the GUI by the users too. Unideal for sure, but the field doesn't seem broad. If you have suggestions for other software we can try, I'm happy to give it a look. Currently, users don't have any database persistence. We'll have to fix that before getting actual users on it. Disk space gets expensive fast. I'm thinking we can just give users a couple hundred MB of non-ssd storage. How does that sound to you? |
Thanks. Yes, that sort of small amount of storage is fine. I will monitor if any student appears to be taking on a project with large datasources. I'm having a little trouble, perhaps related to the persistence of the database? e.g.: in a xsql kernal notebook I run:
And that works fine. My expectation is that database (and table) would be available in both pgadmin and in other (python kernel) notebooks. But it doesn't seem to be? In pgadmin4 I create a server using localhost as the host; it appears to connect and shows the database (but maybe not the table?), but then seems to disconnect and I get "internal server error" or errors from the dashboard. In another python notebook I run:
and get a I'm also having trouble via
But then class_music_festival is still there. Similarly a
Any ideas? With the sidecar approach the postgres server is running in a different container, right? But we are connecting to localhost, so are the ports on localhost forwarded to the other container? Using docker-compose I used the Docker hostnames and thus the internal networking. See the approach here: https://github.com/howisonlab/wrangling-docker |
Apologies, I think the psql issues are just my unfamiliarity with postgres, got to remember to end all commands/queries with a semicolon!
Let me try the other routes now (xeus-sql and psycopg). |
@jameshowison cool, let me know if the problems persist. Does pgadmin4 still have issues? |
@yuvipanda pgadmin4 seems to connect to the server (user:jovyan pass:) and I can see objects and use the browse/query interface but then it seems to disconnect or lockup. Even "Disconnect Server" from the right-click menu doesn't work. Logging out and logging back in does seem to work. |
- For each user, we create a 1Gi persistent standard non-ssd disk, and put the postgresql data on it. This way, it survives user pod restarts! - 1Gi seems to be the smallest individual disk we can get. We can resize it up automatically if needed. - Code adapted from common.yaml in berkeley-dsep-infra/datahub#2651. Ref 2i2c-org#968
@jameshowison if you restart your server, you'll get persistent database storage now! Each user will get 1GB - which is the smallest possible individual disk on our cloud provider it turns out. I think I know what's up with pgadmin4, I'll try tweak that too |
Well, I'm not sure what is up with pgadmin4 :( The fundamental error is this:
The general consensus is that this might be caused by multiple pgadmin4 processes running, but that's clearly not the case here. I'll try debug again on Monday. |
I'm seeing some version mismatches between the pg client tools on the commandline and the server. Perhaps this is because different versions are used on the main image and the database 'sidecar' image?
psql shows similar messages (but it does run):
|
Matches client version in the user image. Ref 2i2c-org#968 (comment)
@jameshowison #980 I am just downgrading to postgres 10 on the server so it matches. Hope that works ok? Do you need any new features in the latest version? |
That's fine, certainly not in this initial course. |
Given the difficulties with pgadmin (and it's overall complexity), is there any chance that https://github.com/pbugnion/jupyterlab-sql can work? I think it's a useful addition to things (particularly ability to browse tables, which I find really helps orient students). |
@jameshowison I just tried it, unfortunately as suspected it doesn't actually work with a new enough version of JupyterLab: Is GUI access a critical part of the course? |
@jameshowison separate from that, is GitHub authentication where you explicitly add student names good enough for authentication? https://docs.2i2c.org/en/latest/admin/howto/manage-users.html has info on how to add them. |
And finally, if this all works at some point I'll make utexas.pilot.2i2c.cloud (vs current utexas-demo.pilot.2i2c.cloud). |
I'm trying nbgitpuller but running into some issues where the hub returns a 404 error. I'm expecting these to pull a folder into a user's workspace (and then update that folder if the link is clicked again). But currently both produce a 404 at the hub. Here is the folder in the repo I'm testing with: https://github.com/howisonlab/datawrangling-exercises/tree/main/objects_database Here is the link I generated via the webform: https://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fhowisonlab%2Fdatawrangling-exercises%2F&urlpath=lab%2Ftree%2F%2Fobjects_database&branch=main And the one from the extension: http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fobjects_database |
@jameshowison can you confirm that nbgitpuller is installed in the user image for the hub? I just tried importing it in a session on your hub and got: |
Running to class but that is probably the issue. reading the docs I just assumed that we were using the default image and it would be installed: https://docs.2i2c.org/en/latest/admin/howto/content.html#content-nbgitpuller Perhaps the 404 could discuss that? But definitely I'd like to use nbgitpuller, so if we can get that installed that would be great. |
It is in the default image but I bet that @yuvipanda created a custom one when he added the database stuff |
Yep the environment repo is here: https://github.com/2i2c-org/utexas-image/ I added nbgitpuller here: 2i2c-org/utexas-image#1 That auto-pushed the updated image here: https://quay.io/repository/2i2c/utexas-image?tag=latest&tab=tags I went into the hub's configurator and updated the image to the new tag that was uploaded to the link above: https://docs.2i2c.org/en/latest/admin/howto/configurator.html Hit "submit". Then closed my Jupyter server. Started a new one. And when I click this link, it worked! http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fobjects_database |
Ok, we did our first class today, with students logging in and using that nbgitpuller link. All worked. One or two servers were slow to spin up (~3 minutes) but eventually came up. One or two initially had trouble with the nbgitpuller link, getting bounced in various ways, logging out and logging back in worked there (so no idea what the issue was, whether it was time passing that resolved it or the actual log out and back in). I have decided to switch the teaching order and teach the CREATE syntax first, and forgo the browsable tables. This is something I had considered anyway. One difficulty that persists, though, is that postgres uses |
pgadmin4 was using end user IP in its session cookie for extra security - however this doesn't work in kubernetes type environments with a lot of dynamic IPs. The documentation suggests we turn this off, and we do. Ref 2i2c-org#968
@jameshowison I'm still on vacation, but I managed to sneak some airplane time to work on this - and I think pgadmin4 properly works now. Try it out? https://utexas.pilot.2i2c.cloud/hub/user-redirect/proxy/absolute/5050/ |
Yes, this is great, thanks! Much easier to browse the tables (although I think for simplicity I will continue with xeus-sql). But this is crucial for debugging. |
Just leaving this note here in case anyone else is using this setup in the future. I was having trouble finding tables I created after creating a database in a xues-sql notebook. In short when postgres creates a database you have to manually switch to that database (ie it's like See discussion and examples here: jupyter-xeus/xeus-sql#57 |
One challenge we have is students running queries with large amounts of output. This locks up the browser and makes things unresponsive. Could adding an output limiter for jupyterhub help? Something like: |
@jameshowison does this seem to be the cause of the students needing to refresh the browser window? Or is this a different issue unrelated to that? I'm surprised you didn't run into this issue when people were working on their own laptops - how did you manage queries with really large outputs then? The browser stuff is not unique to a JupyterHub, I think... |
I think students are refreshing when hitting kernel issues as well. Good point, @choldgraf. Hmmm, let me think how we did this before. Ah, we were running queries like this using phpmyadmin, which provides paging by default. So we could switch to pgadmin but then there is a lot of copy and paste between windows (and pgadmin is confusing). I'm teaching them to use (and remove) LIMIT 10 as they build up queries on tables. That's good practice anyway, but guard rails are still useful. I also wonder if these issues are primarily when using the xsql kernel, rather than ipython, I will watch for that. Maybe just going to sql magics would help? I might give them that as an option. |
I think we need to see some logs on the container when these errors are occurring. Is there anyway to do that? Can we set this up so that the logs are available outside the container, at least while we sort this out. A friend wondered if endpoint/container networking might be causing intermittent issues? |
Just to confirm - this was a major issue that everybody was experiencing at first, and when we bumped the memory guarantees for each user session, the problem went down substantially. Is that correct? |
I'd say (somewhat guessing) that it is less common now, but still widespread. My TA and I encounter the kernel dying issues sporadically, as do others. |
A second issue arose this morning, when I tried to offer pgadmin as an alternative for executing queries. About 30% of the class ran into "Internal Gateway" errors when trying to follow this link to their pgadmin sidecar: https://utexas.pilot.2i2c.cloud/hub/user-redirect/proxy/absolute/5050/ (pass:[email protected] / jovyan). |
@jameshowison you can look at logs for your running container by looking at Looking at the distribution of memory usage, I don't believe most of the kernel deaths are about memory exhaustion, although that is still possible that the memory spike is so quick and so high that our prometheus metrics (which are sampled every minute) don't catch it. |
Great stuff, thanks @yuvipanda I can test stuff whenever you'd like! Good to know where the logs are, I'm not sure where I would have found to look for that, maybe https://infrastructure.2i2c.org/en/latest/topic/troubleshooting.html or https://infrastructure.2i2c.org/en/latest/howto/operate/index.html But since I'm an administrator and not a 2i2c engineer probably here (where it mentions helping users debug): https://docs.2i2c.org/en/latest/admin/howto/control-user-server.html |
Not related to 2i2c infrastructure setup, I'm guessing, but I ran into a confusing error using nbgitpuller to distribute files. Luckily this one only affected me, not the students. |
I think that this hub is now roughly in a steady state, without major work to be done in setting it up, so I'll close this and we can debug other problems as they pop up via support channels. (note that we do have an incident issue about this still open: #1021 ) |
Hub Description
This is a first pilot hub for the University of Texas. The hub will be administered by James Howison ( @jameshowison ) and used by students in his course. The course is focused on databases so is deployed with a "database sidecar" strategy as suggested by @yuvipanda.
Community Representative(s)
@jameshowison
Important dates
Target start date
2021-02-01
Preferred Cloud Provider
No preference (default)
Preferred Location of the Cloud Resources
No response
Do you have your own billing account?
Hub Authentication Type
GitHub Authentication (e.g., @MyGitHubHandle)
Hub logo
No response
Hub logo URL
No response
Hub image service
No response
Hub image
No response
Extra features you'd like to enable
Other relevant information
James Howison has some questions about the database sidecars.
Hub ID
utexas-demo
Hub Cluster
pilot
Hub URL
https://utexas-demo.pilot.2i2c.cloud/
Hub Type
basehub (with database sidecar)
Tasks to deploy the hub
The text was updated successfully, but these errors were encountered: