Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request pg upgrade for Data 101 Hub #5999

Open
cycomachead opened this issue Aug 23, 2024 · 12 comments
Open

Request pg upgrade for Data 101 Hub #5999

cycomachead opened this issue Aug 23, 2024 · 12 comments
Assignees
Labels
package-request Package addition request for a hub

Comments

@cycomachead
Copy link

Package Name

postgres 16 latest (upgrade from 14.5)

Hub URL

data101.datahub.berkeley.edu

Course Name

DATA 101
https://classes.berkeley.edu/content/2024-fall-data-101-001-lec-001

Semester Details

Fa24 and forward

Installation Deadline

Ideally 8/30/2024
but this change should be safe.

@cycomachead cycomachead added the package-request Package addition request for a hub label Aug 23, 2024
@felder
Copy link
Contributor

felder commented Aug 23, 2024

@cycomachead can you confirm if this is for the client, server, or both? Also in the case of the server, do you have any concerns about upgrading from 14->16 with the existing volumes created using 14? Typically when you upgrade postgresql major versions there's a dump and restore procedure that must be performed.

We wouldn't be doing that in this case so I'm not sure if that would result in the volumes being corrupted. The solution for fixing a corrupted volume in this case would likely be to delete it and let K8s provision an empty one.

@felder
Copy link
Contributor

felder commented Aug 23, 2024

For a client upgrade, the client is currently installed via apt.

In order to get the postgresql16 client we'd likely be pulling the package installation out of apt.txt and we'd place it in environment.yml to be deployed via conda which provides 16.4.

https://anaconda.org/conda-forge/postgresql

@felder
Copy link
Contributor

felder commented Aug 23, 2024

@cycomachead
Copy link
Author

cycomachead commented Aug 23, 2024 via email

@felder
Copy link
Contributor

felder commented Aug 23, 2024

@cycomachead so the way the student postgresql servers work is there is another container that runs in the pod which hosts each student's postgresql server. Additionally, a separate 4GB volume is provisioned for postgresql for each student. Those volumes are persistent (we delete them at the end of each semester). Mongodb works the same as well and it gets a 2GB volume.

For example I do this:

notebook) jovyan@jupyter-felder:~$ psql -h localhost
psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1), server 14.5 (Debian 14.5-1.pgdg110+1))
Type "help" for help.

jovyan=# \l
                              List of databases
   Name    | Owner  | Encoding |  Collate   |   Ctype    | Access privileges 
-----------+--------+----------+------------+------------+-------------------
 jovyan    | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
 template1 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
(4 rows)

jovyan=# create database test;
CREATE DATABASE
jovyan=# \l
                              List of databases
   Name    | Owner  | Encoding |  Collate   |   Ctype    | Access privileges 
-----------+--------+----------+------------+------------+-------------------
 jovyan    | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
 template1 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
 test      | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
(5 rows)

jovyan=# \q
(notebook) jovyan@jupyter-felder:~$ 

Then restart my server and do this, we can see the information persists:

(notebook) jovyan@jupyter-felder:~$ psql -h localhost
psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1), server 14.5 (Debian 14.5-1.pgdg110+1))
Type "help" for help.

jovyan=# \l
                              List of databases
   Name    | Owner  | Encoding |  Collate   |   Ctype    | Access privileges 
-----------+--------+----------+------------+------------+-------------------
 jovyan    | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
 template1 | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | =c/jovyan        +
           |        |          |            |            | jovyan=CTc/jovyan
 test      | jovyan | UTF8     | en_US.utf8 | en_US.utf8 | 
(5 rows)

jovyan=# \q
(notebook) jovyan@jupyter-felder:~$

My concern here is if you or your teaching staff have already done some prep work within postgresql, upgrading the server might not work smoothly because the persistent volume contains a postgresql 14 db. Until I try this in staging, I won't really know what it's going to do. The remedy for an issue here would be removal of the existing persistent volumes so another, empty one, would be provisioned.

As for how many there are, whenever someone signs into data101.datahub.berkeley.edu these volumes are provisioned for them if they do not exist. Last week as part of our routine maintenance I removed all of the volumes except ones owned by you and lisa. Currently, there are 19 of them.

If your course treats these as ephemeral (which IMO is a very good idea since we don't archive them and support for an issue with a specific volume would be limited), then I'd assume there's not an issue with me deleting all of data101s volumes as part of the db upgrade.

Can you confirm that you and lisa don't have any data within your postgresql volumes that you'd miss? If so, I'll proceed.

@felder
Copy link
Contributor

felder commented Aug 23, 2024

In the meantime I can work on upgrading the client since the client is backwards compatible with the server.

@cycomachead
Copy link
Author

cycomachead commented Aug 23, 2024 via email

@felder
Copy link
Contributor

felder commented Aug 23, 2024

@cycomachead unfortunately I'm encountering significant resistance with the client install. Currently there is no easy mechanism for us to pull in postgresql 16 from the pgdg repo (https://www.postgresql.org/download/linux/ubuntu/) via apt and so I was trying to see I could do it via conda. However, I'm having significant dependency conflicts when attempting to run mamba install postgresql==16.4

The way to do this via apt would be to convert the data101 hub build process to use a Dockerfile so that we could add the repo and install from there. However, that is a somewhat significant task which would also require significant testing and that is not really practical this close to the start of instruction.

I'm going to open up an internal ticket for this and invite some others on the team to take a look at it, but I'm not super optimistic.

One technique is to attempt to do this on a running server and modify the installed packages accordingly to try and fix the dependency conflicts. However, I do not feel comfortable uninstalling packages or trying a bunch of different versions of packages because it's indeterminate what the impact of doing that would be on the notebooks your course relies on that use these packages.

Here is sample output of what I see when attempting to install postgresql:

(notebook) jovyan@jupyter-felder:~$ mamba install postgresql==16.4

Looking for: ['postgresql==16.4']

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
pkgs/main/noarch                                              No change
pkgs/r/noarch                                                 No change
pkgs/r/linux-64                                               No change
pkgs/main/linux-64                                            No change

Pinned packages:
  - python 3.11.*


Could not solve for environment specs
The following packages are incompatible
├─ hdf4 is installable with the potential options
│  ├─ hdf4 4.2.15 would require
│  │  └─ jpeg >=9e,<10a , which can be installed;
│  ├─ hdf4 [4.2.11|4.2.12|4.2.13] would require
│  │  └─ jpeg [9* |>=9b,<10a ], which can be installed;
│  ├─ hdf4 [4.2.13|4.2.15] would require
│  │  └─ jpeg >=9d,<10a , which can be installed;
│  ├─ hdf4 4.2.13 would require
│  │  └─ jpeg >=9c,<10a , which can be installed;
│  └─ hdf4 4.2.15 would require
│     └─ libjpeg-turbo [>=2.1.5.1,<3.0a0 |>=3.0.0,<4.0a0 ], which requires
│        └─ jpeg <0.0.0a , which conflicts with any installable versions previously reported;
├─ jpeg is requested and can be installed;
├─ postgresql 16.4  is installable and it requires
│  ├─ krb5 >=1.21.3,<1.22.0a0 , which can be installed;
│  └─ libpq 16.4 h482b261_0, which can be installed;
└─ qt-main is not installable because there are no viable options
   ├─ qt-main 5.15.8 would require
   │  └─ libjpeg-turbo >=3.0.0,<4.0a0 , which cannot be installed (as previously explained);
   ├─ qt-main [5.15.2|5.15.6|5.15.8] would require
   │  └─ krb5 >=1.20.1,<1.21.0a0 , which conflicts with any installable versions previously reported;
   ├─ qt-main 5.15.2 would require
   │  └─ krb5 >=1.19.2,<1.20.0a0 , which conflicts with any installable versions previously reported;
   ├─ qt-main [5.15.3|5.15.4|5.15.6] would require
   │  └─ krb5 >=1.19.3,<1.20.0a0 , which conflicts with any installable versions previously reported;
   ├─ qt-main 5.15.8 would require
   │  └─ libpq >=15.3,<16.0a0  but there are no viable options
   │     ├─ libpq 15.3 conflicts with any installable versions previously reported;
   │     ├─ libpq 15.4 would require
   │     │  └─ openssl >=3.2.0,<3.2.0a0 , which does not exist (perhaps a missing channel);
   │     └─ libpq [15.4|15.5|15.6|15.7|15.8] conflicts with any installable versions previously reported;
   ├─ qt-main 5.15.8 would require
   │  └─ libpq >=15.4,<16.0a0  but there are no viable options
   │     ├─ libpq 15.4, which cannot be installed (as previously explained);
   │     └─ libpq [15.4|15.5|15.6|15.7|15.8] conflicts with any installable versions previously reported;
   ├─ qt-main 5.15.2 would require
   │  └─ libpq >=12.9,<13.0a0 , which conflicts with any installable versions previously reported;
   └─ qt-main 5.15.2 would require
      └─ krb5 >=1.19.4,<1.20.0a0 , which conflicts with any installable versions previously reported.
(notebook) jovyan@jupyter-felder:~$ 

One possibility is to have someone on your course staff attempt to resolve the dependencies while testing to make sure all of your course notebooks continue to work, and then submitting a PR with a new list of packages and their locked versions.

However, let's see if anyone else on the team has better success or other ideas.

@balajialg
Copy link
Contributor

balajialg commented Aug 29, 2024

@cycomachead We will be discussing the a) issues reported by Jon F and b) the next steps for this request during our planning meeting on Sep 3rd. Sorry that we couldn't offer any resolution before the start of the semester.

@cycomachead
Copy link
Author

No worries, appreciate the update!

@balajialg
Copy link
Contributor

@cycomachead During our discussion on Tuesday, we determined that, based on our current bandwidth, we will only be able to address this request before the start of SP 25. If you and Lisa are planning to offer Data 101 during SP 25, we can schedule a quick meeting in November or December to discuss the requirements and plan the implementation details. Let us know!

@cycomachead
Copy link
Author

November / December are totally fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package-request Package addition request for a hub
Projects
None yet
Development

No branches or pull requests

4 participants