-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
eefe523
commit 715ff1a
Showing
1 changed file
with
155 additions
and
167 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,221 +1,223 @@ | ||
!!! info "Objectives" | ||
# File transfer to/from Bianca | ||
|
||
We'll go through the methods to transfer files | ||
!!!- info "Learning objectives" | ||
|
||
- wharf | ||
- transit server | ||
- rsync, scp/sftp | ||
- pros/cons of different solutions | ||
!!! warning | ||
|
||
It is important to keep the entire chain of transferring the data secure | ||
- Understand what the wharf is | ||
- Understand what the Transit server allows | ||
- Transfer files to/from Bianca using rsync | ||
- Transfer files to/from Bianca using FileZilla | ||
|
||
## How does it work? | ||
???- question "For teachers" | ||
|
||
![Bianca](./img/biancaorganisation-01.png) | ||
Prerequisites are: | ||
|
||
### The `wharf` | ||
- None | ||
|
||
!!! info "`wharf` is a harbour dock" | ||
Teaching goals are: | ||
|
||
- The `wharf` area can be reached from both Bianca and any other place on Bianca. | ||
- Therefore, it serves as a bridge between Internet and Bianca. | ||
|
||
## Data transfers: | ||
- <https://www.uppmax.uu.se/support/user-guides/bianca-user-guide/> | ||
- section 3: Transfer files to and from Bianca | ||
- Learners understand what the wharf is | ||
- Learners understand that the Transit server serves | ||
like a bridge between locations | ||
- Learners have transferred files to/from Bianca using rsync | ||
- Learners have transferred files to/from Bianca using FileZilla | ||
|
||
### The `wharf` location on Bianca | ||
|
||
- The path to this folder, once you are logged into your project's cluster, is: | ||
Lesson plan: | ||
|
||
`/proj/<projid>/nobackup/wharf/<username>/<username>-<projid>` | ||
E.g. | ||
`/proj/sens2023598/nobackup/wharf/myuser/myuser-sens2023598` | ||
```mermaid | ||
gantt | ||
title File tranfer to/from Bianca | ||
dateFormat X | ||
axisFormat %s | ||
section First hour | ||
Course introduction: done, course_intro, 0, 15s | ||
Introduction : intro, after course_intro, 5s | ||
Theory 1: theory_1, after intro, 10s | ||
Exercise 1: crit, exercise_1, after theory_1, 20s | ||
Feedback 1: feedback_1, after exercise_1, 10s | ||
Break: milestone, after feedback_1 | ||
section Second hour | ||
Exercise 2: crit, exercise_2, 0, 10s | ||
Feedback 2: feedback_2, after exercise_2, 10s | ||
SLURM: done, slurm, after feedback_2, 25s | ||
Break: done, milestone, after slurm | ||
``` | ||
|
||
- To transfer data from Bianca, copy the files you want to transfer here. | ||
- To get the files transferred to the `wharf` area from outside, move the files to you project folder or home folder. | ||
|
||
- Please note that in the `wharf` you only have access to upload your files to the directory that is named: | ||
`<username>-<projid>` | ||
e.g. | ||
`myuser-sens2023598` | ||
As the video is 11 minutes, I assume around 3x as much time. | ||
|
||
## Why? | ||
|
||
Most users need to transfer files to/from Bianca, | ||
for example, their scripts to analyse their (sensitive) data. | ||
|
||
## Methods | ||
--- | ||
- GUI sftp clients | ||
- Using standard command line sftp client | ||
- Transit Server from/to Rackham | ||
- Mounting the wharf on your local computer | ||
In this session, we will transfer (non-sensitive) files to/from Bianca. | ||
|
||
## GUI sftp clients | ||
--- | ||
- Please notice that **SFTP is NOT the same as SCP**. | ||
Be sure to really use a SFTP client -- not just a SCP client. | ||
## Terms | ||
|
||
- Also be aware that many SFTP clients use reconnects (with a cached version of your password). This will not work for Bianca, because of the second factor authentication! Other clients try to use multiple connections with the same password, which will fail as well. | ||
```mermaid | ||
flowchart LR | ||
subgraph sunet[SUNET] | ||
subgraph bianca[Bianca] | ||
wharf | ||
end | ||
transit[transit server] | ||
user[User in SUNET\nUser on Rackham\nUser on other NAISSS clusters\n] | ||
wharf <--> transit | ||
transit <--> user | ||
end | ||
``` | ||
|
||
- So for example with the command line SFTP client LFTP, you need to "set net:connection_limit 1". LFTP may also defer the actual connection until it's really required unless you end your connect URL with a path. | ||
As Bianca is a sensitive data cluster, we need to know: | ||
|
||
- An example command line for LFTP would be | ||
- [wharf](http://docs.uppmax.uu.se/cluster_guides/wharf/): a folder | ||
on Bianca that is the only folder one can transfer data to/from | ||
- [Transit](http://docs.uppmax.uu.se/cluster_guides/transit/): | ||
a service that allows one to transfer files between Bianca | ||
and other places, such as your local computer, | ||
but also other sensitive data clusters | ||
|
||
`lftp sftp://<username>-<projname>@bianca-sftp.uppmax.uu.se/<username>-<projname>/` | ||
## Software | ||
|
||
### WinSCP (Windows) | ||
- Connect from local computer | ||
There are many ways to [tranfer files to/from Bianca](http://docs.uppmax.uu.se/cluster_guides/transfer_bianca/). | ||
|
||
![WinSCP](./img/winscp-snaphot1.png) | ||
In this session, we use: | ||
|
||
### Filezilla (Linux/MacOS/Windows) | ||
- Asks for password every time you transfer files | ||
- Connect from local computer | ||
- [File transfer to/from Bianca using rsync](http://docs.uppmax.uu.se/cluster_guides/bianca_file_transfer_using_rsync/): | ||
the recommended way to do so | ||
- [File transfer to/from Bianca using FileZilla](http://docs.uppmax.uu.se/cluster_guides/bianca_file_transfer_using_filezilla/): | ||
the user-friendly way to do so | ||
|
||
![FileZilla](./img/filezilla-snapshot.png) | ||
We will use `rsync` first, as this is the UPPMAX-recommended way, | ||
as it is capable of transferring files of any size efficiently. | ||
|
||
FileZilla is easier to use and its guide is easier to go through | ||
without an UPPMAX expert. | ||
|
||
|
||
## Using standard sftp client (command line) | ||
--- | ||
<https://www.uppmax.uu.se/support/user-guides/basic-sftp-commands/> | ||
## Exercises | ||
|
||
```bash | ||
$ sftp -q <username>-<projid>@bianca-sftp.uppmax.uu.se | ||
``` | ||
Ex. | ||
```bash | ||
$ sftp -q [email protected] | ||
``` | ||
### Exercise 1: using rsync | ||
|
||
The `-q` flag is to be quiet (not showing the banner intended to help someone trying to ``ssh`` to the host), if your client does not support it, you can just skip it. | ||
???- info "Learning objectives" | ||
|
||
Use your normal UPPMAX password directly followed by | ||
the six digits from the second factor application. | ||
- Understand what the wharf is | ||
- Understand what the Transit server allows | ||
- Transfer files to/from Bianca using rsync | ||
|
||
Ex. if your password is "VerySecret" and the second factor code is 123 456 you would type VerySecret123456 as the password in this step. | ||
- Individually, read: | ||
|
||
Once connected you will have to type the sftp commands to upload/download files. Have a look at the Basic SFTP commands guide to get started with it. | ||
- Together, set a timer for 10 minutes | ||
- Individually, answer the questions within the time limit | ||
- Together, write down a shared answer on the GitHub project repository | ||
with path `learners/[a teammember's name]/pair_programming.md` | ||
- Upload the file to the GitHub repo. | ||
Use the GitHub web interface if pushing is a problem! | ||
|
||
Please note that in the wharf you only have access to upload your files to the directory that is named: | ||
Questions: | ||
|
||
`<username>-<projid>` e.g. `myuser-sens2023598` | ||
- What is pair programming? | ||
- How does a good pair behave? Describe what can be observed when pairing online | ||
- When to switch roles? Give a procedure | ||
- What effects does pair programming have? | ||
|
||
Example: | ||
```bash | ||
$ sftp -q [email protected] | ||
[email protected]'s password: | ||
???- question "Answers" | ||
|
||
sftp> ls | ||
pmitev-sens2023598 | ||
> - What is pair programming? | ||
|
||
sftp> cd pmitev-sens2023598 | ||
sftp> | ||
``` | ||
Pair programming is a software development practice | ||
in which two developers work on the same computer. | ||
The person with the keyboard ('the driver') develops new code. | ||
The person without the keyboard ('the navigator') reviews the code. | ||
|
||
Alternatively, you can specify this at the end of the sftp command, so that you will always end up in the correct folder directly. | ||
> - How does a good pair behave? Describe what can be observed when pairing online | ||
|
||
```bash | ||
$ sftp -q <username>-<projid>@bianca-sftp.uppmax.uu.se:<username>-<projid> | ||
``` | ||
E.g. | ||
```bash | ||
`$ sftp -q [email protected]:myuser-sens2023598 | ||
``` | ||
- `sftp` supports a recursive flag `-r` to upload (`put -r folder_name`) or download (`get -r folder_name`) entire folders and subfolders. | ||
In an online course: | ||
|
||
## Transit server | ||
--- | ||
- To facilitate secure data transfers to, from, and within the system for computing on sensitive data a special service is available via SSH at `transit.uppmax.uu.se`. | ||
- A good pair has the driver sharing his/her screen | ||
- In a good pair, both people talk a lot | ||
- A good pair switches roles regularly | ||
- A good pair has a lot of commits | ||
|
||
```bash | ||
Transit server | ||
> - When to switch roles? Give a procedure | ||
|
||
You can mount bianca wharf with the command | ||
Any procedure to achieve the goal of regularly switching roles: | ||
|
||
mount_wharf PROJECT [path] | ||
- after enough work has been done to put in a `git commit` | ||
such as 'Add documentation', 'Add test', 'Pass test' | ||
- each time a timer goes off, e.g. after 5 minutes | ||
|
||
If you do not give a path the mount will show up as PROJECT in your home | ||
directory. | ||
The first procedure sometimes fails when a driver (thinks he/she) | ||
has much more knowledge than the navigator on the subject | ||
and is (apparently) inexperienced in good pair programming. | ||
In such cases, the second procedure work better. | ||
|
||
Note; any chagnes you do to your normal home directory will not persist. | ||
``` | ||
- Example | ||
> - What effects does pair programming have? | ||
|
||
```bash | ||
$ ssh [email protected] | ||
All material for this exercise show references to studies that | ||
show advantages of pair programming, | ||
for example (from two Wikipedia references): | ||
|
||
my_user@transit:~$ mount_wharf sens2023531 | ||
Mounting wharf (accessible for you only) to /home/<user>/sens2023531 | ||
<user>[email protected]'s password: | ||
``` | ||
- Enter password + F2A | ||
* a pair considers more alternative ways for a solution [Flor et al., 1991] | ||
* 96% of developers prefer pair programming over developing alone [Williams & Kessler, 2000] | ||
|
||
```bash | ||
my_user@transit:~$ ls sens2023531/ | ||
my_user@transit:~$ | ||
``` | ||
However, the first study uses only 2 programming teams, | ||
the second study 41 self-selected respondents. | ||
One can/should be critical on these studies. | ||
|
||
- Note that your home directory is mounted _read-only_, any changes you do to your "local" home directory (on transit) will be lost upon logging out. | ||
Yet, for teaching, working is groups has a high effect size [Hattie, 2012], | ||
where the optimal group size is two [Schwartz & Gurung, 2012]. | ||
|
||
- You can use commands like ``rsync``, ``scp`` to fetch data and transfer it to your bianca wharf. | ||
- You can use cp to copy from Rackham to the wharf | ||
- Remember that you cannot make lasting changes to anything except for mounted wharf directories. Therefore you have to use rsync and scp to transfer from the ``wharf`` to Rackham. | ||
- The mounted directory will be kept for later sessions. | ||
### Exercise 2: practice pair programming | ||
|
||
### Moving data from transit to Rackham | ||
- **On Rackham:** (_or other computer_) copy files to Bianca via transit: | ||
```bash | ||
# scp | ||
scp path/my_files [email protected]:sens2023531/ | ||
???- info "Learning objectives" | ||
|
||
# rsync | ||
rsync -avh path/my_files [email protected]:sens2023531/ | ||
``` | ||
- Practice pair programming | ||
- Practice to convert class diagrams to real code | ||
|
||
- **On transit:** copy files to Bianca from Rackham (_or other computer_) | ||
```bash | ||
# scp | ||
scp [email protected]:path/my_files ~/sens2023531/ | ||
Before doing the exercises: | ||
|
||
# rsync | ||
rsync -avh [email protected]:path/my_files ~/sens2023531/ | ||
``` | ||
- Reach an agreement on how to do pair programming: among others, | ||
decide upon the first driver and when to switch roles. | ||
|
||
:book: `rsync` [tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-rsync-to-sync-local-and-remote-directories) for beginners. | ||
The exercise, to be done as a pair: | ||
|
||
:warning: Keep in mind that project folders on Rackham are not available on transit. | ||
- In the course's shared document, there is a list of classes | ||
extracted from the design document. Assign yourselves to write a class together | ||
- Find the GitHub repository of a Programming Formalism student project | ||
done in an earlier cohort. Find where the Python code for classes ended up. | ||
Look for the Python code of the most simple class. | ||
- Write the minimal code of your class together. | ||
Share code by `push`ing it to the `main` branch. | ||
'Minimal code' means only the name of the class, without any behavior! | ||
|
||
### Moving data between projects | ||
Reflect: | ||
|
||
- You can use transit to transfer data between projects by mounting the wharfs for the different projects and transferring data with ``rsync``. | ||
- Note that you may of course only do this if this is allowed (agreements, permissions, etc.) | ||
- Were roles swapped often enough? | ||
- Did you solve unexpected problems well? | ||
- Did the driver always share his/her screen? | ||
- Did each team member contribute? | ||
- Did each team member contribute to the code in the Python class? | ||
|
||
???- question "Answers to what needs to be done" | ||
|
||
### Software on Transit | ||
The hardest part will be to understand how little needs to be done here. | ||
|
||
- While logged in to Transit, you cannot make lasting changes to anything except for mounted wharf directories. However, anything you have added to your Rackham home directory is available on Transit. In addition, some modules are available. | ||
- SciLifeLab Data Delivery System - [https://delivery.scilifelab.se/](https://delivery.scilifelab.se/) | ||
A file needs to be created at `src/bacsim/[class_name].py`. | ||
For example, for a coordinate, | ||
this file will be called `src/bacsim/coordinate.py` | ||
|
||
```bash | ||
# Load the tool from the software module tree | ||
module load bioinfo-tools dds-cli | ||
The contents of the file is -maybe unexpectedly- minimal. | ||
Here I show a good example from [an earlier Programming Formalisms cohort](https://github.com/programming-formalisms/programming_formalisms_project_autumn_2023/blob/main/src/pfpa2023/coordinate.py): | ||
|
||
# Run the tool | ||
dds | ||
``` | ||
![dds-cli](./img/dds-cli.png) | ||
```python | ||
"""A coordinate somewhere in space.""" | ||
|
||
- To download data from TCGA, log in to Rackham and install the GDC client to your home directory. Then log in to Transit, mount the wharf, and run ./gdc-client. | ||
class Coordinate: | ||
|
||
"""Where am I?.""" | ||
``` | ||
|
||
## NGI Deliver | ||
|
||
- Not covered here but | ||
- <https://www.uppmax.uu.se/support/user-guides/deliver-user-guide/> | ||
- <https://www.uppmax.uu.se/support/user-guides/grus-user-guide/> | ||
|
||
|
||
!!! info "Summary" | ||
|
@@ -232,17 +234,3 @@ rsync -avh [email protected]:path/my_files ~/sens2023531/ | |
- transit server | ||
- rsync, scp/sftp | ||
|
||
## Mounting the SFTP-server with ``sshfs`` on you local machine | ||
--- | ||
**Mount the wharf on your machine** | ||
|
||
- This is only possible on your own system. | ||
- ``sshfs`` allows you to mount the ``wharf`` on your own machine. | ||
- You will be able to copy and work on the data using your own local tools such as ``cp`` or ``vim``. | ||
- Remember that you are neither logged in on the distant server, nor is the data physically on your local disk (until you have copied it). | ||
|
||
!!! warning | ||
- UPPMAX doesn't have ``sshfs`` client package installed for security reasons. | ||
- ``sshfs`` is available on most Linux distributions: | ||
- install the package ``sshfs`` on Ubuntu, | ||
- ``fuse-sshfs`` on Fedora, RHEL7/CentOS7 (enable EPEL repository) and RHEL8 (enable codeready-builder repository) / CentOS8 (enable powertools repository). |