-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Stratify subjects are associated to 4 recruitment centres, LONDON, SOUTHAMPTON, BERLIN and AACHEN, and further divided into patients and controls.
Only acquisition centres can convert between identifying recruitment data and PSC1 codes.
We associate a specific prefix P to PSC1 codes of each of the resulting classes:
Patient | Control | |
---|---|---|
LONDON | 010001 (600 subjects) | 010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
1200 | 450 |
After discarding 100 LONDON patient codes and adding 250 BERLIN patient codes on 21/12/2017:
Patient | Control | |
---|---|---|
LONDON | 010001 (500 subjects) | 010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
BERLIN | 040001 (250 subjects) | |
1350 | 450 |
After adding 100 patients/subjects from AACHEN on 17/09/2018:
Patient | Control | |
---|---|---|
LONDON | 010001 (500 subjects) | 010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
BERLIN | 040001 (250 subjects) | |
AACHEN | 091001 (40 subjects) | 091000 (60 subjects) |
1390 | 510 |
After discarding 1 LONDON patient code to deal with a patient who was mistakenly assigned a control code:
Patient | Control | |
---|---|---|
LONDON | 010001 (499 subjects) | 010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
BERLIN | 040001 (250 subjects) | |
AACHEN | 091001 (40 subjects) | 091000 (60 subjects) |
1389 | 510 |
After discarding 5 LONDON patient codes to deal with 5 patients who were mistakenly assigned control codes:
Patient | Control | |
---|---|---|
LONDON | 010001 (494 subjects) | 010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
BERLIN | 040001 (250 subjects) | |
AACHEN | 091001 (40 subjects) | 091000 (60 subjects) |
1384 | 510 |
After adding 200 new LONDON patient codes in 2021:
Patient | Control | |
---|---|---|
LONDON | 010001 (494 subjects) 010002 (62 subjects) 010003 (53 subjects) 010004 (20 subjects) 010005 (65 subjects) |
010000 (50 subjects) |
SOUTHAMPTON | 090001 (600 subjects) | 090000 (400 subjects) |
BERLIN | 040001 (250 subjects) | |
AACHEN | 091001 (40 subjects) | 091000 (60 subjects) |
1584 | 510 |
Please note that LONDON control subjects should be recruited from the Imagen cohort, so we do not need to generate new specific pseudoyms for them. Just re-use existing Imagen pseudonym codes. A limited set of 50 new such codes have nevertheless been generated, just in case some LONDON control subjects are recruited outside the Imagen cohort.
Pseudonyms are generated for all subjects of the above defined classes. These 12-digit codes are a concatenation of:
- a prefix P made of 6 digits, as documented in the table above,
- a main code C made of 5 digits, unique across all subjects (whether Imagen or Stratify),
- a check digit D made of a single digit, and obtained by applying the Damm algorithm to the concatenation of P and C, to detect invalid codes.
We make sure the Damerau–Levenshtein distance between the concatenation of C and D for any two subjects is at least 3, in order to mitigate the risk of manual input errors.
We ran Python script stratify_generate_psc1.py
as follows:
stratify_generate_psc1.py | sort > stratify_codes_2017-07-20.txt
stratify_generate_psc1_berlin.py | sort > stratify_codes_berlin_2017-12-13.txt
stratify_generate_psc1_aachen.py | sort > stratify_psc2_aachen_2018-09-17.txt
stratify_generate_psc1_london_2021.py | sort | head -200 > stratify_codes_2021-11-16.txt
Only NeuroSpin, acting as a trusted third party, can convert between PSC1 and PSC2 codes.
We associate a specific prefix to the PSC2 codes of patients and controls:
prefix | |
---|---|
Patient | 0001 |
Control | 0000 |
This is consistent with Imagen:
- Imagen subjects already use
0000
as a PSC2 prefix. - Some LONDON Imagen subjects will be used as Stratify controls.
The 12-digit PSC2 pseudonym codes are a concatenation of:
- a prefix P made of 4 digits, as documented in the table above,
- a main code C made of 7 digits, unique across all subjects (whether Imagen, c-VEDA or Stratify),
- a check digit D made of a single digit, and obtained by applying the Damm algorithm to the concatenation of P and C, to detect invalid codes.
We make sure the Damerau–Levenshtein distance between the concatenation of C and D for any two subjects is at least 3, in order to mitigate the risk of manual input errors.
We ran Python script stratify_generate_psc2.py
as follows:
stratify_generate_psc2.py | sort > stratify_psc2_2017-07-28.txt
Create separate PSC1 and PSC2 files for each patient/control class, and shuffle PSC2 files so that the conversion table cannot be inferred:
grep -e '^0[19]0000' stratify_codes_2017-07-20.txt > controls_psc1.txt
grep -e '^0[19]0001' stratify_codes_2017-07-20.txt > patients_psc1.txt
grep -e '^0000' stratify_psc2_2017-07-28.txt | shuf > controls_psc2.txt
grep -e '^0001' stratify_psc2_2017-07-28.txt | shuf > patients_psc2.txt
Create the Stratify conversion table using dummy 000000
DAWBA codes for now, copy this new Stratify table at the end of the existing Imagen conversion table, and finally delete temporary files:
paste -d '=' controls_psc1.txt controls_psc2.txt | sed 's/=/=000000=/' > controls.txt
paste -d '=' patients_psc1.txt patients_psc2.txt | sed 's/=/=000000=/' > patients.txt
rm controls_psc1.txt patients_psc1.txt controls_psc2.txt patients_psc2.txt
cat controls.txt patients.txt | sort >> psc2psc.csv
rm controls.txt patients.txt