Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Naming Conventions and Typing #3

Open
miili opened this issue Nov 3, 2022 · 8 comments
Open

General Naming Conventions and Typing #3

miili opened this issue Nov 3, 2022 · 8 comments

Comments

@miili
Copy link
Member

miili commented Nov 3, 2022

Hello all,

first off, thank you for this first draft.

I know, I know, Fortran once had a limitation of 6 chars for any variable.
Please consider using clear self-explaining variables:

DASFileVersion  -> version
domain          -> data_unit
t0              -> start_time
dt              -> sampling_period
GL              -> gauge_length
lats            -> latitudes
longs           -> longitudes
elev            -> elevations
meta            -> additional_data
@miili miili changed the title General Naming Conventions General Naming Conventions and Typing Nov 3, 2022
@miili
Copy link
Member Author

miili commented Nov 3, 2022

Please use confined dataclasses (https://docs.python.org/3/library/dataclasses.html) for the meta data. Here is a proposal:

from dataclasses import dataclass
from typing import Any, Literal

StrainUnit = Literal["m/m", "cm/m", "nm/m"]


@dataclass
class DASMeta:
    version: int
    data_unit: StrainUnit
    start_time: float
    sampling_period: float
    gauge_length: float
    latitudes: list[float]
    longitudes: list[float]
    elevations: list[float]
    additional_data: dict[str, Any]

    def endtime(self, nsamples) -> float:
        ...

@andreas-wuestefeld
Copy link
Collaborator

I chose readability over efficiency.
In my experience, your suggestion of classes increase the barrier of entry. For many student this might be their first contact with programming.

I envision this as reference reader, not optimum super-duper high-class reader. It should help people understand the data format.

But I am open for arguments against such approach

@andreas-wuestefeld
Copy link
Collaborator

regarding variable names, I am just lazy typing :-)
I understand the argument for descriptive names

Let's see what the community thinks

@jpmorten-asn
Copy link

jpmorten-asn commented Nov 4, 2022

My preference is definitely on writing out the variable names using underscores to include spaces. This can avoid a lot of misunderstandings and makes it possible to discover the structure of the data even when documentation is not available (lost, or forgotten). I think one aim of the project was indeed to create a discoverable format.

@miili
Copy link
Member Author

miili commented Nov 4, 2022

In my experience, your suggestion of classes increase the barrier of entry. For many student this might be their first contact with programming.

@andreas-wuestefeld, if we are looking for a sustainable DAS data format we need an elaborate concept. Conceptualization of a data format is nothing for students or beginner programmers. We need performant I/O (layout) and efficient storage (compression).

I envision this as reference reader, not optimum super-duper high-class reader.

A sustainable data format should be super-duper efficient and versatile!

It should help people understand the data format.

A user does not need to understand a data format. All its complexity has to be abstracted away by a reference library. This is why e.g. ObsPy (libmseed) is so successful, libjpeg or libhdf5.

The fundamental question is whether we are looking for a serious DAS data format implemented by IRIS which can be used for

  1. performant data analysis,
  2. efficient archiving,
  3. possibly streaming and
  4. querying online repositories (similar to FDSNWS)

or a HDF5 structure for project-internal exchange in February.

@andreas-wuestefeld
Copy link
Collaborator

@miili I learned yesterday evening, in response to publishing this format, that IRIS is actually working on / considering a format
It may well be that this format is rather short-lived, although I hope it will prove its worth.

I thus changed the potentially misleading name from IRIS (as part of the IRIS RCN efforts) to more general miniDAS. The repro name is still the same but will be hopefully fixed over the weekend.

At this point, I feel it is most important to have a common format for the global month, ideal or not.
Your input is very good, and I am happy to hear these comments from someone obviously more familiar with the deep down programming features.

Maybe you can just point out the most easy-to-fix issues here to be implemented (space vs time for example?). variable names can obviously also be adjusted

@andreas-wuestefeld
Copy link
Collaborator

implemented.
comments on new names are welcome

@dcbowden
Copy link

dcbowden commented Dec 2, 2022

I'm a month late to these discussions. I agree with @andreas-wuestefeld that the formal object-oriented structure is going to be a bit harder for many of us academics to deal with; I also had to wrap my head around how to work with it. That said I agree with @miili that it could be OK, in that most academics & students don't need to worry about the internals! We just need some very user friendly demos before February. Maybe Jupyter Notebooks? Not just a README list of headers and function inputs/outputs, but a full step-by-step guide showing how to load some interrogator's raw output (Silixa, Febus, whatever), declare the metadata object, use the from_numpy() function to eventually save the proper output, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants