-
Notifications
You must be signed in to change notification settings - Fork 6
Code and data
Aaditya Dar edited this page Jul 17, 2019
·
7 revisions
The Zen of Python, by Tim Peters: "Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down."
- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren't special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one-- and preferably only one --obvious way to do it.
- Although that way may not be obvious at first unless you're Dutch.
- Now is better than never.
- Although never is often better than right now.
- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Namespaces are one honking great idea -- let's do more of those!
Some more rules copied from https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf
-
Automation
- Automate everything that can be automated
- Write a single script that executes all code from beginning to end
-
Directories
- Separate directories by function
- Separate files into inputs and outputs
- Make directories portable
-
Keys
- Store cleaned data in tables with unique, non-missing keys
- Keep data normalized as far into your code pipeline as you can
-
Abstraction
- Abstract to eliminate redundancy
- Abstract to improve clarity
- Otherwise, don’t abstract
-
Documentation
- Don’t write documentation you will not maintain
- Code should be self-documenting
-
Management
- Manage tasks with a task management system
- E-mail is not a task management system
-
Code Style
- Make your functions shy
- Order your functions for linear reading
- Use descriptive names
- Pay special attention to coding algebra
- Make logical switches intuitive
- Be consistent
- Check for errors
Source: Code and Data for the Social Sciences: A Practitioner's Guide
-
Computing practices from https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
- Data management
- Save the raw data.
- Ensure that raw data are backed up in more than one location.
- Create the data you wish to see in the world.
- Create analysis-friendly data.
- Record all the steps used to process data.
- Anticipate the need to use multiple tables, and use a unique identifier for every record.
- Submit data to a reputable DOI-issuing repository so that others can access and cite it.
- Software
- Place a brief explanatory comment at the start of every program.
- Decompose programs into functions.
- Be ruthless about eliminating duplication.
- Always search for well-maintained software libraries that do what you need.
- Test libraries before relying on them.
- Give functions and variables meaningful names.
- Make dependencies and requirements explicit.
- Do not comment and uncomment sections of code to control a program's behavior.
- Provide a simple example or test data set.
- Submit code to a reputable DOI-issuing repository.
- Collaboration
- Create an overview of your project.
- Create a shared "to-do" list for the project.
- Decide on communication strategies.
- Make the license explicit.
- Make the project citable.
- Project organization
- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the doc directory.
- Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory.
- Put project source code in the src directory.
- Put external scripts or compiled programs in the bin directory.
- Name all files to reflect their content or function.
- Keeping track of changes
- Back up (almost) everything created by a human being as soon as it is created.
- Keep changes small.
- Share changes frequently.
- Create, maintain, and use a checklist for saving and sharing changes to the project.
- Store each project in a folder that is mirrored off the researcher's working machine.
- Add a file called CHANGELOG.txt to the project's docs subfolder.
- Copy the entire project whenever a significant change has been made.
- Use a version control system.
- Manuscripts
- Write manuscripts using online tools with rich formatting, change tracking, and reference management.
- Write the manuscript in a plain text format that permits version control.
- Data management
- Research and professional ethics
- Getting started
- Text editor
- Organization
- Stata
- Python
- GIS
- Git and GitHub
- LaTeX
- Workflow
- More