Practicalities of data handling
Slides for the "Practicalities of data handling" session of the "Research Data Management and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Scientific topics: Data management, Data acquisition, Data security
Operations: Data handling
Keywords: data encryption, data protection, setup, checksums, Data transfer, Data storage, data organisation
Resource type: Presentation
Practicalities of data handling
https://doi.org/10.5281/zenodo.4068227
http://tess.elixir-uk.org/materials/practicalities-of-data-handling
Slides for the "Practicalities of data handling" session of the "Research Data Management and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Vilem Ded
Pinar Alper
Roland Krause
Nene Barry
Data management
Data acquisition
Data security
data encryption, data protection, setup, checksums, Data transfer, Data storage, data organisation
Researchers
PhD
Master students
Reproducible analysis
Slides for the "Reproducible analysis" session of the "Best practices in research data management and stewardship" held regularly by ELIXIR-Luxembourg.
Scientific topics: Data architecture, analysis and design
Keywords: Workflows, Literate programming, Reproducible Science, Data analysis
Resource type: Presentation
Reproducible analysis
https://doi.org/10.5281/zenodo.4071505
http://tess.elixir-uk.org/materials/reproducible-analysis
Slides for the "Reproducible analysis" session of the "Best practices in research data management and stewardship" held regularly by ELIXIR-Luxembourg.
Roland Krause
Pinar Alper
Vilem Ded
Data architecture, analysis and design
Workflows, Literate programming, Reproducible Science, Data analysis
PhD candidates
Researchers
Data Management Planning
Slides for the Data Management Plannning session of the "Research Data Manegement and Stewardship Training" held regularly by ELIXIR-Luxembourg
Keywords: Data management plan
Resource type: Presentation
Data Management Planning
https://doi.org/10.5281/zenodo.3609525
http://tess.elixir-uk.org/materials/data-management-planning
Slides for the Data Management Plannning session of the "Research Data Manegement and Stewardship Training" held regularly by ELIXIR-Luxembourg
Pinar Alper
Data management plan
Researchers
data steward / data manager
PhD candidates
Master students
Introduction to FAIR principles
Slides for the "Introduction to FAIR Principles" session of the "Research Data Manegement and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Keywords: FAIR data
Resource type: Presentation
Introduction to FAIR principles
https://doi.org/10.5281/zenodo.3609967
http://tess.elixir-uk.org/materials/introduction-to-fair-principles
Slides for the "Introduction to FAIR Principles" session of the "Research Data Manegement and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Pinar Alper
Roland Krause
FAIR data
Researchers
data steward / data manager
PhD candidates
Master students
Implementing GDPR in a Biomedical Research Institute
Slides for the "Data protection, working with sensitive human data in research." session of the "Research Data Management and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Scientific topics: Data management, Data governance
Keywords: GDPR, data protection, sensitive data
Implementing GDPR in a Biomedical Research Institute
https://doi.org/10.5281/zenodo.3609974
http://tess.elixir-uk.org/materials/implementing-gdpr-in-a-biomedical-research-institute
Slides for the "Data protection, working with sensitive human data in research." session of the "Research Data Management and Stewardship Training" held regularly by ELIXIR-Luxembourg.
Pinar Alper
Regina Becker
Vilem Ded
Data management
Data governance
GDPR, data protection, sensitive data
Researchers
data stewards
data managers
ELIXIR Webinar: Requirements in data protection law and the upcoming General Data Protection Regulation (GDPR) implementation
Regina Becker (ELIXIR Luxembourg) will give an introduction into the upcoming General Data Protection Legislation (GDPR) that will come into force on 25 May 2018. The new regulation has considerable consequences for managing and storing personal data, imposing more responsibility on the data...
Keywords: GDPR
Resource type: Webinar
ELIXIR Webinar: Requirements in data protection law and the upcoming General Data Protection Regulation (GDPR) implementation
https://www.elixir-europe.org/events/webinar-gdpr
http://tess.elixir-uk.org/materials/elixir-webinar-requirements-in-data-protection-law-and-the-upcoming-general-data-protection-regulation-gdpr-implementation
Regina Becker (ELIXIR Luxembourg) will give an introduction into the upcoming General Data Protection Legislation (GDPR) that will come into force on 25 May 2018. The new regulation has considerable consequences for managing and storing personal data, imposing more responsibility on the data controllers and data processors.
During the webinar, relevant parts of the legislation for research and data sharing will be highlighted. Practical consequences for research institutions and in particular ELIXIR Nodes will be explained, and measures outlined that should be implemented for compliance. In addition, challenges and open questions will be addressed and suggestions made on how to jointly tackle these challenges in ELIXIR.
This webinar is of interest for data stewards, Technical Coordinators, and everyone interested in the requirements for processing data for research under the GDPR.
Regina Becker
GDPR
Python for Social Science Data: Instructor Notes
PIP is referred to in the text but it shouldn’t need to be used. It is assumed that Jupyter notebooks will be used for all of the coding. (The shell is used in explaining REPL) How to start Jupyter is included in the setup instructions. All of the datasets used have been placed in the data...
Python for Social Science Data: Instructor Notes
http://datacarpentry.github.io/python-socialsci/guide/
http://tess.elixir-uk.org/materials/python-for-social-science-data-instructor-notes
PIP is referred to in the text but it shouldn’t need to be used. It is assumed that Jupyter notebooks will be used for all of the coding. (The shell is used in explaining REPL) How to start Jupyter is included in the setup instructions. All of the datasets used have been placed in the data folder. They should be downloaded to the local machine before use.
R for Social Scientists: Instructor Notes
This lesson uses SAFI_clean.csv. The direct download link for this file is:
https://ndownloader.figshare.com/files/11492171. When time comes in the lesson to use this file, we recommend that the
instructors, place the download.file() command in the Etherpad, and that the
learners copy and paste...
R for Social Scientists: Instructor Notes
http://datacarpentry.github.io/r-socialsci/guide/
http://tess.elixir-uk.org/materials/r-for-social-scientists-instructor-notes
This lesson uses SAFI_clean.csv. The direct download link for this file is:
https://ndownloader.figshare.com/files/11492171. When time comes in the lesson to use this file, we recommend that the
instructors, place the download.file() command in the Etherpad, and that the
learners copy and paste it in their scripts to download the file directly from
figshare in their working directory. . If the learners haven’t created the
data/ directory and/or are not in the correct working directory, the
download.file command will produce an error. Therefore, it is important to use
the stickies at this point. Some learners may have previous R installations. On Mac, if a new install is
performed, the learner’s system will create a symbolic link, pointing to the new
install as ‘Current.’ Sometimes this process does not occur, and, even though a
new R is installed and can be accessed via the R console, RStudio does not find
it. The net result of this is that the learner’s RStudio will be running an
older R install. This will cause package installations to fail. This can be
fixed at the terminal. First, check for the appropriate R installation in the
library; We are currently using R 3.x.y If it isn’t there, they will need to install it.
If it is present, you will need to set the symbolic link to Current to point to
the 3.x.y directory: Then restart RStudio.
Data Organization in Spreadsheets for Social Scientists: Instructor Notes
The challenge with this lesson is that the instructor’s version of the spreadsheet software is going to look different than about half the room’s. It makes
it challenging to show where you can find menu options and navigate through. Instead discuss the concepts of quality control, and how things...
Data Organization in Spreadsheets for Social Scientists: Instructor Notes
http://datacarpentry.github.io/spreadsheets-socialsci/guide/
http://tess.elixir-uk.org/materials/data-organization-in-spreadsheets-for-social-scientists-instructor-notes
The challenge with this lesson is that the instructor’s version of the spreadsheet software is going to look different than about half the room’s. It makes
it challenging to show where you can find menu options and navigate through. Instead discuss the concepts of quality control, and how things like sorting can help you find outliers in your data. Provide information on setting up your environment for learners to view your
live coding (increasing text size, changing text color, etc), as well as
general recommendations for working with coding tools to best suit the
learning environment. The main challenge with this lesson is that Excel looks very different and how you
do things is even different between Mac and PC, and between different versions of
Excel. So, the presenter’s environment will only be the same as some of the learners. We need better notes and screenshots of how things work on both Mac and PC. But we
likely won’t be able to cover all the different versions of Excel.
OpenRefine for Social Science Data: Instructor Notes
There is a separate file for the setup instructions for installing OpenRefine
(setup). Introduction Working with OpenRefine Filtering and Sorting Examining Numbers in OpenRefine
OpenRefine for Social Science Data: Instructor Notes
http://datacarpentry.github.io/openrefine-socialsci/guide/
http://tess.elixir-uk.org/materials/openrefine-for-social-science-data-instructor-notes
There is a separate file for the setup instructions for installing OpenRefine
(setup). Introduction Working with OpenRefine Filtering and Sorting Examining Numbers in OpenRefine
Cloud Genomics: Instructor NotesCloud Genomics Pre-WorkshopDuring the workshop
VM Image Directories
A high-level listing of the directory tree from the dcuser account is shown below. Please note that is may be subject to change over time, but we’ll try to remember to update this doc. We had a couple instances die as we were going through our workshop.
Cloud Genomics: Instructor NotesCloud Genomics Pre-WorkshopDuring the workshop
http://datacarpentry.github.io/cloud-genomics/guide/
http://tess.elixir-uk.org/materials/cloud-genomics-instructor-notescloud-genomics-pre-workshopduring-the-workshop
VM Image Directories
A high-level listing of the directory tree from the dcuser account is shown below. Please note that is may be subject to change over time, but we’ll try to remember to update this doc. We had a couple instances die as we were going through our workshop.
Shell Genomics: Instructor Notes
This lesson will introduce learners to fundamental skills needed for working with their computers through a command-line interface (using
the bash shell). They will learn how to navigate their file system, computationally manipulate their files (e.g. copying, moving, renaming), search files,...
Shell Genomics: Instructor Notes
http://datacarpentry.github.io/shell-genomics/guide/
http://tess.elixir-uk.org/materials/shell-genomics-instructor-notes
This lesson will introduce learners to fundamental skills needed for working with their computers through a command-line interface (using
the bash shell). They will learn how to navigate their file system, computationally manipulate their files (e.g. copying, moving, renaming), search files, redirect output and write shell scripts. By the end of the lesson, learners will be prepared to move on to using more advanced bioinformatic command line tools (see the lesson on Data Wrangling and Processing). This lesson is meant to be taught in its entirety. For novice learners, schedule around 4 hours for this lesson. If your learners are
already somewhat familiar with the bash shell, the earlier episodes can be condensed. This lesson uses data hosted on an Amazon Machine Instance (AMI). Instructors will be sent information on how to log-in to the AMI by the workshop coordinator a few days before the workshop. If you are running a self-organized workshop, register the workshop with our self-organized workshop form and send us an email at mailto:team@datacarpentry.org with information on how many people you expect to have at the workshop, and we’ll start instances for you to use in the workshop. The day before the workshop, we’ll send you the login information for your learners. Learners will work through an Amazon Web Service (AWS) instance for this lesson. The workshop coordinator will set up AWS instances for
your workshop a few days ahead of time. Put the links for all instances on your workshop Etherpad and have learners put their name next
to the instance they will use. This prevents learners from accidentally messing up another learner’s filesystem. The workshop coordinator usually sets up more AWS instances than needed for the registered learners.
If a learner accidentally deletes or overwrites data files, you can have them change to a different AWS instance.
Genomics Organization: Instructor Notes
Discussions can happen between neighbors in a workshop. Then after paired discussion there can be a short general discussion of the types of things that came up in the discussion. You could also have people enter responses to the discussion in the workshop etherpad. Or capture the general...
Genomics Organization: Instructor Notes
http://datacarpentry.github.io/organization-genomics/guide/
http://tess.elixir-uk.org/materials/genomics-organization-instructor-notes
Discussions can happen between neighbors in a workshop. Then after paired discussion there can be a short general discussion of the types of things that came up in the discussion. You could also have people enter responses to the discussion in the workshop etherpad. Or capture the general responses in that Etherpad. That etherpad is then a resource for learners after the workshop.
Genomics WorkshopWorkshop OverviewTeaching Platform
This lesson assumes no prior experience with the tools covered in the workshop.
However, learners are expected to have some familiarity with biological concepts,
including nucleotide abbreviations and the
concept of genomic variation within a population. Participants should bring their laptops...
Genomics WorkshopWorkshop OverviewTeaching Platform
http://datacarpentry.github.io/genomics-workshop/
http://tess.elixir-uk.org/materials/genomics-workshopworkshop-overviewteaching-platform
This lesson assumes no prior experience with the tools covered in the workshop.
However, learners are expected to have some familiarity with biological concepts,
including nucleotide abbreviations and the
concept of genomic variation within a population. Participants should bring their laptops and plan to participate actively. To get started, follow the directions in the Setup tab to
get access to the required software and data for this workshop. Please note that workshop materials for working with Genomics data in R are under development and will become available
in June 2018. This workshop uses data from a long term evolution experiment published in 2012: Genomic analysis of a key innovation in an experimental Escherichia coli population by Blount ZD, Barrick JE, Davidson CJ, and Lenski RE. (doi: 10.1038/nature11514) More information about these data will be presented in the first lesson of the workshop.
Ecology Workshop Overview
There are no pre-requisites, and the materials assume no prior knowledge about the tools. The data for this workshop are is the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse. The Portal Project Teaching Database is a simplified version of the...
Ecology Workshop Overview
http://datacarpentry.github.io/ecology-workshop/
http://tess.elixir-uk.org/materials/ecology-workshop-overview
There are no pre-requisites, and the materials assume no prior knowledge about the tools. The data for this workshop are is the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse. The Portal Project Teaching Database is a simplified version of the Portal
Project Database designed for teaching. It is a tabular dataset of observations
of small mammals in a desert ecosystem in Arizona, USA, collected over more than 40 years.
It provides a real world example of
life-history, population, and ecological data, with sufficient complexity to
teach many aspects of data analysis and management, but with many complexities
removed to allow students to focus on the core ideas and skills being taught. More information on this dataset The workshop can be taught using R or Python as the base language.
Instructor notes
This lesson uses mostly combined.csv. The 3 other csv files: plots.csv, species.csv and surveys.csv are only needed for the lesson on databases. combined.csv is downloaded directly in the chapter “Starting with Data” and does not need to be downloaded before hand. It however requires that there...
Instructor notes
http://datacarpentry.github.io/R-ecology-lesson/instructor-notes
http://tess.elixir-uk.org/materials/instructor-notes
This lesson uses mostly combined.csv. The 3 other csv files: plots.csv, species.csv and surveys.csv are only needed for the lesson on databases. combined.csv is downloaded directly in the chapter “Starting with Data” and does not need to be downloaded before hand. It however requires that there is a decent internet connection in the room where the workshop is being taught. To facilitate the download process, the chunk of code that includes the URL where the csv file lives, and where the file should go and be named is included in the code handout (see next paragraph). Using this approach ensures that the file will be where the lesson expects it to be, and teaches good/reproducible practice of automating the download. If the learners haven’t created the data/ directory and/or are not in the correct working directory, the download.file command will produce an error. Therefore, it is important to use the stickies at this point. The code handout (a link to download it is also available on the top bar of the lesson website) is useful for Data Carpentry workshops. It includes an outline of the lesson content, the text for the challenges, the links for the files that need to be downloaded for the lesson, and pieces of code that may be difficult to type for learners with no programming experience/who are unfamiliar with R’s syntax. We encourage you to distribute it to the learners at the beginning of the lesson. As an instructor, we encourage you to do the live coding directly in this file, so the participants can follow along. Some learners may have previous R installations. On Mac, if a new install is performed, the learner’s system will create a symbolic link, pointing to the new install as ‘Current.’ Sometimes this process does not occur, and, even though a new R is installed and can be accessed via the R console, RStudio does not find it. The net result of this is that the learner’s RStudio will be running an older R install. This will cause package installations to fail. This can be fixed at the terminal. First, check for the appropriate R installation in the library; We are currently using R 3.4.x. If it isn’t there, they will need to install it. If it is present, you will need to set the symbolic link to Current to point to the 3.4.x directory:
Python for Ecologists
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some...
Python for Ecologists
http://datacarpentry.github.io/python-ecology-lesson/
http://tess.elixir-uk.org/materials/python-for-ecologists
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python. Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to ensure the proper setup of tools for an efficient
workflow. These lessons assume no prior knowledge of the skills or tools. To get started, follow the directions in the “Setup” tab to
download data to your computer and follow any installation instructions. This lesson requires a working copy of Python.
To most effectively use these materials, please make sure to install
everything before working through this lesson.
2017-10-09
Python for Ecologists: Instructor NotesChallenge solutions
To install Homebrew, you need have xcode command line tools installed, from the terminal, type: then Run the following command to ensure Homebrew is installed properly: install Python 3:
Python for Ecologists: Instructor NotesChallenge solutions
http://datacarpentry.github.io/python-ecology-lesson/guide/
http://tess.elixir-uk.org/materials/python-for-ecologists-instructor-noteschallenge-solutions
To install Homebrew, you need have xcode command line tools installed, from the terminal, type: then Run the following command to ensure Homebrew is installed properly: install Python 3:
2017-10-09
SQL for Ecology: Instructor Notes
Note that the figshare download is an archive (.zip) file that rudely explodes all of the files into your current directory. See this slide deck as a sample intro for the lesson:
SQL Intro Deck Key points: If you’ve written up a diagram of the data analysis pipeline (raw data ->
clean data...
SQL for Ecology: Instructor Notes
http://datacarpentry.github.io/sql-ecology-lesson/guide/
http://tess.elixir-uk.org/materials/sql-for-ecology-instructor-notes
Note that the figshare download is an archive (.zip) file that rudely explodes all of the files into your current directory. See this slide deck as a sample intro for the lesson:
SQL Intro Deck Key points: If you’ve written up a diagram of the data analysis pipeline (raw data ->
clean data -> import and analyze -> results -> visualization), it can be
helpful to identify that you’re now somewhere between clean data and analysis. Tips
2017-10-09
Open Refine for Ecology: Instructor Notes
Note the file types OpenRefine handles: TSV, CSF, *SV, Excel (.xls .xlsx), JSON, XML, RDF as XML, Google Data documents. Support for other formats can be added with OpenRefine extensions. In this first step, we’ll browse our computer to the sample data file for this lesson (If you haven’t...
Open Refine for Ecology: Instructor Notes
http://datacarpentry.github.io/OpenRefine-ecology-lesson/guide/
http://tess.elixir-uk.org/materials/open-refine-for-ecology-instructor-noteslesson
Note the file types OpenRefine handles: TSV, CSF, *SV, Excel (.xls .xlsx), JSON, XML, RDF as XML, Google Data documents. Support for other formats can be added with OpenRefine extensions. In this first step, we’ll browse our computer to the sample data file for this lesson (If you haven’t already, download the data from:
https://ndownloader.figshare.com/files/7823341). In this case, I’ve modified the Portal_rodents.csv file. I added several columns: scientificName, locality, county, state, country and I generated several more columns in the lesson itself (JSON, decimalLatitude, decimalLongitude). Data in locality, county, country, JSON, decimalLatitude and decimalLongitude are contrived and are in no way related to the original dataset. Once OpenRefine is open, you’ll be asked if you want to Create, Open, or Import a Project. Exploring data by applying multiple filters OpenRefine supports faceted browsing as a mechanism for
2017-10-09
Data Organization in Spreadsheets: Instructor Notes
This lesson is optional The challenge with this lesson is that the instructor’s version of the spreadsheet software is going to look different than about half the room’s. It makes
it challenging to show where you can find menu options and navigate through. Instead discuss the concepts of quality...
Data Organization in Spreadsheets: Instructor Notes
http://datacarpentry.github.io/spreadsheet-ecology-lesson/guide/
http://tess.elixir-uk.org/materials/data-organization-in-spreadsheets-instructor-notes
This lesson is optional The challenge with this lesson is that the instructor’s version of the spreadsheet software is going to look different than about half the room’s. It makes
it challenging to show where you can find menu options and navigate through. Instead discuss the concepts of quality control, and how things like sorting can help you find outliers in your data. Provide information on setting up your environment for learners to view your
live coding (increasing text size, changing text color, etc), as well as
general recommendations for working with coding tools to best suit the
learning environment. The main challenge with this lesson is that Excel looks very different and how you
do things is even different between Mac and PC, and between different versions of
Excel. So, the presenter’s environment will only be the same as some of the learners.
2017-10-09
Data Carpentry Python for Ecologists
Data Carpentry's aim is to teach researchers basic concepts, skills,
and tools for working with data so that they can get more done in less
time, and with less pain. The lessons below were designed for those interested in
working with ecological data in Python. Data for this lesson is from the...
Data Carpentry Python for Ecologists
http://datacarpentry.github.io/python-ecology/
http://tess.elixir-uk.org/materials/data-carpentry-python-for-ecologists
Data Carpentry's aim is to teach researchers basic concepts, skills,
and tools for working with data so that they can get more done in less
time, and with less pain. The lessons below were designed for those interested in
working with ecological data in Python. Data for this lesson is from the Portal Project Teaching Database - available on FigShare. The data files used in this lesson are surveys.csv download link - https://ndownloader.figshare.com/files/2292172 and species.csv download link - https://ndownloader.figshare.com/files/3299483.
Requirements:
Data Carpentry's teaching is hands-on, so participants are encouraged to bring in and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop. (We will provide instructions on setting up the required software several days in advance, and the classroom will have computers with the software installed). There are no pre-requisites, and we will assume no prior knowledge about the tools. Participants are required to abide by Software Carpentry's
Code of Conduct.
Twitter: #datacarpentry
2016-03-07
Data Carpentry: R for data analysis and visualization of Ecological Data
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecology data in R. This is an introduction to R designed for...
Data Carpentry: R for data analysis and visualization of Ecological Data
http://datacarpentry.github.io/R-ecology-lesson/
http://tess.elixir-uk.org/materials/data-carpentry-r-for-data-analysis-for-ecology
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecology data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from R. Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools, but working through this lesson requires working copies of the software described below. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.
2017-10-09
SQL for Ecology
This lesson will teach you what relational databases are, how you can load data
into them and how you can query databases to extract just the information that you need. Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to insure the proper setup of...
SQL for Ecology
http://datacarpentry.github.io/sql-ecology-lesson/
http://tess.elixir-uk.org/materials/data-carpentry-sql-for-ecology
This lesson will teach you what relational databases are, how you can load data
into them and how you can query databases to extract just the information that you need. Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to insure the proper setup of tools for an efficient
workflow. These lessons assume no prior knowledge of the skills or tools. To get started, follow the directions in the “Setup” tab to
download data to your computer and follow any installation instructions. This lesson requires a working copy of DB Browser for SQLite for SQL.
To most effectively use these materials, please make sure to install
everything before working through this lesson. If you are teaching this lesson in a workshop, please see the
Instructor notes.
2017-10-09
Open Refine for Ecology
OpenRefine (formerly Google Refine) is a powerful free and open source tool
for working with messy
data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to effectively clean and format
data and automatically track any changes that you...
Open Refine for Ecology
http://datacarpentry.github.io/OpenRefine-ecology-lesson/
http://tess.elixir-uk.org/materials/data-carpentry-openrefine-for-ecology
OpenRefine (formerly Google Refine) is a powerful free and open source tool
for working with messy
data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to effectively clean and format
data and automatically track any changes that you make. Many people comment
that this tool saves them literally months of work trying to make these
edits by hand. Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to insure the proper setup of tools for an efficient
workflow. These lessons assume no prior knowledge of the skills or tools. To get started, follow the directions in the “Setup” tab to
download data to your computer and follow any installation instructions. This lesson requires a working copy of OpenRefine (also called
GoogleRefine).
To most effectively use these materials, please make sure to install
everything before working through this lesson.
2017-10-09
Data Organization in Spreadsheets
We organize data in spreadsheets in the ways that we as humans want to work with the data,
but computers require that data be organized in particular ways. In order
to use tools that make computation more efficient, such as programming
languages like R or Python, we need to structure our data...
Data Organization in Spreadsheets
http://datacarpentry.github.io/spreadsheet-ecology-lesson/
http://tess.elixir-uk.org/materials/data-carpentry-spreadsheets-for-ecology
We organize data in spreadsheets in the ways that we as humans want to work with the data,
but computers require that data be organized in particular ways. In order
to use tools that make computation more efficient, such as programming
languages like R or Python, we need to structure our data the way that
computers need the data. Since this is where most research projects start,
this is where we want to start too! In this lesson, you will learn: In this lesson, however, you will not learn about data analysis with spreadsheets.
Much of your time as a researcher will be spent in the initial ‘data wrangling’
stage, where you need to organize the data to perform a proper analysis later.
It’s not the most fun, but it is necessary. In this lesson you will
learn how to think about data organization and some practices for more
effective data wrangling. With this approach you can better format current data
and plan new data collection so less data wrangling is needed. Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to insure the proper setup of tools for an efficient
workflow. These lessons assume no prior knowledge of the skills or tools. To get started, follow the directions in the “Setup” tab to
download data to your computer and follow any installation instructions.
2017-10-09