Loading Events

« All Events

  • This event has passed.

Working with Messy Data

April 21, 2021 @ 6:00 am - 12:00 pm

This course will be offered via Zoom only.

-Summary:
When working with data, one thing is fairly certain: data is rarely in an optimal format. A misplaced space here, or an extra comma there, can mean the difference between two clicks and two hours of work. In this course, we will work with ways to isolate, extract, and transform data from webpages, text files, and published datasets using Python and Pandas. This class will also introduce regular expressions, a language for matching specific parts of text.

-Why take this course?
The tools for handling data are often complicated, and the associated learning curves are very steep. Over these classes, we will cover a range of techniques with industry standard data-processing tools. If you are curious about how to handle tabular data but feel intimidated by the prospect of programming, this class will get you started on the path towards better data management.

-What will participants learn?
Participants in this course will learn basic and intermediate Python programming and scripting as it pertains to importing and exporting data.. We will cover some of the libraries associated with mathematical and statistical analysis, as well as text processing using regular expressions.

-Prerequisites and requirements:
This course is intended for data scientists with basic-to-intermediate understanding of one or more of: the Python programming language, data import/export formats, text processing, and some statistical analysis. This class assumes that you will have a computer with a running installation of Python 3, a text editor that supports regular expressions, and a web browser with internet connectivity. We will be using the Anaconda Individual Edition for Python 3, and Sublime Text.

Recommended installations:
Anaconda for Python 3: https://www.anaconda.com/products/individual
Sublime Text: https://www.sublimetext.com/

Instructor Bio:
Brown Biggers is the IT Operations Manager for the UNC Greensboro University Libraries. He holds a master’s degree in computer science, and has over 18 years of systems and network management experience in academic, public, and private sectors. His current research interests include natural language processing, text mining, data visualization, and social media crisis analytics.

Registration Fees:
– UNC CH Students: $0, with a $25 deposit to hold your spot (deposit is refundable upon your attendance for at least 66% of the course)
– UNC CH Faculty/Staff/Postdoc: $40
– Non UNC CH: $40

Details

Date:
April 21, 2021
Time:
6:00 am - 12:00 pm
Website:
https://odum.unc.edu/event/working-with-messy-data-online/

Venue

online

Organizer

UNC Odum Institute for Research in Social Science
View Organizer Website

© 2024 Carolina Health Informatics Program - WordPress Theme by Kadence Themes