# Let’s Talk About Sets In Probability

Discussing the foundation of probability and combinatronics

Probability is a vital area of study to understand in order to be an effective data scientist. It may not be the most fun, but having an understanding of the math underlying all the amazing work your models do will allow you to better explain and better develop all of your models. In this post, I will specifically be talking about sets, and will be covering these topics:

• Defining what a set is
• Explaining universal sets and subsets
• Discussing the following set operations: unions, intersections, relative complements, and absolute complements

But first, what exactly is a set? Well, it’s generally described as a well-defined collection of objects. In mathematics, sets are usually represented by 𝑆. If you have an object X and it belongs to the set, then you would say that X ∈ 𝑆. But, if object X does not belong to the defined set, then you would say that X ∉ 𝑆. For example, if you define 𝑆 as a set of odd numbers, and if X= 1, then X ∈ 𝑆. …

# A Tutorial on Scraping Images from the Web Using BeautifulSoup

In the real world of data science, it’ll often be a task that we have to obtain some or all of our data. As much as we love to work with clean, organized datasets from Kaggle, that perfection is not always replicated in day-to-day tasks at work. That’s why knowing how to scrape data is a very valuable skill to possess, and today I’m going to demonstrate how to do just that with images, along with eventually displaying your image results in a Pandas DataFrame.

To start, I’m going to scrape from the website that I first learned to scrape images from, which is books.toscrape.com. This is a great site to practice all of your scraping skills on, not just image scraping. Now, the first thing you’ll want to do is import some necessary packages — BeautifulSoup and requests. …

# Investigating the JSON Module

## Exploring one of the most popular data formats in Python

JSON, which stands for JavaScript Object Notation, was created with the intention of helping make data transportation more efficient. It has done just that, and it is now the gold standard for data transfers on the web. In this blog post, I am going to go through how to utilize the data stored within JSON files in Python using the JSON module.

First, we want to load our JSON file. The file I’ll be going through in this post contains New York City campaign finance data from 2001. …

# 5 PEP8 Must-Remember Guidelines

## Follow these tips to create effective and legible code

Whenever you are writing code, there is one thing that must always be remembered when writing — it has to be readable. Just like when you had to hand write essays in school, the content of your essay meant nothing if your teacher couldn’t read it. Everyone who codes has their own unique styles and quirks that allow them to get things done. But, if you are going to open source share your work for anyone to see, it should be consistent in its appearance and readability. In this post, I will be sharing five tips to create organized and legible code when working in Python. …

# NoSQL Databases — The Solution to a Fast-Paced, Smartphone World

## What they are and why they’re useful

Relational databases are a foundational component of modern technology. They are everywhere nowadays, particularly since they have been around since the 1970s (Shoutout IBM! ✊). And they are everywhere because they are very reliable and fairly easy to access once you learn how to use their code. You can store, track, and analyze data all in one organized place. In most situations, a relational database is a great choice. However, as we live in the age of the internet and smartphones, there are some forms of data that aren’t a great fit for a traditional relational database. …

# How to Use SQL in Pandas

If you consider the structure of a Pandas DataFrame and the structure of a table from a SQL Database, they are structured very similarly. They both consist of data points, or values, with every row having a unique index and each column having a unique name. Because of this, SQL allows you to rapidly access the specific information you need for whatever project you are working on. But, very similar queries can be made using Pandas! In this blog post, I will show you how to do just that, along with explaining which library you’ll need to make it happen.

## .query()

When using SQL, obtaining the information we need is called querying the data. In Pandas, there is a built-in querying method that allows you to do the exact same thing, which is called .query(). This both saves time and makes your queries much more coherent in your code because you don’t have to use slicing syntax. For instance, a brief example to query data in Pandas using the .query()

# Do’s and Don’ts for Data Visualization

Making data visualizations is easy — but making effective visualizations is what will separate you.

Making data visualizations is an essential skill to have in the quest to becoming a well-rounded data scientist. With so many different types and styles to choose from, it can be easy sometimes to try and do too much when creating your visualizations. Unlike a lot of tasks in data science where you know your code will lead to a correct answer, there is no real “right” or “wrong” answer on how to display your data. But, with that being said, there are definitely some good habits you’ll want to develop in data visualization, along with some habits that you’ll want to avoid. …

# How to Use Pivot Tables In Pandas

A common Excel function made easier in Pandas.

In this short blog post, I will teach you about the different ways you can structure and index your dataset(s) to make it simpler to process or comprehend. The three goals of this blog post are:

• Show the difference between a wide dataframe and a long dataframe.
• Compare simple, flattened index structures and multi-hierarchical index structures.
• Show how to make them yourself utilizing aggregation functions and pivot tables.

## Long vs. Wide

These two distinct arrangements allude to how you can structure your data in a dataframe. In the wide arrangement, every column equals a variable and each row equals a data point. The index is normally an integer, with 0 being the first row. …

# The Python Data Visualization Essentials

Going over the 3 most common ways to visualize your data in Python.

Data visualization is a crucial aspect to being a complete and successful data scientist. Manipulating and analyzing data can only go so far if you can’t properly display what your solution is. While super cool or fancy visualizations can be a mind-blowing visual experience, ultimately your objective as a data scientist is to display your findings in a way that is clear and to the point. So sometimes sticking to and mastering the basics is the best way to go. …

# Conditional Statements: The Key to High-Level, Streamlined Code in Python

## An important aspect to mastering your control flow

When someone starts to learn how to code, the initial philosophy is often to write your code with each line doing its own individual step, one step at a time. If you’re doing small tasks or just practicing your technique, following that kind of idea can be a good place to start. However, in the world of data science & analysis, projects will get more complex, deadlines will get shorter, and the pressure will start to intensify. With all of that in mind, there will be a necessity for your code to be smooth and streamlined — you can’t have clunky or choppy code that is slow and inefficient. …