Chapter 4: Data Conversion

Chapter 4: Data Conversion Recipes

Data conversion is a fundamental skill in data science. You'll frequently need to transform data between different types and structures to prepare it for analysis. This chapter covers essential conversion patterns you'll use every day.

Converting Lists to Dictionaries

Converting between lists and dictionaries is one of the most common data transformation tasks. Let's explore several practical approaches.

Creating a Dictionary from Key-Value Pairs

The most straightforward way to create a dictionary is from a list of tuples:

This pattern is useful when you have paired data that needs to be looked up by key.

Zipping Two Lists into a Dictionary

When you have two separate lists that should be paired together, use zip():

The zip() function pairs elements from multiple iterables, making it perfect for creating dictionaries from parallel lists.

Creating a Dictionary with Default Values

Use dict.fromkeys() to initialize a dictionary where all keys start with the same value:

This is particularly useful when tracking state or initializing counters.

Converting Dictionaries to Lists

Extracting data from dictionaries into lists enables iteration, sorting, and other list-based operations.

Getting a List of Keys

The simplest conversion - just wrap the dictionary in list():

Getting Sorted Keys

Combine sorted() with dictionary keys for alphabetical ordering:

Getting a List of Values

Extract all values using the .values() method:

Getting Key-Value Pairs

Use .items() to get both keys and values as tuples:

Converting Dictionaries to Pandas DataFrames

Pandas DataFrames are the workhorse of data science. Converting dictionaries to DataFrames is a critical skill for data analysis.

Creating a DataFrame with the data Parameter

The most direct approach - pass a dictionary where keys become column names:

Using the from_dict() Method

An alternative syntax that's more explicit:

Creating DataFrames with Custom Index

Use orient='index' when your dictionary keys should become row indices:

Notice how the dictionary keys (0, 1, 3) become the row indices.

Assigning Column Names During Creation

Convert a simple dictionary to a DataFrame with explicit column names:

Converting Strings to Integers

Type conversion between strings and integers is essential for data cleaning and parsing.

Basic String to Integer Conversion

By default, int() assumes base 10:

Notice that leading zeros don't affect the value in base 10.

Converting Binary Strings (Base 2)

Parse binary numbers by specifying base 2:

Converting Octal Strings (Base 8)

Octal (base 8) uses digits 0-7:

Converting Hexadecimal Strings (Base 16)

Hexadecimal (base 16) uses digits 0-9 and letters A-F:

Converting Other Bases

The int() function supports any base from 2 to 36:

Converting Integers to Strings

Use str() to convert any number to its string representation:

Converting Between Hexadecimal, Binary, and Floats

More advanced conversions involve hexadecimal and binary representations of numbers.

Converting Floats to Strings

Simple string conversion preserves the decimal value:

The !r in the f-string shows the repr() representation, which includes quotes for strings.

Converting Strings to Floats

Use float() to parse numeric strings with decimals:

Converting Integers to Hexadecimal

The hex() function converts integers to hexadecimal strings:

The '0x' prefix indicates hexadecimal notation.

Converting Floats to Hexadecimal

Floats have a .hex() method that returns their hexadecimal representation:

This hexadecimal representation shows the exact binary representation of the float.

Converting Integers to Binary

The bin() function converts integers to binary strings:

Working with Binary Data

Python has special types for working with binary data - bytes and bytestrings.

Creating Bytes Literals

Bytes literals are similar to strings but limited to ASCII characters:

Encoding Strings to Base64

Base64 encoding converts binary data to ASCII text:

Base64 is commonly used for transmitting binary data over text-based protocols.

Decoding Base64 to Bytes

Reverse the encoding with b64decode():

Converting Bytes to Strings

Use the .decode() method to convert bytes to strings:

Converting Strings to Bytes

Use the .encode() method to convert strings to bytes:

Practical Data Conversion Examples

Let's put these conversions together in real-world scenarios.

Parsing Configuration Data

Building DataFrames from Multiple Sources

Number System Conversions

Key Takeaways

  1. List to Dict: Use dict() with tuples, zip() for parallel lists, or fromkeys() for defaults
  2. Dict to List: Use list() for keys, .values() for values, .items() for pairs
  3. Dict to DataFrame: Use DataFrame(data=dict) or DataFrame.from_dict() with orient parameter
  4. String to Int: Use int(string, base) where base defaults to 10
  5. Number Conversions: Use hex(), bin(), oct() for integers; .hex() for floats
  6. Binary Data: Use bytes literals (b""), base64 encoding, and .encode()/.decode()

These conversion patterns form the foundation of data preprocessing in Python. Master them, and you'll be able to transform data between any format needed for your analysis.

Quiz

Further Reading

📝 Test Your Knowledge: Chapter 4: Data Conversion

Take this quiz to reinforce what you've learned in this chapter.