Chapter 4: Data Conversion
Chapter 4: Data Conversion Recipes
Data conversion is a fundamental skill in data science. You'll frequently need to transform data between different types and structures to prepare it for analysis. This chapter covers essential conversion patterns you'll use every day.
Converting Lists to Dictionaries
Converting between lists and dictionaries is one of the most common data transformation tasks. Let's explore several practical approaches.
Creating a Dictionary from Key-Value Pairs
The most straightforward way to create a dictionary is from a list of tuples:
This pattern is useful when you have paired data that needs to be looked up by key.
Zipping Two Lists into a Dictionary
When you have two separate lists that should be paired together, use zip():
The zip() function pairs elements from multiple iterables, making it perfect for creating dictionaries from parallel lists.
Creating a Dictionary with Default Values
Use dict.fromkeys() to initialize a dictionary where all keys start with the same value:
This is particularly useful when tracking state or initializing counters.
Converting Dictionaries to Lists
Extracting data from dictionaries into lists enables iteration, sorting, and other list-based operations.
Getting a List of Keys
The simplest conversion - just wrap the dictionary in list():
Getting Sorted Keys
Combine sorted() with dictionary keys for alphabetical ordering:
Getting a List of Values
Extract all values using the .values() method:
Getting Key-Value Pairs
Use .items() to get both keys and values as tuples:
Converting Dictionaries to Pandas DataFrames
Pandas DataFrames are the workhorse of data science. Converting dictionaries to DataFrames is a critical skill for data analysis.
Creating a DataFrame with the data Parameter
The most direct approach - pass a dictionary where keys become column names:
Using the from_dict() Method
An alternative syntax that's more explicit:
Creating DataFrames with Custom Index
Use orient='index' when your dictionary keys should become row indices:
Notice how the dictionary keys (0, 1, 3) become the row indices.
Assigning Column Names During Creation
Convert a simple dictionary to a DataFrame with explicit column names:
Converting Strings to Integers
Type conversion between strings and integers is essential for data cleaning and parsing.
Basic String to Integer Conversion
By default, int() assumes base 10:
Notice that leading zeros don't affect the value in base 10.
Converting Binary Strings (Base 2)
Parse binary numbers by specifying base 2:
Converting Octal Strings (Base 8)
Octal (base 8) uses digits 0-7:
Converting Hexadecimal Strings (Base 16)
Hexadecimal (base 16) uses digits 0-9 and letters A-F:
Converting Other Bases
The int() function supports any base from 2 to 36:
Converting Integers to Strings
Use str() to convert any number to its string representation:
Converting Between Hexadecimal, Binary, and Floats
More advanced conversions involve hexadecimal and binary representations of numbers.
Converting Floats to Strings
Simple string conversion preserves the decimal value:
The !r in the f-string shows the repr() representation, which includes quotes for strings.
Converting Strings to Floats
Use float() to parse numeric strings with decimals:
Converting Integers to Hexadecimal
The hex() function converts integers to hexadecimal strings:
The '0x' prefix indicates hexadecimal notation.
Converting Floats to Hexadecimal
Floats have a .hex() method that returns their hexadecimal representation:
This hexadecimal representation shows the exact binary representation of the float.
Converting Integers to Binary
The bin() function converts integers to binary strings:
Working with Binary Data
Python has special types for working with binary data - bytes and bytestrings.
Creating Bytes Literals
Bytes literals are similar to strings but limited to ASCII characters:
Encoding Strings to Base64
Base64 encoding converts binary data to ASCII text:
Base64 is commonly used for transmitting binary data over text-based protocols.
Decoding Base64 to Bytes
Reverse the encoding with b64decode():
Converting Bytes to Strings
Use the .decode() method to convert bytes to strings:
Converting Strings to Bytes
Use the .encode() method to convert strings to bytes:
Practical Data Conversion Examples
Let's put these conversions together in real-world scenarios.
Parsing Configuration Data
Building DataFrames from Multiple Sources
Number System Conversions
Key Takeaways
- List to Dict: Use
dict()with tuples,zip()for parallel lists, orfromkeys()for defaults - Dict to List: Use
list()for keys,.values()for values,.items()for pairs - Dict to DataFrame: Use
DataFrame(data=dict)orDataFrame.from_dict()with orient parameter - String to Int: Use
int(string, base)where base defaults to 10 - Number Conversions: Use
hex(),bin(),oct()for integers;.hex()for floats - Binary Data: Use bytes literals (
b""), base64 encoding, and.encode()/.decode()
These conversion patterns form the foundation of data preprocessing in Python. Master them, and you'll be able to transform data between any format needed for your analysis.
Quiz
Further Reading
📝 Test Your Knowledge: Chapter 4: Data Conversion
Take this quiz to reinforce what you've learned in this chapter.