
Picture by Creator
# Introduction
Working with JSON in Python is commonly difficult. The fundamental json.masses() solely will get you to date.
API responses, configuration information, and information exports usually comprise JSON that’s messy or poorly structured. It is advisable to flatten nested objects, safely extract values with out KeyError exceptions, merge a number of JSON information, or convert between JSON and different codecs. These duties come up continuously in net scraping, API integration, and information processing. This text walks you thru 5 sensible capabilities for dealing with frequent JSON parsing and processing duties.
You could find the code for these capabilities on GitHub.
# 1. Safely Extracting Nested Values
JSON objects usually nest a number of ranges deep. Accessing deeply nested values with bracket notation will get difficult quick. If any secret is lacking, you get a KeyError.
Here’s a perform that permits you to entry nested values utilizing dot notation, with a fallback for lacking keys:
def get_nested_value(information, path, default=None):
"""
Safely extract nested values from JSON utilizing dot notation.
Args:
information: Dictionary or JSON object
path: Dot-separated string like "person.profile.electronic mail"
default: Worth to return if path does not exist
Returns:
The worth on the path, or default if not discovered
"""
keys = path.break up('.')
present = information
for key in keys:
if isinstance(present, dict):
present = present.get(key)
if present is None:
return default
elif isinstance(present, checklist):
attempt:
index = int(key)
present = present[index]
besides (ValueError, IndexError):
return default
else:
return default
return present
Let’s check it with a posh nested construction:
# Pattern JSON information
user_data = {
"person": {
"id": 123,
"profile": {
"title": "Allie",
"electronic mail": "allie@instance.com",
"settings": {
"theme": "darkish",
"notifications": True
}
},
"posts": [
{"id": 1, "title": "First Post"},
{"id": 2, "title": "Second Post"}
]
}
}
# Extract values
electronic mail = get_nested_value(user_data, "person.profile.electronic mail")
theme = get_nested_value(user_data, "person.profile.settings.theme")
first_post = get_nested_value(user_data, "person.posts.0.title")
lacking = get_nested_value(user_data, "person.profile.age", default=25)
print(f"E mail: {electronic mail}")
print(f"Theme: {theme}")
print(f"First put up: {first_post}")
print(f"Age (default): {lacking}")
Output:
E mail: allie@instance.com
Theme: darkish
First put up: First Publish
Age (default): 25
The perform splits the trail string on dots and walks by way of the info construction one key at a time. At every degree, it checks if the present worth is a dictionary or a listing. For dictionaries, it makes use of .get(key), which returns None for lacking keys as a substitute of elevating an error. For lists, it tries to transform the important thing to an integer index.
The default parameter supplies a fallback when any a part of the trail doesn’t exist. This prevents your code from crashing when coping with incomplete or inconsistent JSON information from APIs.
This sample is particularly helpful when processing API responses the place some fields are non-compulsory or solely current beneath sure situations.
# 2. Flattening Nested JSON into Single-Degree Dictionaries
Machine studying fashions, CSV exports, and database inserts usually want flat information buildings. However API responses and configuration information use nested JSON. Changing nested objects to flat key-value pairs is a standard activity.
Here’s a perform that flattens nested JSON with customizable separators:
def flatten_json(information, parent_key='', separator="_"):
"""
Flatten nested JSON right into a single-level dictionary.
Args:
information: Nested dictionary or JSON object
parent_key: Prefix for keys (utilized in recursion)
separator: String to hitch nested keys
Returns:
Flattened dictionary with concatenated keys
"""
gadgets = []
if isinstance(information, dict):
for key, worth in information.gadgets():
new_key = f"{parent_key}{separator}{key}" if parent_key else key
if isinstance(worth, dict):
# Recursively flatten nested dicts
gadgets.prolong(flatten_json(worth, new_key, separator).gadgets())
elif isinstance(worth, checklist):
# Flatten lists with listed keys
for i, merchandise in enumerate(worth):
list_key = f"{new_key}{separator}{i}"
if isinstance(merchandise, (dict, checklist)):
gadgets.prolong(flatten_json(merchandise, list_key, separator).gadgets())
else:
gadgets.append((list_key, merchandise))
else:
gadgets.append((new_key, worth))
else:
gadgets.append((parent_key, information))
return dict(gadgets)
Now let’s flatten a posh nested construction:
# Advanced nested JSON
product_data = {
"product": {
"id": 456,
"title": "Laptop computer",
"specs": {
"cpu": "Intel i7",
"ram": "16GB",
"storage": {
"sort": "SSD",
"capability": "512GB"
}
},
"critiques": [
{"rating": 5, "comment": "Excellent"},
{"rating": 4, "comment": "Good value"}
]
}
}
flattened = flatten_json(product_data)
for key, worth in flattened.gadgets():
print(f"{key}: {worth}")
Output:
product_id: 456
product_name: Laptop computer
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Wonderful
product_reviews_1_rating: 4
product_reviews_1_comment: Good worth
The perform makes use of recursion to deal with arbitrary nesting depth. When it encounters a dictionary, it processes every key-value pair, increase the flattened key by concatenating mum or dad keys with the separator.
For lists, it makes use of the index as a part of the important thing. This allows you to protect the order and construction of array parts within the flattened output. The sample reviews_0_rating tells you that is the ranking from the primary evaluate.
The separator parameter allows you to customise the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys relying in your wants.
This perform is especially helpful when it’s essential to convert JSON API responses into dataframes or CSV rows the place every column wants a novel title.
# 3. Deep Merging A number of JSON Objects
Configuration administration usually requires merging a number of JSON information containing default settings, environment-specific configs, person preferences, and extra. A easy dict.replace() solely handles the highest degree. You want deep merging that recursively combines nested buildings.
Here’s a perform that deep merges JSON objects:
def deep_merge_json(base, override):
"""
Deep merge two JSON objects, with override taking priority.
Args:
base: Base dictionary
override: Dictionary with values to override/add
Returns:
New dictionary with merged values
"""
consequence = base.copy()
for key, worth in override.gadgets():
if key in consequence and isinstance(consequence[key], dict) and isinstance(worth, dict):
# Recursively merge nested dictionaries
consequence[key] = deep_merge_json(consequence[key], worth)
else:
# Override or add the worth
consequence[key] = worth
return consequence
Let’s attempt merging pattern configuration information:
import json
# Default configuration
default_config = {
"database": {
"host": "localhost",
"port": 5432,
"timeout": 30,
"pool": {
"min": 2,
"max": 10
}
},
"cache": {
"enabled": True,
"ttl": 300
},
"logging": {
"degree": "INFO"
}
}
# Manufacturing overrides
prod_config = {
"database": {
"host": "prod-db.instance.com",
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"ttl": 600
},
"monitoring": {
"enabled": True
}
}
merged = deep_merge_json(default_config, prod_config)
print(json.dumps(merged, indent=2))
Output:
{
"database": {
"host": "prod-db.instance.com",
"port": 5432,
"timeout": 30,
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"enabled": true,
"ttl": 600
},
"logging": {
"degree": "INFO"
},
"monitoring": {
"enabled": true
}
}
The perform recursively merges nested dictionaries. When each the bottom and override comprise dictionaries on the similar key, it merges these dictionaries as a substitute of changing them solely. This preserves values that aren’t explicitly overridden.
Discover how database.port and database.timeout stay from the default configuration, whereas database.host will get overridden. The pool settings merge on the nested degree, so min and max each get up to date.
The perform additionally provides new keys that don’t exist within the base config, just like the monitoring part within the manufacturing override.
You’ll be able to chain a number of merges to layer configurations:
final_config = deep_merge_json(
deep_merge_json(default_config, prod_config),
user_preferences
)
This sample is frequent in utility configuration the place you could have defaults, environment-specific settings, and runtime overrides.
# 4. Filtering JSON by Schema or Whitelist
APIs usually return extra information than you want. Giant JSON responses make your code tougher to learn. Typically you solely need particular fields, or it’s essential to take away delicate information earlier than logging.
Here’s a perform that filters JSON to maintain solely specified fields:
def filter_json(information, schema):
"""
Filter JSON to maintain solely fields laid out in schema.
Args:
information: Dictionary or JSON object to filter
schema: Dictionary defining which fields to maintain
Use True to maintain a discipline, nested dict for nested filtering
Returns:
Filtered dictionary containing solely specified fields
"""
if not isinstance(information, dict) or not isinstance(schema, dict):
return information
consequence = {}
for key, worth in schema.gadgets():
if key not in information:
proceed
if worth is True:
# Maintain this discipline as-is
consequence[key] = information[key]
elif isinstance(worth, dict):
# Recursively filter nested object
if isinstance(information[key], dict):
filtered_nested = filter_json(information[key], worth)
if filtered_nested:
consequence[key] = filtered_nested
elif isinstance(information[key], checklist):
# Filter every merchandise within the checklist
filtered_list = []
for merchandise in information[key]:
if isinstance(merchandise, dict):
filtered_item = filter_json(merchandise, worth)
if filtered_item:
filtered_list.append(filtered_item)
else:
filtered_list.append(merchandise)
if filtered_list:
consequence[key] = filtered_list
return consequence
Let’s filter a pattern API response:
import json
# Pattern API response
api_response = {
"person": {
"id": 789,
"username": "Cayla",
"electronic mail": "cayla@instance.com",
"password_hash": "secret123",
"profile": {
"title": "Cayla Smith",
"bio": "Software program developer",
"avatar_url": "https://instance.com/avatar.jpg",
"private_notes": "Inside notes"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"content": "My first post",
"views": 100,
"internal_score": 0.85
},
{
"id": 2,
"title": "Python Tips",
"content": "Some tips",
"views": 250,
"internal_score": 0.92
}
]
},
"metadata": {
"request_id": "abc123",
"server": "web-01"
}
}
# Schema defining what to maintain
public_schema = {
"person": {
"id": True,
"username": True,
"profile": {
"title": True,
"avatar_url": True
},
"posts": {
"id": True,
"title": True,
"views": True
}
}
}
filtered = filter_json(api_response, public_schema)
print(json.dumps(filtered, indent=2))
Output:
{
"person": {
"id": 789,
"username": "Cayla",
"profile": {
"title": "Cayla Smith",
"avatar_url": "https://instance.com/avatar.jpg"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"views": 100
},
{
"id": 2,
"title": "Python Tips",
"views": 250
}
]
}
}
The schema acts as a whitelist. Setting a discipline to True consists of it within the output. Utilizing a nested dictionary allows you to filter nested objects. The perform recursively applies the schema to nested buildings.
For arrays, the schema applies to every merchandise. Within the instance, the posts array will get filtered so every put up solely consists of id, title, and views, whereas content material and internal_score are excluded.
Discover how delicate fields like password_hash and private_notes don’t seem within the output. This makes the perform helpful for sanitizing information earlier than logging or sending to frontend purposes.
You’ll be able to create completely different schemas for various use circumstances, resembling a minimal schema for checklist views, an in depth schema for single-item views, and an admin schema that features all the things.
# 5. Changing JSON to and from Dot Notation
Some programs use flat key-value shops, however you wish to work with nested JSON in your code. Changing between flat dot-notation keys and nested buildings helps obtain this.
Here’s a pair of capabilities for bidirectional conversion.
// Changing JSON to Dot Notation
def json_to_dot_notation(information, parent_key=''):
"""
Convert nested JSON to flat dot-notation dictionary.
Args:
information: Nested dictionary
parent_key: Prefix for keys (utilized in recursion)
Returns:
Flat dictionary with dot-notation keys
"""
gadgets = {}
if isinstance(information, dict):
for key, worth in information.gadgets():
new_key = f"{parent_key}.{key}" if parent_key else key
if isinstance(worth, dict):
gadgets.replace(json_to_dot_notation(worth, new_key))
else:
gadgets[new_key] = worth
else:
gadgets[parent_key] = information
return gadgets
// Changing Dot Notation to JSON
def dot_notation_to_json(flat_data):
"""
Convert flat dot-notation dictionary to nested JSON.
Args:
flat_data: Dictionary with dot-notation keys
Returns:
Nested dictionary
"""
consequence = {}
for key, worth in flat_data.gadgets():
components = key.break up('.')
present = consequence
for i, half in enumerate(components[:-1]):
if half not in present:
present[part] = {}
present = present[part]
present[parts[-1]] = worth
return consequence
Let’s check the round-trip conversion:
import json
# Authentic nested JSON
config = {
"app": {
"title": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": True,
"notifications": False
}
}
# Convert to dot notation (for surroundings variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, worth in flat.gadgets():
print(f" {key} = {worth}")
print("n" + "="*50 + "n")
# Convert again to nested JSON
nested = dot_notation_to_json(flat)
print("Nested format:")
print(json.dumps(nested, indent=2))
Output:
Flat format:
app.title = MyApp
app.model = 1.0.0
database.host = localhost
database.credentials.username = admin
database.credentials.password = secret
options.analytics = True
options.notifications = False
==================================================
Nested format:
{
"app": {
"title": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": true,
"notifications": false
}
}
The json_to_dot_notation perform flattens the construction by recursively strolling by way of nested dictionaries and becoming a member of keys with dots. Not like the sooner flatten perform, this one doesn’t deal with arrays; it’s optimized for configuration information that’s purely key-value.
The dot_notation_to_json perform reverses the method. It splits every key on dots and builds up the nested construction by creating intermediate dictionaries as wanted. The loop handles all components besides the final one, creating nesting ranges. Then it assigns the worth to the ultimate key.
This method retains your configuration readable and maintainable whereas working throughout the constraints of flat key-value programs.
# Wrapping Up
JSON processing goes past fundamental json.masses(). In most tasks, you will have instruments to navigate nested buildings, rework shapes, merge configurations, filter fields, and convert between codecs.
The methods on this article switch to different information processing duties as nicely. You’ll be able to modify these patterns for XML, YAML, or customized information codecs.
Begin with the protected entry perform to forestall KeyError exceptions in your code. Add the others as you run into particular wants. Pleased coding!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! Presently, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.
