Pandas JSON

JSON（ J ava S cript O bject N otation，JavaScript 对象表示法），是存储和交换文本信息的语法，类似 XML。

JSON 比 XML 更小、更快，更易解析，更多 JSON 内容可以参考 JSON 教程。

Pandas 可以很方便的处理 JSON 数据，本文以 sites.json 为例，内容如下：

实例

[ { " id " : " A001 " , " name " : " 云搜索MX教程 " , " url " : " www.yssmx.com " , " likes " : 61 } , { " id " : " A002 " , " name " : " Google " , " url " : " www.google.com " , " likes " : 124 } , { " id " : " A003 " , " name " : " 淘宝 " , " url " : " www.taobao.com " , " likes " : 45 } ]

实例

import pandas as pd

df = pd. read_json ( 'sites.json' )

print ( df. to_string ( ) )

to_string() 用于返回 DataFrame 类型的数据，我们也可以直接处理 JSON 字符串。

实例

import pandas as pd

data = [
{
"id" : "A001" ,
"name" : "云搜索MX教程" ,
"url" : "www.yssmx.com" ,
"likes" : 61
} ,
{
"id" : "A002" ,
"name" : "Google" ,
"url" : "www.google.com" ,
"likes" : 124
} ,
{
"id" : "A003" ,
"name" : "淘宝" ,
"url" : "www.taobao.com" ,
"likes" : 45
}
]
df = pd. DataFrame ( data )

print ( df )

以上实例输出结果为：

									id    name             url  likes
0  A001    云搜索MX教程  www.yssmx.com     61
1  A002  Google  www.google.com    124
2  A003      淘宝  www.taobao.com     45

JSON 对象与 Python 字典具有相同的格式，所以我们可以直接将 Python 字典转化为 DataFrame 数据：

实例

import pandas as pd

# 字典格式的 JSON
s = {
"col1" : { "row1" : 1 , "row2" : 2 , "row3" : 3 } ,
"col2" : { "row1" : "x" , "row2" : "y" , "row3" : "z" }
}

# 读取 JSON 转为 DataFrame
df = pd. DataFrame ( s )
print ( df )

以上实例输出结果为：

									col1 col2
row1     1    x
row2     2    y
row3     3    z

从 URL 中读取 JSON 数据：

实例

import pandas as pd

URL = 'https://static.jyshare.com/download/sites.json'
df = pd. read_json ( URL )
print ( df )

以上实例输出结果为：

									id    name             url  likes
0  A001    云搜索MX教程  www.yssmx.com     61
1  A002  Google  www.google.com    124
2  A003      淘宝  www.taobao.com     45

内嵌的 JSON 数据

假设有一组内嵌的 JSON 数据文件 nested_list.json ：

nested_list.json 文件内容

{
"school_name" : "ABC primary school" ,
"class" : "Year 1" ,
"students" : [
{
"id" : "A001" ,
"name" : "Tom" ,
"math" : 60 ,
"physics" : 66 ,
"chemistry" : 61
} ,
{
"id" : "A002" ,
"name" : "James" ,
"math" : 89 ,
"physics" : 76 ,
"chemistry" : 51
} ,
{
"id" : "A003" ,
"name" : "Jenny" ,
"math" : 79 ,
"physics" : 90 ,
"chemistry" : 78
} ]
}

使用以下代码格式化完整内容：

实例

import pandas as pd

df = pd. read_json ( 'nested_list.json' )

print ( df )

以上实例输出结果为：

									school_name   class                                           students
0  ABC primary school  Year 1  {'id': 'A001', 'name': 'Tom', 'math': 60, 'phy...
1  ABC primary school  Year 1  {'id': 'A002', 'name': 'James', 'math': 89, 'p...
2  ABC primary school  Year 1  {'id': 'A003', 'name': 'Jenny', 'math': 79, 'p...

这时我们就需要使用到 json_normalize() 方法将内嵌的数据完整的解析出来：

实例

import pandas as pd
import json

# 使用 Python JSON 模块载入数据
with open ( 'nested_list.json' , 'r' ) as f:
data = json. loads ( f. read ( ) )

# 展平数据
df_nested_list = pd. json_normalize ( data , record_path = [ 'students' ] )
print ( df_nested_list )

以上实例输出结果为：

									id   name  math  physics  chemistry
0  A001    Tom    60       66         61
1  A002  James    89       76         51
2  A003  Jenny    79       90         78

data = json.loads(f.read()) 使用 Python JSON 模块载入数据。

json_normalize() 使用了参数 record_path 并设置为 ['students'] 用于展开内嵌的 JSON 数据 students 。

显示结果还没有包含 school_name 和 class 元素，如果需要展示出来可以使用 meta 参数来显示这些元数据：

实例

import pandas as pd
import json

# 使用 Python JSON 模块载入数据
with open ( 'nested_list.json' , 'r' ) as f:
data = json. loads ( f. read ( ) )

# 展平数据
df_nested_list = pd. json_normalize (
data ,
record_path = [ 'students' ] ,
meta = [ 'school_name' , 'class' ]
)
print ( df_nested_list )

以上实例输出结果为：

									id   name  math  physics  chemistry         school_name   class
0  A001    Tom    60       66         61  ABC primary school  Year 1
1  A002  James    89       76         51  ABC primary school  Year 1
2  A003  Jenny    79       90         78  ABC primary school  Year 1

接下来，让我们尝试读取更复杂的 JSON 数据，该数据嵌套了列表和字典，数据文件 nested_mix.json 如下：

nested_mix.json 文件内容

{
"school_name" : "local primary school" ,
"class" : "Year 1" ,
"info" : {
"president" : "John Kasich" ,
"address" : "ABC road, London, UK" ,
"contacts" : {
"email" : "[email protected]" ,
"tel" : "123456789"
}
} ,
"students" : [
{
"id" : "A001" ,
"name" : "Tom" ,
"math" : 60 ,
"physics" : 66 ,
"chemistry" : 61
} ,
{
"id" : "A002" ,
"name" : "James" ,
"math" : 89 ,
"physics" : 76 ,
"chemistry" : 51
} ,
{
"id" : "A003" ,
"name" : "Jenny" ,
"math" : 79 ,
"physics" : 90 ,
"chemistry" : 78
} ]
}

nested_mix.json 文件转换为 DataFrame：

实例

import pandas as pd
import json

# 使用 Python JSON 模块载入数据
with open ( 'nested_mix.json' , 'r' ) as f:
data = json. loads ( f. read ( ) )

df = pd. json_normalize (
data ,
record_path = [ 'students' ] ,
meta = [
'class' ,
[ 'info' , 'president' ] ,
[ 'info' , 'contacts' , 'tel' ]
]
)

print ( df )

以上实例输出结果为：

									id   name  math  physics  chemistry   class info.president info.contacts.tel
0  A001    Tom    60       66         61  Year 1    John Kasich         123456789
1  A002  James    89       76         51  Year 1    John Kasich         123456789
2  A003  Jenny    79       90         78  Year 1    John Kasich         123456789

读取内嵌数据中的一组数据

以下是实例文件 nested_deep.json ，我们只读取内嵌中的 math 字段：

nested_deep.json 文件内容

{
"school_name" : "local primary school" ,
"class" : "Year 1" ,
"students" : [
{
"id" : "A001" ,
"name" : "Tom" ,
"grade" : {
"math" : 60 ,
"physics" : 66 ,
"chemistry" : 61
}

} ,
{
"id" : "A002" ,
"name" : "James" ,
"grade" : {
"math" : 89 ,
"physics" : 76 ,
"chemistry" : 51
}

} ,
{
"id" : "A003" ,
"name" : "Jenny" ,
"grade" : {
"math" : 79 ,
"physics" : 90 ,
"chemistry" : 78
}
} ]
}

这里我们需要使用到 glom 模块来处理数据套嵌， glom 模块允许我们使用 . 来访问内嵌对象的属性。

第一次使用我们需要安装 glom：

									pip3 install glom

实例

import pandas as pd
from glom import glom

df = pd. read_json ( 'nested_deep.json' )

data = df [ 'students' ] . apply ( lambda row: glom ( row , 'grade.math' ) )
print ( data )

以上实例输出结果为：

									0    60
1    89
2    79
Name: students, dtype: int64

云搜索MX教程 -- 学的不仅是技术，更是梦想！

Pandas JSON

实例

实例

实例

实例

实例

内嵌的 JSON 数据

nested_list.json 文件内容

实例

实例

实例

nested_mix.json 文件内容

实例

读取内嵌数据中的一组数据

nested_deep.json 文件内容

实例

微信关注