ホーム
PythonとXML
Pythonのxmlで要素を列挙する

Pythonのxmlで要素を列挙する

Pythonのxmlまたはdefusedxmlモジュールで、XMLの要素を列挙してみました。

ElementTreeオブジェクトから取得する
ElementTreeを使った取得例
Elementオブジェクトから取得する
Elementを使った取得例

ElementTreeオブジェクトから取得する

ElementTreeのiterメソッドは、ElementTreeの全ての要素を戻すイテレータを返します。

import defusedxml.ElementTree as ET
# import xml.etree.ElementTree as ET # 組み込みのxmlモジュールを使う場合

iter = tree.iter([tag])

変数	型	内容
tree	ElementTree	検索するElementTreeオブジェクト。
tag	str	省略可。既定値はNone。探したいタグ。
iter		指定されたタグの要素を返すイテレータ。

tagがNoneのときは、ElementTreeの根以下の全てのタグの要素が対象になります。

ElementTreeを使った取得例

試してみましょう。

次のようなXMLを処理するとします。

<?xml version="1.0" encoding="UTF-8" ?>
<tip>
    <middle id='1'>
        <bottom>content_1</bottom>
    </middle>
    <middle id='2'>
        <!--
        <bottom>content_20</bottom>
        <bottom>content_21</bottom>
        -->
        <bottom>いろは</bottom>
        <bottom>にほへと</bottom>
        <bottom></bottom>
    </middle>
    <middle id='3'>
        content_3
    </middle>
</tip>

全ての要素のタグ名と属性とテキストを表示してみます。

import defusedxml.ElementTree as ET

tree = ET.parse('test.xml')

for i in tree.iter():
    print(i.tag, i.attrib, i.text)

出力はこうなります。

tip {}

middle {'id': '1'}

bottom {} content_1
middle {'id': '2'}


bottom {} いろは
bottom {} にほへと
bottom {} None
middle {'id': '3'}
        content_3

改行文字もテキストとして扱われるようで、それでタグとタグの間に空行があるようです。テキストが無い要素のテキストの出力はNoneになってますね。

tagにbottomを指定してみます。

import defusedxml.ElementTree as ET

tree = ET.parse('test.xml')

for i in tree.iter(tag='bottom'):
    print(i.tag, i.attrib, i.text)

出力はこうなります。

bottom {} content_1
bottom {} いろは
bottom {} にほへと
bottom {} None

Elementオブジェクトから取得する

Elementのiterメソッドは、Element以下の全ての要素を戻すイテレータを返します。

import defusedxml.ElementTree as ET
# import xml.etree.ElementTree as ET # 組み込みのxmlモジュールを使う場合

iter = element.iter([tag])

変数	型	内容
element	Element	検索するElementオブジェクト。
tag	str	省略可。既定値はNone。探したいタグ。
iter		指定されたタグの要素を返すイテレータ。

tagがNoneのときは、Element以下の全てのタグの要素が対象になります。

Elementを使った取得例

試してみましょう。

import defusedxml.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()

for i in root.iter():
    print(i.tag, i.attrib, i.text)

getrootメソッドは、ElementTreeのルート要素をElementオブジェクトとして返すメソッドです。

出力はこうなります。

tip {}

middle {'id': '1'}

bottom {} content_1
middle {'id': '2'}


bottom {} いろは
bottom {} にほへと
bottom {} None
middle {'id': '3'}
        content_3

tagにbottomを指定してみます。

import defusedxml.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot(tag='bottom')

for i in root.iter():
    print(i.tag, i.attrib, i.text)

bottom {} content_1
bottom {} いろは
bottom {} にほへと
bottom {} None

ElementTreeのiterメソッドはXMLの木構造全体が対象なのに対して、ElementのiterメソッドはそのElement以下だけが対象なわけです。

CentOS7にsamba 4.8をインストールする Pythonで文字を置換する

公開日 2018-07-10

Pythonのxmlで要素を列挙する

ElementTreeオブジェクトから取得する

ElementTreeを使った取得例

Elementオブジェクトから取得する

Elementを使った取得例

PythonとXMLカテゴリの投稿

某エンジニアのお仕事以外のメモ（分冊）

Recent Posts

Tags

Categories