Press "Enter" to skip to content

How to Parse Namespaces using the Python RSS Parser?

Mani Gopalakrishnan 2

In the last tutorial, we learned about how to build a Python based RSS Parser. Continuing that conversation and building on that tutorial, let’s now look at parsing Namespaces and Namespace specific elements.

Getting Ready

For the purpose of this tutorial, we will use the WhizRssAggregator.py file that we created in the previous tutorial.

Parsing Namespaces

Let’s extend the RSS Aggregator file below.

[code language=”Python”]
import feedparser

class WhizRssAggregator():
feedurl = ""

def __init__(self, paramrssurl):
print(paramrssurl)
self.feedurl = paramrssurl
self.parse()

def parse(self):
thefeed = feedparser.parse(self.feedurl)

print("Getting Feed Data")
print(thefeed.feed.get("title", ""))
print(thefeed.feed.get("link", ""))
print(thefeed.feed.get("description", ""))
print(thefeed.feed.get("published", ""))
print(thefeed.feed.get("published_parsed",
thefeed.feed.published_parsed))

for thefeedentry in thefeed.entries:
print("__________")
print(thefeedentry.get("guid", ""))
print(thefeedentry.get("title", ""))
print(thefeedentry.get("link", ""))
print(thefeedentry.get("description", ""))
print("__________")

# Parsing Namespaces
for thefeednamespace in thefeed.namespaces:
if (thefeednamespace == "media"):
# parse for Yahoo Media
print("Media")
allmediacontent = thefeedentry.get("media_content", "")
for themediacontent in allmediacontent:
print(themediacontent["url"])
print(themediacontent["height"])
print(themediacontent["width"])</pre>
[/code]

In the above code snippet that follows the Parsing Namespaces comment, you use yet another powerful capability of FeedParser. By simply referencing thefeed.namespaces, you can retrieve the list of namespaces referenced in the RSS XML Document. You can then iterate through the namespace. In the example above, we assume that the “media” namespace is referenced in the RSS XML Document.

The media namespace uses a series of tags to define its content. Using feedparser, you can access the tag defined within a namespace by referencing it as namespace_tagname.

In this example, since we are referencing the  tags defined within the namespace, you can simply use the get() function with the “media_content” parameter. This returns all of the items using the tags defined within the context of the “media” namespace. You can simply iterate and print each sub-tag or attribute. In this example, print(themediacontent[“url”]) simply prints the link to the media content which is an attribute of the content tag.

Conclusion 

Most RSS Documents use multiple namespaces. By using the namespace feature and iterating through the document, you can very easily factor in various popular namespaces.

I hope this was helpful. Have fun coding in python.

P.S. Click here to download the files via github.

  1. Very useful tutorials…I am currently trying to find a way to use all that code with Flask so that the output can be displayed on a webpage, any pointers in the right direction will be greatly appreciated..i’m pretty new in the Python stage (started learning last December!).

Leave a Reply

%d bloggers like this: