Bad episode XML

A place for developers to advertise their TheTVDB.com enabled app and get help from other developers with the API.
lpwcomp
Posts: 16
Joined: Sun Oct 30, 2011 8:20 pm

Fri Nov 18, 2011 7:18 am

I note in another thread that you no longer have to use specific mirrors. Are you now doing automatic load leveling between 2 or more copies of the DB? Is it possible that at least one of the copies contains corrupt or differently formatted data? I ask this because of the following:

Lately a lot of the time I am getting what I can only describe as garbage when trying to get XML for a specific episode. It eventually works properly but as of right now, I am experiencing this problem with episode 2 of "Once Upon a Time". Here is the relevant code:

Code: Select all

        url = MirrorURL + "/api/" + APIKEY + "/series/" + seriesid + "/default/" + season + "/" + episode + "/en.xml"
        debug(3,"getEpisodeInfoXML: Using URL " + url)
        try:
                episodeInfoXML = parse( urllib.urlopen(url)).getroot()
        except Exception as e:
                debug(0, "\n Exception = " + str(e))
                rawXML =  urllib.urlopen(url).read()
                debug(0,"\nrawXML = " + rawXML + "\n\nhexXML = " + toHex(rawXML))
This is the output:

getEpisodeInfoXML: Using URL http://thetvdb.com/api/0403764A0DA51955/series/2488
35/default/1/2/en.xml

Exception = not well-formed (invalid token): line 1, column 0

rawXML = ▼ ♥]S█R█0►}╫W∞ΣÑ/%è╔ÑåY╠@‼.♥⌠☻┤¥>1"▐8jm)#╔I╙Θ╟wì/$╝φ9{$¡v╧ΓΘƒ"ç59»
¡9ΘE²A☼╚╠m¬Mv╥√÷xq►≈α4◄8UA%8[ioSb¼╙d¶MFâxéÆcü₧ö╖ªó'ôß°►eGê÷╪º▓x&ùpnƒx§¿éÆ╟%┴πÆ▀ç
ƒ╢ä[╗&╕│>╝▲¬D☻/┤≤ßL;Jô├A¶↔Dââß σ♫-≡▓$▼▲ér>A╣♥°;¼ÿδÆKG↓£‼§╩á∞Xü?£♫\┌Y¬
╕▓╬nt°♂ `ûnöKßF»=╩F%≡3╖p¡iô▄Sªìéα4▬▬╓═ fEí└û☺∞☻▲°üφ││÷7ü2iK/╔╜└+2n√╬C«╫|├è∟WF&Σ
╟(wyüï\e↓G<ö6Σ╛⌂ƒ>Ñ┌╧┘▬▄εWPgj╦4Ö♠╘↓¬goj├╘é}«╓═ùj§:Eï♦¬go≤2╨Sw├[å♂µ╬Ü╩\═┼^▲ÄΓx8ûì
!░╨╬YWwc╟á0êÅú°x<B╣º►x}7=⌂║₧remT¡┼uæ]p≤Ω¥iü└{§xAÆ↑e‼ |xiW│Q◄╩=,≡VÖ¼T↓⌂Ä}▐☺╤mU
╒▒j┼┼⌂°ßå0↕♦

hexXML = 1f8b08000000000000035d53db52db30107dd757ece4a52f258ac9a58659cc40132e03f
402b49d3e3122de386a6d2923c949d3e9c7778d2f24bced397b24ad76cfe2e99f22873539afad39e
945fd410fc8cc6daa4d76d2fbf6787110f7e0341138554125385b696f5362acd364144d468378829
263819e94b7a6a22793e1f810654788f6d8a7b2782697706e9f7815a88292c725c1e392df879fb68
45bbb26b8b33ebc1eaa44022fb4f3e14c3b4a93c341141d448383e100e50e2df0b2241f1e82723e4
1b903f83bac9807eb924b47199c1315caa0ec58813f9c0e5cda59aa0ab8b2ce6e74f80bff60966e9
44be14607af3dca4625f033b770ad6993dc53a68d82e03479081616d6cd096645a1c09601ec021ef
881edb3b3f6378132694b2fc9bdc02b326efbce43aed77cc38a1c574626e4db3edc91329ba5cee93
de800da83a335a99c5258da0d04eedb6cad73f85a1219cee53c00ce55fcbc749ea05c59f302174ab
b6de093b0b12e4ffb28bbfa057e71362de781fdf0b11a35ca3784c05cf950ae5215b8c9d1303a3a8
a3f1c0dc7287779818b5c6519473c9436e4be7f9f3ea5dacfd916dcee5750676acb349906d419aa6
76f6ac3d4827daed6cd976a153a458b04aa676ff332d05377c35b860be6ce9aca5ccdc55e1e8ee27
838968dd1fbbf5619ffa5555596afa6ccd5d7bacaf20d21b0d0ce59577763c7a030888fa3f8783c4
2b9a710787d373d7fba9e72656d54adc575915d70f3ea9d6981c07b15784192186513097c786957b
35111ca3d2cf05699ac54197f8e7dde01d16d55d5b16ac5c57ff8e1863012040000
!! Error looking up data for this episode, skipping.

BTW, just tried it again and it worked properly. But now episode 3 is returning garbage
User avatar
szsori
Site Admin
Posts: 2229
Joined: Fri Nov 03, 2006 2:23 pm

Fri Nov 18, 2011 10:06 am

No mirrors right now and the mirror code is not needed. I'm showing proper XML for episode 3 as well...
http://thetvdb.com/api/0403764A0DA51955 ... 1/3/en.xml
lpwcomp
Posts: 16
Joined: Sun Oct 30, 2011 8:20 pm

Fri Nov 18, 2011 3:07 pm

Thanks for the quick response.
szsori wrote:No mirrors right now and the mirror code is not needed. I'm showing proper XML for episode 3 as well...
http://thetvdb.com/api/0403764A0DA51955 ... 1/3/en.xml
I was afraid you were going to say that. At his point, I thinking my ISP (U-Verse) is doing something weird although it works fine for me too when I paste the URL into my browser. And so far, eventually works in the program.

I going to continue to see if i can find the source of the problem or at least a workaround. Meantime, does anyone out there have any ideas? BTW, in the version of the code I am using, I have hardcoded MIrrorURL to be "http://theTVDB.com".

I have no clue what is causing this. I just ran the program and it didn't work for episodes 2 & 3 of "Once Upon a Time" but did work for episode 4.
etw
Site Moderator
Posts: 1138
Joined: Sat Oct 16, 2010 3:48 pm
Location: England

Sat Nov 19, 2011 12:31 am

Just a thought but have you checked the headers to see if it is being compressed (gzip) as web browsers will handle this automagically but you'll need to code for this in any custom code.
guillesn
Posts: 9
Joined: Wed Nov 02, 2011 2:47 am
Location: Palma de Mallorca

Sat Nov 19, 2011 1:57 am

I've tested my application with those XML files and they worked as it was expected, so the files are OK. What's happening to you is quite strange... just a few thoughts on this:
·Try to use http://thetvdb.com as MirrorURL, since the XML API that you are using may be case sensitive
·As etw said, check if data is being compressed, I don't think so since we can use these XML files, but make sure that.
·The last one, does your XML API use any kind of cache policy? If it does, maybe a connection went wrong the first time you fetched these XML files and your XML API is returning corrupt data from the local cache.

Good luck!
lpwcomp
Posts: 16
Joined: Sun Oct 30, 2011 8:20 pm

Sat Nov 19, 2011 6:53 pm

Based on the two previous posts (thanks!), I did some research and attempted a gzip decompression and that returned valid data. So it appears that sometimes I am getting compressed data. So I added code to treat it as compressed if the initial parse failed and that seems to work. Not real happy with the way I have it coded so will continue to work with it.

Not sure why I am sometimes getting compressed data and sometimes not. I would think it would be consistent one way or the other
lpwcomp
Posts: 16
Joined: Sun Oct 30, 2011 8:20 pm

Mon Nov 21, 2011 7:56 am

I am trying to get the zap2it program/episode Id, In order to do this, I must access the data from a specific season. Unfortunately, no matter what url I use, I can only get the current or final season data. Even if I put the url in a browser, I get the same page. If I refresh it, I get the correct page, but there doesn't seem to be any way to do that within a Python program.

Example:

Code: Select all

        url = "http://tvlistings.zap2it.com/tv/perry-mason/episode-guide/EP00003343/1"
        rawHTMLfile = urllib.urlopen(url)
        rawHTML = rawHTMLfile.read()
rawHTML should have the data for the episode guide for season 1 of "Perry Mason". Instead, it has the data for season 9, which was the final season.

This may be a feature of zap2it to prevent someone from doing exactly what I am trying to do but does anyone have any ideas?
Omertron
Posts: 9
Joined: Thu Mar 26, 2009 11:31 pm

Tue Nov 22, 2011 9:26 am

It's not just you. I get this a lot with YAMJ.

Interested to see your solution, so I will try unzipping the corrupt data as well.
etw
Site Moderator
Posts: 1138
Joined: Sat Oct 16, 2010 3:48 pm
Location: England

Tue Nov 22, 2011 9:32 am

If whatever library or function you're using makes the http headers available to you, you can check the "Content-Encoding:" header, which should be set to gzip if compressed and the header probably won't exist if uncompressed.
lpwcomp
Posts: 16
Joined: Sun Oct 30, 2011 8:20 pm

Tue Nov 22, 2011 1:07 pm

Omertron wrote:It's not just you. I get this a lot with YAMJ.

Interested to see your solution, so I will try unzipping the corrupt data as well.

Here is the Python code I am using:

Code: Select all

import StringIO
import gzip

def getXML(url):
        
        try:
                rawXML = urllib.urlopen(url).read()
        except Exception as e:
                print "\n Exception = " + str(e)
                return None
        
        xml = None
        if ( toHex(rawXML[0:3]) !=  "1f8b08" ): #check for gzip compressed data
                filestream = StringIO.StringIO(rawXML)
        else:
                filestream = gzip.GzipFile(fileobj=StringIO.StringIO(rawXML))
        try:
                xml = parse(filestream).getroot()
        except Exception as e:
                print "\n Exception = " + str(e)
                
        return xml

def toHex(s):
    lst = []
    for ch in s:
        hv = hex(ord(ch)).replace('0x', '')
        if len(hv) == 1:
            hv = '0'+hv
        lst.append(hv)
Post Reply