TheTVDB.com

Online TV Database
It is currently Fri May 24, 2013 10:43 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Wed Sep 28, 2011 1:43 pm 
Offline

Joined: Thu May 05, 2011 1:04 pm
Posts: 4
Hi,

I think its good to have an automatic db-script that corrects accidently wrong entered values. Accrording to http://www.thetvdb.com/wiki/index.php/Category:Episodes, multiple entries in Guest Stars, Director & Writer Fields must be delimited with a pipe symbol ("|"). In my scraped database there are about 15% fields (mainly writer) that are splitted by commata and not by pipe.

any suggestions?

gerosil


Top
 Profile  
 
PostPosted: Wed Sep 28, 2011 1:48 pm 
Offline
Forum Owner

Joined: Tue Apr 28, 2009 11:28 am
Posts: 3420
Honestly people aren't going to enter these right no matter how much we hound them, and setting loose an automatic fix script is just bound to be a disaster at some point. Wouldn't it make more sense for the scraper to simply accept both commas and pipes?

_________________
Click here to LIKE me on Facebook!

Image


Top
 Profile  
 
PostPosted: Wed Sep 28, 2011 1:55 pm 
Offline
Site Admin

Joined: Fri Nov 03, 2006 5:23 pm
Posts: 2145
Handled completely on the new site, since people are correctly split into a separate table. This will allow the API to properly return each person as a separate XML entity, meaning the end-user app can display it in whatever manner they see fit.


Top
 Profile  
 
PostPosted: Wed Sep 28, 2011 2:08 pm 
Offline

Joined: Thu May 05, 2011 1:04 pm
Posts: 4
hikaricore wrote:
Honestly people aren't going to enter these right no matter how much we hound them, and setting loose an automatic fix script is just bound to be a disaster at some point. Wouldn't it make more sense for the scraper to simply accept both commas and pipes?


sure, it woulf be better to "educate" the users, but a small SQL should fix all errors at once. something like this should work:

Code:
// first set beginning & ending delimiters for fields with commata
UPDATE <table> SET <field>='|'||<field>||'|' WHERE <field> LIKE '%,%';
// then replace all occurences of ", " and "," with the pipe
UPDATE <table> SET <field>=(SELECT REPLACE((SELECT REPLACE(<field>,', ','|')),',','|')) WHERE <field> LIKE '%,%';


on the other hand, if you redesign the db-scheme in the next release, we could wait a little bit. But I doubt all the scrapers adopt to the new api (for example my favorite media manager Ember Media Manager isn't developed any more)


Top
 Profile  
 
PostPosted: Wed Sep 28, 2011 3:15 pm 
Offline
Site Admin

Joined: Fri Nov 03, 2006 5:23 pm
Posts: 2145
The new database schema can be used to generate the same results as the current API, but with the errors fixed. Put succinctly, it'll be backwards compatible but more accurate. The problem with your solution is that we can't properly assume that all directors, writers, and actors don't have the word "and" or a comma in the middle of their name or that they shouldn't remain grouped as a single entity.

These instances would be rare, but take the case of "Hootie and the Blowfish". In my opinion a musical group that is listed in the credits on a show as a single group should remain grouped together. They could also be listed individually as "Darius Rucker|Dean Felber|Jim Sonefeld|Mark Bryan". But if we apply your SQL we'd end up with "Hootie|the Blowfish" which is definitely less appropriate than the other two.

Obviously I'm applying this to the instance where someone uses "and" instead of a comma or pipe, but it's merely a demonstration of the issue, which is also true for commas.


Top
 Profile  
 
PostPosted: Wed Sep 28, 2011 3:19 pm 
Offline
Site Admin

Joined: Fri Nov 03, 2006 5:23 pm
Posts: 2145
Specific comma example... "Peter, Paul and Mary" would result in "Peter|Paul and Mary" or "Peter|Paul|Mary". None of those is as accurate as "Peter, Paul and Mary" existing as a single entity.


Top
 Profile  
 
PostPosted: Thu Sep 29, 2011 11:49 pm 
Offline

Joined: Fri Mar 06, 2009 8:13 pm
Posts: 195
szsori wrote:
Handled completely on the new site, since people are correctly split into a separate table. This will allow the API to properly return each person as a separate XML entity, meaning the end-user app can display it in whatever manner they see fit.

Ahh good I was hoping it would be something like this.


Top
 Profile  
 
PostPosted: Sat Oct 01, 2011 5:28 pm 
Offline

Joined: Sat Apr 30, 2011 1:55 am
Posts: 9
It will also f*ck up the name if the actor is called something like Joey McCorney, Jr.

_________________
Coder of a Swedish XMLTV project and a TVGuide.


Top
 Profile  
 
PostPosted: Sat Oct 01, 2011 5:37 pm 
Offline
Forum Owner

Joined: Tue Apr 28, 2009 11:28 am
Posts: 3420
alatoor wrote:
It will also f*ck up the name if the actor is called something like Joey McCorney, Jr.


Why are you censoring yourself? :lol:

_________________
Click here to LIKE me on Facebook!

Image


Top
 Profile  
 
PostPosted: Wed Mar 14, 2012 12:36 pm 
Offline

Joined: Tue Feb 14, 2012 2:42 pm
Posts: 4
Sorry for the bump, but in a similar vein to this, would it be possible to suppress characters that cause XML errors?

Quite regularly my program falls over while searching because of people putting strange tags in show descriptions like this which I just caught:

Quote:
Written by AfterBurner <aburner@erols.com>


on a show which caused any XML using part of the show name to be unusable by the API.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group