Restructure/Redesign

Announcements about the database, website, and plugins.
c4ho
Posts: 160
Joined: Wed Apr 18, 2007 3:23 am

Re: Restructure/Redesign

Postby c4ho » Wed Mar 04, 2009 6:31 pm

click170 wrote:In this economy, I don't think anyone expects it to stay up forever under its current donations model. If you are based in the United States, I am astonished you have survived this long.


Please dont take this as a dig but if you move is server farm or colo circles what were talking about is a negligible drop in the bucket. You cant make huge genralitys like "In this economy" or "don't think anyone expects it to stay up forever". Essentially your just guessing. Not a dig :)


szsori wrote:
c4ho wrote:I am confused. I know hosting used to cost money a while back but with the current hardware donations what currently runs to FAR more than $100 a month.?


You surely can't assume that the company providing hosting doesn't pay for it. Right now it's being donated, but we can't expect that to continue indefinitely.


I am not assuming anything I am differentiating between ACTUAL costs and POTENTIAL costs along with ACTUAL income and POTENTIAL income. There is a fundamental difference. Lets be careful not to confuse the two.

Right now you have ZERO ACTUAL costs but HIGH POTENTIAL costs. You also have ACTUAL income thats accumulating in a period of no ACTUAL costs.

Im not trying to labour the point its just business plan 101 which will be vital if you do want to take this to LLC level.


szsori wrote:However, bandwidth is the entire discussion. With our current bandwidth usage we're using around 2500GB/month. You can't price out individual dedicated hosts, since we can't cluster the database without jumping through a ton of hoops. Using Amazon's S3 service as an example, bandwidth costs $0.20/GB. That means we're looking at $500/month for that. That's not to mention that we expect bandwidth usage to increase over time instead of decrease. Mirroring will help, but we still wouldn't be below the current site revenue. That's not even taking into account trying to future-protect the site by building up a small savings for it either (my goal would be savings for 6 months of costs).
[/quote]

Thats a ridiculous price to pay for bandwidth. I have one server using in excess of 5TB of month and it costs me 60 Euros. I feel now we are getting to the crux of it . Bandwidth is VERY cheap but the current codebase backend requirements are so specific we cant make best use of good deals out there.

Focusing on this should be the way forward. On any startup you minimise outgoings on day 1. This would leave you free to do alot more with the money when you finally decide on the next route.

Lastly if you are going to donate server you should get that money back.

Edit: removed a comment that didnt actually add anything

emigrating
Site Admin
Posts: 278
Joined: Mon Aug 27, 2007 4:38 pm
Contact:

Re: Restructure/Redesign

Postby emigrating » Wed Mar 04, 2009 6:58 pm

szsori wrote:Donations before that time were used primarily for expenses as well, but there were some left over and combined with more recent donations we're still under $400.

I would just like to point out that those $400 is over a period of more than a year.

c4ho wrote:I am confused. I know hosting used to cost money a while back but with the current hardware donations what currently runs to FAR more than $100 a month.?

Which bit do you find confusing? The hardware/bandwidth/electricity/support doesn't cost TheTVDB anything, but I assure you someone, somewhere, is still paying for it, and that's the core of what we are discussing - we are trying to find a way for these costs to be paid for by the site; meaning that TheTVDB LLC manages their own hosting rather than relying on third-parties for free.

c4ho wrote:Based on some quick calculations you could get THREE of these for that kind of money:

Like I pointed out earlier in this thread, that LeaseWeb server is not an option due to the below par processor and memory in it. They do have other alternatives that would better suit us, but since we already own hardware, we don't need to rent/lease a virtual or dedicated server; we need rackspace in a datacenter with good bandwidth and peering agreements. Sure, LeaseWeb provides this also, but we are talking a lot more money that the 29 euro you considered.

click170 wrote:I can't help but thinking, and please don't take this as an attack against the developers [praise be with you], but how efficient is the codebase if it requires that kind of horsepower behind it? Which is it that is taking up so much memory and or CPU cycles? You mention most of the load is for API calls, is there any caching in place for the most recent or most popularly pulled episode? Is it the construction and compression of the xml files?

You have a point, and you are right on track; the current codebase is bringing the server to a standstill - this will obviously be bettered during the redesign, but as we are looking to go back to a fully dynamic (rather than the currently employed static file) system any improvements we make to the code as such will be reversed by the fact that we are no longer serving out cached XML files.

To expand on this a little - the XML (or ZIP) files we are currently serving out is generated as and when they are needed and cached on disk indefinitely. The only time you (an API user) hits the database is during initial series lookups and smaller ratings/favorites queries. This works fairly well, but it means we (the third-party developers) can no longer ask the site to hand us all the data on series A, B, C and D as well as episodes 1-20, 48-56 and actors X, Y and Z - that is to say; we can, but it has to be done "manually" in several calls. Going back to a fully dynamic system all the above could be queried with a single (or possibly three) call(s) to the site - and at the same time it would allow for true personalization of the returned dataset.

What's actually eating CPU at the moment is a combination of Apache, MySql and the various scripts running that creates the cached XML/ZIP files. In addition there is the image mirror script which, when it wakes up, is fairly processor intensive.

What you have to remember though is that we serve millions of pages per day, so the fact that we need fairly hefty hardware shouldn't come as a shock to anyone.

click170 wrote:Could I get you to elaborate on your hesitance towards clustering? I understand from a programming perspective, that does seem like a rather daunting task, but, planning for the future growth of the site, shouldn't it be assumed that we will eventually need to move to a clustering system? And perhaps, one day, ideally, entirely decentralized?

We kind of are planning for it, and it will [in a way] be a completely decentralized system. Basically, once the restructure is done and dusted we should be able to fire up another server in no-time and pull the plug on the original server without anyone (apart from the actual admins) knowing anything about it. With a little more magic we should be able to provide geographical mirrors also, which ensures high speed transfers all around the world.

click170 wrote:Also, I think buddy has a point about the wiki-style donations bar .... a really good idea would be asking the developers to be aware of this bar and to try and implement it similarly in their applications somehow. If people had a visual indication, "omg, TheTVDB is going to go down if I don't donate!" I think their reaction might be different to the current simply donation link.

The problem is that the majority of traffic comes in thru the backdoor (i.e. the API). Yes, we could ask the third-party developers to add a visual indication to their applications, but would it really have much of an effect? When I sit down infront of the HTPC I sit down to watch TV (or a movie, or browse some photos etc) and not to worry about where my TV listings come from or if they need some money to keep delivering those listings to me. Even if I got a huge on-screen popup saying "Donate now or the site will go down in a week" I would probably close it quickly and continue watching the last episode of Lost. I reckon most people [actually using HTPC's with a TheTVDB compatible frontend) are the same.

If we were able to donate from the HTPC frontend (rather than having to use an actual PC) it would likely give more of an effect, but unless all the third-party developers spend time developing their own payment system or PayPal gateways [for us - for free] it's not going to happen.

I find it interesting that there is such as small number of people posting here. Is it safe to assume from that fact alone that most people do not have a problem with what we are proposing (i.e. certain premium services available for a small fee)?

In any case, history has shown that we cannot rely on donations alone. Google ads help a little, but for them to be really effective we need to drive more people to the site (something we are looking into as well), but in order to do this we need to provide more services/features for these people to utilize and more more traffic == more processors/memory needed == more bandwidth consumed == more overall cost.

I really don't want to compare this [relatively small site] with larger ones, but there are plenty of [large community] sites out there trying to survive on advertising alone and if they cannot do it ... well, you do the math :)

c4ho wrote:Thats a ridiculous price to pay for bandwidth. I have one server using in excess of 5TB of month and it costs me 60 Euros. I feel now we are getting to the crux of it . Bandwidth is VERY cheap but the current codebase backend requirements are so specific we cant make best use of good deals out there.

Can I just ask how many hits per second that server is getting? And please dont tell me it's your seedbox because that needs next to no CPU and/or memory (heck, I've run bt clients on my Fonera wifi AP succesfully).

Yes, bandwidth is cheap (I can soon get a 100Mbps pipe installed in my office for ~$40/month, 1Gbps pipes are also on the way apparently, for roughly the same price) - but running a dynamic website receiving millions (yes millions, as in 7 figures) of hits per day is demanding on the hardware and that has nothing to do with the codebase being "specific". And it's not just initial hardware, it's also ongoing replacement of that hardware. If a drive fails it needs to be replaced, if a fan fails that needs to be replaced, if lightning strikes and kills the entire server the entire server needs to be replaced. What about physical access to the server? iLO access?

dgaust
Posts: 38
Joined: Fri Jan 04, 2008 12:27 am

Re: Restructure/Redesign

Postby dgaust » Wed Mar 04, 2009 10:11 pm

emigrating wrote:
I find it interesting that there is such as small number of people posting here. Is it safe to assume from that fact alone that most people do not have a problem with what we are proposing (i.e. certain premium services available for a small fee)?


No problem here. The basic information will still be available. I would have no problem if Fan Art was to be considered a part of that premium service.
'Underbelly' is not one show with 4 seasons, but 4 independent mini-series. If you insist otherwise you're a moron.

behanw
Posts: 1
Joined: Mon Oct 06, 2008 1:47 pm

Re: Restructure/Redesign

Postby behanw » Wed Mar 04, 2009 10:22 pm

It all sounds good to me; especially the part about using stored procedures in postgresql.

At this point I'm mostly curious what's going to happen to the API (I agree it could be made a lot more consistant, and more fine grained). But I'm sure whatever is decided will be fine.

szsori
Site Admin
Posts: 1911
Joined: Fri Nov 03, 2006 5:23 pm

Re: Restructure/Redesign

Postby szsori » Thu Mar 05, 2009 3:16 am

In addition to all the points emigrating pointed out, here's another one. We're getting around 25 hits per second just on our main API. That's excluding images, which would more than double it. That's a lot of stuff hitting a dynamic interface. As emigrating pointed out, the restructure is an attempt to address part of the issue.

It really doesn't matter, though. The fact remains that the site's current income is far less than what we need to pay. No, we're not currently paying it, but we should be and if we don't, the site won't stick around.

On top of that, since we'll be forming an LLC for the site, we probably won't be taking donations after that. If people really want to donate, we'll give them a good cause to donate to. Frankly the donations have been so small that they'd barely make a dent in our expenses anyway. I'd much rather add some premium services to the site, which will be somewhat consistent income, rather than wonder month-to-month if the site will stay up.

I also still fail to understand why people would be upset about us adding some new premium services when all of our existing ones will stay completely free. Everything that people can currently do with the API and site will still be possible and free. In a nutshell, that really doesn't affect anyone outside the project, so I'm not sure what the complaint is.

c4ho
Posts: 160
Joined: Wed Apr 18, 2007 3:23 am

Re: Restructure/Redesign

Postby c4ho » Thu Mar 05, 2009 3:24 am

emigrating can you clarify a couple of points in your post. You say tvdb has $400 but it took a year to make. In another post ACTUAL income is $100 a month and ACTUAL outgoings for many months has been zero. Obviously that doesn't match.

Also you keep mixing the divide between bandwidth and hardware. Unless you post specifics about what you need it pointless saying hardware x and y is too slow? (To be clear i know the last example i posted it too slow but it was cited clearly as an example of bandwidth costs).

Lastly your roadmap idea of "TheTVDB LLC manages their own hosting rather than relying on third-parties for free.". This is a new point. So were saying that the plan is to drop donated servers in favour of servers owned or rented by TVDB. Why would you want to do that? Surely you would augment a core set of TVDB hardware with the generous donations of servers?

Also emigrating the tone of your responses is still very defensive. I realize you are heavily personally invested in this and thats a good thing but a couple of us here are trying to help speak for the silent masses. You guys have done an excellent job and likely invested a large amount of personal manhours and thats appreciated (very much so)... but just remember the amount of hours you have spent on tvdb will be much less than what the open source ann tvdb community as a whole has spent.

click170
Posts: 23
Joined: Thu Feb 19, 2009 4:21 am
Location: Canada-Land

Re: Restructure/Redesign

Postby click170 » Thu Mar 05, 2009 4:56 am

szsori wrote:I also still fail to understand why people would be upset about us adding some new premium services when all of our existing ones will stay completely free. Everything that people can currently do with the API and site will still be possible and free. In a nutshell, that really doesn't affect anyone outside the project, so I'm not sure what the complaint is.

Not to jump all over you :P but I think thats a little bit over-stating the resistance to subscriptions and premium services that have been suggested so far.
Re-reading the posts, the somewhat heated-debate started when someone mentioned database dumps.

c4ho wrote:Thats sounds [perfectly] ok but its a fine line between extra features and artificalliy reststicting access to user [submitted] data. This may be easily covered though with very regular and publiclly downloadable database dumps. This way you are saying "here is the data it is free" but "to access it in a slick way requires us to spend money so were charging ou for some premium services". Its a very fine line as this is what IMDB did and has... and whilst it is a superb source of data it is hardly in the same spirit as tvdb is this now.

There, he used it as an example of a way of providing premium features ^without^ artificially restricting access to user submitted data, and the example was public database dumps published regularly. As has been pointed out, these are already available on request (where does one submit a request?) and have been granted already on numerous occasions despite the website currently ^not^ having premium services.
Actually offering this service as you do, I think is extremely generous because of the lack of automation involved; if it was an automated process which gathered the data and laid it out in an easily parse-able format [people interested in this information are likely only encumbered by the format that it is in, in Sql table dumps] this is different (and should be free) but because there is manual labor involved in the request, the dumping of all database tables, stripping of non-relevant user information, this requires effort, and I thank you for providing that service as you have thus far. Please continue to do so, at least until its automated :)

Moving on,
It seems, in that specific comment at least, that his implication was that the API could be a pay-for service but the database dumps should be free because of their user-submitted nature. This has already been ruled out, as you have said, what is already there will remain free, fantastic!

c4ho wrote:Also playing devils advocate. Dbase dumps will be a paid for feature because bandwidth costs money. Thats fair enough until someone says I will host a torrent for it so you have virtually no bandwidth costs. Im am not being anal I am just predicting the reaction of some to hopefully make the transition easier.
Oh! Me! Me Me Me!!! *puts his hand up so high he's standing on his seat*
If thats all it takes to get you to publish database dumps, lets do it, I'm there, lets go! Like I said, I will provide you with Cds/DVDs if you are concerned about the bandwidth of the initial push. Somehow though,I suspect my volunteering alone won't do the job, but none-the-less the offer is on the table, so you can't say nobody offered.

Anyway, back to my original point, I don't think there is actually that much resentment towards the idea itself of collecting subscriptions for the 'premium services' that were proposed, except for the obviously controversial idea of the database dumps. There has been plenty of back and forth about the database dumps and why they should be pay and why they shouldn't but really I think thats where most of the controversy has been is on that idea, not so much on the idea of offering subscription services or even on the other proposed subscription services. I would take that as a green light for the subscription based model with premium services, with the exception of working out the kinks in the whole database dump thing.

Awe, and then c4ho goes and wrecks it by saying 'minimum of a day' in his post below. *distances himself from that particular post*

Edit: Oh, and of course, the various bickering over server costs, but I don't think that stems from any kind of resentment towards the subscription idea itself.
Last edited by click170 on Thu Mar 05, 2009 5:52 am, edited 2 times in total.

emigrating
Site Admin
Posts: 278
Joined: Mon Aug 27, 2007 4:38 pm
Contact:

Re: Restructure/Redesign

Postby emigrating » Thu Mar 05, 2009 4:58 am

c4ho wrote:emigrating can you clarify a couple of points in your post. You say tvdb has $400 but it took a year to make. In another post ACTUAL income is $100 a month and ACTUAL outgoings for many months has been zero. Obviously that doesn't match.

No, I said donations was below $400.00 over a 12 month period - with adsense revenue "we're currently bringing in under $100/month on average". Nowhere does it say actual income == $100/month. It clearly says it is below $100.00 / month.

c4ho wrote:Also you keep mixing the divide between bandwidth and hardware. Unless you post specifics about what you need it pointless saying hardware x and y is too slow? (To be clear i know the last example i posted it too slow but it was cited clearly as an example of bandwidth costs).

Lastly your roadmap idea of "TheTVDB LLC manages their own hosting rather than relying on third-parties for free.". This is a new point. So were saying that the plan is to drop donated servers in favour of servers owned or rented by TVDB. Why would you want to do that? Surely you would augment a core set of TVDB hardware with the generous donations of servers?

Wrong again. Go back to your first post suggesting the €29.00 LeaseWeb alternative, my reply was "the LeaseWeb dedicated server for €29.00 would not be much of an alternative ... actual bandwidth will work ... The site is currently running on 4x dual core Intel Xeon 3GHz platform and CPU usage is maxed out at times ... trying to move from an 8-core to a single-core setup is madness".

As for the hardware, it really doesn't matter if we have several Dell and HP servers donated or not - sure, the initial outlay is taken care of, but as I explained above what about ongoing hardware maintenance? Also, renting rackspace/bandwidth for a colo-server is [usually] more expensive than simply renting a dedicated server as the dedicated server is based on 1u. If we take LeaseWeb's standard 3u space option as an example we are talking €119.00, on this package we would have to change the 1amp option to 2amp [at least] which brings us to a total of €168.00. Now, let's change their defalt 1TB/mnt bandwidth allocation to 4TB and we're at €228.00 per month. Even if we go for 2u of rackspace we're at €198.00/month. Quite a bit more than your estimated "you can get three of these" idea.

c4ho wrote:Also emigrating the tone of your responses is still very defensive. I realize you are heavily personally invested in this and thats a good thing but a couple of us here are trying to help speak for the silent masses. You guys have done an excellent job and likely invested a large amount of personal manhours and thats appreciated (very much so)... but just remember the amount of hours you have spent on tvdb will be much less than what the open source ann tvdb community as a whole has spent.

Obviously it's "defensive" as certain people fail to understand what we are saying. Would you prefer to be told "Fuck ya'll. It is what it is and this is what's happening." and for us to call it a day? No, I didn't think so.

As for the silent masses, after I specifically asked "Is it safe to assume ... most people do not have a problem with what we are proposing" two more people came along and say "it's all good" - are you sure you still speak for the masses?

c4ho
Posts: 160
Joined: Wed Apr 18, 2007 3:23 am

Re: Restructure/Redesign

Postby c4ho » Thu Mar 05, 2009 5:39 am

OK Im out. Your responses are far to confrontational/selectively nit picky for me and do not fit with my interpretation of the spirit of the site.

[*]Make sure you allow (sooner rather than later) FREE download of the database at a minimum of once a day to match the spirit of the people that freely donate(d) their time to get you where you are today (and to allow a fork if someone deems it necessary)
[*]Make sure you realize you cannot retrospectively change the license of the user submissions without asking every single one of them to agree to it
[*]Make sure you realize you cannot sell the content of the site (as its not yours to sell) only the services

Good luck with your project, you are attempting to do what few open sources projects have ever managed to, really hope you make it work and find a good balance between commercial and open source.

emigrating
Site Admin
Posts: 278
Joined: Mon Aug 27, 2007 4:38 pm
Contact:

Re: Restructure/Redesign

Postby emigrating » Thu Mar 05, 2009 6:49 am

c4ho wrote:OK Im out. Your responses are far to confrontational/selectively nit picky for me and do not fit with my interpretation of the spirit of the site.

[*]Make sure you allow (sooner rather than later) FREE download of the database at a minimum of once a day to match the spirit of the people that freely donate(d) their time to get you where you are today (and to allow a fork if someone deems it necessary)
[*]Make sure you realize you cannot retrospectively change the license of the user submissions without asking every single one of them to agree to it
[*]Make sure you realize you cannot sell the content of the site (as its not yours to sell) only the services

Good luck with your project, you are attempting to do what few open sources projects have ever managed to, really hope you make it work and find a good balance between commercial and open source.

  • The database is available for download today and has been since day one. Simply request it and someone will sort it out for you.
  • No-one is changing the license of the data.
  • No-one is selling the content of the site, we have said time and time again that the data is free. Services may or may not be.