Restructure - Schema

Announcements about the database, website, and plugins.
szsori
Site Admin
Posts: 1972
Joined: Fri Nov 03, 2006 5:23 pm

Restructure - Schema

Postby szsori » Tue Feb 10, 2009 4:37 pm

Ok, I think I've got my sequences, tables, and fields created. I will change the tables to use a single sequence if a knowledgeable person thinks it's a good idea, but I have a feeling that Postgres will be forced to do sequential inserts into separate tables if they share a sequence. If someone can find this out for me, I'd be very grateful.

I have yet to set up my foreign keys, indexes, stored procedures, etc. I also have to run down the fields and set NOT NULL and default values where appropriate. I'll give a quick summary of my goal... every table that will have revisions (rollback capability) will have a modifiedby foreign key (the user) and modified date. This will allow us to push non-active revisions into the revision table and sort by modified date (desc, of course). Tables that don't allow revisions will all have an active flag that will determine if the field is available for new/updated records (ie shown in web interface and available in the API).

My main concern is how seasons will work. Seasontypes will have "Air Date", "DVD", "Absolute", "Best of DVD", etc. This means that we can add additional orderings for episodes if we want. From there each episode can belong to any number of seasons, but we'll restrict it so it can only belong to one airdate season, one dvd season, one absolute season, etc. I think this will give us complete flexibility in how we store episodes without making things too complex for devs and end users (they'll just select the order they want for a series and that's what it'll use).

You may notice that imdb id, tv.com id, etc aren't included. These will be stored in the remoteids table, which will give us more flexibility for adding future sources without having to add a field. Each of those other sources will be in remotesources. I also include remotesites, which I was thinking would be the actual links to other sites for the episode/series/person/etc. I may combine the id's and url's into one table though. I'm open to feedback on this.

I'd appreciate any feedback on this before I spend time making the revision tables, fk's, etc.

EDIT: For anyone wishing to work with this, I'm running Postgres Plus 8.3 (http://www.enterprisedb.com/products/pg ... do#windows). It includes PgAdmin, which is an excellent DB GUI (I'm extremely impressed and I work in SQL Studio and MySQL Administrator on a regular basis).

Code: Select all

-- Create the sequences
CREATE DATABASE tvdb WITH OWNER = tvdb ENCODING = 'UNICODE';

CREATE SEQUENCE apikeys_seq;
CREATE SEQUENCE banners_seq;
CREATE SEQUENCE comments_seq;
CREATE SEQUENCE episodepeople_seq;
CREATE SEQUENCE episodes_seq;
CREATE SEQUENCE genres_seq;
CREATE SEQUENCE languages_seq;
CREATE SEQUENCE networks_seq;
CREATE SEQUENCE people_seq;
CREATE SEQUENCE peopletypes_seq;
CREATE SEQUENCE ratings_seq;
CREATE SEQUENCE remoteids_seq;
CREATE SEQUENCE remotesites_seq;
CREATE SEQUENCE remotesources_seq;
CREATE SEQUENCE seasonepisodes_seq;
CREATE SEQUENCE seasons_seq;
CREATE SEQUENCE seasontypes_seq;
CREATE SEQUENCE series_seq;
CREATE SEQUENCE seriesaka_seq;
CREATE SEQUENCE seriespeople_seq;
CREATE SEQUENCE seriesstatus_seq;
CREATE SEQUENCE translatedepisodes_seq;
CREATE SEQUENCE translatedseries_seq;
CREATE SEQUENCE userepisodes_seq;
CREATE SEQUENCE userfavorites_seq;
CREATE SEQUENCE users_seq;

CREATE TABLE apikeys
(
  apikey_pk integer NOT NULL DEFAULT nextval('apikeys_seq'::regclass),
  apikey_user_fk integer,
  apikey_key character varying(16),
  apikey_project character varying(100),
  apikey_modifiedby_fk integer,
  apikey_modified timestamp with time zone,
  CONSTRAINT apikeys_primarykey PRIMARY KEY (apikey_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE banners
(
  banner_pk integer NOT NULL DEFAULT nextval('banners_seq'::regclass),
  banner_network_fk integer,
  banner_series_fk integer,
  banner_season_fk integer,
  banner_episode_fk integer,
  banner_people_fk integer,
  banner_filename character varying(100),
  banner_width integer,
  banner_height integer,
  banner_comment character varying(100),
  banner_modifiedby_fk integer,
  banner_modified timestamp with time zone,
  CONSTRAINT banners_primarykey PRIMARY KEY (banner_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE comments
(
  comment_pk integer NOT NULL DEFAULT nextval('comments_seq'::regclass),
  comment_user_fk integer,
  comment_series_fk integer,
  comment_episode_fk integer,
  comment_text character varying(500),
  comment_modifiedby_fk integer,
  comment_modified timestamp with time zone,
  CONSTRAINT comments_primarykey PRIMARY KEY (comment_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE episodepeople
(
  episodepeople_pk integer NOT NULL DEFAULT nextval('episodepeople_seq'::regclass),
  episodepeople_episode_fk integer,
  episodepeople_peopletype_fk integer,
  episodepeople_role character varying(100),
  episodepeople_modifiedby_fk integer,
  episodepeople_modified timestamp with time zone,
  CONSTRAINT episodepeople_primarykey PRIMARY KEY (episodepeople_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE episodes
(
  episode_pk integer NOT NULL DEFAULT nextval('episodes_seq'::regclass),
  episode_series_fk integer,
  episode_number integer,
  episode_airdate date,
  episode_runtime integer,
  episode_productioncode character varying(25),
  episode_modifiedby_fk integer,
  episode_modified timestamp with time zone,
  CONSTRAINT episodes_primarykey PRIMARY KEY (episode_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE genres
(
  genre_pk integer NOT NULL DEFAULT nextval('genres_seq'::regclass),
  genre_name character varying(50),
  genre_active boolean DEFAULT true,
  CONSTRAINT genres_primarykey PRIMARY KEY (genre_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE languages
(
  language_pk integer NOT NULL DEFAULT nextval('languages_seq'::regclass),
  language_name character varying(50),
  language_englishname character varying(50),
  language_active boolean DEFAULT true,
  CONSTRAINT languages_primarykey PRIMARY KEY (language_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE networks
(
  network_pk integer NOT NULL DEFAULT nextval('networks_seq'::regclass),
  network_name character varying(100) NOT NULL,
  network_active boolean DEFAULT true,
  CONSTRAINT networks_primarykey PRIMARY KEY (network_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE people
(
  people_pk integer NOT NULL DEFAULT nextval('people_seq'::regclass),
  people_firstname character varying(50),
  people_lastname character varying(50),
  people_birthdate date,
  people_bio character varying(500),
  people_modifiedby_fk integer,
  people_modified timestamp with time zone,
  CONSTRAINT people_primarykey PRIMARY KEY (people_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE peopletypes
(
  peopletype_pk integer NOT NULL DEFAULT nextval('peopletypes_seq'::regclass),
  peopletype_name character varying(25),
  peopletype_active boolean DEFAULT true,
  CONSTRAINT peopletypes_primarykey PRIMARY KEY (peopletype_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE ratings
(
  rating_pk integer NOT NULL DEFAULT nextval('ratings_seq'::regclass),
  rating_user_fk integer NOT NULL,
  rating_banner_fk integer,
  rating_series_fk integer,
  rating_episode_fk integer,
  rating_score smallint NOT NULL,
  rating_modifiedby_fk integer,
  rating_modified timestamp with time zone,
  CONSTRAINT ratings_primarykey PRIMARY KEY (rating_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE remoteids
(
  remoteid_pk integer NOT NULL DEFAULT nextval('remoteids_seq'::regclass),
  remoteid_series_fk integer,
  remoteid_episode_fk integer,
  remoteid_remotesource_fk integer,
  remoteid_id character varying(50),
  remoteid_modifiedby_fk integer,
  remoteid_modified timestamp with time zone,
  CONSTRAINT remoteids_primarykey PRIMARY KEY (remoteid_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE remotesites
(
  remotesite_pk integer NOT NULL DEFAULT nextval('remotesites_seq'::regclass),
  remotesite_series_fk integer,
  remotesite_episode_fk integer,
  remotesite_remotesource_fk integer,
  remotesite_url character varying(250),
  remotesite_modifiedby_fk integer,
  remotesite_modified timestamp with time zone,
  CONSTRAINT remotesites_primarykey PRIMARY KEY (remotesite_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE remotesources
(
  remotesrc_pk integer NOT NULL DEFAULT nextval('remotesources_seq'::regclass),
  remotesrc_name character varying(100) NOT NULL,
  remotesrc_active boolean DEFAULT true,
  CONSTRAINT remotesources_primarykey PRIMARY KEY (remotesrc_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seasonepisodes
(
  seasonep_pk integer NOT NULL DEFAULT nextval('seasonepisodes_seq'::regclass),
  seasonep_series_fk integer,
  seasonep_season_fk integer,
  seasonep_episode_fk integer,
  seasonep_modifiedby_fk integer,
  seasonep_modified timestamp with time zone,
  CONSTRAINT seasonepisodes_primarykey PRIMARY KEY (seasonep_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seasons
(
  season_pk integer NOT NULL DEFAULT nextval('seasons_seq'::regclass),
  season_series_fk integer,
  season_seasontype_fk integer,
  season_number integer,
  season_name character varying(100),
  season_modifiedby_fk integer,
  season_modified timestamp with time zone,
  season_lockedby_fk integer,
  season_locked timestamp with time zone,
  CONSTRAINT seasons_primarykey PRIMARY KEY (season_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seasontypes
(
  seasontype_pk integer NOT NULL DEFAULT nextval('seasontypes_seq'::regclass),
  seasontype_name character varying(25) NOT NULL,
  seasontype_shortname character varying(10) NOT NULL,
  seasontype_active boolean DEFAULT true,
  CONSTRAINT seasontypes_primarykey PRIMARY KEY (seasontype_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE series
(
  series_pk integer NOT NULL DEFAULT nextval('series_seq'::regclass),
  series_network_fk integer,
  series_genre_fk integer,
  series_runtime integer,
  series_night smallint,
  series_status_fk smallint,
  series_modifiedby_fk integer,
  series_modified timestamp with time zone,
  series_lockedby_fk integer,
  series_locked timestamp with time zone,
  CONSTRAINT series_primarykey PRIMARY KEY (series_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seriesaka
(
  seriesaka_pk integer NOT NULL DEFAULT nextval('seriesaka_seq'::regclass),
  seriesaka_series_fk integer,
  seriesaka_name character varying(100),
  seriesaka_modifiedby_fk integer,
  seriesaka_modified timestamp with time zone,
  CONSTRAINT seriesaka_primarykey PRIMARY KEY (seriesaka_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seriespeople
(
  seriespeople_pk integer NOT NULL DEFAULT nextval('seriespeople_seq'::regclass),
  seriespeople_series_fk integer,
  seriespeople_peopletype_fk integer,
  seriespeople_role character varying(100),
  seriespeople_modifiedby_fk integer,
  seriespeople_modified timestamp with time zone,
  CONSTRAINT seriespeople_primarykey PRIMARY KEY (seriespeople_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE seriesstatus
(
  serstatus_pk integer NOT NULL DEFAULT nextval('seriesstatus_seq'::regclass),
  serstatus_name character varying(25) NOT NULL,
  serstatus_active boolean DEFAULT true,
  CONSTRAINT seriesstatus_primarykey PRIMARY KEY (serstatus_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE translatedepisodes
(
  tranepisode_pk integer NOT NULL DEFAULT nextval('translatedepisodes_seq'::regclass),
  tranepisode_episode_fk integer,
  tranepisode_language_fk integer,
  tranepisode_name character varying(100),
  tranepisode_overview character varying(500),
  tranepisode_modifiedby_fk integer,
  tranepisode_modified timestamp with time zone,
  CONSTRAINT translatedepisodes_primarykey PRIMARY KEY (tranepisode_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE translatedseries
(
  transeries_pk integer NOT NULL DEFAULT nextval('translatedseries_seq'::regclass),
  transeries_series_fk integer,
  transeries_language_fk integer,
  transeries_name character varying(100),
  transeries_overview character varying(500),
  transeries_modifiedby_fk integer,
  transeries_modified timestamp with time zone,
  CONSTRAINT translatedseries_primarykey PRIMARY KEY (transeries_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE userepisodes
(
  userep_pk integer NOT NULL DEFAULT nextval('userepisodes_seq'::regclass),
  userep_user_fk integer,
  userep_episode_fk integer,
  userep_watched timestamp with time zone,
  userep_modifiedby_fk integer,
  userep_modified timestamp with time zone,
  CONSTRAINT userepisodes_primarykey PRIMARY KEY (userep_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE userfavorites
(
  userfav_pk integer NOT NULL DEFAULT nextval('userfavorites_seq'::regclass),
  userfav_user_fk integer,
  userfav_series_fk integer,
  userfav_modifiedby_fk integer,
  userfav_modified timestamp with time zone,
  CONSTRAINT userfavorites_primarykey PRIMARY KEY (userfav_pk)
)
WITH (OIDS=FALSE);

CREATE TABLE users
(
  user_pk integer NOT NULL DEFAULT nextval('users_seq'::regclass),
  user_login character varying(25) NOT NULL,
  user_password character varying(25) NOT NULL,
  user_email character varying(50) NOT NULL,
  user_roles integer NOT NULL DEFAULT 1,
  user_language_fk integer NOT NULL DEFAULT 0,
  user_bannerlimit smallint NOT NULL DEFAULT 1,
  user_agreements smallint NOT NULL DEFAULT 0,
  user_lastipaddress character varying(25),
  user_lastlogin timestamp with time zone,
  user_modifiedby_fk integer,
  user_modified timestamp with time zone,
  CONSTRAINT users_primarykey PRIMARY KEY (user_pk)
)
WITH (OIDS=FALSE);

emigrating
Site Admin
Posts: 278
Joined: Mon Aug 27, 2007 4:38 pm
Contact:

Re: Restructure - Schema

Postby emigrating » Wed Feb 11, 2009 12:55 pm

Does this mean that the decision to move to PostgreSQL is made? I have some reservations against this as postgres is kinda quirky (well, it was the last time I used it).

Postgres is case sensitive by default (MSSQL is not - although it be on a column by column basis - Pgres cannot) and this is a huge issue for us as it will not treat Mash the same as mash or MASH. You can use ILIKE 'mash' or UPPER any queries, but I still see it as a huge downfall - especially as you cannot use Indexes.

Replication is also somewhat lacking on postgres which may become an issue as/when we decide we need to roll out mirrors for the DB itself and not just content like we do today.

AFAIK you are also unable to reference "remote" databases in postgresql. This can easily be worked around, but ideally we would split the user/content database into two during the restructure. If for nothing else to be able to more easily export a full copy of the contentDB without having to remove usernames/passwords/emails etc. Other features we have discussed also makes sense being put in separate databases rather than just adding more and more tables to the same DB - being able to reference seriesDB.tvseries.seriesid from a query run in usersDB could prove useful when it comes to reporting.

Also, there is no UPSERT in pgres - although the "need" for this comes down to what sort of revision control we decide to use (if any).

szsori
Site Admin
Posts: 1972
Joined: Fri Nov 03, 2006 5:23 pm

Re: Restructure - Schema

Postby szsori » Wed Feb 11, 2009 3:17 pm

As we've discussed in our private forum and email, MS SQL makes us uneasy because of the licensing cost and difficulty in finding a suitable host if the current hosting situation ever changes. Yes, I know there are other hosts that provide MS SQL, but the large majority of people that have offered us free hosting have been running linux servers. That is my primary concern with MS SQL otherwise I'd be all for it. As such my research and reading has been primarily focused on MySQL vs PostgreSQL. I'm not entirely opposed to MS SQL, but I don't think my fears about using it are unfounded.

For case sensitivity, we can handle that by using LOWER() or UPPER(), ILIKE, or *~. Obviously each of these has some drawbacks which is why it'd probably be smart for us to store a transformed field in the translations table. This can easily be handled by the stored procs. If there are other methods, I'm not aware of them. I'm also not sure how much this would affect performance, since a transformed field could be indexed as well. I'd love to hear more feedback on this from someone familiar with the issue.

I'll look into Postgres replication more. I've run across Slony-I (http://www.slony.info/) a number of times when reading about Postgres used in large applications, but I'm not sure if it's considered the standard method for replication in Postgres or not. I think we should focus on keeping a single master while possibly adding multiple slaves, since that simplifies things a bit. Postback API's would have to hit the master while all other API calls could hit the slaves. (Note: I feel bad talking about hitting slaves :? )

For connecting to remote databases, I believe that's done using the CREATE DATABASE LINK statement, but I may be mistaken. Andre_pl... can you provide any insight here? I'm not sold on separating the databases since I'm not sure how that would work with all of the various relationships between the tables. I understand partitioning when the database is huge and complex, but ours really isn't all that big nor is it really complex. If this is something we need to discuss more, I'm all ears. :)

UPSERT won't be necessary for the revision control that I'm considering. If we go with a framework that may change, but right now I'm envisioning that each stored proc will first "INSERT INTO... SELECT FROM" the revision table and then INSERT INTO the live table. The revision table will be identical to the live table except it won't use SERIAL fields and the indexes may be set differently. This way finding all revisions for a record will be as simple as finding all records with that PK and sorting by modifieddate. Finding deleted records will be as simple as finding all records that exist in the revision table but not the live table, although I may set up a separate "deletedrecords" table to increase performance. Restoring a record will just require that we delete the live record and "INSERT INTO... SELECT FROM" the live table from the revisions table. My main concern right now is that primary keys would have to be ditched to allow duplicates, but I can still make that field an index. I could create another PK on each revision table, but the "INSERT INTO... SELECT FROM" statements become a lot more complex at that point (I'd have to list all of the fields other than the revision_pk or else ditch the revision_pk in-between the select and insert somehow). Another possibility is to have OIDS enabled on the revision table.

Obviously creating revisions will require wrapping those statements in a transaction, but I don't see a major performance hit since we won't store revisions for things like userepisodes, userfavorites, etc.

emigrating
Site Admin
Posts: 278
Joined: Mon Aug 27, 2007 4:38 pm
Contact:

Re: Restructure - Schema

Postby emigrating » Wed Feb 11, 2009 3:56 pm

I remember your concerns for MSSQL, but I also remember discussing the feasibility of rolling out on SQLExpress as a development platform and purchasing licenses as needed when it was actually needed. We also offered up VLK editions for initial use.

Don't get me wrong, I'm not against pgres as such, it does most of what is needed without a problem and what it doesn't do it can work around, but it seems to me that we have not learnt by our mistakes; by that I mean to say that none of the current site admins have a previous [good] working knowledge of postgres (to my knowledge) so whilst we no longer have some of the old "issues" working against us, we're now up against a completely new [SQL dialect] which will have to be picked up as and when things need coding/fixing/integrating. I can't talk for everyone, but I personally don't feel like having to pick up more [albeit useful] knowledge just for the heck of it ;)

I do agree though, PostgreSQL is the obvious choice ahead of MySQL - I just also think that MSSQL is the obvious choice ahead of pgres. Sure it will be harder to find free hosting of the site, but we currently have the main server running just fine with a image backup in a secondary location. If we start hitting bandwidth barriers again I could come up with another image backup fairly quickly as well as allow for a full mirror half way around the world (by which time we should look at selecting a mirror based on your geographical location). I know there were fears about being locked into the arrangement we have at the moment, but if you recall there were "offers of safety" made for this very reason.

I dunno, perhaps my "fears" are unfounded, but it feels like what you're proposing now is somewhat in disagreement with the discussions we had late last year in the "development" forum - where we even had people starting to look at MySQL-MSSQL migration. Perhaps we should move the discussion back to the dev forum for now?

szsori
Site Admin
Posts: 1972
Joined: Fri Nov 03, 2006 5:23 pm

Re: Restructure - Schema

Postby szsori » Wed Feb 11, 2009 4:18 pm

Wherever we discuss them is fine with me. I've just granted andre_pl access to that forum as well, since I think his experience in postgres and other technologies warrants his involvement. I'll start a thread down there. :)

andre_pl
Posts: 13
Joined: Wed Feb 04, 2009 11:25 pm

Re: Restructure - Schema

Postby andre_pl » Wed Feb 11, 2009 4:32 pm

as for the case sensitivity, it really is a non-issue. text-comparisons should be done using postgres' fulltext engine, which is pretty amazing.
also, I'm not familiar with MSSQL, but Postgres supports the concepts of schema's and tablespaces. which should provide the separation you're looking for in the user database. schema are like a logical separation, whereas a tablespace can live on a different area of the disk.

I find it odd that you refer to Postgres as "Quirky" its been my experience that its probably the most standards-compliant of the bunch. Postgres' SQL Dialect is very similar to Oracles, MS is the outcast with their whacky "Embraced and Extended" dialect. furthermore, this site is "open" as it says on the homepage. the DB Schema and code are open source, why pollute that by tying them to a proprietary RDBMS for no other reason that a lack of desire to learn a new SQL dialect? how "open" is it really, if I can't just download, install and USE it for free? (I realize this is a confusion of free as in beer, and free as in speech, but I think this site aims to accomplish both of those goals). and yes, SQL Express, blah blah. I dont have a windows machine, and dont want one... Postgres runs under every operating system, as does every other tool involved in this development process. MSSQL would be an anchor tying it to one specific platform, put everyone involved at the mercy of the behemoth. I would go so far as to say if mssql is chosen, I'm out, and I'll be your primary competition, leveraging all your nice open source code ;)

Remote database connections are possible w/ postgres, though I have never used them, and I believe it may require some sort of extension, don't quote me on that though.

and also, Slony is the way to go for replication, it has a bit of a steep learning curve, but its quite powerful.

Hope that helps with the decision making.