Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 ...15  Previous   Next
Credit Name Parsing
Author Message
DVD Profiler Unlimited Registrantxradman
Registered: June 17, 2002
Registered: March 14, 2007
United States Posts: 1,328
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
I would like to also go with film based credits rather than DVD based credits (so we don't have to duplicate effort for all the different localities and releases).  I know there are few exceptions where credits differ based on locality, but benefits would outweigh the downsides.  If there is a simple way to make exceptions, even better.
My Home Theater
DVD Profiler Desktop and Mobile RegistrantStar Contributorm.cellophane
tonight's the night...
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 3,480
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
  • A unique numeric identifier per name.

  • Each online profile stores only the identifier and the Credited As.

  • Locally, for display purposes, one chooses between:

  •     - Credited As name
        - Most credited name in local db
        - A local custom display name (which may or may not be any of the credited names)
  • Users could decide for their local db to use a global default method for all names with name-specific overrides (as above).


  • Display names must be local-only. Any contribution of display-names leads to all of the arguments about which method is "right".

    You could use a combination of local identifiers and online identifiers so that one could add names locally without involvement with the online db.  The cross-referencing would be done when contributing or downloading profiles. You could also allow users to ignore the cross-referencing when they download profiles if they are not interested in making the linking decisions (which I suspect could be too cumbersome for many users).

    We need a name management interface so we can sort through name issues. A function to merge two names, while preserving Credited As, would be very helpful.
    ...James

    "People fake a lot of human interactions, but I feel like I fake them all, and I fake them very well. That’s my burden, I guess." ~ Dexter Morgan
    DVD Profiler Unlimited RegistrantMark Harrison
    I like IMDB
    Registered: March 13, 2007
    Reputation: Great Rating
    United States Posts: 3,321
    Posted:
    PM this userEmail this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
    Quoting T!M:
    Quote:
    I wouldn't presume to know how to handle it from a programmer's point of view (a unique ID for everyone certainly seems a logical step), but what I'd like as a user is, ideally, that when I'm auditing a DVD, and I enter a name - say "John Doe", is that the program asks me which "John Doe" I mean. I'd like the program to show me a list of the existing people named "John Doe", including people whose real or common name may not be John Doe but who have been credited as such. If at all possible, it would be great if something extra was thrown in to further identify the various John Doe's, like maybe including two or three of their film titles, their production years, or when it's about the crew section, maybe the job category in which they're mostly credited. Basically, when I type "John Doe", I'd like to be asked whether I meant the actor that acted in this-and-that, the sound mixer who worked on so-and-so, or the editor that worked on such-and-such. I'm afraid I'm asking a lot here... 


    I DO presume to know how to handle it from a programmer's point of view. But to be fair, I think the way to approach this is to figure out how to make it as simple as possible for the end user.  If you start from a programmer's point of view, you'll end up with something that works fantastic that no one can figure out.

    With that said, I think T!M is dead on.  I noticed the other day my Mad Men profiles have Elizbeta Moss.  Not sure if that's a mistake to be corrected yet or an alternate spelling.  But it's a good example.  From an end user point of view, I want to enter something like Elizabeth Moss and have the program display all current Elizabeth Moss' in the database.  It should also display any alternate spellings / existing links if they exist.  I'd like to see: Elizabeth Moss (AKA Elisabeth Moss, Lisa Moss, Beth Moss, etc).  I would use code from your CLT tool to order the names from most popular variations to least popular.

    From there you could take it a step further and use a Sounds Like algorithm to find similar names. They're fairly simple to implement.

    It should give me more than a name.  I should be able to see what that person has worked on previously.  Ideally it would list a couple of their biggest titles with the option to see all titles.  What they're most known for.  But that would require probably finding which titles are the "most owned" by people.  I think that would have to be automated or we would never agree what someone is most known for. This could be what they're most know for overall with the ability to drill down to see every title you own with that actor.  If we're retaining the existing birthyears, those should be displayed as well for identification purposes.  I think that there would handle a huge chunk of problems.  At least for big names.

    I'd also display their photo if one is available.

    Initially I wouldn't place too many restrictions around linking or unlinking people.  Unless it proves to be a problem.  Someone should be able to say that John Doe from movie A isn't the same John Doe from movie B.  The voters ought to be able to catch most mistakes like that.

    That's all I can think of for the end user.  While it's easy to let my imagination run wild, I think it should be a simple interface.  If it becomes too convoluted, you'll end up with frustrated users.

    From the technical side, just some thoughts off the top of my head.  This is just a first stab.  I think I would have an Actor table with 2 IDs.  The first would be a Person ID, the second would be a Variant ID.  So with my previous example, Elisabeth Moss might be Person ID #1.  Elizabeth Moss would also be Person ID #1.  Each spelling would get a different Variant ID.  Other fields would include the names.  And if we retain birthyears, those would be there too.  Perhaps a pre-computed "sounds like" value should you decide to implement that for faster searching.

    Also in an ideal world, if Elizabeth Moss is a typo, when it's changed, Eliszbeth Moss would be removed from the table if that name is no longer connected to any existing profiles. That would help a little to keep the database clean and keep the potential variations smaller when looking someone up.

    So if I enter Elisabeth Moss, it would select any exact matches on that name.  You would also pick up all the variants on the spelling.  If I pick one, that person ID and variant ID get saved in the movie profile.  If I determine it is a new person with the exact same name, they get a new person ID and variant ID.  If you determine that Elizabeth Moss is not actually Elisabeth Moss, then you change the existing Person ID and Variant ID to the proper person or make them a new person.

    I'm be happy to ponder the technical side further and consult with some DBA friends, but I suspect you really don't need such assistance.
    Get the CSVExport and Database Query plug-ins here.
    Create fake parent profiles to organize your collection.
     Last edited: by Mark Harrison
    DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
    Since July 3, 2003
    Registered: March 29, 2007
    Reputation: Great Rating
    France Posts: 4,479
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Quoting xradman:
    Quote:

    I don't know about Flipper, but Madonna and Cher are stage names IMO and belong in the Given Name field.

    For sorting reasons, those stage names should be better in surname place.
    Images from movies
    DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
    Since July 3, 2003
    Registered: March 29, 2007
    Reputation: Great Rating
    France Posts: 4,479
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Naming of headshots should reflect linking (use of ID ?). It could be something like "givenname_surname_ID"
    In fact ID field would replace in database BY field, with of course different functions.

    At present time, if a user has Zhang Ziyi in his local, and another user has Ziyi Zhang, we get two headshots when we share headshots databases. It would be great to have only one. So we can compare Zhang_Ziyi_56789 to Ziyi_Zhang_56789
    Images from movies
     Last edited: by surfeur51
    DVD Profiler Unlimited RegistrantStar ContributorCharlieM
    Registered Sept 5 2005
    Registered: May 20, 2007
    Reputation: High Rating
    United States Posts: 2,934
    Posted:
    PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
    I posted this awhile back, but I think it is what is needed (an expansion of what xradman said)

    Quote:
    I don't think we will get any meaningful linking, unless we get away from a thinking about it in names.



    We need to think of it more in How we associate.  We really don't care if this is the same name as these other 10 variants.  The only thing we care about is "Are these 2 people the same".

    This is a 1 to 1, not 1 to many relationship

    I am not a very good diagramer, but here goes (very simplified)


                                              KEY1
          J. Doe --  John Doe -- J. R. Doe --  John R Doe -- John Robert Doe

                                              KEY2
          J. Doe --  John Doe -- J. R. Doe --  John R Doe -- John Randall Doe

    Here we have 2 distinct people, represented by Key1 and Key2.  The names below them are WYSIWYT.  What name you enter is irrelevant.  No need for BY, or correct names, or common names,  the only thing that matters is the relationship between he name you entered and the key to associate it with.  In this scenario, common name would automatically be determined by the relationship of the number of entries to the key.

    When you enter a name into a profile you are auditing, the system would use a soundex or some other filtering subroutine to determine possibilities, and present them into a list.  So if you type in:

    John Doe as Role 1

    A list would pop up and give you an option of like

    1  J. Doe --  John Doe -- J. R. Doe --  John R Doe -- John Robert Doe
    2  J. Doe --  John Doe -- J. R. Doe --  John R Doe -- John Randall Doe

    Of course the more info you have, the closer the list becomes.  So typing in John R. Doe, would of course limit the list.  At this point, you would be able to click on a select the appropriate one.  If you did not know the right one, you would be able to look at the credits associated with each one, and then make your selection.  the entry in the DB would be John Doe (key?).  Now you have an association.

    It would require work, but when fully implemented, cast and crew would be linked properly.  It would not care if you accented or not.

    Now, I do not know how to implement this.  I can picture the structure, but not knowing how kens DB is designed, it may very well be a massive overhaul.


    Charlie:

    PS a little off topic, but --- nested cast dividers please...
    DVD Profiler Unlimited Registrantjmbox
    Registered: April 14, 2007
    United Kingdom Posts: 415
    Posted:
    PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
    If you don't want to be downloading the entire cast/crew list to your local database, then the linking of new names will have to be done by the contribution system.

    Say you have John Doe #1 and John Doe #2. You profile a new movie, which you know has John Doe #1 in it, so you select that one from your list when entering. Now, also there is Jane Doe, and you have a Jane Doe #1 but not the correct one. You can enter Jane Doe as a new variant and program gives it a local unique id. When contributing, John Doe #1 is automatically accepted, but for Jane Doe, you are given an option to choose which of the online database's Jane Does are correct, or the option to create a new one.

    Obviously the contribution system is going to have to give you list of titles that each Jane Doe is profiled in to make it easier to select the correct one.

    I don't think it will sensible to completely remove the CLT, but you can at least change it to give the online variant list
    DVD Profiler Unlimited Registrantjmbox
    Registered: April 14, 2007
    United Kingdom Posts: 415
    Posted:
    PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
    Oh and please link cast db ids to the crew db ids (or join them completely), so that say clicking on George Lucas director, will also bring up his cameo/acting roles.
    DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
    Who is John Galt?
    Registered: March 13, 2007
    Reputation: High Rating
    United States Posts: 6,635
    Posted:
    PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
    This is what I posted HERE a while back along with a poll and discussion:

    Here's how I see it possibly working:

    Eliminate the "As Credited" function of the program as well as the CLT.  All credits would be entered WYSIWYT.

    Ken would create a "Linking" table which would contain all variations of a person's name.  This table could be initially populated by Ken by extracting the current "As Credited" data from the main database.  This means that all the work done over the past couple of years trying to link names using "As Credited" would be preserved and we would be starting this system with a significant number of names already linked.  This table would become part of the download files whenever you "Update" your local database.

    Ken would have to modify the code for the local application such that when you double-click on a name in the cast or crew section, the program would find the exact name that you clicked on in the table (referenced above) and list not just the profiles with an exact match, but profiles with matches to all name variants listed in the table for that name.

    The program would link all instances within your local database automatically without making you manually go to each profile and update the credit to include "As Credited" data.

    For new links, users would submit "Link Requests" through a separate contribution process.  These requests would require sufficient supporting evidence that the two linked names are in fact the same person.  These "contributions" would be voted on the same way as profile contributions today, and ultimately reviewed by the screeners and approved/rejected as appropriate.

    We all know that the current main database is riddled with IMDb data.  A simple linking system wouldn't care if the actual credits in the main database are "As Credited" or not.  It would only need to know that credit A equals credit B equals credit C, therefore, a "cleanup" of the main database is not required to make this system work immediately.

    In all fairness, I need to point out one downside of this system.  Take for instance "Al Smithee".  Today, with as credited, "Al Smithee" has been identified to be an alias for more than one person (which is in fact the case) and only the profiles that are associated with the same person will get linked.  In the simple linking system, this cannot be done, and all people associated with the name of "Al Smithee" will get linked together.

    Personally, I believe this scenario is so infrequent, that the benefits of a simple linking system far outweigh this one small flaw.  And knowing this community, it is entirely possible that someone here can come up with a solution to that problem, too.

    To really put a cap on this, if we collapse the name fields to a single name field, we'd have a near perfect system that would eliminate 75% plus of the arguments in the forum. (I think that's a good thing????).

    Please feel free to shoot this full of holes!!!
    Hal
    DVD Profiler Unlimited RegistrantStar ContributorAce_of_Sevens
    Registered: December 10, 2007
    Reputation: High Rating
    Posts: 3,004
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Xradman's suggestion (and all the basically identical suggestions over the years) coudl be done without breaking the current functionality. The system would just have to auto-convert all names to the new format, then leave it to us to correct any that we hadn't been able to differentiate before

    What Ken is doing here will fix the eternal questions of Asian names and what is and isn't a middle name, but not help at all with the probably bigger issues of people we can't find birth-years for, people with the same name and birth-year, the difficulty of determining common names and, my least favorite, having to change every entry in the DB if a common name changes. Some sort of unique ID system would solve all of these issues. The difficulty is making sure everyone gets assigned one ID and vice versa, but I have lots of suggestions how to deal with that. It would need to largely be doen server-side, though.

    While we're at it, can we go to one DB for cast and crew? When i bring up Clint Eastwood, I'd like to see his acting and directing work together.
    DVD Profiler Unlimited RegistrantStar ContributorMerrik
    NON-STEPFORD PROFILER
    Registered: September 30, 2008
    Reputation: Highest Rating
    Canada Posts: 1,805
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Fantastic that something like this is finally happening!    And everyone's brought up some nice ideas already!

    Quoting surfeur51:
    Quote:
    Quoting xradman:
    Quote:

    I don't know about Flipper, but Madonna and Cher are stage names IMO and belong in the Given Name field.

    For sorting reasons, those stage names should be better in surname place.


    This doesn't really have anything to do with the thread persay (and I'm not trying to derail the thread) but we have to be careful when talking about things like this. Madonna is NOT a stage name. It's her given name. Ciccone (which she's been credited as only a single time in her career) is her surname. Gotta make sure when we get a stage name, it's actually a stage name. I do however understand the need for proper sorting, so it could be as simple as if there's no surname entered, it simply gets sorted by the only entry. Putting a first name into the surname field kind of defeats the purpose (in those cases only) of going to only two fields to properly enter given names into a given name field and surnames into a surname field.

    So I'd like to see: A single name = being sorted by that name without having to enter it into the incorrect field.
    The night is calling. And it whispers to me soflty come and play.
    DVD Profiler Unlimited RegistrantStar ContributorAce_of_Sevens
    Registered: December 10, 2007
    Reputation: High Rating
    Posts: 3,004
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    It would be pretty easy to program DVD Profiler to sort on given name of the family name field was blank.
    DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
    Since July 3, 2003
    Registered: March 29, 2007
    Reputation: Great Rating
    France Posts: 4,479
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Quoting Merrik:
    Quote:

    So I'd like to see: A single name = being sorted by that name without having to enter it into the incorrect field.

    I agree. In fact, when I spoke of stage names, I meant unique names or group names. John Wayne should be treated as given name/surname. Unique names may be the given name of the person, or his (her) surname, or a specific stage name (Miou-Miou, Bourvil, Fernandel...). We have also to treat group names (The Beatles). My concern is to find Fernandel among the F, not before the A.
    Images from movies
    DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
    Since July 3, 2003
    Registered: March 29, 2007
    Reputation: Great Rating
    France Posts: 4,479
    Posted:
    PM this userView this user's DVD collectionDirect link to this postReply with quote
    Just something to think of: "credited as" seems interesting for many users, and I can understand that. But should not be "credited as" name be exactly as credited, without any case transformation? With display name, everyone could keep "credited as" with its case, or transform the case with rules he finds good for him.

    So we could have in credited as field:
    KRISTIN SCOTT THOMAS
    Kristin SCOTT THOMAS,
    without loosing any information about what did really the credit maker.

    With accented names, it would simplify entering  "credited as", field as in 99,9% of case, names are correctly spelled in credits.
    Images from movies
     Last edited: by surfeur51
    DVD Profiler Unlimited RegistrantStar ContributorVirusPil
    uncredited
    Registered: January 1, 2009
    Reputation: Highest Rating
    Germany Posts: 3,087
    Posted:
    PM this userDirect link to this postReply with quote
    Some ideas by different users have been posted here.

    I'll post my current idea this evening.
    DVD Profiler Unlimited RegistrantStar ContributorGSyren
    Profiling since 2001
    Registered: March 14, 2007
    Reputation: Highest Rating
    Sweden Posts: 4,685
    Posted:
    PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
    I know a single name field would break current functionality, but personally that's something that I would happily sacrifice. If we can't decide proper parsing (e g Kristin Scott / Thomas vs Kristin / Scott Thomas), then sorting by last name is crippled anyway. But that's just me...
    My freeware tools for DVD Profiler users.
    Gunnar
        Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 ...15  Previous   Next