Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->General: General Discussion Page: 1  Previous   Next
AI blues
Author Message
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Inspired by a post in the Cast/Crew Edit 2 thread, I started to look for AI APIs that would allow me too look up IMDb entries. ChatGPT said it could, but the API wouldn't be free. Well, for a freeware program that is a no-no, in my opinion. So I turned to Microsoft Copilot. It said that it couldn't provide IMDb data via the API. I didn't bother to check if the API was free. Finally I asked Gemini if it could do it. It said that it could, and that there was a free API (with some limitations).

That sounded promising, so I did some initial testing, and it worked.  I instructed Gemini that I wanted all cast and crew, and I wanted the credits exactly as in IMDb. On closer testing I found that Gemini failed in both aspects. Running the same movie multiple times would produce different number of credits, and slight variation of roles and crew jobs - not as credited. I tried changing the prompt to impress the criteria on Gemini, but that didn't help. That isn't good enough.

So, at least for now, this idea has been shot down.  Disappointing, because I was really hoping to be able to mine IMDb data without having to resort to screen scraping. So I guess it's still either Cast/Crew Edit 2, or being content with TMDB data using TmdbInfo. 
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Unlimited RegistrantStar ContributorAlunH
Registered: February 19, 2012
Reputation: Superior Rating
United Kingdom Posts: 118
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
I don't think I've ever seen a single IMDb set of credits that doesn't contain at least one mistake.

It was once a valuable resource (and I'm talking probably twenty years ago).  It's definitely not any more.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
That may be true, but that’s beside the point. The data should be checked against the credits anyway.  You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.

Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.

I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,541
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
That may be true, but that’s beside the point. The data should be checked against the credits anyway.  You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.

Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.

I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort.

My experience with using Claude for writing code was very positive. It knew the eCommerce package I was using and exactly how to write and install supported plugins.

Perhaps another approach for you would be to have the AI actually write the code rather than present the result. Then you could request a specific coding approach and examine the code for ways to create a possibly improved version that it could create based on your guidance, or that you could augment manually.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorObiKen
Registered: October 22, 2015
Reputation: Highest Rating
Australia Posts: 344
Posted:
PM this userDirect link to this postReply with quote
Quoting GSyren:
Quote:
That may be true, but that’s beside the point. The data should be checked against the credits anyway.  You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.

Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.

I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort.

What about going direct to the IMDb Non-Commercial Datasets here.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting ObiKen:
Quote:
What about going direct to the IMDb Non-Commercial Datasets here.

Yeah, I looked at that. 7 files totaling over 9 GB unzipped. 6.6 GB if you skip AKAS and Ratings. Still you would need to load those files into some database in order to be able to extract any meaningful information out of them. Trying to use those flat files directly would mean that any lookup would take forever. And if you wanted to keep to it up to date, you'd have to go through the whole download/unzip/load again. And again ... 

So not really a useful option, I'm afraid.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Just for some perspective, I ran a line count on all the IMDb files:

title.principals.tsv - 99 738 115 lines
title.akas.tsv - 57 439 274 lines
name.basics.tsv - 15 378 039 lines
title.basics.tsv - 12 539 092 lines
title.crew.tsv - 12 537 727 lines
title.episode.tsv - 9 685 416 lines
title.ratings.tsv - 1 676 413 lines

I guess it would be theoretically possible to write a program that loads (some of) these files into a database, and get full cast and crew listings from it. title.principals.tsv and title.basics.tsv would probably suffice for movies, add title.episode.tsv for TV shows. But would anyone be interested?
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Unlimited RegistrantStar ContributorObiKen
Registered: October 22, 2015
Reputation: Highest Rating
Australia Posts: 344
Posted:
PM this userDirect link to this postReply with quote
My mantra is the solution has got to be simple, when I ask for a banana, I don't want a gorilla holding that banana.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting ObiKen:
Quote:
My mantra is the solution has got to be simple, when I ask for a banana, I don't want a gorilla holding that banana.

I take it that you didn’t realize you were suggesting a gorilla? 
I think I may build this for myself, to see if it climbs the Empire State Building with the banana.
"T’was fruit killed the beast" 
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Unlimited RegistrantStar ContributorObiKen
Registered: October 22, 2015
Reputation: Highest Rating
Australia Posts: 344
Posted:
PM this userDirect link to this postReply with quote
Yep, I only saw the leaves of the forest when I made the suggestion 

I thought you may try a proof of concept for personal use, in which case, don't overdose on the potassium 
 Last edited: by ObiKen
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionEdit postDirect link to this postReply with quote
I tested loading the largest file into an Sqlite table. I found that I could load about 200 line per second. That would mean that it would take over 5 days (!) to complete.

I'm a fairly patient guy, but I would go bananas waiting that long, so I aborted the test. 

It's nice of IMDb to dump their database tables into downloadable files (for non-commercial use), but I can't help but wonder who actually finds a use for them.

So, this whole project was pretty much a waste of time. Oh, well, I did get a little refresher on how to use Sqlite, so at least that's something. I wonder if I can think of something else where I can use it ... 
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Unlimited RegistrantStar ContributorObiKen
Registered: October 22, 2015
Reputation: Highest Rating
Australia Posts: 344
Posted:
PM this userEdit postDirect link to this postReply with quote
Sounds like SQLite is performing disk writes for each row and slowing things down. Did you check the default settings in SQLite to prioritize speed over data safety when bulk loading records. Commit/rollback processes and logging for each record may need to be turned off.

Maybe the AI engine can help you speed up the import process (yes, its another suggestion, hopefully not Godzilla size). Just be warned, Large Language Model AI engines of the generative type (like the ones you tried) are prone to hallucinate with their answers!
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionEdit postDirect link to this postReply with quote
I didn't think that it could be sped up enough to make it palatable, but how wrong I was! 
I changed the program to use transactions of 10,000 records each. Now it took 36 minutes instead of (the estimated) 5 days! Together with the other two files I needed, building the database took about an hour (not counting the coding time).

Now I just have to build a program that queries that database. But that's a job for another day.

Thanks for pushing me, ObiKen! 

PS The database became almost 7 GB.
My freeware tools for DVD Profiler users.
Gunnar
 Last edited: by GSyren
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,926
Posted:
PM this userVisit this user's homepageView this user's DVD collectionEdit postDirect link to this postReply with quote
This was an interesting challenge. Unfortunately it turned out that the data does not contain all cast and crew. Not even close.
So what the point of providing these files? Most disappointing. 

Oh, well. I got to play a little with SQLite. Maybe this experience will be useful some time.
My freeware tools for DVD Profiler users.
Gunnar
    Invelos Forums->General: General Discussion Page: 1  Previous   Next