| Author |
Message |
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | Inspired by a post in the Cast/Crew Edit 2 thread, I started to look for AI APIs that would allow me too look up IMDb entries. ChatGPT said it could, but the API wouldn't be free. Well, for a freeware program that is a no-no, in my opinion. So I turned to Microsoft Copilot. It said that it couldn't provide IMDb data via the API. I didn't bother to check if the API was free. Finally I asked Gemini if it could do it. It said that it could, and that there was a free API (with some limitations). That sounded promising, so I did some initial testing, and it worked. I instructed Gemini that I wanted all cast and crew, and I wanted the credits exactly as in IMDb. On closer testing I found that Gemini failed in both aspects. Running the same movie multiple times would produce different number of credits, and slight variation of roles and crew jobs - not as credited. I tried changing the prompt to impress the criteria on Gemini, but that didn't help. That isn't good enough. So, at least for now, this idea has been shot down. Disappointing, because I was really hoping to be able to mine IMDb data without having to resort to screen scraping. So I guess it's still either Cast/Crew Edit 2, or being content with TMDB data using TmdbInfo.  | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: February 19, 2012 | Reputation:  | Posts: 118 |
| | Posted: | | | | I don't think I've ever seen a single IMDb set of credits that doesn't contain at least one mistake.
It was once a valuable resource (and I'm talking probably twenty years ago). It's definitely not any more. |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | That may be true, but that’s beside the point. The data should be checked against the credits anyway. You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.
Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.
I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort. | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: March 18, 2007 | Reputation:  | Posts: 6,541 |
| | Posted: | | | | Quoting GSyren: Quote: That may be true, but that’s beside the point. The data should be checked against the credits anyway. You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.
Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.
I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort. My experience with using Claude for writing code was very positive. It knew the eCommerce package I was using and exactly how to write and install supported plugins. Perhaps another approach for you would be to have the AI actually write the code rather than present the result. Then you could request a specific coding approach and examine the code for ways to create a possibly improved version that it could create based on your guidance, or that you could augment manually. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | | Last edited: by mediadogg |
|
| Registered: October 22, 2015 | Reputation:  | Posts: 344 |
| | Posted: | | | | Quoting GSyren: Quote: That may be true, but that’s beside the point. The data should be checked against the credits anyway. You use it as a starting point. However, if the data is not complete, you’re no better off than if you use data from TMDb.
Also, transforming crew jobs to Profiler format is hopeless if the source isn’t consistent.
I may take another look at this in the future. But it feels like too little, too late. I’m not sure that there are enough users who care, in order for this to be worth the effort. What about going direct to the IMDb Non-Commercial Datasets here. |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | Quoting ObiKen: Quote: What about going direct to the IMDb Non-Commercial Datasets here. Yeah, I looked at that. 7 files totaling over 9 GB unzipped. 6.6 GB if you skip AKAS and Ratings. Still you would need to load those files into some database in order to be able to extract any meaningful information out of them. Trying to use those flat files directly would mean that any lookup would take forever. And if you wanted to keep to it up to date, you'd have to go through the whole download/unzip/load again. And again ... So not really a useful option, I'm afraid. | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | Just for some perspective, I ran a line count on all the IMDb files:
title.principals.tsv - 99 738 115 lines title.akas.tsv - 57 439 274 lines name.basics.tsv - 15 378 039 lines title.basics.tsv - 12 539 092 lines title.crew.tsv - 12 537 727 lines title.episode.tsv - 9 685 416 lines title.ratings.tsv - 1 676 413 lines
I guess it would be theoretically possible to write a program that loads (some of) these files into a database, and get full cast and crew listings from it. title.principals.tsv and title.basics.tsv would probably suffice for movies, add title.episode.tsv for TV shows. But would anyone be interested? | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: October 22, 2015 | Reputation:  | Posts: 344 |
| | Posted: | | | | My mantra is the solution has got to be simple, when I ask for a banana, I don't want a gorilla holding that banana. |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | Quoting ObiKen: Quote: My mantra is the solution has got to be simple, when I ask for a banana, I don't want a gorilla holding that banana. I take it that you didn’t realize you were suggesting a gorilla? I think I may build this for myself, to see if it climbs the Empire State Building with the banana. "T’was fruit killed the beast"  | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: October 22, 2015 | Reputation:  | Posts: 344 |
| | Posted: | | | | Yep, I only saw the leaves of the forest when I made the suggestion I thought you may try a proof of concept for personal use, in which case, don't overdose on the potassium  | | | | Last edited: by ObiKen |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | I tested loading the largest file into an Sqlite table. I found that I could load about 200 line per second. That would mean that it would take over 5 days (!) to complete. I'm a fairly patient guy, but I would go bananas  waiting that long, so I aborted the test. It's nice of IMDb to dump their database tables into downloadable files (for non-commercial use), but I can't help but wonder who actually finds a use for them. So, this whole project was pretty much a waste of time. Oh, well, I did get a little refresher on how to use Sqlite, so at least that's something. I wonder if I can think of something else where I can use it ...  | | | My freeware tools for DVD Profiler users. Gunnar |
|
| Registered: October 22, 2015 | Reputation:  | Posts: 344 |
| | Posted: | | | | Sounds like SQLite is performing disk writes for each row and slowing things down. Did you check the default settings in SQLite to prioritize speed over data safety when bulk loading records. Commit/rollback processes and logging for each record may need to be turned off.
Maybe the AI engine can help you speed up the import process (yes, its another suggestion, hopefully not Godzilla size). Just be warned, Large Language Model AI engines of the generative type (like the ones you tried) are prone to hallucinate with their answers! |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | I didn't think that it could be sped up enough to make it palatable, but how wrong I was! I changed the program to use transactions of 10,000 records each. Now it took 36 minutes instead of (the estimated) 5 days! Together with the other two files I needed, building the database took about an hour (not counting the coding time). Now I just have to build a program that queries that database. But that's a job for another day. Thanks for pushing me, ObiKen!  PS The database became almost 7 GB. | | | My freeware tools for DVD Profiler users. Gunnar | | | | Last edited: by GSyren |
|
| Registered: March 14, 2007 | Reputation:  | Posts: 4,926 |
| | Posted: | | | | This was an interesting challenge. Unfortunately it turned out that the data does not contain all cast and crew. Not even close. So what the point of providing these files? Most disappointing. Oh, well. I got to play a little with SQLite. Maybe this experience will be useful some time. | | | My freeware tools for DVD Profiler users. Gunnar |
|