Duplicate Content

Download and get help for different MediaMonkey for Windows 4 Addons.

Moderators: Peke, Gurus

-me.

Duplicate Content

Post by -me. »

Hi there,

I'm trying to organize and cleanup my music library and I have small problem. That is ~5000 "duplicate content" files. I figured I better learn how to use scripts or I will have to spend couple month just going through them.

So here is a question. I opened mediamonkey.mdb database but I cannot figure our how does media monkey knows that two files have same content? Is there some kind of a hash key in Songs table, or you use some other table to store that info, or it is just a magic:) ?

thanks a bunch! MM rocks :)
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Thats a lot of duplicates!
Have you checked and deleted your dead links? To me it seems that you have a lot of tracks there.

Are you good with Access?
Guest

Post by Guest »

Bex wrote:Thats a lot of duplicates!
Have you checked and deleted your dead links? To me it seems that you have a lot of tracks there.

Are you good with Access?
Well I had two music libraries which I for historical reasons kept separate for a while, then I decided that it is time to merge them back. Since I didn't want to lose any of my files or ID3 tags I just imported them both. Now I just need to clean up a bit (actually alot).

Access? Yep. Should be fine. I do software for a living.

Not sure what script is going to do but first I need to find out how to programatically find "duplicate content" files.
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Ok, I understand!

You'll find your duplicate content by nested queries, something like this;

Code: Select all

SELECT Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4, Count(Songs.ID) AS CountOfID
FROM Songs
GROUP BY Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4
HAVING (((Count(Songs.ID))>1));
Save as query1

Code: Select all

SELECT Songs.SongTitle, Songs.SongPath, Query1.CountOfID
FROM Songs INNER JOIN Query1 ON (Songs.SignPart4 = Query1.SignPart4) AND (Songs.SignPart3 = Query1.SignPart3) AND (Songs.SignPart2 = Query1.SignPart2) AND (Songs.SignPart1 = Query1.SignPart1);
You get the picture.

Good luck!
-me.

Post by -me. »

Thanks! You rock! :)
onglide
Posts: 4
Joined: Thu Jan 05, 2006 8:28 pm

Duplicate Content

Post by onglide »

Hi guys. I have a similar problem.

I have ~3000 true duplicate content songs. Most of them have one copy in under a folder on my D drive and another under a folder on my C drive. (Of course the folders in both drives have lots of other songs under them).

For all such files I'd like to delete the copy of the file that's on the C drive.

I know SQL like the back of my hand, and I understand the SQL above, however I'm a rank newbie at media monkeying. I don't know how to turn SQL into script, where to save the script files or how to run them.

I opened up the mediamonkey.mdb in access to check it out but it looks like that's not the real database. Am I right? Is it's purpose just to be able to write and validate SQL against some dummy data?

I think I could do this with a shell script fairly easily (instead of through media monkey) if I could generate a text file of full-path-filenames for all songs mediamonkey's flagged as "duplicate content". (I'd sort skipping the first character and then use regular expression search and replace to change the C:\xxx, newline, D:\xxx into "rm C:\xxx; then I could delete all the lines that don't start with "rm ")

Any ideas how to do either? [I.e. generate a text file of full-path-filenames of all "duplicate content" (1st choice) OR run some sql directly (2nd choice) OR learn how do do real MM scripting].

Thank you,

Eric
Peke
Posts: 18562
Joined: Tue Jun 10, 2003 7:21 pm
Location: Earth
Contact:

Post by Peke »

I'm little bi confused here you say that
1. in Library->Files to edit->Duplicate titles you have ~3000 Tracks?
2. You for exampl ewant to delete all duplicates from drive C:?

If Answer to both of the is Yes the you can D&D Column PATH to be first then sort listing on that Column and Delete for example all tracks From Drive C either from LIb only or from Lib And Computer.

I'm sorry if my english is little bit bad, but it is 03:00am and I just dropped by to check if there is something interesting.
Best regards,
Peke
MediaMonkey Team lead QA/Tech Support guru
Admin of Free MediaMonkey addon Site HappyMonkeying
Image
Image
Image
How to attach PICTURE/SCREENSHOTS to forum posts
onglide
Posts: 4
Joined: Thu Jan 05, 2006 8:28 pm

Duplicate content

Post by onglide »

Thanks Pete. Your English is great.

Right, I see that you can sort and do multiple select and operate from the "Duplicate Titles" node.

I want to use MediaMonkey's cool file-contents matching trick which is available only under the "Duplicate Content" node. Under that node you can't select more than one song at the same time (let alone sort the whole node). If I could my job would be easier.

If I work in Duplicate Titles I know I'll delete more than I want to, and probably also fail to delete some songs with matching music conent but different tags.

Thanks for the advice though,

Eric
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Hi onglide and welcome!

Pekes suggestion is pretty smart but beware! If you delete all the tracks found on C: under the "duplicate content" node you will delete all occurencies of a file if both the original and the copy only exist on C:!!

You could however install "Magic Nodes" and write a very advanced sql filter that only finds the tracks you want to delete.
Magic nodes:
http://www.mediamonkey.com/forum/viewtopic.php?t=3358

Since that node would very slow, also read this for some workarounds regarding speed:
http://www.mediamonkey.com/forum/viewto ... c&start=15

Just ask if you want any help!

/Bex
onglide
Posts: 4
Joined: Thu Jan 05, 2006 8:28 pm

Duplicate Content

Post by onglide »

Thanks Bex. I've been playing with Magic Nodes. Very cool.

I had everything working in Access but I get errors moving into Magic Nodes.

Here's what I have:

First, a query saved in the database as DupContent (Same as you suggested to the "-me" poster above):

Code: Select all

SELECT Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4, Count(Songs.ID) AS CountOfID
FROM Songs
GROUP BY Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4
HAVING (((Count(Songs.ID))>1));
Then a query of Songs matching a certain path, joined to DupContent on the SignPart fields, with a correlated "EXISTS" subquery that makes sure at least one dup of the song exists outside of the matched path:

Code: Select all

SELECT ID AS IDToDelete, SongTitle, SongPath
FROM Songs AS S INNER JOIN DupContent ON (S.SignPart1 = DupContent.SignPart1) AND (S.SignPart2 = DupContent.SignPart2) AND (S.SignPart3 = DupContent.SignPart3) AND (S.SignPart4 = DupContent.SignPart4)
WHERE IDMedia = 49
AND SongPath LIKE ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*"
AND EXISTS (
select 'x' 
FROM Songs S2 
WHERE (S2.SignPart1 = S.SignPart1) AND (S2.SignPart2 = S.SignPart2) AND (S2.SignPart3 = S.SignPart3) AND (S2.SignPart4 = S.SignPart4)
AND  (IDMedia<> 49 OR SongPath NOT LIKE ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*")
AND S.ID <> S2.ID);
The above works great inside access and appears to return the right rows.

Then I created the following a magic node:

Code: Select all

DupsToDelete|SQL filter: Songs.ID IN (SELECT IDToDelete FROM DupsToDelete)\<Artist>
But I get this "Too Few Parameters" Error:

Code: Select all

There was a problem querying the database: 07002: [Microsoft][ODBC Microsoft Access Driver] Too Few Parameters. Expected 1.
Looking back through the forum postings got my suspicions pointing at the “LIKE” conditions (either disallowed characters, or perhaps ones that could be escaped, or use of the wrong wildcards [* and ? or % and _]), or changing quote characters (using # instead of ‘).

So, in the pursuit of trying to understand how SQL Filters and LIKEs are supposed to work in Magic Nodes I tried creating the following super simple Magic Node:

Code: Select all

Path2 |SQL filter: Songs.SongPath LIKE 'zzzzzy'
Of course I have no songs with paths containing zzzzzy so I’d expect the node to show no songs. Instead it shows all songs. Same for

Code: Select all

SongPath LIKE #zzzzzy#
So obviously I’m missing something.

Thank you,

onglide (Eric)
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

It seems that a manually written query doesnt work as "real" access-query?! Well, well you never stop learning new stuff.
The workaround is to make a table of the query instead.
Open up the query in design view and choose Query->Make table query, choose a name for your table and simply run the query. Dont forget to change the magic node so it points to the table.

What the heck, here are the codes:

Code: Select all

SELECT S.ID INTO X
FROM Songs AS S INNER JOIN DupContent ON (S.SignPart4 = DupContent.SignPart4) AND (S.SignPart3 = DupContent.SignPart3) AND (S.SignPart2 = DupContent.SignPart2) AND (S.SignPart1 = DupContent.SignPart1)
WHERE (((S.SongPath) Like ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*") AND ((S.IDMedia)=49) AND ((Exists (select 'x'
FROM Songs S2
WHERE (S2.SignPart1 = S.SignPart1) AND (S2.SignPart2 = S.SignPart2) AND (S2.SignPart3 = S.SignPart3) AND (S2.SignPart4 = S.SignPart4)
AND  (IDMedia<> 49 OR SongPath NOT LIKE ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*")
AND S.ID <> S2.ID))<>False));
This query creates a table X. When created open up X in desgn view and add a key to the ID field. (Curser on the ID field and simply press the yellow key button.)

Then use this Magic Node:

Code: Select all

DupsToDelete|SQL filter: Songs.ID IN (SELECT IDToDelete FROM X)\<Artist>
Enjoy
/Bex
Guest

Post by Guest »

:o wow, write code, learn code. If you have the talent/education, or time to compensate for lack of either, what I just read sounds do-able. However, if your relationship to your computer is the same as a person to their 1st car (used) , the duplicates could be spawning triplicates by the time you learn enough code to ferret them out. On the other hand, within a matter of minutes, you could download the free version of DoubleKiller:

http://www.bigbangenterprises.de/en/doublekiller/,

and have your results within 10 minutes (moving at a leisurely mosey). Results arrive within a minute or 2, optional selection criteria are by checksum, date, size, etc.

I have a really small hard drive (is this why men keep saying size does matter? :wink: ), and before I begin moving content to cd semi-oblivion, I search for doubles, especially as music is in 2 separate locations. The results have pretty accurate, and quick.
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Well guest, what we are discussing here is not how you find your duplicates. That's very easy, just go to your "Files to edit->Duplicate Content" node and there they are!
We're discussing a smart way to find the ones you want to delete since you could have like thousands of them.

/Bex
onglide
Posts: 4
Joined: Thu Jan 05, 2006 8:28 pm

Duplicate Content

Post by onglide »

Thanks Bex!

That did the trick. My hard drive is now 2846 songs cleaner!

I wonder if Mr Magic Nodes would like to know that there's something funny with either string pattern matching sql or perhaps just long sql queries.

Anyway, I think we have a great technique of cleaning up large amounts of "true" dup content in an automated way.

Thanks again.

Onglide / Eric
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Glad it worked!
As for problems with long sql queries in the sql filter of Magic Nodes i think MS's ODBC driver is to blame. Actually I think the very same driver is to blame that your query didnt work either.

Mr Magic Node, the great Pablo, is working on a highly anticipated update on the Magic Node script. We'll see what it contains when it comes.

Thanks
/Bex
Post Reply