Duplicate Content

Post a reply

Smilies
:D :) :( :o :-? 8) :lol: :x :P :oops: :cry: :evil: :roll: :wink:

BBCode is ON
[img] is ON
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: Duplicate Content

by Bex » Tue Apr 25, 2006 12:33 pm

I'm not a scripter so I really dont know. But I think it's not.

One thing that should work though is to create a script that deletes the data and appends new data to an indexed table which you then use in a magic node. Psyxonova was in to that but never finnished it.

Perhaps some other good scripter can create such script.

Anyone?

/Bex

by Teknojnky » Tue Apr 25, 2006 11:36 am

Yea that might work.

Another question (I know I'm full of them!)..

Is it possible for magicnodes to call a macro to update the query/table automatically?

by Bex » Tue Apr 25, 2006 11:18 am

Perhaps just click on the "Duplicat Content" node so you get all dups in the window and then sort on lenght?

by Teknojnky » Tue Apr 25, 2006 10:08 am

Yay! Got it to work, Thanks Bex!

That works very fast!

Now, the only issue I see is when tracks are not named properly and show up as different artists... then you have single tracks under the artist node but not know which tracks are duplicates of each other.

Not sure how to view that unless there can be a 'signature' node.

by Bex » Tue Apr 25, 2006 4:47 am

Oh sorry I did a miss. Now it should work.

/Bex

by Teknojnky » Mon Apr 24, 2006 9:31 pm

Nope, I had deleted the table and 3 queries and re-did everything step by step as your post says.

Now I get 'could not find output table 'dup content'.

by Bex » Mon Apr 24, 2006 7:49 pm

I've updated the instructions so it should work now.
- Replace the the make table query with the updated code and run the query again. Say yes to delete the existing table.
- Add the key to the table. It should work now.
- Change query type. Goto Query -> Append Query, hit OK and save.
- Test the Magic Node.

It works now, doesnt it?

/Bex

by Teknojnky » Mon Apr 24, 2006 6:26 pm

Bex wrote:3. Run the make table query (DupContent3). Close it and open up the newly created DupContent table in design view and add a key to the ID field.
I kept getting an error about duplicates when attempting to save the table with the key.
4. Open the DupContent3 query in design view and change the query type to Append query.
Sorry I'm not quite SQL literate, how do I change the type to "Append"?


Now you will have three queries:
- DupContent1, which is nested into DupContent3
- DupContent2, which deletes all data in table DupContent
- DupContent3, which adds new data to table DupContent
And one new table:
- DupContent, which is indexed on field ID
I now have all that + the macro, but I am unable to set the KEY on the DupContent Table.

by Bex » Mon Apr 24, 2006 6:08 pm

Ok here we go!

This will give you a fast Magic Node which give you all your "Duplicate Content" tracks so you easy can access all of them in one go.

1. First create a select query by pasting this code into the sql view of an Access query:

Code: Select all

SELECT Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4
FROM Songs
GROUP BY Songs.SignPart1, Songs.SignPart2, Songs.SignPart3, Songs.SignPart4
HAVING (((Songs.SignPart1)<>0) AND ((Songs.SignPart2)<>0) AND ((Songs.SignPart3)<>0) AND ((Songs.SignPart4)<>0) AND ((Count(Songs.ID))>1));
Save it as DupContent1

2. Create a "make table" query from this code:

Code: Select all

SELECT Songs.ID INTO DupContent
FROM DupContent1 INNER JOIN Songs ON (DupContent1.SignPart4 = Songs.SignPart4) AND (DupContent1.SignPart3 = Songs.SignPart3) AND (DupContent1.SignPart2 = Songs.SignPart2) AND (DupContent1.SignPart1 = Songs.SignPart1)
GROUP BY Songs.ID;
Save it as DupContent3

3. Run the make table query (DupContent3). Close it and open up the newly created DupContent table in design view and add a key to the ID field.

4. Open the DupContent3 query in design view and change the query type to Append query. (Query -> Append Query, hit OK and save.)

5. Create a new delete query from this code:

Code: Select all

DELETE DupContent.*
FROM DupContent;
Save it as DupContent2

Now you will have three queries:
- DupContent1, which is nested into DupContent3
- DupContent2, which deletes all data in table DupContent
- DupContent3, which adds new data to table DupContent
And one new table:
- DupContent, which is indexed on field ID
(The reason for having one delete and one add query is that we want to keep the key in the table DupContent which wouldnt be the case if we only used one make table query instead.)

Create a new Magic Node:

Code: Select all

Duplicate Content|SQL filter: songs.id in (select id from DupContent)\<Artist>
This Magic Node is fast! 8)

The downside is that you MUST run the DupContent2 and DupContent3 queries everytime you add or delete tracks in MM.
This could however be simplified by creating an Access Macro:
- SetWarnings = no
- OpenQuery DupContent2
- OpenQuery DupContent3
Save it as MacroDupContent
The macro is executed in 3 sec!

Enjoy!
/Bex

by Teknojnky » Mon Apr 24, 2006 5:30 pm

Ok, this looks like the OP acquired a library of songs from a friend (or an older backup) which contained a large amount of previously shared files (duplicates).

This isn't exactly what I need at this point, but would have been good in the past (and who knows maybe the future too).

I tried adding the tables/queries as indicated, and of course I changed the paths to my library but I could not get it to work properly, MN gives a bunch of odbc and other errors.

Basically what I want to accomplish is an improved 'duplicate content' node where it functions like a normal node where you select all files and expand out artist/albums etc and perform the normal rename/organize/tag operations.

Instead of the existing 'duplicate content' node which does not function in a worthwhile way, it by itself is not very helpful.

by Bex » Sun Jan 08, 2006 2:31 pm

Glad it worked!
As for problems with long sql queries in the sql filter of Magic Nodes i think MS's ODBC driver is to blame. Actually I think the very same driver is to blame that your query didnt work either.

Mr Magic Node, the great Pablo, is working on a highly anticipated update on the Magic Node script. We'll see what it contains when it comes.

Thanks
/Bex

Duplicate Content

by onglide » Sun Jan 08, 2006 1:43 pm

Thanks Bex!

That did the trick. My hard drive is now 2846 songs cleaner!

I wonder if Mr Magic Nodes would like to know that there's something funny with either string pattern matching sql or perhaps just long sql queries.

Anyway, I think we have a great technique of cleaning up large amounts of "true" dup content in an automated way.

Thanks again.

Onglide / Eric

by Bex » Sun Jan 08, 2006 11:57 am

Well guest, what we are discussing here is not how you find your duplicates. That's very easy, just go to your "Files to edit->Duplicate Content" node and there they are!
We're discussing a smart way to find the ones you want to delete since you could have like thousands of them.

/Bex

by Guest » Sun Jan 08, 2006 12:50 am

:o wow, write code, learn code. If you have the talent/education, or time to compensate for lack of either, what I just read sounds do-able. However, if your relationship to your computer is the same as a person to their 1st car (used) , the duplicates could be spawning triplicates by the time you learn enough code to ferret them out. On the other hand, within a matter of minutes, you could download the free version of DoubleKiller:

http://www.bigbangenterprises.de/en/doublekiller/,

and have your results within 10 minutes (moving at a leisurely mosey). Results arrive within a minute or 2, optional selection criteria are by checksum, date, size, etc.

I have a really small hard drive (is this why men keep saying size does matter? :wink: ), and before I begin moving content to cd semi-oblivion, I search for doubles, especially as music is in 2 separate locations. The results have pretty accurate, and quick.

by Bex » Sat Jan 07, 2006 6:03 pm

It seems that a manually written query doesnt work as "real" access-query?! Well, well you never stop learning new stuff.
The workaround is to make a table of the query instead.
Open up the query in design view and choose Query->Make table query, choose a name for your table and simply run the query. Dont forget to change the magic node so it points to the table.

What the heck, here are the codes:

Code: Select all

SELECT S.ID INTO X
FROM Songs AS S INNER JOIN DupContent ON (S.SignPart4 = DupContent.SignPart4) AND (S.SignPart3 = DupContent.SignPart3) AND (S.SignPart2 = DupContent.SignPart2) AND (S.SignPart1 = DupContent.SignPart1)
WHERE (((S.SongPath) Like ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*") AND ((S.IDMedia)=49) AND ((Exists (select 'x'
FROM Songs S2
WHERE (S2.SignPart1 = S.SignPart1) AND (S2.SignPart2 = S.SignPart2) AND (S2.SignPart3 = S.SignPart3) AND (S2.SignPart4 = S.SignPart4)
AND  (IDMedia<> 49 OR SongPath NOT LIKE ":\Documents and Settings\Eric\My Documents\Robs Friends Music\*")
AND S.ID <> S2.ID))<>False));
This query creates a table X. When created open up X in desgn view and add a key to the ID field. (Curser on the ID field and simply press the yellow key button.)

Then use this Magic Node:

Code: Select all

DupsToDelete|SQL filter: Songs.ID IN (SELECT IDToDelete FROM X)\<Artist>
Enjoy
/Bex

Top