unknown tokenizer

Post a reply

Smilies
:D :) :( :o :-? 8) :lol: :x :P :oops: :cry: :evil: :roll: :wink:

BBCode is ON
[img] is ON
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: unknown tokenizer

Re: unknown tokenizer

by Barry4679 » Fri Jul 15, 2022 4:35 am

Doxtur wrote: Wed Jul 13, 2022 4:35 am I don't know how the MediaMonkey tokenizer works, but this wrapper allows you to make changes to text fields in the Songs table without removing triggers or other tricky manipulations.
Thanks Doxtur, that sounds super useful. :D
I will give it a try shortly.

Re: unknown tokenizer

by Doxtur » Wed Jul 13, 2022 4:35 am

I don't know how the MediaMonkey tokenizer works, but this wrapper allows you to make changes to text fields in the Songs table without removing triggers or other tricky manipulations.

You can do this on python 3:

1) Install the following modules:
pip install sqlitefts
pip install cffi

2) Use script:

Code: Select all

import sqlite3
import re
import sqlitefts as fts

class MMTokenizer(fts.Tokenizer):
    _p = re.compile(r'\w+', re.UNICODE)

    def tokenize(self, text):
        for m in self._p.finditer(text):
            s, e = m.span()
            t = text[s:e]
            l = len(t.encode('utf-8'))
            p = len(text[:s].encode('utf-8'))
            yield t, p, p + l


def cmp(a, b):
    return (a > b) - (a < b)
    
    
def i_unicode_collate(s1, s2):
    return cmp(s1.lower(), s2.lower())
    
    
def sqlite_getextension(a):
    return ''
    
conn = sqlite3.connect("MM.DB", check_same_thread=False)
conn.create_collation('IUNICODE', i_unicode_collate)
tk = fts.make_tokenizer_module(MMTokenizer())
fts.register_tokenizer(conn, 'mm', tk)
conn.create_function('getextension', -1, sqlite_getextension)
cursor = conn.cursor()

cursor.execute("INSERT INTO Songs (SongTitle, Artist) VALUES (?,?)", ('foo', 'bar'))
conn.commit()

64bit version in the future?

by Barry4679 » Fri Aug 06, 2021 9:09 am

TIV73 wrote: Fri Aug 06, 2021 4:39 am In the end, it probably comes down to the question "Is it worth it?".
I do think that we need a 64bit version of the SQLite3MMExt.dll, extension, which Ventis supplies to allow 3rd party Developers to interact with the MM database.

You made the point that it may be a large job for Ventis to move from 32bit up to 64bit.
OK, but I think that it is near impossible, and highly undesirable, for a Developer to downgrade their 64bit app down to 32 bits.

Is there going to be a 64bit version of SQLite3MMExt.dll, or even better SQLite3MM.dll?
I have asked this question several times, with no answer.

Re: unknown tokenizer

by Barry4679 » Sun May 09, 2021 3:33 am

drakinite wrote: Wed May 05, 2021 11:22 am Barry, I just realized that if you use the COM model when MM is not running, it (should) open MM. This can solve your issue of MM not being running.
Thanks Drakinite, I was not aware of that.

Ludek wrote: Wed May 05, 2021 9:03 am Deleting SQL insert triggers is not a good practice at all

It works, but it is not my first preference obviously.

My first preference would be that you ship a 64 bit version of your sql-related dlls. That way we could do as Petr intended.
Are you planning this, or is it off the table?

My second preference would be as I requested during the alpha of MM5 ... ie. deliver some songs.customx fields that are not covered by FTS. ... or even better, add a checkbox for each of the customx columns, into Options|Library|Fieldsm to condition whether or not the column is to be covered by FTS.

This way MM5 inter-operability will be enhanced by removing the tokenizer issue for selected customx columns. These columns have been provided for customer-specific purposes, and at the moment they are pain to update from external applications.

In my case I am using two customx columns. Every row will contain a value in both columns, and in one column it will be mostly distinct values. The data is numeric, so I will never want to search these values in FTS. Won't I reduce FTS overhead by their absence?
Ludek wrote: Wed May 05, 2021 9:03 am actually can cause unpredictable issues that could lead for a need to use Manage Database > Rebuild database, the MM tokenizer is needed to insert correct values to our SongsText table (sqlite's FTS engine), it is needed for indexing the text values without the special characters for the full-text-searching to work properly.

If I don't want a customx column covered by FTS search, then deleting the trigger is a no-harm activity isn't it. ... I think that I am better off by not having alll that dead weight is the SongsText file ... aren't I?

If will re look at the COM option after Drakinite's comment. ... Need to check overhead and responsiveness.

I need to to join the MM database to another sql database, and then run a query which updates the MM database.
I have checked, using your SQL Editor addon sandbox, that I can ATTACH and DROP my database, and that is OK. But are you sure that this is going to be a better solution than just dropping FTS support for a couple of unwanted customx columns?

Re: unknown tokenizer

by drakinite » Sat May 08, 2021 1:23 pm

drakinite wrote: Wed May 05, 2021 11:22 am
I believe it would be more beneficial to use the Chromium Devtools API to actually communicate with MM and make it create its database calls, since that will work on all platforms in the future; but if you create a SongsDB5.SDBApplication object, it (should) open MM5 automatically. The reason I'm typing "should" is because I just tested it and it's a regression; tracked as https://www.ventismedia.com/mantis/view.php?id=17823.

So after it's fixed, you can do the following to ensure MM5 is open:

Code: Select all

import win32com.client as win32
sdb = win32.Dispatch('SongsDB5.SDBApplication')
Never mind about the not working bit. I just had an issue on my own non-portable install; instantiating SongsDB5.SDBApplication should work with opening MM5. :slight_smile:

Re: unknown tokenizer

by drakinite » Wed May 05, 2021 11:22 am

Ludek wrote: Wed May 05, 2021 9:03 am You can use the "SQL Editor" addon https://www.mediamonkey.com/addons/brow ... ql-editor/ for executing SQL write operations
I believe Barry wanted his Python program to execute SQL operations automatically.
Barry4679 wrote: Wed Apr 28, 2021 10:40 pm I don't want pass the updates to MM because I have no guarantee that it is running at the time, and in many cases it won't be.
Barry, I just realized that if you use the COM model when MM is not running, it (should) open MM. This can solve your issue of MM not being running.
I believe it would be more beneficial to use the Chromium Devtools API to actually communicate with MM and make it create its database calls, since that will work on all platforms in the future; but if you create a SongsDB5.SDBApplication object, it (should) open MM5 automatically. The reason I'm typing "should" is because I just tested it and it's a regression; tracked as https://www.ventismedia.com/mantis/view.php?id=17823.

So after it's fixed, you can do the following to ensure MM5 is open:

Code: Select all

import win32com.client as win32
sdb = win32.Dispatch('SongsDB5.SDBApplication')

Re: unknown tokenizer

by Ludek » Wed May 05, 2021 9:03 am

Something which also fixed the tokenizer issue would be nice, but I have a workaround which is delete the SQL insert triggers for the Customx columns that I am updating.
Deleting SQL insert triggers is not a good practice at all -- actually can cause unpredictable issues that could lead for a need to use Manage Database > Rebuild database, the MM tokenizer is needed to insert correct values to our SongsText table (sqlite's FTS engine), it is needed for indexing the text values without the special characters for the full-text-searching to work properly.

You can use the "SQL Editor" addon https://www.mediamonkey.com/addons/brow ... ql-editor/ for executing SQL write operations

Re: unknown tokenizer

by ZvezdanD » Sun May 02, 2021 5:38 pm

PetrCBR wrote: Sun May 02, 2021 3:49 am Check what SQLite version they originally using as our SQLite3MM.dll is standard sqlite library only with our tokenizer (we're running on 3.23.1).
It would be nice if you leave the text that you are quoting because it is not obvious to whom you are replaying.

If you are replaying to me, as I said, I tried several your versions from MM4 and from MM5. For example, I tried SQLite Expert Personal 3.4.71, which has included SQlite 3.7.17, with your SQLite 3.7.6.3 from MM4. I think these two versions are close enough to not cause any trouble. Besides, I tried the same version of SQLite Expert with many original DLL files downloaded from SQLite site and it worked fine with non-MM databases.

I don't think that your SQLite3MM.dll is standard library since it is not recognized by two applications that I tried. If you state at least one SQLite application beside of MM that could work with your DLL, I will agree with you.

Re: unknown tokenizer

by Barry4679 » Sun May 02, 2021 4:26 am

ZvezdanD wrote: Sun May 02, 2021 3:40 am What works for you? Did you try Petr suggestion, which is not working as I explained before:
"I copied SQLite3MM.dll from MM folder to the SQLite Expert folder and renamed it to sqlite3.dll".

Notice that I am talking about SQLite3MM.dll here, not SQLite3MMExt.dll.
OK, misunderstanding, sorry.
ZvezdanD wrote: Sun May 02, 2021 2:38 am I just tried the same with the mentioned DB Browser for SQLite 3.12.1 32-bit and I got: Entry Point Not Found - The procedure entry point sqlite3_create_function_v2 could not be located in the dynamic link library sqlite3.dll.
I thought "the mentioned DB Browser for SQLite 3.12.1 32-bit" was referring to my discussion with Petr about an extension library for SQLITE browsers.

What "worked " for me was this:
  • take menu optionTools|LoadExtension. and load Petr's SQLite3MMExt.dll from 2014
  • run a query ==> no IUNICODE error
Something which also fixed the tokenizer issue would be nice, but I have a workaround which is delete the SQL insert triggers for the Customx columns that I am updating.

I didn't need to follow his other advice to get a convenient browser for MM database ... and I haven't tried using SQLite3MM.dll in my application, because I have a 64bit app atm ... I would like to try it if he can make a 64 bit version.
PetrCBR wrote: Sun May 02, 2021 3:49 am Check what SQLite version they originally using as our SQLite3MM.dll is standard sqlite library only with our tokenizer (we're running on 3.23.1).
The DB Browser is built for Sqlite version 3.33.0 (August 2020) ... 3.23.1 is quite old (April 2018).

Re: unknown tokenizer

by PetrCBR » Sun May 02, 2021 3:49 am

Check what SQLite version they originally using as our SQLite3MM.dll is standard sqlite library only with our tokenizer (we're running on 3.23.1).

Re: unknown tokenizer

by ZvezdanD » Sun May 02, 2021 3:40 am

Barry4679 wrote: Sun May 02, 2021 3:20 am It works fine for me
What works for you? Did you try Petr suggestion, which is not working as I explained before:
"I copied SQLite3MM.dll from MM folder to the SQLite Expert folder and renamed it to sqlite3.dll".

Notice that I am talking about SQLite3MM.dll here, not SQLite3MMExt.dll.

Re: unknown tokenizer

by Barry4679 » Sun May 02, 2021 3:20 am

ZvezdanD wrote: Sun May 02, 2021 2:38 am In another words, could you move your tokenizer and all other MM related stuff to the same SQLite3MMExt.dll file, which could be loaded as extension in MM and other SQLite applications that support loading of extensions?

You know, in my opinion this whole mess with unknown tokenizer and unrecognized collations and poorly supported Unicode is just plainly ridiculous from the SQLite authors. I doubt that any else database format has such elementary problems.
I agree with these two points, although I doubt that the problem is with Sqlite itself. I expect that the MM implementation is the problem.

I also think that Ventis would do everybody, and themselves, a big favour if they authored a KB entry to cover usage of the MM database by external add-on applications. ... The current situation is a massive time sink for anyone trying to get started. ... This is evidenced by the many unanswered queries over the years. And I believe that the problem is larger than that, because any Community member that was considering a project would see all those posts, and likely silently cancel their project before even starting it.
ZvezdanD wrote: Sun May 02, 2021 2:38 am I just tried the same with the mentioned DB Browser for SQLite 3.12.1 32-bit and I got: Entry Point Not Found - The procedure entry point sqlite3_create_function_v2 could not be located in the dynamic link library sqlite3.dll.
It works fine for me .... but I have only tried a couple of basic queries. What was your query? ... I could try it here.

Re: unknown tokenizer

by ZvezdanD » Sun May 02, 2021 2:38 am

PetrCBR wrote: Sat May 01, 2021 3:53 pm MM tokenizer is defined directly in SQLite3MM.dll we're using (you can try to use this library instead of 'original' sqlite3.dll library used by your SQLite app).
As I already wrote in Mantis 8 years ago:
"When I copied SQLite3MM.dll from MM folder to the SQLite Expert folder and renamed it to sqlite3.dll, when I started SQLite Expert I got the message:
Could not load SQLite library: sqlite3.dll
and it automatically switched to the internal engine."

I just tried the same with the mentioned DB Browser for SQLite 3.12.1 32-bit and I got: Entry Point Not Found - The procedure entry point sqlite3_create_function_v2 could not be located in the dynamic link library sqlite3.dll.

I tried that with SQLite3MM.dll from several MM versions including MM4 and MM5.

And here is one question that was left in another Mantis thread unanswered:
"Is it possible to implement your collations and additional functions and Unicode support, and whatever else you added to SQLite engine, as one or more external .dll extensions and keep the basic SQLite file intact? In that case, we could download it directly from the SQLite site, whatever version we need, even the latest one, and put it in the MediaMonkey folder."

In another words, could you move your tokenizer and all other MM related stuff to the same SQLite3MMExt.dll file, which could be loaded as extension in MM and other SQLite applications that support loading of extensions?

You know, in my opinion this whole mess with unknown tokenizer and unrecognized collations and poorly supported Unicode is just plainly ridiculous from the SQLite authors. I doubt that any else database format has such elementary problems.

Re: unknown tokenizer

by Barry4679 » Sat May 01, 2021 8:51 pm

PetrCBR wrote: Sat May 01, 2021 3:53 pm IUNICODE collation and some other methods are defined in SQLite3MMExt.dll, but MM tokenizer not .... so for this reason i wrote it didn't fix your issue (you wrote you do not know where you can get this extension library so for this reason i wrote it's in mantis entry you've mentioned).
MM tokenizer is defined directly in SQLite3MM.dll we're using (you can try to use this library instead of 'original' sqlite3.dll library used by your SQLite app).
Thanks, I can understand that explanation..

I had already tried your SQLite3MMExt.dll, extension. ... It would not load "Errror loading extension. The specified module could not be found." . this with this open source SQLITE Browser

Today I have found that is because yours is a 32 bit extension. Can you make a 64 bit version?

Your extension works with the 32 bit version of the browser mentioned above. ... It is good to finally have a good browser for for MM db, without having to screw around with IUNICODE workarounds ... thanks.

As you say it does not get around the tokinizer problem for updates to text columns covered by FTS. Do you have a 64 bit version of SQLite3MM.dll? My app is 64 bit, and it is much harder for me to downgrade that to 64 bit. ... I do have a current workaround by deleting your SQL trigger for the Customx columns that I want to work with. ... Although this workaround is of limited use, because other people many want to update other columns, where trigger removal is not an option.

Re: unknown tokenizer

by PetrCBR » Sat May 01, 2021 3:53 pm

IUNICODE collation and some other methods are defined in SQLite3MMExt.dll, but MM tokenizer not .... so for this reason i wrote it didn't fix your issue (you wrote you do not know where you can get this extension library so for this reason i wrote it's in mantis entry you've mentioned).
MM tokenizer is defined directly in SQLite3MM.dll we're using (you can try to use this library instead of 'original' sqlite3.dll library used by your SQLite app).

Top