Installing Scotas OLS at 12c/Docker

Scotas OLS is Solr Lucene + Oracle

Docker is the new way for DevOps operation specially during development process.

There is a possibility to run Scotas OLS with Oracle 12c running on Docker following this post On docker, Ubuntu and Oracle RDBMS.

Assuming that there is a Docker – Oracle 12c template ready for use:

# docker images
REPOSITORY TAG
oracle-12102 latest
solr/ols latest
alvinhenrick/hadoop-nn-dn latest
alvinhenrick/hadoop-dn latest
alvinhenrick/hadoop-base latest
alvinhenrick/serf latest
java openjdk-6-jdk
ubuntu latest
ubuntu trusty
oraclelinux 6

once We have the image ready first step is to prepare a shared directory at the host machine to be public in a test container, here the steps:

# mkdir -p /var/lib/docker/db/test
# chown 54321:54321 /var/lib/docker/db/test
# cp /var/lib/docker/dockerfiles/build-oracle-12102/db_install-simple.dbt /var/lib/docker/db/test/db_install.dbt

db_install.dbt must have enabled at least these two features JSERVER and XDB_PROTOCOLS, first feature is obviously required because Scotas OLS is entirely written in Java (Apache Solr) and second property is because Scotas OLS storage could be exported using XMLDB repository (WebDAV/ftp).

First Docker image run operation will create an empty database named test, for example:

# docker run –privileged=true –ipc=host –volume=/var/lib/docker/db/test:/u01/app/oracle/data –name test –hostname test –detach=true –publish=1521:1521 –publish=9099:9099 oracle-12102

It will take a long time according to your hardware, in the mean time a log operation could be checked using:

# docker logs -f test

once the Oracle RDBMS is created a good option is to stop and start again to see if everything is fine:

# docker stop test
# docker rm test

Note that docker rm operation will not remove your created DB, it only removes changes at the docker image created during the operation, you could check that by inspecting the shared directory:

# cd /var/lib/docker/db/test
# ls
DATABASE_IS_SETUP db_install.dbt test

now to install Scotas OLS We have to copy install zip file, license and patched ODCI.jar as is described in the post Upgrading system libraries…, for example:

# cp /tmp/lucene-ols-bin-2.0.1.zip /var/lib/docker/db/test
# cp /tmp/ODCI.jar /var/lib/docker/db/test
# cp /tmp/valid.ols

these files copied at the host machine will available at the Docker image during next run:

# docker run –privileged=true –ipc=host –volume=/var/lib/docker/db/test:/u01/app/oracle/data –name test –hostname test –detach=true –publish=1521:1521 –publish=9099:9099 oracle-12102
# docker exec -ti test bash
[root@test oracle]# id
uid=0(root) gid=0(root) groups=0(root)

by executing last step We attach to a running images interactive using a bash command, according to the  oracle-12102 template image the shell script is connected as root.

First step is to log as oracle owner and check SQLNet connectivity and SYS password, oracle-12102 template left SYS password defined as oracle:

[root@test oracle]# su -l oracle
[oracle@test ~]$ tnsping test
TNS Ping Utility for Linux: Version 12.1.0.2.0 – Production on 01-SEP-2015 13:49:19
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521))) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = test)))
OK (0 msec)
[oracle@test ~]$ sqlplus “sys/oracle@test as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 1 13:49:30 2015
Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

if everything is fine We can start installing Scotas OLS, first patching the ODCI.jar, updating Scotas OLS license and running the installer:

[oracle@test ~]$ cd /tmp
[oracle@test tmp]$ unzip /u01/app/oracle/data/lucene-ols-bin-2.0.1.zip
Archive: /u01/app/oracle/data/lucene-ols-bin-2.0.1.zip
… lots of messages here …
[oracle@test tmp]$ cp /u01/app/oracle/data/ODCI.jar ols/
[oracle@test tmp]$ cp /u01/app/oracle/data/valid.ols ols/bin/com/scotas/license/valid.ols
[oracle@test tmp]$ cd ols
[oracle@test ols]$ loadjava -f -r -v -s -g public -u sys/oracle@test ODCI.jar
…. lots of messages here ….
[oracle@test ols]$ ant -Ddb.str=test -Ddb.tbl=SYSAUX -Ddba.usr=sys -Ddba.pwd=oracle
…. lots of output here ….

last step start Scotas OLS installer using sys user, oracle password and the target database test using SYSAUX tablespace (created by default during installation).

At this point Scotas OLS should be installed you can start using it by shutdown/startup your database or if its not possible enabling manually the OLS Parallel Shared Server process:

[oracle@test ols]$ ant -Ddb.str=test -Ddba.usr=sys -Ddba.pwd=oracle enable-jobs
… some SQLPlus messages here ….
BUILD SUCCESSFUL
Total time: 1 second

Now We are ready to test Scotas OLS running in a Docker container, there are some test prepared to run as Scott but Scott user is now included by default into the RDBMS installation, We can create them doing these SQL commands (logged as SYS):

SQL> create user scott identified by tiger
default tablespace SYSAUX
temporary tablespace TEMP
quota unlimited on SYSAUX;
SQL> grant connect,resource,luceneuser to scott;

SYSAUX tablespace is not a right place to hold scott’s data but is the tablespace available into a default RDBMS installation.

With this user We can run some test using Ant tool, for example Scotas OLS tutorial by editing the file db/sqlunit.properties

[oracle@test ols]$ cat db/sqlunit.properties
# SQLUnit test
sqlunit.driver = oracle.jdbc.driver.OracleDriver
sqlunit.url = jdbc:oracle:oci:@test
sqlunit.user = scott
sqlunit.password = tiger
[oracle@test ols]$ ant test-tutorial

One or more SQLUnit Tests will fail, see sqlunit-test-tutorial.txt for details, this is caused some time by different timezone configuration but most of the test will pass.

Another test is test-parallel-ols, this test simulates a heavy multi-tier application which insert, update, delete, search and count hits on a simple table, the parameters for this should be changed directly into build.xml file, for example:

<target name=”test-parallel-ols” description=”Run Test on Domain Index parallel functionality OLS”>
<junit printsummary=”yes” fork=”no” haltonfailure=”yes” showoutput=”yes”>
<sysproperty key=”db.usr” value=”scott“/>
<sysproperty key=”db.pwd” value=”tiger“/>
<sysproperty key=”db.str” value=”test“/>
<sysproperty key=”idx.upd” value=”0″/>
<sysproperty key=”idx.sch” value=”0,1″/>
<sysproperty key=”db.online-mode” value=”true”/>
<sysproperty key=”db.commitOnSync” value=”false”/>
<sysproperty key=”db.bufferedQueue” value=”true”/>
<sysproperty key=”db.batch-size” value=”200″/>

Sample test output:

[oracle@test ols]$ ant test-parallel-ols
Buildfile: build.xml

test-parallel-ols:
[junit] Running com.scotas.solr.index.TestDBIndexParallel
[junit] Table created: T1
[junit] Index altered: B$SIT1
[junit] OnLine: true CommitOnSync: false
[junit] Avg time on Solr deletes: 2
[junit] Avg time on Solr deletes: 1
[junit] Avg time on Solr updates: 2
[junit] Avg time on Solr updates: 2
[junit] Avg time on Solr updates: 2
[junit] Avg time on Solr deletes: 16
[junit] Avg time on Solr inserts: 0
[junit] Avg time on Solr inserts: 0
[junit] Avg time on Solr inserts: 0
[junit] Avg time on Solr count hits: 26
[junit] Avg time on Solr search: 33
[junit] Avg time on Solr search: 33
[junit] Avg time on Solr count hits: 27
[junit] Avg time on Solr search: 32
[junit] Avg time on Solr count hits: 27
[junit] Index droped: B$SIT1
[junit] Table droped: T1
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 310.189 sec

BUILD SUCCESSFUL
Total time: 5 minutes 10 seconds

And that’s all Scotas OLS works perfect using Oracle 12c on Docker.

Apache Solr tutorial in a relational way

Many questions to be answered when starting with Scotas products is the lack of knowledge on how to implement typical Solr operations in a SQL way or vice-versa.

The idea of this tutorial is to do the same thing explained into the Apache Solr tutorial but using relational concepts using Scotas OLS.

Scotas OLS and Oracle BigData architecture

Following the Oracle Information Architecture: An Architect’s Guide to Big Data.

I modified the figure 3 to show how Scotas OLS product fit into this reference architecture.

as you can see Scotas OLS Search integration is at the decision layer proving:

  • Rich free text searching a la Google
  • Faceted Search and Filtering
  • Scalable index maintenance, no index fragmentation, NRT data changes
  • Advanced, Configurable Text Analysis
  • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
  • Powerful Extensions to the Lucene Query Language
  • Geospatial Search

with these additions decision makers have in hand a native SQL integrated Solr solution which can be used on any Oracle Analytic tool.

5 minutes tutorial with Scotas OLS

If you already have Scotas OLS installed this is a 5 minutes guide for using Oracle-Solr integration.

Scotas OLS binary distribution have a sample configuration files located at db/tutorial directory which are used to simulate all operations described into Solr Tutorial but using Scotas OLS.

Here a complete list of steps:

1) grant LUCENEUSER role to SCOTT (this step could be avoided if was already done):

-bash-4.2$ sqlplus “/ as sysdba”
SQL*Plus: Release 11.2.0.3.0 Production on Thu May 10 08:07:51 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> grant LUCENEUSER to SCOTT;
Grant succeeded.

2) upload Solr related files to XMLDB repository (assuming that we will create an index named SCOTT.TUTORIAL_SIDX):

# cd db/tutorial/LUCENE.TUTORIAL_SIDX/conf
# ftp localhost 2100
Connected to localhost.localdomain.
220- localhost.localdomain
Unauthorised use of this FTP server is prohibited and may be subject to civil and criminal prosecution.
220 localhost.localdomain FTP Server (Oracle XML DB/Oracle Database) ready.
Name (localhost:mochoa): scott
331 pass required for SCOTT
Password: *****
230 SCOTT logged in
Remote system type is Unix.
ftp> cd public/solr
250 CWD Command successful
ftp> mkdir SCOTT.TUTORIAL_SIDX
250 CWD Command successful
ftp> cd SCOTT.TUTORIAL_SIDX
250 CWD Command successful
ftp> mkdir conf
250 CWD Command successful
ftp> cd conf
250 CWD Command successful
ftp> mput *
mput elevate.xml [anpqy?]? a
………
ftp> dir
227 Entering Passive Mode (127,0,0,1,249,27)
150 ASCII Data Connection
-rw-r–r– 1 SCOTT oracle 1274 MAY 10 11:15 elevate.xml
-rw-r–r– 1 SCOTT oracle 78514 MAY 10 11:15 mapping-FoldToASCII.txt
-rw-r–r– 1 SCOTT oracle 2868 MAY 10 11:15 mapping-ISOLatin1Accent.txt
-rw-r–r– 1 SCOTT oracle 367 MAY 10 11:15 mb_attributes.txt
-rw-r–r– 1 SCOTT oracle 121 MAY 10 11:15 mb_letterBuckets.txt
-rw-r–r– 1 SCOTT oracle 873 MAY 10 11:15 protwords.txt
-rw-r–r– 1 SCOTT oracle 30717 MAY 10 11:15 schema.xml
-rw-r–r– 1 SCOTT oracle 921 APR 13 13:46 scripts.conf
-rw-r–r– 1 SCOTT oracle 13267 MAY 10 11:15 solrconfig.xml
-rw-r–r– 1 SCOTT oracle 13 MAY 10 11:15 spellings.txt
-rw-r–r– 1 SCOTT oracle 1171 MAY 10 11:15 stopwords.txt
-rw-r–r– 1 SCOTT oracle 1132 MAY 10 11:15 synonyms.txt
226 ASCII Transfer Complete
ftp> pwd
257 “/public/solr/SCOTT.TUTORIAL_SIDX/conf” is current directory.
ftp> quit

3) Create and populate a sample table. This table have a relational version of the Solr tutorial documents (these SQL statements are at db/tutorial/schemaTutorial.sql file).

[mochoa@localhost tutorial]$ sqlplus scott/tiger@orcl
SQL*Plus: Release 11.2.0.3.0 Production on Thu May 10 08:22:44 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> set define off
SQL> CREATE TABLE OLS_TUTORIAL ( ID VARCHAR2(30) PRIMARY KEY,
NAME VARCHAR2(400),
MANU VARCHAR2(4000),
CAT VARCHAR2(400),
FEATURES CLOB,
INCLUDES VARCHAR2(4000),
WEIGHT NUMBER,
PRICE NUMBER,
POPULARITY NUMBER,
INSTOCK CHAR(5), — true or false
MANUFACTUREDATE_DT TIMESTAMP,
PAYLOADS VARCHAR2(4000),
STORE VARCHAR2(200)) ;
Table created.
SQL> insert into ols_tutorial values ( ‘SP2514N’, ‘Samsung SpinPoint P120 SP2514N – hard drive – 250 GB – ATA-133′, ‘Samsung Electronics Co. Ltd.’, ‘electronics hard-drive’, ‘7200RPM, 8MB cache, IDE Ultra ATA-133 NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor’, NULL, NULL, 92, 6, ‘true’, TO_TIMESTAMP(‘2006-02-13T15:26:37Z’,’YYYY-MM-DD”T”HH24:MI:SS”Z”‘), NULL, ‘35.0752,-97.032′) ;
…….
1 row created.
SQL> commit;
Commit complete.
SQL> select count(*) from OLS_TUTORIAL;
COUNT(*)
———-
16

4) Indexing using Scotas OLS.

SQL> CREATE INDEX TUTORIAL_SIDX ON OLS_TUTORIAL(ID) INDEXTYPE IS LUCENE.SOLRINDEX
parameters(‘Updater:localhost@9099;Searcher:localhost@9099;SyncMode:OnLine;LockMasterTable:false;IncludeMasterColumn:false;LogLevel:ALL;MltColumn:title;HighlightColumn:name,features;DefaultColumn:text;ExtraCols:id “id”,cat “cat”,name “name”,features “features”,manu “manu”,includes “includes”,price “price”,popularity “popularity”,inStock “inStock”,to_char(manufacturedate_dt,”YYYY-MM-DD”T”HH24:MI:SS”Z””) “manufacturedate_dt”‘);
Index created.

5) Doing some queries (equivalent to the section  …and now you can search for all sorts of Solr Tutorial, there are more query examples at db/tutorial/testTutorial.xml file in a form o SQLUnit syntax):

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ id FROM OLS_TUTORIAL T where scontains(id,’video’)>0;
ID
——————————
MA147LL/A
100-435805
EN7800GTX/2DHTV/256M

SQL> SELECT id FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’name:video’)>0;
ID
——————————
MA147LL/A
100-435805

SQL> SELECT id,price FROM OLS_TUTORIAL WHERE SCONTAINS(ID,’video AND price:[* TO 400]’)>0;
ID PRICE
—————————— ———-
MA147LL/A                           399

6) Other complex queries (Faceted Search example):

SQL> SELECT FIELD,SJOIN(T.FACETS) F FROM TABLE(SFACETS(USER||’.TUTORIAL_SIDX’,’*:*’,’facet.field=cat&facet.field=inStock’)) T;
cat
electronics(14),memory(3),connector(2),graphics-card(2),hard-drive(2),monitor(2) ,search(2),software(2),camera(1),copier(1),multifunction-printer(1),music(1),pri nter(1),scanner(1)
inStock
true(12),false(4)

7) More queries (Text Analysis)

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ SSCORE(1),name FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’power-shot‘,1)>0;
.683216989
Canon PowerShot SD500

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ SSCORE(1),NAME,features FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’features:recharging‘,1)>0;
.336813927
Apple 60 GB iPod with Video Playback Black iTunes, Podcasts, Audiobooks Stores up to 15,000 songs, 25,000 photos, or 150 hours of video 2.5-inch, 320×240 color TFT LCD display with LED backlight Up to 20 hours of battery life Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in g ames, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication

And that’s all. Here some notes on the index parameters defined above:

  • IncludeMasterColumn:false, defined that the column ID which is the master column of the index defined when the SQL DDL is used as CREATE INDEX TUTORIAL_SIDX ON OLS_TUTORIAL(ID)… is not included during index construction as Solr field.
  • MltColumn:title, defines which column is used with More Like This ancillary operator mlt(), not showed at the examples.
  • HighlightColumn:name,features, columns used for highlighting with shighlight() ancillary operator, not showed at the examples.
  • DefaultColumn:text, which column is used as default for searching when using Solr Query Syntax at scontains() operator, note that text is not table column, is a virtual column defined at schema.xml with the Solr functionality copyField and works as bag including all Solr fields (cat, name, manu, features, includes).
  • ExtraCols:id “id“,cat “cat“,name “name“,features “features“,manu “manu“,includes “includes“,price “price“,popularity “popularity“,inStock “inStock“,to_char(manufacturedate_dt,”YYYY-MM-DD”T”HH24:MI:SS”Z””) “manufacturedate_dt“, these are table columns defined as Solr field, note that we are using double quoted (“) to defined case sensitive Solr Field names and an special to_char mask to format dates includes in manufacterdate_dt column in Solr date format.

 

Scaling OLS (Oracle/Solr Integration)

OLS have differents ways to deal with large data-sets by using some Oracle built in features available on RAC or multi-core environments.

A difference with Solr distributed search/sharding is that OLS is using a transactional/shared Lucene storage which means that if one process commits a change on them it is visible immediately to all other processes, is not necessary a copy post process, we can see this like Solr over SAN, but with additional of the SQL behavior.

To explain above functionality we can use the index optimize task. This task takes a long time to execute and can hit Out Of Memory exceptions if some Solr parameter are not fine tuned, lets see the example.

On a console 1 a running test-suite simulates multiple users requesting searches on a table which has around 600K rows:

[mochoa@localhost ols]$ ant test-queryhits-solr
Buildfile: build.xml
[echo] Building ols…
test-queryhits-solr:
[junit] Running com.scotas.solr.indexer.TestQueryHits
[junit] Hits: 36707

[junit] Elapsed time: 335
….
[junit] Query time: 59
[junit] Elapsed time: 12983
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.323 sec
BUILD SUCCESSFUL
Total time: 14 seconds

on a parallel console 2 an optimize task is started

[mochoa@localhost Download]$ sqlplus scott/tiger@orcl
SQL*Plus: Release 11.2.0.3.0 Production on Tue Apr 24 08:46:26 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);

and we see that the CPU usage of processor 4 is at the top of CPU (attached to the optimize task, PID 10567) and other Oracle process are on top of the execution processing request for searching from the client side (Java test-suite, oracleorcl[LOCAL=NO]) and the Shared Searcher process (ora_j010_orcl)

Note that other processes are not affected by the optimize task running with PID 10567 and to show what happens in the worst scenario we will kill the process trying to show a catastrophic problem, such as an OOM exception or a RAC node failure:

[mochoa@localhost ~]$ su
Password:
[root@localhost mochoa]# kill -9 10567
———————————————————————————
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);
BEGIN SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’); END;
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10567
Session ID: 77 Serial number: 109
Elapsed: 00:01:05.96

by starting again the test suite, the Oracle RDBMS automatically did a rollback on any uncommitted changes on the Lucene storage so the optimization task didn’t break the storage and can be started again without any problem, lets see:

[mochoa@localhost ols]$ ant test-queryhits-solr
Buildfile: build.xml
[echo] Building ols…
test-queryhits-solr:

[junit] Running com.scotas.solr.indexer.TestQueryHits
[junit] Hits: 36707
[junit] Elapsed time: 275
….
[junit] Query time: 20
[junit] Elapsed time: 11293
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 11.571 sec
BUILD SUCCESSFUL
Total time: 12 seconds

SQL> set timing on
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);
PL/SQL procedure successfully completed.
Elapsed: 00:04:34.63
SQL> commit;
Commit complete.

Another feature of OLS is the parallel execution and could be used to speedup heavy indexes by configuring multiples processes listening on different ports and altering the index to use them, for example we have 3 OLS parallel/shared server processes running:

-bash-4.2$ netstat -anp|grep ora_j0
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 :::9099 :::* LISTEN 4479/ora_j010_orcl
tcp 0 0 :::9199 :::* LISTEN 4483/ora_j012_orcl
tcp 0 0 :::9299 :::* LISTEN 4485/ora_j013_orcl

now by executing an alter command above index will be configured to use two processes for reading, and one process for updating the index.

SQL> alter index SOURCE_BIG_SIDX parameters(‘Searcher:localhost@9099,localhost@9199;
Updater:localhost@9299′);
Index altered.
Elapsed: 00:00:00.28

any successful write operations committed by the writer process (PID 4485) will cause that the other two readers (PID 4479/4483) hot-reload their indexes structure in RAM to see changes transparently.

Above examples shows how easily OLS can scale on multi-core environments.

Another important  functionality is known as partitioning/sharding, OLS automatically inherits partitioning from tables on Oracle Enterprise Edition, for example we have a table with 24 partitions created as:

CREATE TABLE “SCOTT”.”MB_SALES”
( “ID” VARCHAR2(20 BYTE),
“S_DATE” DATE,
…..
“T_TRM_LOOKUPS” NUMBER(10,0)
) COMPRESS
PARTITION BY RANGE (s_date)
(
PARTITION SALES_2010_01 VALUES LESS THAN (TO_DATE(’01-FEB-2010′,’DD-MON-YYYY’)),
….
PARTITION SALES_2011_12 VALUES LESS THAN (TO_DATE(’01-JAN-2012′,’DD-MON-YYYY’)),
PARTITION SALES_2012 VALUES LESS THAN (MAXVALUE)
) NOLOGGING;

by creating an OLS index using:

CREATE INDEX SIDX ON MB_SALES (T_NAME)
INDEXTYPE IS LUCENE.SOLRINDEX
PARAMETERS (‘PopulateIndex:false;DefaultColumn:t_name;IncludeMasterColumn:false;
BatchCount:4000;LockMasterTable:false;LogLevel:ALL;SyncMode:OnLine;
HighlightColumn:t_name;ExtraCols:t_name “t_name”,t_duration “t_duration”,
t_a_name “t_a_name”,t_r_name “t_r_name”,t_r_attributes “r_type”,t_r_tracks “t_r_tracks”,
t_trm_lookups “t_trm_lookups”‘)
LOCAL parallel 2;

and setting PopulateIndex:false we can define later an specific parameter for each partition:

alter index SIDX REBUILD PARTITION SALES_2010_01 parameters (‘Searcher:localhost@9099;Updater:localhost@9099′);
alter index SIDX REBUILD PARTITION SALES_2010_02 parameters (‘Searcher:localhost@9199;Updater:localhost@9199′);

at the time of populating the index port 9099 is used for serving request on partition SALES_2010_01, 9199 for SALES_2010_02, and so on.

Finally, Oracle automatically controls which partitions of the shard are visited when doing a query, for example:

SQL> explain plan for
SELECT SHIGHLIGHT(1),SSCORE(1) FROM MB_SALES WHERE S_DATE BETWEEN TO_DATE(’01-JAN-2010′,’DD-MON-YYYY’) AND TO_DATE(’31-MAY-2010′,’DD-MON-YYYY’) AND SCONTAINS(T_NAME,’rownum:[1 TO 10] AND love’,1)>0
ORDER BY SSCORE(1) DESC;

……..

——————————————————–
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
——————————————————–
| 0 | SELECT STATEMENT | | 26026 | 1092K| | 2841 (1)| 00:00:35 | | |
| 1 | SORT ORDER BY | | 26026 | 1092K| 1448K| 2841 (1)| 00:00:35 | | |
| 2 | PARTITION RANGE ITERATOR | | 26026 | 1092K| | 2553 (1)| 00:00:31 | 1 | 5 |
|* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| MB_SALES | 26026 | 1092K| | 2553 (1)| 00:00:31 | 1 | 5 |
|* 4 | DOMAIN INDEX | SIDX | | | | | | | |
——————————————————–

Predicate Information (identified by operation id):
——————————————————–

3 – filter(“S_DATE”<=TO_DATE(‘ 2010-05-31 00:00:00′, ‘syyyy-mm-dd hh24:mi:ss’) AND “S_DATE”>=TO_DATE(‘
2010-01-01 00:00:00′, ‘syyyy-mm-dd hh24:mi:ss’))
4 – access(“LUCENE”.”SCONTAINS”(“T_NAME”,’rownum:[1 TO 10] AND love’,1)>0)

18 rows selected.

Elapsed: 00:00:01.80

Note that columns Pstart and Pstop shows that the RDBMS optimizer knows that only partitions 1 to 5 of the index need to be visited to satisfy the query with the condition of  01-JAN-2010 and 31-MAY-2010, this functionality is known as partition pruning and is automatically exposed to OLS.

The advantages of partitioning/sharding are not only the possibility to transparently distribute searches across multiple processes, but also the index structure is smaller and as consequence of that less RAM is used during index operations and less time is involved doing maintenance operations such as optimization or export/import.

For example MB_SALES table has 167M rows, but each partition has only 6.9M rows. Finally, the index structure is only using 1.5Gb of space on each partition.

 Note: Using inline OLS pagination with partitioned tables caused that the query returns more rows than expected, it means for example if we defined rownum:[1 TO 10] instead of 10 rows expected above query returns 50 rows (10 by 5 partitions visited) and the sscore(1) sort should be re-executed at SQL level; if we need only 10 rows the solution is to use the Top-N syntax or limiting the cursor result at client-side.

New Oracle Lucene Domain Index release based on Lucene 3.0.2

Just a few words to announce a new release of Oracle Lucene Domain Index, this zip is valid for 10g and 11g database version (10g using back-ported classes from 1.5 to 1.4)
This release is compiled using Lucene 3.0.2 version and incorporates a set of new features added, here the list:

  • Added a long awaited functionality, a parallel/shared/slave search process used during a start-fetch-close and CountHits function
  • Added lfreqterms ancillary operator returning the freq terms array of rows visited
  • Added lsimilarity ancillary operator returning a computed Levenshtein distance of the row visited
  • Added a ldidyoumean pipeline table function using DidYouMean.indexDictionary storage
  • Added test using SQLUnit

The bigger addition is the Parallel-Shared-Slave search process, this architectural change was in my to-do list for a long time and finally I added in this release :)
The idea behind this is to have a new Oracle process started by the DBMS_SCHEDULER sub-system during the database startup process and stopped immediately before shutdown.
Now this process is responsible for implementing the ODCI methods start-fetch-close/count-hit on behalf of the client process (process associated to an specific user session) which connect to the shared-slave process by using RMI.
With this new architecture we have two principal benefits:

  • Reduce memory consumption
  • Increase Lucene Cache Hits

Less memory consumption because the internal OJVM implementation is attached to a client session, so the Java space used by Lucene structures is isolated and independent from another concurrent session, now allLucene memory structures used during index scan process are created in a shared process and then not replicated.
Also if one session submits a Lucene search, this search is cached for subsequent queries, all subsequent queries coming from the same client session or any other which are associated to the same index and with the same Query string implies a hit.
I’ll explain more in detail this new architecture in another post also showing how many parallel process can work together when using Parallel Indexing and Searching.
On the other hand next week I’ll be at the Oracle OpenWorld 2010 in SFO presenting the session:

Schedule: Tuesday: 09:30AM
Session ID: S315660
Title: Database Applications Lifecycle Management
Event: JavaOne and Oracle Develop
Stream(s): ORACLE DEVELOP
Track(s): Database Development
Abstract: Complex applications, such as Java running inside the database, require an application lifecycle management to develop and delivery good code. This session will cover some best practices, tools, and experience managing and delivering code for running inside the database, including tools for debugging, automatic test, packaging, deployment, and release management. Some of the tools presented will include Apache Maven, JUnit, log4j, Oracle JDeveloper, and others integrated into the Oracle Java Virtual Machine (JVM) development environment.

See you there or at any of networking planned events :)