Scotas OLS official Docker image build scripts

We uploaded to GitHub two official Docker build image scripts for Scotas OLS 2.0.1 (RDBMS 12c R1) and OLS 2.0.2 (RDBMS 12c R2).

Both scripts are based on Oracle official RDBMS 12c Docker build images scripts, and they make the installation of Oracle/Scotas OLS too easy.

It basically start by using your own Oracle 12c Docker image and install OLS’s binaries and license. Follow these steps to get your OLS up and running on Linux/Mac/Windows:

First check that you have the 12c image ready, for example:

[mochoa@localhost ols-12cr2]$ docker image ls|grep oracle
….
oracle/database                12.2.0.1-ee                 1a870e15682d                6 weeks ago            14.8 GB
….

then checkout our scripts from GitHub using:

[mochoa@localhost tmp]$ git clone https://github.com/scotas/docker-images.git
Cloning into ‘docker-images’…
remote: Counting objects: 39, done.
remote: Compressing objects: 100% (28/28), done.
remote: Total 39 (delta 14), reused 32 (delta 10), pack-reused 0
Unpacking objects: 100% (39/39), done.
Checking connectivity… done.
[mochoa@localhost tmp]$ cd docker-images/ols-12cr2/
[mochoa@localhost ols-12cr2]$ ./buildDockerImage.sh

.. Docker build steps here (requires internet connection)..

once you have Scotas OLS image ready at your local Docker images repository you can install OLS at first startup of your Docker container using:

docker run –name <container name> \
-p <host port>:1521 -p <host port>:5500 \
-e ORACLE_SID=<your SID> \
-e ORACLE_PDB=<your PDB name> \
-e ORACLE_PWD=<your database passwords> \
-e ORACLE_CHARACTERSET=<your character set> \
-v [<host mount point>:]/opt/oracle/oradata \
ols-official:2.0.2

Important: unlike Oracle official Docker image script first run requires ORACLE_PWD argument, it doesn’t work with random generated password. During this first startup your RDBMS will create all the datafiles at <host mount point> directory and unpack and install Scotas OLS, you will see some like this:

ODCI patched
Installing OLS…
OLS installed see /home/oracle/install-OLS.log and /home/oracle/install-OLS.err files for details
OLS Installed OK….
#########################
DATABASE IS READY TO USE!
#########################

after a successful Database creation you can start/stop the container, rm and run again without the ORACLE_PWD argument.

Note: that the CDB and PDB$SEED are patched against an update version of ODCI.jar library, the included library for RDBMS 12c R1 and R2 are outdated and doesn’t have a correct wrapper files for latest ODCI library, see this post for more details.

Note: During first startup you will see an ora_m00n_pid process consuming a lot of CPU, this is because once you start using Java inside the Database a JIT compilation phase start converting Java code to assembler, it includes a complete Java/Oracle stack and Scotas OLS code, after a week of OLS usage all Java code must be compiled to assembler and this process only takes some CPU usage, obviously when all the code is compiled your indexing and query time will be faster.

5 minutes tutorial with Scotas OLS

If you already have Scotas OLS installed this is a 5 minutes guide for using Oracle-Solr integration.

Scotas OLS binary distribution have a sample configuration files located at db/tutorial directory which are used to simulate all operations described into Solr Tutorial but using Scotas OLS.

Here a complete list of steps:

1) grant LUCENEUSER role to SCOTT (this step could be avoided if was already done):

-bash-4.2$ sqlplus “/ as sysdba”
SQL*Plus: Release 11.2.0.3.0 Production on Thu May 10 08:07:51 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> grant LUCENEUSER to SCOTT;
Grant succeeded.

2) upload Solr related files to XMLDB repository (assuming that we will create an index named SCOTT.TUTORIAL_SIDX):

# cd db/tutorial/LUCENE.TUTORIAL_SIDX/conf
# ftp localhost 2100
Connected to localhost.localdomain.
220- localhost.localdomain
Unauthorised use of this FTP server is prohibited and may be subject to civil and criminal prosecution.
220 localhost.localdomain FTP Server (Oracle XML DB/Oracle Database) ready.
Name (localhost:mochoa): scott
331 pass required for SCOTT
Password: *****
230 SCOTT logged in
Remote system type is Unix.
ftp> cd public/solr
250 CWD Command successful
ftp> mkdir SCOTT.TUTORIAL_SIDX
250 CWD Command successful
ftp> cd SCOTT.TUTORIAL_SIDX
250 CWD Command successful
ftp> mkdir conf
250 CWD Command successful
ftp> cd conf
250 CWD Command successful
ftp> mput *
mput elevate.xml [anpqy?]? a
………
ftp> dir
227 Entering Passive Mode (127,0,0,1,249,27)
150 ASCII Data Connection
-rw-r–r– 1 SCOTT oracle 1274 MAY 10 11:15 elevate.xml
-rw-r–r– 1 SCOTT oracle 78514 MAY 10 11:15 mapping-FoldToASCII.txt
-rw-r–r– 1 SCOTT oracle 2868 MAY 10 11:15 mapping-ISOLatin1Accent.txt
-rw-r–r– 1 SCOTT oracle 367 MAY 10 11:15 mb_attributes.txt
-rw-r–r– 1 SCOTT oracle 121 MAY 10 11:15 mb_letterBuckets.txt
-rw-r–r– 1 SCOTT oracle 873 MAY 10 11:15 protwords.txt
-rw-r–r– 1 SCOTT oracle 30717 MAY 10 11:15 schema.xml
-rw-r–r– 1 SCOTT oracle 921 APR 13 13:46 scripts.conf
-rw-r–r– 1 SCOTT oracle 13267 MAY 10 11:15 solrconfig.xml
-rw-r–r– 1 SCOTT oracle 13 MAY 10 11:15 spellings.txt
-rw-r–r– 1 SCOTT oracle 1171 MAY 10 11:15 stopwords.txt
-rw-r–r– 1 SCOTT oracle 1132 MAY 10 11:15 synonyms.txt
226 ASCII Transfer Complete
ftp> pwd
257 “/public/solr/SCOTT.TUTORIAL_SIDX/conf” is current directory.
ftp> quit

3) Create and populate a sample table. This table have a relational version of the Solr tutorial documents (these SQL statements are at db/tutorial/schemaTutorial.sql file).

[mochoa@localhost tutorial]$ sqlplus scott/tiger@orcl
SQL*Plus: Release 11.2.0.3.0 Production on Thu May 10 08:22:44 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> set define off
SQL> CREATE TABLE OLS_TUTORIAL ( ID VARCHAR2(30) PRIMARY KEY,
NAME VARCHAR2(400),
MANU VARCHAR2(4000),
CAT VARCHAR2(400),
FEATURES CLOB,
INCLUDES VARCHAR2(4000),
WEIGHT NUMBER,
PRICE NUMBER,
POPULARITY NUMBER,
INSTOCK CHAR(5), — true or false
MANUFACTUREDATE_DT TIMESTAMP,
PAYLOADS VARCHAR2(4000),
STORE VARCHAR2(200)) ;
Table created.
SQL> insert into ols_tutorial values ( ‘SP2514N’, ‘Samsung SpinPoint P120 SP2514N – hard drive – 250 GB – ATA-133′, ‘Samsung Electronics Co. Ltd.’, ‘electronics hard-drive’, ‘7200RPM, 8MB cache, IDE Ultra ATA-133 NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor’, NULL, NULL, 92, 6, ‘true’, TO_TIMESTAMP(‘2006-02-13T15:26:37Z’,’YYYY-MM-DD”T”HH24:MI:SS”Z”‘), NULL, ‘35.0752,-97.032′) ;
…….
1 row created.
SQL> commit;
Commit complete.
SQL> select count(*) from OLS_TUTORIAL;
COUNT(*)
———-
16

4) Indexing using Scotas OLS.

SQL> CREATE INDEX TUTORIAL_SIDX ON OLS_TUTORIAL(ID) INDEXTYPE IS LUCENE.SOLRINDEX
parameters(‘Updater:localhost@9099;Searcher:localhost@9099;SyncMode:OnLine;LockMasterTable:false;IncludeMasterColumn:false;LogLevel:ALL;MltColumn:title;HighlightColumn:name,features;DefaultColumn:text;ExtraCols:id “id”,cat “cat”,name “name”,features “features”,manu “manu”,includes “includes”,price “price”,popularity “popularity”,inStock “inStock”,to_char(manufacturedate_dt,”YYYY-MM-DD”T”HH24:MI:SS”Z””) “manufacturedate_dt”‘);
Index created.

5) Doing some queries (equivalent to the section  …and now you can search for all sorts of Solr Tutorial, there are more query examples at db/tutorial/testTutorial.xml file in a form o SQLUnit syntax):

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ id FROM OLS_TUTORIAL T where scontains(id,’video’)>0;
ID
——————————
MA147LL/A
100-435805
EN7800GTX/2DHTV/256M

SQL> SELECT id FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’name:video’)>0;
ID
——————————
MA147LL/A
100-435805

SQL> SELECT id,price FROM OLS_TUTORIAL WHERE SCONTAINS(ID,’video AND price:[* TO 400]’)>0;
ID PRICE
—————————— ———-
MA147LL/A                           399

6) Other complex queries (Faceted Search example):

SQL> SELECT FIELD,SJOIN(T.FACETS) F FROM TABLE(SFACETS(USER||’.TUTORIAL_SIDX’,’*:*’,’facet.field=cat&facet.field=inStock’)) T;
cat
electronics(14),memory(3),connector(2),graphics-card(2),hard-drive(2),monitor(2) ,search(2),software(2),camera(1),copier(1),multifunction-printer(1),music(1),pri nter(1),scanner(1)
inStock
true(12),false(4)

7) More queries (Text Analysis)

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ SSCORE(1),name FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’power-shot‘,1)>0;
.683216989
Canon PowerShot SD500

SQL> SELECT /*+ DOMAIN_INDEX_SORT */ SSCORE(1),NAME,features FROM OLS_TUTORIAL T WHERE SCONTAINS(ID,’features:recharging‘,1)>0;
.336813927
Apple 60 GB iPod with Video Playback Black iTunes, Podcasts, Audiobooks Stores up to 15,000 songs, 25,000 photos, or 150 hours of video 2.5-inch, 320×240 color TFT LCD display with LED backlight Up to 20 hours of battery life Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in g ames, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication

And that’s all. Here some notes on the index parameters defined above:

  • IncludeMasterColumn:false, defined that the column ID which is the master column of the index defined when the SQL DDL is used as CREATE INDEX TUTORIAL_SIDX ON OLS_TUTORIAL(ID)… is not included during index construction as Solr field.
  • MltColumn:title, defines which column is used with More Like This ancillary operator mlt(), not showed at the examples.
  • HighlightColumn:name,features, columns used for highlighting with shighlight() ancillary operator, not showed at the examples.
  • DefaultColumn:text, which column is used as default for searching when using Solr Query Syntax at scontains() operator, note that text is not table column, is a virtual column defined at schema.xml with the Solr functionality copyField and works as bag including all Solr fields (cat, name, manu, features, includes).
  • ExtraCols:id “id“,cat “cat“,name “name“,features “features“,manu “manu“,includes “includes“,price “price“,popularity “popularity“,inStock “inStock“,to_char(manufacturedate_dt,”YYYY-MM-DD”T”HH24:MI:SS”Z””) “manufacturedate_dt“, these are table columns defined as Solr field, note that we are using double quoted (“) to defined case sensitive Solr Field names and an special to_char mask to format dates includes in manufacterdate_dt column in Solr date format.

 

Scaling OLS (Oracle/Solr Integration)

OLS have differents ways to deal with large data-sets by using some Oracle built in features available on RAC or multi-core environments.

A difference with Solr distributed search/sharding is that OLS is using a transactional/shared Lucene storage which means that if one process commits a change on them it is visible immediately to all other processes, is not necessary a copy post process, we can see this like Solr over SAN, but with additional of the SQL behavior.

To explain above functionality we can use the index optimize task. This task takes a long time to execute and can hit Out Of Memory exceptions if some Solr parameter are not fine tuned, lets see the example.

On a console 1 a running test-suite simulates multiple users requesting searches on a table which has around 600K rows:

[mochoa@localhost ols]$ ant test-queryhits-solr
Buildfile: build.xml
[echo] Building ols…
test-queryhits-solr:
[junit] Running com.scotas.solr.indexer.TestQueryHits
[junit] Hits: 36707

[junit] Elapsed time: 335
….
[junit] Query time: 59
[junit] Elapsed time: 12983
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.323 sec
BUILD SUCCESSFUL
Total time: 14 seconds

on a parallel console 2 an optimize task is started

[mochoa@localhost Download]$ sqlplus scott/tiger@orcl
SQL*Plus: Release 11.2.0.3.0 Production on Tue Apr 24 08:46:26 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);

and we see that the CPU usage of processor 4 is at the top of CPU (attached to the optimize task, PID 10567) and other Oracle process are on top of the execution processing request for searching from the client side (Java test-suite, oracleorcl[LOCAL=NO]) and the Shared Searcher process (ora_j010_orcl)

Note that other processes are not affected by the optimize task running with PID 10567 and to show what happens in the worst scenario we will kill the process trying to show a catastrophic problem, such as an OOM exception or a RAC node failure:

[mochoa@localhost ~]$ su
Password:
[root@localhost mochoa]# kill -9 10567
———————————————————————————
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);
BEGIN SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’); END;
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10567
Session ID: 77 Serial number: 109
Elapsed: 00:01:05.96

by starting again the test suite, the Oracle RDBMS automatically did a rollback on any uncommitted changes on the Lucene storage so the optimization task didn’t break the storage and can be started again without any problem, lets see:

[mochoa@localhost ols]$ ant test-queryhits-solr
Buildfile: build.xml
[echo] Building ols…
test-queryhits-solr:

[junit] Running com.scotas.solr.indexer.TestQueryHits
[junit] Hits: 36707
[junit] Elapsed time: 275
….
[junit] Query time: 20
[junit] Elapsed time: 11293
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 11.571 sec
BUILD SUCCESSFUL
Total time: 12 seconds

SQL> set timing on
SQL> exec SolrDomainIndex.optimize(‘SOURCE_BIG_SIDX’);
PL/SQL procedure successfully completed.
Elapsed: 00:04:34.63
SQL> commit;
Commit complete.

Another feature of OLS is the parallel execution and could be used to speedup heavy indexes by configuring multiples processes listening on different ports and altering the index to use them, for example we have 3 OLS parallel/shared server processes running:

-bash-4.2$ netstat -anp|grep ora_j0
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 :::9099 :::* LISTEN 4479/ora_j010_orcl
tcp 0 0 :::9199 :::* LISTEN 4483/ora_j012_orcl
tcp 0 0 :::9299 :::* LISTEN 4485/ora_j013_orcl

now by executing an alter command above index will be configured to use two processes for reading, and one process for updating the index.

SQL> alter index SOURCE_BIG_SIDX parameters(‘Searcher:localhost@9099,localhost@9199;
Updater:localhost@9299′);
Index altered.
Elapsed: 00:00:00.28

any successful write operations committed by the writer process (PID 4485) will cause that the other two readers (PID 4479/4483) hot-reload their indexes structure in RAM to see changes transparently.

Above examples shows how easily OLS can scale on multi-core environments.

Another important  functionality is known as partitioning/sharding, OLS automatically inherits partitioning from tables on Oracle Enterprise Edition, for example we have a table with 24 partitions created as:

CREATE TABLE “SCOTT”.”MB_SALES”
( “ID” VARCHAR2(20 BYTE),
“S_DATE” DATE,
…..
“T_TRM_LOOKUPS” NUMBER(10,0)
) COMPRESS
PARTITION BY RANGE (s_date)
(
PARTITION SALES_2010_01 VALUES LESS THAN (TO_DATE(’01-FEB-2010′,’DD-MON-YYYY’)),
….
PARTITION SALES_2011_12 VALUES LESS THAN (TO_DATE(’01-JAN-2012′,’DD-MON-YYYY’)),
PARTITION SALES_2012 VALUES LESS THAN (MAXVALUE)
) NOLOGGING;

by creating an OLS index using:

CREATE INDEX SIDX ON MB_SALES (T_NAME)
INDEXTYPE IS LUCENE.SOLRINDEX
PARAMETERS (‘PopulateIndex:false;DefaultColumn:t_name;IncludeMasterColumn:false;
BatchCount:4000;LockMasterTable:false;LogLevel:ALL;SyncMode:OnLine;
HighlightColumn:t_name;ExtraCols:t_name “t_name”,t_duration “t_duration”,
t_a_name “t_a_name”,t_r_name “t_r_name”,t_r_attributes “r_type”,t_r_tracks “t_r_tracks”,
t_trm_lookups “t_trm_lookups”‘)
LOCAL parallel 2;

and setting PopulateIndex:false we can define later an specific parameter for each partition:

alter index SIDX REBUILD PARTITION SALES_2010_01 parameters (‘Searcher:localhost@9099;Updater:localhost@9099′);
alter index SIDX REBUILD PARTITION SALES_2010_02 parameters (‘Searcher:localhost@9199;Updater:localhost@9199′);

at the time of populating the index port 9099 is used for serving request on partition SALES_2010_01, 9199 for SALES_2010_02, and so on.

Finally, Oracle automatically controls which partitions of the shard are visited when doing a query, for example:

SQL> explain plan for
SELECT SHIGHLIGHT(1),SSCORE(1) FROM MB_SALES WHERE S_DATE BETWEEN TO_DATE(’01-JAN-2010′,’DD-MON-YYYY’) AND TO_DATE(’31-MAY-2010′,’DD-MON-YYYY’) AND SCONTAINS(T_NAME,’rownum:[1 TO 10] AND love’,1)>0
ORDER BY SSCORE(1) DESC;

……..

——————————————————–
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
——————————————————–
| 0 | SELECT STATEMENT | | 26026 | 1092K| | 2841 (1)| 00:00:35 | | |
| 1 | SORT ORDER BY | | 26026 | 1092K| 1448K| 2841 (1)| 00:00:35 | | |
| 2 | PARTITION RANGE ITERATOR | | 26026 | 1092K| | 2553 (1)| 00:00:31 | 1 | 5 |
|* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| MB_SALES | 26026 | 1092K| | 2553 (1)| 00:00:31 | 1 | 5 |
|* 4 | DOMAIN INDEX | SIDX | | | | | | | |
——————————————————–

Predicate Information (identified by operation id):
——————————————————–

3 – filter(“S_DATE”<=TO_DATE(‘ 2010-05-31 00:00:00′, ‘syyyy-mm-dd hh24:mi:ss’) AND “S_DATE”>=TO_DATE(‘
2010-01-01 00:00:00′, ‘syyyy-mm-dd hh24:mi:ss’))
4 – access(“LUCENE”.”SCONTAINS”(“T_NAME”,’rownum:[1 TO 10] AND love’,1)>0)

18 rows selected.

Elapsed: 00:00:01.80

Note that columns Pstart and Pstop shows that the RDBMS optimizer knows that only partitions 1 to 5 of the index need to be visited to satisfy the query with the condition of  01-JAN-2010 and 31-MAY-2010, this functionality is known as partition pruning and is automatically exposed to OLS.

The advantages of partitioning/sharding are not only the possibility to transparently distribute searches across multiple processes, but also the index structure is smaller and as consequence of that less RAM is used during index operations and less time is involved doing maintenance operations such as optimization or export/import.

For example MB_SALES table has 167M rows, but each partition has only 6.9M rows. Finally, the index structure is only using 1.5Gb of space on each partition.

 Note: Using inline OLS pagination with partitioned tables caused that the query returns more rows than expected, it means for example if we defined rownum:[1 TO 10] instead of 10 rows expected above query returns 50 rows (10 by 5 partitions visited) and the sscore(1) sort should be re-executed at SQL level; if we need only 10 rows the solution is to use the Top-N syntax or limiting the cursor result at client-side.