Push Connector 2.0 for Oracle XE 18c

At the end of 2018 Oracle finally released a long awaited 18c XE release for Linux and a few days ago Windows version.

This new release included a native support for Json inside the Oracle Database this is a good new because:

  •  standardize the API (no more dependency over PLJson Library or Apex JSON)
  • performance, this support is complete implemented in C

using this new feature We complete rewrite Scotas Push Connector for Solr using the new API and due XE is free, Scotas Push connector is delivered also as Open Source version with the source at GitHub!!!

To deploy this new version is simple as deploy a new Docker stack on your Swarm cluster or single server, a sample stack is at docker-compose.yml file. Here main components:

Screenshot from 2019-02-26 10-53-15

 

 

 

 

 

 

 

 

first part of the stack is to start a new Solr instance to assist Solr Push connector searching facilities, remember that Push Connector implements Searching functionality using outside instance of Apache Solr.

Screenshot from 2019-02-26 10-56-03

 

 

 

 

second service is an Oracle RDBMS instance with two important parts, an external storage for the database data as the Oracle Docker image suggest, you can do the same with the persistent storage of Apache Solr image, also We define as environment variables the initial SYS password for the XE instance (this is mandatory during PC installation) and the hostname and port for Solr instance.

Finally We provide into the stack the startup scripts for installing Scotas Push Connector 2.0.0, this scripts are automatically called just after the Oracle XE instance is created.

Screenshot from 2019-02-26 11-04-23

 

 

 

to deploy this stack either in Swarm cluster or a simple node with Docker installed just proceed with these steps:

  • Checkout GitHub scripts

$ git clone https://github.com/scotas/docker-images.git
$ cd docker-images/pc-18cXE

  • Create configs for Oracle XE startup scripts

$ curl https://raw.githubusercontent.com/scotas/docker-images/master/pc-18cXE/00-unzip-pc.sh|docker config create 00-unzip-pc –
$ curl https://raw.githubusercontent.com/scotas/docker-images/master/pc-18cXE/01-pc-ins.sh|docker config create 01-pc-ins –
$ curl https://raw.githubusercontent.com/scotas/docker-images/master/pc-18cXE/02-clean-up-files.sh|docker config create 02-clean-up-files -

  • Deploy sample stack

$ cd ../sample-stacks
$ docker stack deploy pc -c docker-compose-pc.yml
$ docker service logs -f pc_db
pc_db.1.y7826xns4ig6@pocho | ORACLE PASSWORD FOR SYS AND SYSTEM: Oracle_2018
….
pc_db.1.y7826xns4ig6@pocho | old 33: insert into PC.bg_process (bg_process_name,host_name,port) values (‘Default &connector Server’,’&newhost’,&newport);
pc_db.1.y7826xns4ig6@pocho | new 33: insert into PC.bg_process (bg_process_name,host_name,port) values (‘Default solr Server’,’solr’,8983);
pc_db.1.y7826xns4ig6@pocho |
pc_db.1.y7826xns4ig6@pocho | PL/SQL procedure successfully completed.
pc_db.1.y7826xns4ig6@pocho |
pc_db.1.y7826xns4ig6@pocho | Elapsed: 00:00:10.63
pc_db.1.y7826xns4ig6@pocho | Disconnected from Oracle Database 18c Express Edition Release 18.0.0.0.0 – Production
pc_db.1.y7826xns4ig6@pocho | Version 18.4.0.0.0
pc_db.1.y7826xns4ig6@pocho |
pc_db.1.y7826xns4ig6@pocho | /opt/oracle/runOracle.sh: running /opt/oracle/scripts/setup/02-clean-up-files.sh
pc_db.1.y7826xns4ig6@pocho |
pc_db.1.y7826xns4ig6@pocho | DONE: Executing user defined scripts
pc_db.1.y7826xns4ig6@pocho |
pc_db.1.y7826xns4ig6@pocho | The Oracle base remains unchanged with value /opt/oracle
pc_db.1.y7826xns4ig6@pocho | #########################
pc_db.1.y7826xns4ig6@pocho | DATABASE IS READY TO USE!
pc_db.1.y7826xns4ig6@pocho | #########################


Testing your Solr Push connector

Once you have the stack deployed and your XE ready to use a simple test could be executed connecting to Oracle and creating an scott account, for example:

$ /opt/sqlcl/bin/sql
Username? (”?) sys/Oracle_2018@localhost:1521/XEPDB1 as sysdba
Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 – Production
SQL> create user scott identified by tiger
2 default tablespace users
3 temporary tablespace temp
4* quota unlimited on users;
User SCOTT created.

SQL> grant connect,resource,pcuser to scott;

Grant succeeded.


Preparing Solr to index a sample table

Unlike Scotas OLS which use Apache Solr inside the RDBMS, Push Connector technology relies on an external Solr instance to prepare Solr mappings for storing a sample table indexing just do these steps in shell inside the Solr instance:

$ docker exec -ti pc_solr.1.tn78o19x9qov3635yhkyyea6r /bin/bash
solr@solr:/opt/solr$ bin/solr create_core -c source_big_pidx
….
Created new core ‘source_big_pidx’
solr@solr:/opt/solr$ curl -X POST -H ‘Content-type:application/json’ –data-binary ‘{
> “add-field”:{
> “name”:”rowid”,
> “type”:”string”,
> “indexed”:true,
> “stored”:true,
> “required”:true,
> “multiValued”:false
> },
> “add-field”:{
> “name”:”solridx”,
> “type”:”string”,
> “indexed”:true,
> “stored”:true,
> “required”:true,
> “multiValued”:false
> }
> }’ http://localhost:8983/solr/source_big_pidx/schema
{
“responseHeader”:{
“status”:0,
“QTime”:336}}
solr@solr:/opt/solr$ sed -i ‘s/<uniqueKey>id/<uniqueKey>rowid/g’ /opt/solr/server/solr/source_big_pidx/conf/managed-schema
solr@solr:/opt/solr$ curl “http://localhost:8983/solr/admin/cores?action=RELOAD&core=source_big_pidx”
{
“responseHeader”:{
“status”:0,
“QTime”:1253}}
solr@solr:/opt/solr$ curl -X POST -H ‘Content-type:application/json’ –data-binary ‘{
> “add-field”:{
> “name”:”title”,
> “type”:”text_general”,
> “indexed”:true,
> “stored”:true,
> “multiValued”:true
> },
> “add-copy-field” :{
> “source”:”*”,
> “dest”:”_text_”
> }
> }’ http://localhost:8983/solr/source_big_pidx/schema
{
“responseHeader”:{
“status”:0,
“QTime”:188}}
solr@solr:/opt/solr$ curl http://localhost:8983/solr/source_big_pidx/schema/uniquekey?wt=json
{
“responseHeader”:{
“status”:0,
“QTime”:0},
“uniqueKey”:”rowid”}


Index your first table

Once Apache Solr is ready to receive Push Connector indexing information We can do these:

SQL> conn scott/tiger@localhost:1521/XEPDB1
Connected.
SQL> set long 10000 lines 140 pages 50 timing on echo on
SQL> set serveroutput on size 1000000
SQL> create table test_source_big as (
2 select owner,name,type,line,text from (select rownum as ntop_pos,q.* from
3 (select * from all_source) q)
4 where ntop_pos>=0 and ntop_pos<=5000
5 );

Table TEST_SOURCE_BIG created.

Elapsed: 00:00:05.460
SQL> CREATE INDEX SOURCE_BIG_PIDX ON TEST_SOURCE_BIG(TEXT)
2 INDEXTYPE IS PC.SOLR
3 parameters(‘{LogLevel:”INFO”,
4 Updater:”solr@8983″,
5 Searcher:”solr@8983″,
6 SolrBase:”solr/source_big_pidx“,
7 SyncMode:”OnLine”,
8 BatchCount:5000,
9 CommitOnSync:true,
10 LockMasterTable:false,
11 IncludeMasterColumn:false,
12 DefaultColumn:”text”,
13 HighlightColumn:”title”,
14 ExtraCols:””text” value text,”title” value substr(text,1,256),”line_i” value line,”type_s” value type”}’);

Index SOURCE_BIG_PIDX created.

Elapsed: 00:00:05.923

Note that ExtraCols parameter is using latest json_object column syntax.


Doing some queries

If you look at the logs of Solr instance during the indexing time you will see something like:

$ docker service logs -f pc_solr

pc_solr.1.tn78o19x9qov@pocho | 2019-01-28 13:00:50.281 INFO (qtp735937428-21) [ x:source_big_pidx] o.a.s.u.DirectUpdateHandler2 end_commit_flush
pc_solr.1.tn78o19x9qov@pocho | 2019-01-28 13:00:50.285 INFO (qtp735937428-21) [ x:source_big_pidx] o.a.s.u.p.LogUpdateProcessorFactory [source_big_pidx] webapp=/solr path=/update/json params={waitSearcher=true&ident=on&commit=true&softCommit=false&wt=json&expungeDeletes=false}{add=[AAAR79AAMAAAADjAAA (1623909145638862848), AAAR79AAMAAAADjAAB (1623909145757351936), AAAR79AAMAAAADjAAC (1623909145761546240), AAAR79AAMAAAADjAAD (1623909145765740544), AAAR79AAMAAAADjAAE (1623909145769934848), AAAR79AAMAAAADjAAF (1623909145772032000), AAAR79AAMAAAADjAAG (1623909145775177728), AAAR79AAMAAAADjAAH (1623909145778323456), AAAR79AAMAAAADjAAI (1623909145781469184), AAAR79AAMAAAADjAAJ (1623909145785663488), … (5000 adds)],commit=} 0 6174

it tell you that Apache Solr is receiving from XE instance batchs of 5000 rows for indexing.

After that here a sample outputs of querying Oracle with Scotas Push connector domain index functionality, count hits:

SQL> select SolrPushConnector.countHits(‘SOURCE_BIG_PIDX’,’*:*’) from dual;

SOLRPUSHCONNECTOR.COUNTHITS(‘SOURCE_BIG_PIDX’,’*:*’)
—————————————————-
5000

Elapsed: 00:00:00.030

query with highlighting:

SQL> select sscore(1),shighlight(1) from test_source_big where scontains(text,'”procedure java”~10′,1)>0 order by sscore(1) desc;

SSCORE(1)
———-
SHIGHLIGHT(1)
——————————————————————————————————————————————–
1.3910314
{“title”:[” — <em>procedure</em> executing the recursive statement via DBMS_SQL, a <em>Java</em> stored\n”]}
Elapsed: 00:00:00.347

domain index pagination:

SQL> select /*+ DOMAIN_INDEX_SORT */ sscore(1) sc,text from test_source_big where scontains(text,’rownum:[1 TO 10] AND function’,1)>0 order by sscore(1) asc;
1.3612658
member function getBlobVal(csid IN number, pflag IN number, indent IN number) return BLOB deterministic parallel_enable
…………
1.5529456
If it is based on a user defined type, this function will return
10 rows selected.

Elapsed: 00:00:00.112

faceting and pivot:

SQL> declare
2 obj f_info;
3 BEGIN
4 obj := SolrPushConnector.facet(‘SOURCE_BIG_PIDX’,null,’facet.field=type_s&facet.limit=5&facet.pivot=type_s,line_i’);
5 dbms_output.put_line(obj.fields.to_string);
6 dbms_output.put_line(obj.pivots.to_string);
7 END;
8 /
{“type_s”:[“PACKAGE”,13493,”TYPE”,6438,”FUNCTION”,64,”PROCEDURE”,5]}
{“type_s,line_i”:[{“field”:”type_s”,”value”:”PACKAGE”,”count”:13493,”pivot”:[{“field”:”line_i”,”value”:1,”count”:20},{“field”:”line_i”,”valu
e”:2,”count”:19},{“field”:”line_i”,”value”:3,”count”:19},{“field”:”line_i”,”value”:4,”count”:19},{“field”:”line_i”,”value”:5,”count”:19}]},{
“field”:”type_s”,”value”:”TYPE”,”count”:6438,”pivot”:[{“field”:”line_i”,”value”:1,”count”:456},{“field”:”line_i”,”value”:2,”count”:429},{“fi
eld”:”line_i”,”value”:3,”count”:346},{“field”:”line_i”,”value”:4,”count”:295},{“field”:”line_i”,”value”:5,”count”:283}]},{“field”:”type_s”,”
value”:”FUNCTION”,”count”:64,”pivot”:[{“field”:”line_i”,”value”:1,”count”:14},{“field”:”line_i”,”value”:2,”count”:14},{“field”:”line_i”,”val
ue”:3,”count”:10},{“field”:”line_i”,”value”:4,”count”:6},{“field”:”line_i”,”value”:5,”count”:5}]},{“field”:”type_s”,”value”:”PROCEDURE”,”cou
nt”:5,”pivot”:[{“field”:”line_i”,”value”:1,”count”:1},{“field”:”line_i”,”value”:2,”count”:1},{“field”:”line_i”,”value”:3,”count”:1},{“field”
:”line_i”,”value”:4,”count”:1},{“field”:”line_i”,”value”:5,”count”:1}]}]}
PL/SQL procedure successfully completed.

Elapsed: 00:00:00.080

more samples at the source at the script testSourceBig.sql and tutorial directory.

5 minutes tutorial with ElasticSearch Push Connector

Continuing with the 5 minutes tutorial started in previous post this is about how to start working with Scotas ElasticSearch Push Connector technology.

After you install the Push Connector and your Elastic Search (ES) server, you could do a simple test using examples included into the script located at your distribution directory – db/tutorial/esTutorial.sql script.

Assuming that the ES server is installed by default we can start them using -f flag to see the log output at the console, for example:

[mochoa@localhost elasticsearch-0.19.4]$ export JAVA_HOME=/usr/local/jdk1.6
[mochoa@localhost elasticsearch-0.19.4]$ export PATH=$JAVA_HOME/bin:$PATH
[mochoa@localhost elasticsearch-0.19.4]$ bin/elasticsearch -f
[2012-06-22 11:00:40,089][INFO ][node ] [Grim Hunter] {0.19.4}[4061]: initializing …

[2012-06-22 11:00:46,899][INFO ][http ] [Grim Hunter] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.100.55:9200]}
[2012-06-22 11:00:46,900][INFO ][node ] [Grim Hunter] {0.19.4}[4061]: started
[2012-06-22 11:00:46,907][INFO ][gateway ] [Grim Hunter] recovered [0] indices into cluster_state

As you see above, ES server is listening on 9200 TCP/IP port which is the default value used by the Push Connector (PC).

First, on another console we need to connect with an user with DBA role to grant the PCUSER role to scott. Then, we are able to start working with the connector examples using the traditional scott user, let see:

-bash-4.2$ sqlplus “/ as sysdba”

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jun 22 11:01:10 2012

Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> grant PCUSER TO scott;

Grant succeeded.

Now we are ready to start creating a table to store a simple tweet structure as is shown in many ES examples.

CREATE TABLE ES_TUTORIAL (
USER_ID VARCHAR2(30) PRIMARY KEY,
MESSAGE VARCHAR2(140),
POST_DATE TIMESTAMP);

and populate it with some rows:

insert into ES_TUTORIAL values (
‘kimchy’,
‘trying out Elastic Search’,
TO_TIMESTAMP(‘2009-11-15T14:12:12′,’YYYY-MM-DD”T”HH24:MI:SS’));

insert into ES_TUTORIAL (
select ‘kimchy-‘||rownum,’trying out Elastic Search ‘||rownum,TO_TIMESTAMP(sysdate+rownum,’YYYY-MM-DD”T”HH24:MI:SS’) from dual connect by rownum <= 100);

insert into ES_TUTORIAL (
select ‘kimchy-‘||(rownum+100),’trying out Elastic Search ‘||rownum,TO_TIMESTAMP(sysdate+rownum,’YYYY-MM-DD”T”HH24:MI:SS’) from dual connect by rownum <= 100);

insert into ES_TUTORIAL (
select ‘kimchy-‘||(rownum+200),’trying out Elastic Search ‘||rownum,TO_TIMESTAMP(sysdate+rownum,’YYYY-MM-DD”T”HH24:MI:SS’) from dual connect by rownum <= 100);

commit; 

Note that we are using TO_TIMESTAMP SQL function to format an Oracle timestamp value to a Java compatible timestamp format.

Now we are ready to index the above table:

[mochoa@localhost pc]$ sqlplus scott/tiger@test

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jun 22 11:38:21 2012

Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set define off timing on serverout on
SQL> CREATE INDEX TWEET_ESIDX ON ES_TUTORIAL(MESSAGE) INDEXTYPE IS PC.ES
PARAMETERS(‘{Settings:{“number_of_shards” : 1},SyncMode:OnLine,LockMasterTable:false,IncludeMasterColumn:false,ExtraCols:”user_id \”user\”,to_char(post_date,”YYYY-MM-DD\”T\”HH24:MI:SS”) \”post_date\”,message \”message\””}’);

Index created.

Elapsed: 00:00:00.74

we can see the output at the ES console telling us that a new index has been created:

[2012-06-22 11:38:47,837][INFO ][cluster.metadata ] [Grim Hunter] [scott.tweet_esidx] creating index, cause [api], shards [1]/[1], mappings []
[2012-06-22 11:38:48,404][INFO ][cluster.metadata ] [Grim Hunter] [scott.tweet_esidx] update_mapping [row] (dynamic)

Note that the index was created at ES server with the sintax [schema].[index_name] in lower case.

Here the explanation of the parameters used during index creation time:

  • Settings, any ES settings parameter in the example the Sharding configuration
  • SyncMode, OnLine means Near Real Time (NRT) mode for inserts/updates and Real Time (RT) mode for deletes, if you need information about this configuration please read the Scotas OLS White Paper, section Working modes replacing Solr by ES.
  • LockMasterTable, false means that during the table scan for selecting rows there is no row level locking, when this parameter is true rows are selected with “for update” mode.
  • IncludedMasterColumn, false means that the values of the master column (defined in create index syntax, MESSAGE) will not be included in the index structure, please see next parameter.
  • ExtraCols, is the list of the columns, formats and field names used at ES index structure. Above example shows that ES index will include three fields user, post_date and message. Note that field are case sensitive and now we are lower casing it.

Now the index is ready to use and has been populated with the 301 rows which are into the table structure. To check if all rows were properly indexed we can query for the rowid field, this field is always indexed and works as id into the ES index, let see:

SQL> select ElasticSearch.countHits(‘TWEET_ESIDX’,’rowid:*’) from dual;
301
Elapsed: 00:00:00.04

also we can select rows that match with a specific string, for example look for the 53 string:

SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:53‘,1)>0;
2.327857                      trying out Elastic Search 53
2.327857                      trying out Elastic Search 53
2.327857                      trying out Elastic Search 53
Elapsed: 00:00:00.14

a few comments about the above syntax for querying ES:

  • econtains, operator is used to query the ES using the Push Connector,
    first argument is the master column of the index, the one which is included into the create index syntax
    second argument is the query string using ES q parameter in the _search? url.
    third argument is the ancillary operator identifier (used by escore(), see next item)
    return value is >0 when the row match against the input query
  • escore, is used to get the score computed by econtains() operator at the specific row, note the input parameter must be correlated to the last argument in econtains().

Now we are trying to explain what the NRT means in a few queries, let see what happens during an update:

SQL> update ES_TUTORIAL set message=message||’-mod‘ where rownum<=10;
10 rows updated.

the RDBMS tell us that 10 rows where updated but NRT deferred this update into the ES server until we commit the transaction, we can easily check using above select to see if it returns mod as positive hit:

SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod‘,1)>0;
no rows selected

now what happens when we commit pending changes at the current transaction:

SQL> commit;
Commit complete.
SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod‘,1)>0;
no rows selected
SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod‘,1)>0;

1.6282115     trying out Elastic Search 76-mod
1.6282115     trying out Elastic Search 77-mod
1.6282115     trying out Elastic Search 78-mod
1.6282115     trying out Elastic Search 79-mod
1.6282115     trying out Elastic Search 80-mod
1.6282115     trying out Elastic Search 81-mod
1.6282115     trying out Elastic Search 82-mod
1.6282115     trying out Elastic Search 83-mod
1.6282115     trying out Elastic Search 84-mod
1.6282115     trying out Elastic Search 85-mod
10 rows selected.

after a few milliseconds the ES layer receives the update and rows are returned as positive hits.

Now we can test RT deletes, for example:

SQL> delete from es_tutorial where rowid in (select rowid from ES_TUTORIAL where econtains(message,’message:mod‘,1)>0);
10 rows deleted.
SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod‘,1)>0;
no rows selected

as you can see after 10 rows where deleted, these rows where not returned as positive hits immediately (Real Time), for the current transaction these 10 rows are not available except if we do a rollback.

To checks what happens at the ES side we can open another concurrent transaction in parallel and check if these rows are visible for other users (remember that the transaction is not committed yet), let see a second connection:

SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod’,1)>0;
1.6282115     trying out Elastic Search 76-mod
…..
1.6282115     trying out Elastic Search 85-mod
10 rows selected.

now we will commit deleted rows at  first connection:

SQL> commit;
Commit complete.
SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod’,1)>0;
no rows selected

and see what happens at the second connection:

SQL> select escore(1),message from ES_TUTORIAL where econtains(message,’message:mod’,1)>0;
no rows selected

Second connection gets no hits because rows where deleted definitely and the ES server receives the changes.

Finally by dropping the index, the ES server drops the index structure too, here the SQL command:

SQL> drop index TWEET_ESIDX;
Index dropped.

and the ES server console:

[2012-06-22 14:52:59,460][INFO ][cluster.metadata ] [Grim Hunter] [scott.tweet_esidx] deleting index

and that’s all, as you can see Elastic Search Push Connector works as all Scotas searching products, functionality not covered into this blog post where faceting, sorting using score or others fields, detecting changes on multiples columns, etc. but they are supported by the connector.