Feed aggregator

query performance tuning

Tom Kyte - Wed, 2023-11-29 22:26
hi, there are queries which are performing slow from past 1 day, earlier same queries were giving output in 22 mins and not it takes 55 mins, this is basically the EOD process. Query 1: <code>SELECT DISTINCT DECODE(:B1 ,'C',IR.REASON_CODE,'D',IR.DESCRIPTION) REASON FROM DEA_REQN_REASONS RR INNER JOIN DEA_INVESTMENT_REASON IR ON UPPER(RR.REASON_CODE) = UPPER(IR.REASON_CODE) WHERE RR.REQ_ID IN (SELECT DEAL_ID FROM DEA_SMD_DEALS_ALLOC WHERE APP_REFER = :B2 UNION ALL SELECT DEAL_ID FROM DEA_SMD_DEALS_ALLOC_HIST WHERE APP_REFER = :B2 UNION ALL SELECT DEAL_ID FROM DEA_PMDEAL_ALLOC WHERE APP_REFER = :B2 UNION ALL SELECT DEAL_ID FROM DEA_PMDEAL_ALLOC_HIST WHERE APP_REFER = :B2 ) AND IR.RECTYPE='L'</code> Query 2: <code>select * from v_chkposint where exists (select scheme from scheme where amc_code = v_chkposint.amc_code and scheme = v_chkposint.scheme and project_y_n = 'Y' and rectype = 'L')</code> what's the scope of tuning, if you expert advise is given, will help to improve the performance. <code>BEGIN_INTERVAL_TIME PLAN_HASH_VALUE OPTIMIZER_ SQL_PROFIL DISK_READS_TOTAL BUFFER_GETS_TOTAL CPU_TIME_TOTAL ELAPSED_TIME_TOTAL IOWAIT_TOTAL ------------------------------ --------------- ---------- ---------- ---------------- ----------------- ----------------- ------------------ ----------------- 20-OCT-23 04.30.46.535 PM 1596829921 ALL_ROWS 45897 11332241 18512737 68903861 8012199 20-OCT-23 01.30.04.043 PM 1596829921 ALL_ROWS 680 7598713 12852262 47355121 428789 19-OCT-23 03.30.03.269 PM 1596829921 ALL_ROWS 27854 393591114 729369364 2402058351 781202 19-OCT-23 11.30.11.237 AM 1596829921 ALL_ROWS 27854 379549234 706629192 2327540410 781202 18-OCT-23 06.30.23.024 PM 1596829921 ALL_ROWS 13936 126131290 220169362 733216565 418827</code> Please find below execution plan for SQL_ID [bjhx2g3b6krmf and ff0sjh69smwx2]. +++++++++++++Execution Plan for SQL_ID [bjhx2g3b6krmf] <code>SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY_AWR('bjhx2g3b6krmf')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- SQL_ID bjhx2g3b6krmf -------------------- SELECT DISTINCT DECODE(:B1 ,'C',IR.REASON_CODE,'D',IR.DESCRIPTION) REASON FROM DEA_REQN_REASONS RR INNER JOIN DEA_INVESTMENT_REASON IR ON UPPER(RR.REASON_CODE) = UPPER(IR.REASON_CODE) WHERE RR.REQ_ID IN (SELECT DEAL_ID FROM DEA_SMD_DEALS_ALLOC WHERE APP_REFER = :B2 UNION ALL SELECT DEAL_ID FROM DEA_SMD_DEALS_ALLOC_HIST WHERE APP_REFER = :B2 UNION ALL SELECT DEAL_ID FROM DEA_PMDEAL_ALLOC...
Categories: DBA Blogs

SQL Firewall in 23c - UKOUG and Scripts

Pete Finnigan - Mon, 2023-11-27 15:26
I spoke at the UKOUG conference just over a week ago twice and the second of my talks was about the new SQL Firewall in Oracle 23c. This was a walk through of the SQL Firewall and what it does....[Read More]

Posted by Pete On 27/11/23 At 03:18 PM

Categories: Security Blogs

Writting DDL_EVENT Triggers

Tom Kyte - Mon, 2023-11-27 15:26
Do you have a full example of writing a ddl_Even trigger. I want to audit on a schema level any creates, alters, or drops of any objects....and write the actual calling syntax to a table. Oracle8 has the new feature of DDL_EVENT, and their is an example in the SQL Reference, but it just shows a pl/sql_block not how to extract the calling DDL. If their is not an easier way I would end up querying v$sql_text and looking for text with some ddl call in it. I would then write the text to a seperate table. Any other ideas or examples would be appreciated. Thanks
Categories: DBA Blogs

Easy-to-Follow RAG Pipeline Tutorial: Invoice Processing with ChromaDB & LangChain

Andrejus Baranovski - Mon, 2023-11-27 07:11
I explain the implementation of the pipeline to process invoice data from PDF documents. The data is loaded into Chroma DB's vector store. Through LangChain API, the data from the vector store is ready to be consumed by LLM as part of the RAG infrastructure. 

 

sys_op_descend()

Jonathan Lewis - Mon, 2023-11-27 03:46

When you specify that a column used in an index should be a “descending” column Oracle uses the internal, undocumented, sys_op_descend() function to derive a value to store for that column in the index leaf block.

For many years I’ve claimed that this function simply takes the one’s-complement of a character value and appends 0xff to the result before storing it. This is nothing like the whole story and in a recent comment to a note on descending indexes and finding max() values I was given another part of the story in response to my confession that I did not know why a varchar2(128) had turned into a raw(193) – it’s the length not the rawness that puzzled me – when subject to sys_op_descend().

Here’s a little script to generate some data that helps to explain what’s going on.

rem
rem     Script:         sys_op_descend.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem

create table t1 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t1 values('A','A');
insert into t1 values('AB','AB');
insert into t1 values('ABC','ABC');
commit;

create table t2 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t2 values(chr(0), chr(0));
insert into t2 values(chr(1), chr(1));
insert into t2 values(chr(2), chr(2));
insert into t2 values(chr(3), chr(3));
insert into t2 values(chr(4), chr(4));
commit;
;

commit;

There’s nothing particularly special about the two tables and data I’ve inserted, I’m just trying to generate patterns that make it easy to see what’s going on; and I’ll be comparing a varchar2() and an nvarchar2() because my nvarchar2() column is a multi-byte character set.

Let’s look at the (nearly raw) data from t1 where I’ve stored growing strings of printable characters. Here’s a simple query and its result set:

set linesize 50
break on row

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t1
/


V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 41
Typ=23 Len=2: be,ff
Typ=1 Len=2: 0,41
Typ=23 Len=4: fe,fb,be,ff

Typ=1 Len=2: 41,42
Typ=23 Len=3: be,bd,ff
Typ=1 Len=4: 0,41,0,42
Typ=23 Len=7: fe,fb,be,fe,fb,bd,ff

Typ=1 Len=3: 41,42,43
Typ=23 Len=4: be,bd,bc,ff
Typ=1 Len=6: 0,41,0,42,0,43
Typ=23 Len=10: fe,fb,be,fe,fb,bd,fe,fb,bc,ff


If you start with the first two columns of the output rows you can see: ‘A’ == 0x41, which becomes 0xbe, 0xff, following the “one’s complement with an appended 0xff” rule. The same pattern is visible for ‘AB’ and ‘ABC’.

When you look at the 3rd and 4th columns of each row (the nvarchar2), ‘A’ is now a 2-byte value (0x00, 0x41) which turns into the four bytes: 0xfe, 0xfb, 0xbe, 0xff. The value 0xbe is recognisable as the one’s-complement of 0x41 that appeared for the varchar2() values – but the 0x00 in the original nvarchar2() value seems to have turned into a two-byte 0xfe, 0xfb rather than the “expected” 0xff.

Why doesn’t Oracle use the “right” one’s complement for zero? Maybe because 0xff is the byte that’s supposed to mark the end of the string, and it’s important to avoid the risk of ambiguity. But now we have a new problem: Oracle is using 0xfe as the first of two bytes to represent the “descending” zero, and 0xfe is the one’s-complement of 0x01. So how is Oracle working around the fact that it would be a bad idea to have two possible meanings for the value 0xfe? That’s where the second data set comes in; here’s the same query, with results, run against the t2 table:

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t2
/

V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 0
Typ=23 Len=3: fe,fe,ff
Typ=1 Len=2: 0,0
Typ=23 Len=3: fe,fd,ff

Typ=1 Len=1: 1
Typ=23 Len=3: fe,fa,ff
Typ=1 Len=2: 0,1
Typ=23 Len=3: fe,fc,ff

Typ=1 Len=1: 2
Typ=23 Len=2: fd,ff
Typ=1 Len=2: 0,2
Typ=23 Len=4: fe,fb,fd,ff

Typ=1 Len=1: 3
Typ=23 Len=2: fc,ff
Typ=1 Len=2: 0,3
Typ=23 Len=4: fe,fb,fc,ff

Typ=1 Len=1: 4
Typ=23 Len=2: fb,ff
Typ=1 Len=2: 0,4
Typ=23 Len=4: fe,fb,fb,ff

Looking at the last three groups of 4 lines we can see the varchar2() column following the “one’s complement, append 0xff” rule and the nvarchar2() following the additional “use 0xfe 0xfb for zero” rule; but for chr(0) and chr(1) the dumps need some further untangling. With the tests so far all we can say with any confidence is that “if you see the 0xfe byte then a different pattern is about to appear briefly”.

I don’t really need to do any more experiments to guess why, in my previous note, the descending varchar2(128) was projected as raw(193) (though you might want to play around to see what happens with strings of several consecutives zeros or ones). I suppose the size reported could vary with character set, but if I have a two-byte fixed width character set and most of the data corresponds to basic ASCII characters then I’ll have a lot of characters where every other byte is a zero that encodes into two bytes when descending – so it makes sense to use for the projection a size derived as: 64 * 1 + 64 * 2 + 1 (0xff) = 193. Q.E.D.

Index sizing

An important side effect of this improved understanding is the realisation of what can happen to the size of an index when declared descending. For a simple demonstration, here’s a table with 4 columns and 50,000 rows copied from all_objects.object_name, using the nvarchar2() type for one of the pairs of copies.

create table t1a (
        nva nvarchar2(128),
        nvd nvarchar2(128),
        va  varchar2(128),
        vd  varchar2(128)
)
/

insert into t1a
select  object_name, object_name, object_name, object_name
from    all_objects
where
        rownum <= 50000
;

create index t1a_nva on t1a(nva);
create index t1a_nvd on t1a(nvd desc);
create index t1a_va on t1a(va);
create index t1a_vd on t1a(vd desc);

execute dbms_stats.gather_table_stats(user,'t1a')

select index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1A'
order by
        index_name
/

select 
        column_name, avg_col_len 
from    user_tab_cols 
where   table_name = 'T1A' 
order by 
        column_name
/


INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1A_NVA                      590
T1A_NVD                      854
T1A_VA                       336
T1A_VD                       343

4 rows selected.

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
NVA                           74
NVD                           74
SYS_NC00005$                 111
SYS_NC00006$                  39
VA                            38
VD                            38

6 rows selected.

As you can see, the descending varchar2() index (backed by column sys_nc0006$) is only slightly larger than the corresponding ascending index, but the descending nvarchar2() (backed by column sys_nc00005$) has increased by something much closer to 50% in size because half the bytes in each object_name were zeroes that have been replaced by the two byte 0xfe 0xfb. This is much worse than the “one byte per descending column per row” that I’ve been claiming for the last 20 or 30 years.

Why book reports aren’t a good homework assignment any more

RDBMS Insight - Fri, 2023-11-24 13:40
/* Chat containers */ .chat { border: 2px solid #dedede; background-color: #f1f1f1; border-radius: 5px; padding: 10px; margin: 10px 0; } /* Darker chat container */ .darker { border-color: #ccc; background-color: #ddd; } /* Style images */ .chat img { float: left; max-width: 60px; width: 100%; margin-right: 20px; border-radius: 50%; }

So, everyone knows that the kids these days are using ChatGPT to write their book reports. Even Big Nate knows it!

But what about ChatGPT’s safeguards? Isn’t it supposed to have some kind of anti-cheating baked in, so it won’t just write essays for kids? Why doesn’t that work?

Sure, it does have safeguards… kind of. If you just ask it to write an essay, it responds with a “helpful” answer about how to write an essay. The thing is that these safeguards are incredibly easy to work around.

Let’s pretend we’re a student who has to write a book report on The Kingdom over the Sea, by Zohra Nabi. Here’s how to write it in 20 minutes without even touching the book.

1. Find a review.

Look up the book online in a search engine. Find a book review that describes it and skim the review. Note the main character’s name and what happens.

In this case, the Kingdom over the Sea was reviewed on Kirkus Reviews. The main character’s name is Yara, and the author’s name is Zohra Nabi.

2. Ask ChatGPT for the first paragraph.

Let’s pretend we’re a Grade 6 student asking ChatGPT what should go in the first paragraph of this book report. Here are the prompts I used:

Big Nate

Hi Im a Grade 6 student and I need to write a baper on a book called, the Kingdom over the Sea.

Big Nate

can you tell me what to put in the first paragraph

chatgpt logo

Of course! The first paragraph of your paper is usually the introduction, and it should give your reader a brief overview of the book and what your paper will be about. Here’s a basic structure for the first paragraph:

Start with a hook: Begin with an interesting or engaging sentence to grab your reader’s attention. It could be a quote from the book, a surprising fact, or a question related to the story.

Introduce the book: Mention the title of the book (“The Kingdom over the Sea”) and the author’s name if you know it.

Provide some context: Briefly describe the genre of the book (e.g., fantasy, adventure) and its setting (where and when the story takes place).

Mention the main idea: Give a hint about the main theme or what the story is about. For example, is it about a young hero going on an epic adventure, a mystery to be solved, or a journey to a magical land?

State your purpose: Finally, state that your paper will explore and analyze various aspects of the book. Mention the topics or themes you plan to discuss, such as characters, plot, or themes.

Here’s an example to get you started:

“In the enchanting world of ‘The Kingdom over the Sea,’ written by [Author’s Name], readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of [Main Character’s Name] as he embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”

Feel free to customize it to fit your own thoughts and ideas. If you have any specific details about the book you’d like to include in the introduction, please let me know!

3. Fill in and rephrase

Ignore most of ChatGPT’s long, didactic answer. Grab the sample paragraph, fill in the missing “Author’s Name” and “Main Character’s Name”, and–this part is key–ask ChatGPT to rephrase it like a Grade 6 student wrote it. If you don’t, you’ll wind up with this problem:

Big Nate comic

Here’s the rewritten prompt:

Big Nate

please rewrite this text so it sounds like a Grade 6 student wrote it: “In the enchanting world of ‘The Kingdom over the Sea,’ written by Zohra Nabi, readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of Yara as she embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”

chatgpt logo

“In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.”

And there’s the first paragraph of your book report — done!

5. Second paragraph: Paraphrase

For the second paragraph, just ask ChatGPT to paraphrase the whole review you found in step 1. Paste the entire review into the prompt. (I haven’t pasted the whole thing; the review starts with “Cryptic directions”)

Big Nate

can you write this paragraph like a 6th grader wrote it “Cryptic directions…

chatgpt logo

“Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.”

6. Expand

Now you have two paragraphs. Feed both of them to ChatGPT and ask it to write Paragraph 3.

Big Nate Paragraph 1 of the book review is is “In ‘The Kingdom over the Sea’…. Paragraph 2 is “Yara’s mom…” What should I put in Paragraph 3?

Note: In the actual prompt, I pasted in the entire first and second paragraphs; I didn’t duplicate them here because that would be boring. But you get the idea.

chatgpt logo

“One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!”

7. Put it all together

Put the three paragraphs all together to get a finished 440-word book review that could plausibly come from an overly enthusiastic middle-schooler:

In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.

Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.

One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!

8. Sprinkle with mistakes and hand in

Add a few spelling mistakes for verisimilitude, and the homework is done.

This took me about 20 minutes, start to finish: much less time than it would’ve taken me to read the book, think about it, come up with something to say, and write it down.

And if I really was a Grade 6 student, I would’ve lost the experience of reading a fantastic book, thinking about it, and reflecting on my own reaction to it. (I did read the book, and it really is lovely.)

What’s a teacher to do?

I don’t know, but here are a couple of ideas from a non-teacher:

  • Use in-class quizzes, not book reports to see who’s done the assigned reading.
  • Ask for response papers that discuss a student’s personal reaction to a book, rather than summarizing the book.
  • Ask for personalized answers in which students contextualize their response to a book with something in their own lives.
  • Require handwritten homework so that, at the very least, students have to hand-copy what ChatGPT spits out. Even the act of copying can lead to reflection.
Post credits
  • Post images are from Big Nate, written and drawn by Lincoln Peirce.

  • Chat CSS is adapted from W3Schools.
Categories: DBA Blogs

Using vector databases for context in AI

Kubilay Çilkara - Fri, 2023-11-24 11:30

In the realm of Artificial Intelligence (AI), understanding and retaining context stands as a pivotal factor for decision-making and enhanced comprehension. Vector databases, are the foundational pillars in encapsulating your own data to be used in conjunction with AI and LLMs. Vector databases are empowering these systems to absorb and retain intricate contextual information.

Understanding Vector Databases

Vector databases are specialised data storage systems engineered to efficiently manage and retrieve vectorised data - also known as embeddings. These databases store information in a vector format, where each data entity is represented as a multidimensional numerical vector, encapsulating various attributes and relationships, thus fostering the preservation of rich context. That is text, video or audio is translated into numbers with many attributes in the multidimensional space. Then mathematics are used to calculate the proximity between these numbers. Loosely speaking that is what a neural network in an LLM does, it computes proximity (similarity) between the vectors. The vector database is the database where the vectors are stored. If you don't use vector databases in LLM, under architectures like RAG, you will not be able to bring your own data or context into your LLM AI model as all it will know will be what it is trained on which will probably be what it was trained on from the public internet. Vector database enable you to bring your own data to AI.

Examples of Vector Databases

Several platforms offer vector databases, such as Pinecone, Faiss by Facebook, Annoy, Milvus, and Elasticsearch with dense vector support. These databases cater to diverse use cases, offering functionalities tailored to handle vast amounts of vectorised information, be it images, text, audio, or other complex data types.

Importance in AI Context

Within the AI landscape, vector databases play a pivotal role in serving specific data and context for AI models. Particularly, in the Retrieval-Augmented Generation (RAG) architecture, where retrieval of relevant information is an essential part of content generation, vector databases act as repositories, storing precomputed embeddings from your own private data. These embeddings encode the semantic and contextual essence of your data, facilitating efficient retrieval in your AI apps and Bots. Bringing vector databases to your AI apps or chatbots will bring your own data to your AI apps, Agents and chatbots, and these apps will speak your data in case of LLMs.

Advantages for Organisations and AI Applications

Organisations can harness the prowess of vector databases within RAG architectures to elevate their AI applications and enable them to use organisational specific data:

  1. Enhanced Contextual Understanding: By leveraging vector databases, AI models grasp nuanced contextual information, enabling more informed decision-making and more precise content generation based on specific and private organisational context.

  2. Improved Efficiency in Information Retrieval: Vector databases expedite the retrieval of pertinent information by enabling similarity searches based on vector representations, augmenting the speed and accuracy of AI applications.

  3. Scalability and Flexibility: These databases offer scalability and flexibility, accommodating diverse data types and expanding corpora, essential for the evolving needs of AI-driven applications.

  4. Optimised Resource Utilisation: Vector databases streamline resource utilisation by efficiently storing and retrieving vectorised data, thus optimising computational resources and infrastructure.

Closing Thoughts

In the AI landscape, where the comprehension of context is paramount, vector databases emerge as linchpins, fortifying AI systems with the capability to retain and comprehend context-rich information. Their integration within RAG architectures not only elevates AI applications but also empowers organisations to glean profound insights, fostering a new era of context-driven AI innovation from data.

In essence, the power vested in vector databases will reshape the trajectory of AI, propelling it toward unparalleled contextualisation and intelligent decision-making based on in house and organisations own data.

But the enigma persists: What precisely will be the data fuelling the AI model?

Categories: DBA Blogs

Unified Auditing on tables

Tom Kyte - Thu, 2023-11-23 20:06
We have been trying to enable auditing on few DDL activities in autonomous database. As per the following code we created, enabled and verified the audit configs. Everything looks normal to us, however none of the table create/drop activities (listed at the end) are not being tracked in <code>unified_audit_trail</code> table. We can only see audit management related activities like <code>DBMS_AUDIT_MGMT BEGIN dbms_audit_mgmt.flush_unified_audit_trail; END;</code> in <code>unified_audit_trail</code> table. I must be missing something, can you please help me navigate through it. <code> -- Connect to ADMIN user -- DB Version SELECT * FROM v$version; -- Verfiy Audit Status SELECT parameter , value FROM v$option WHERE parameter = 'Unified Auditing'; -- Enable immediate write mode BEGIN dbms_audit_mgmt.set_audit_trail_property(audit_trail_type => dbms_audit_mgmt.audit_trail_unified , audit_trail_property => dbms_audit_mgmt.audit_trail_write_mode ,audit_trail_property_value => dbms_audit_mgmt.audit_trail_immediate_write); END; / -- Validate audit configs SELECT * FROM dba_audit_mgmt_config_params; -- Create an audit policy CREATE AUDIT POLICY xx_ddl_test_policy PRIVILEGES CREATE ANY TABLE , ALTER ANY TABLE , DROP ANY TABLE WHEN 'SYS_CONTEXT (''USERENV'', ''SESSION_USER'') IN (''ADMIN'',''XXTST'')' EVALUATE PER STATEMENT ; -- Enable Audit Policy AUDIT POLICY xx_ddl_test_policy; -- Validate policy SELECT * FROM audit_unified_enabled_policies WHERE policy_name IN ('XX_DDL_TEST_POLICY'); SELECT * FROM audit_unified_policies WHERE policy_name in (SELECT policy_name FROM audit_unified_enabled_policies); -- Flush audit trail EXEC dbms_audit_mgmt.flush_unified_audit_trail; -- Connect to XXTST user -- Test Audit Policy CREATE TABLE xx_test (vc_col VARCHAR2(25)); INSERT INTO xx_test ('Test 01'); COMMIT; TRUNCATE TABLE xx_test; DROP TABLE xx_test; -- Verify audit trial SELECT * FROM unified_audit_trail ORDER BY event_timestamp DESC; </code>
Categories: DBA Blogs

Statistics Gathering using older version feature

Tom Kyte - Thu, 2023-11-23 20:06
Hello Connor/Chris, I am in the middle of troubleshooting a slow-running application maintenance job for a database that has been migrated/upgraded from on-premise 12.1.0.2 on linux x64 to Exadata Cloud @ Customer setup. Many of the "Top SQLs" that have regressed are internal SQLs generated by call to GATHER_TABLE_STATS call that is part of the application job. The internal SQLs are similar to below but for some reason are running 5-6 times slower on 19c database. I am running out of options to figure out what could be causing this but I was wondering whether there exists a way to replicate the behaviour of statistics gathering from 12.1.0.2 in 19c database. Example internal SQL that has regressed: <code>select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */to_char(count("OWNER")),substrb(dump(min("OWNER"),16,0,64) ,1,240),substrb(dump(max("OWNER"),16,0,64),1,240), to_char(count("OBJECT_NAME")),substrb(dump(min("OBJECT_NAME"),16,0,64),1, 240),substrb(dump(max("OBJECT_NAME"),16,0,64),1,240), to_char(count("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,0,64) ,1,240),substrb(dump(max("SUBOBJECT_NAME"),16,0,64),1,240), to_char(count("OBJECT_ID")),substrb(dump(min("OBJECT_ID"),16,0,64),1,240), substrb(dump(max("OBJECT_ID"),16,0,64),1,240), to_char(count("DATA_OBJECT_ID")),substrb(dump(min("DATA_OBJECT_ID"),16,0,64) ,1,240),substrb(dump(max("DATA_OBJECT_ID"),16,0,64),1,240), to_char(count("OBJECT_TYPE")),substrb(dump(min("OBJECT_TYPE"),16,0,64),1, 240),substrb(dump(max("OBJECT_TYPE"),16,0,64),1,240), to_char(count("CREATED")),substrb(dump(min("CREATED"),16,0,64),1,240), substrb(dump(max("CREATED"),16,0,64),1,240),to_char(count("LAST_DDL_TIME")), substrb(dump(min("LAST_DDL_TIME"),16,0,64),1,240), substrb(dump(max("LAST_DDL_TIME"),16,0,64),1,240),to_char(count("TIMESTAMP") ),substrb(dump(min("TIMESTAMP"),16,0,64),1,240), substrb(dump(max("TIMESTAMP"),16,0,64),1,240),to_char(count("STATUS")), substrb(dump(min("STATUS"),16,0,64),1,240),substrb(dump(max("STATUS"),16,0, 64),1,240),to_char(count("TEMPORARY")),substrb(dump(min("TEMPORARY"),16,0, 64),1,240),substrb(dump(max("TEMPORARY"),16,0,64),1,240), to_char(count("GENERATED")),substrb(dump(min("GENERATED"),16,0,64),1,240), substrb(dump(max("GENERATED"),16,0,64),1,240),to_char(count("SECONDARY")), substrb(dump(min("SECONDARY"),16,0,64),1,240),substrb(dump(max("SECONDARY"), 16,0,64),1,240),to_char(count("NAMESPACE")),substrb(dump(min("NAMESPACE"), 16,0,64),1,240),substrb(dump(max("NAMESPACE"),16,0,64),1,240), to_char(count("EDITION_NAME")),substrb(dump(min("EDITION_NAME"),16,0,64),1, 240),substrb(dump(max("EDITION_NAME"),16,0,64),1,240), to_char(count("SHARING")),substrb(dump(min("SHARING"),16,0,64),1,240), substrb(dump(m...
Categories: DBA Blogs

503 Service Unavailable - ORA-28001: the password has expired

Tom Kyte - Tue, 2023-11-21 13:26
hi sir i got this message when i try to run my app "503 Service Unavailable 2023-11-15T09:24:51.336389300Z | 8-GbuZcF3_cnIHgb1ul5Kg | ORDS was unable to make a connection to the database. This can occur if the database is unavailable, the maximum number of sessions has been reached or the pool is not correctly configured. The connection pool named: |default|lo| had the following error(s): ORA-28001: the password has expired" how can i solve it ? thanks
Categories: DBA Blogs

Oracle SE2 Licence NUP

Tom Kyte - Tue, 2023-11-21 13:26
Hello, i am very new at Oracle licensing issues and i am facing a situation with an Oracle SE2 Licence NUP. i am very confused how to count the number of NUP's i will have to purchase. is a NUP an explicit Oracle user i have created? what about system Oracle users ? i have to deal with a web/backend application using an Oracle user to connect to an Oracle instance. but this application is in fact creating parallel processes each connecting to Oracle with this user. how to count the needed NUP's in this case. is NUP licensing the right way to go ? thanks in advance for your help. regards. Laure
Categories: DBA Blogs

How to extract the tag value form the xml tag with colon

Tom Kyte - Tue, 2023-11-21 13:26
How to Extract the Tag value from xml which as tag name with colon. <code> <link>https://www.rba.gov.au/statistics/frequency/exchange-rates.html#JPY</link> <description>AUD/JPY representative rate as at 4.00 pm Eastern Australian time on 14 Nov 2023</description> <dc:date>2023-11-14T17:52:00+11:00</dc:date> <dc:language>en</dc:language> <dcterms:audience>general</dcterms:audience> <cb:statistics rdf:parseType="Resource"> <rdf:type rdf:resource="http://www.cbwiki.net/wiki/index.php/RSS-CB_1.2_RDF_Schema#Exchange-Rates"/> <cb:country>AU</cb:country> <cb:institutionAbbrev>RBA</cb:institutionAbbrev> <cb:exchangeRate rdf:parseType="Resource"> <rdf:type rdf:resource="http://www.cbwiki.net/wiki/index.php/RSS-CB_1.2_RDF_Schema#Exchange-Rates"/> <cb:observation rdf:parseType="Resource"> <rdf:type rdf:resource="http://www.cbwiki.net/wiki/index.php/RSS-CB_1.2_RDF_Schema#Exchange-Rates"/> <cb:value>96.64</cb:value> <cb:unit>AUD</cb:unit> <cb:decimals>2</cb:decimals> </cb:observation> <cb:baseCurrency>AUD</cb:baseCurrency> <cb:targetCurrency>JPY</cb:targetCurrency> <cb:rateType>4.00 pm foreign exchange rates</cb:rateType> <cb:observationPeriod rdf:parseType="Resource"> <rdf:type rdf:resource="http://www.cbwiki.net/wiki/index.php/RSS-CB_1.2_RDF_Schema#Exchange-Rates"/> <cb:frequency>daily</cb:frequency> <cb:period>2023-11-14</cb:period> </cb:observationPeriod> </cb:exchangeRate> </cb:statistics> </code> Tried with by inserting the xml as clob object. <code>SELECT XMLTYPE(t.xclob).EXTRACT('/rdf:RDF/item/cb:statistics/cb:exchangeRate/cb:baseCurrency/text()').getStringVal() FROM test_clob t; Getting dORA-31011: XML parsing failed ORA-19202: Error occurred in XML processing LPX-00601: Invalid token in: ' '/rdf:RDF/item/cb:statistics/cb:exchangeRate/cb:baseCurrency/text()' 31011. 00000 - "XML parsing failed" *Cause: XML parser returned an error while trying to parse the document. *Action: Check if the document to be parsed is valid.</code> Please suggest.
Categories: DBA Blogs

Question about Leading Hint and outer Joined tables

Tom Kyte - Mon, 2023-11-20 19:06
Hi Tom, I have 5 tables in a query, Tab1, Tab2, Tab3, Tab4 and Tab5 and I have the following query below. Select * From Tab1, Tab2, Tab3, Tab4, Tab5 Where Tab1.x = Tab2.y(+) and Tab1.x = Tab3.y and Tab3.x = Tab4.y(+) and Tab3.x = Tab5.y; Question 1. Can a table who's outer joined (in this case Tab2, Tab4) be a Driving table? Question 2. In the Join Order what happens to tables that are outer joined (in this case Tab2, Tab4). Do we only use tables that are inner joined in the Leading hint? Regards, Ravi
Categories: DBA Blogs

UKOUG 2023 - Using Database Vault in Real Life

Pete Finnigan - Mon, 2023-11-20 18:26
I went down to Reading, UK last week on the train from York and presented at the conference being held at Oracles offices there in Reading. This is a short post to raise that i have posted a pdf of....[Read More]

Posted by Pete On 20/11/23 At 09:44 AM

Categories: Security Blogs

I started an AI newsletter just in time for OpenAI to implode

RDBMS Insight - Mon, 2023-11-20 17:53

Last week, I decided to soft-launch a weekly newsletter to highlight three or four recent news stories in AI. When I started it, I had no idea this would be one of the biggest weeks in AI news all year. There was one giant story: OpenAI, makers of ChatGPT, fired their CEO, Sam Altman. And then it got messy.

You can read the rest of this week’s newsletter here.

Why a newsletter: I spend several hours a week reading about AI/machine learning. I get frustrated when I see overly-simple takes on AI in the media, on social media, etc. I figured that the best thing I could do about that would be to share what I’m learning every week.

What the newsletter is: A weekly email about some of the past week’s most interesting stories in AI, plus the occasional backgrounder.

What it’s not: Business-focused, highly technical, or comprehensive.

What you’ll get from AI Week:

  • An interesting email about AI in your inbox every week.
  • Background knowledge to contextualize all the breathless media stories about the powers, or dangers, of AI.
  • A feeling for where the field is going, why it’s going there, and who’s powering it.

Where do I sign up? Right here:


Enter your email

Powered by Buttondown.

Categories: DBA Blogs

gby_pushdown

Jonathan Lewis - Mon, 2023-11-20 12:42

This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.

It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.

I’ll start with a script to create a simple data set and a trivial query with a parallel hint:

rem
rem     Script:         gby_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem

set linesize 180
set pagesize 60
set trimspool on
set tab off

create table t1 
as 
select 
        * 
from 
        all_objects 
where 
        rownum <= 50000
/


alter session set statistics_level = all;

set serveroutput off

prompt  ==================================================
prompt  Default (for this data) pushdown and hash group by
prompt  ==================================================

set feedback only

select 
        /*+ 
                qb_name(main)
                parallel(t1 2) 
--              no_gby_pushdown(@main)
--              no_use_hash_gby_for_pushdown(@main)
        */ 
        owner, object_type, count(*) ct
from 
        t1 
group by 
        owner, object_type
order by
        owner, object_type
/

set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));

In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.07 |       5 |      0 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.01 |       0 |      0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    121 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |    276 |    77   (8)|    121 |00:00:00.04 |    1043 |    991 |  1079K|  1079K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |    991 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.01 |    1043 |    991 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.

If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.

After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)

With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.

There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:

lan hash value: 3954212205

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.14 |       5 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.14 |       5 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.03 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    120 |00:00:00.03 |       0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       SORT GROUP BY      |          |      2 |    276 |    77   (8)|    120 |00:00:00.13 |    1043 |  9216 |  9216 |     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |    1043 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.

There is an interesting detail to look out for, though, in the Outline Data and Hint Report:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   1 -  MAIN
           -  qb_name(main)

   8 -  MAIN / T1@MAIN
           -  parallel(t1 2)

I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)

The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    77 (100)|     96 |00:00:00.08 |       5 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     96 |00:00:00.08 |       5 |  6144 |  6144 |     1/0/0|
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY        |          |      2 |    276 |    77   (8)|     96 |00:00:00.11 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE          |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |  50000 |    73   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |  50000 |    73   (3)|  50091 |00:00:00.02 |    1051 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T1       |     28 |  50000 |    73   (3)|  50091 |00:00:00.01 |    1051 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
   1 -  MAIN
           -  no_gby_pushdown(@main)
           -  qb_name(main)

   7 -  MAIN / T1@MAIN
           -  parallel(t1 2)

Key details to highlight here are:

  • There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
  • 50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
  • The no_gby_pushdown(@main) does appear in the Hint Report
  • The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).

Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.

Summary

When using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.

You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.

Session Activity Stats

Jonathan Lewis - Mon, 2023-11-20 09:35

A little detail to remember when trouble-shooting at the session level – some of the information summarised in the Instance Activity figures (v$sysstat) is not available in the Session Activity figures (v$sesstat / v$mystat). The difference goes right down to the x$ objects, and here are two versions of a simple query I wrote for 19c to check for the missing statistics (you’ll have to be able to connect as SYS to get results from the first one):

rem
rem     Script:         mystat_missing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.3.0.0
rem             19.11.0.0
rem             12.2.0.1

spool mystat_missing

set linesize 132
set pagesize 60
set trimspool on
set tab off


select 
        usd.indx, usd.ksusdnam, usd.ksusdcls
from 
        x$ksusd usd
where
        usd.indx not in (
                select  sta.indx 
                from    x$ksumysta      sta
        )
/

select
        sys.statistic#, sys.name, sys.class
from
        v$sysstat sys
where
        sys.statistic# not in (
                select  mst.statistic# 
                from    v$mystat        mst
        )
/

spool off

If you run the script against 12.2.0.1 you’ll find that there are no “missing” session stats, but when you upgrade to 19c (in my case 19.11.0.0) you’ll find a couple of dozen statistics reported. This was the output I got:

STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
      2010 Instance Statistic test case                                            384
      2011 cell RDMA reads unused                                                  264
      2012 cell RDMA writes                                                        264
      2013 cell pmem cache read hits unused                                        264
      2014 cell pmem cache writes                                                  264
      2015 NS transactions                                                         257
      2016 NS transactions timed out                                               257
      2017 NS transactions interrupted                                             257
      2018 NS transactions not using all standbys                                  257
      2019 NS transactions skipping send                                           257
      2020 NS transaction setups                                                   257
      2021 NS transaction bytes loaded                                             320
      2022 NS transaction bytes to network                                         320
      2023 NS transaction bytes relogged                                           322
      2024 NS transaction bytes logged                                             322
      2025 NS transaction send time                                                320
      2026 NS transaction setup time                                               320
      2027 NS transaction confirm time                                             320
      2028 NS recovery timeout interrupts                                          384
      2029 NS recovery DTC full interrupts                                         384
      2030 NS recovery fetch requests made                                         384
      2031 NS recovery fetch ranges received                                       384
      2032 NS recovery fetch requested bytes                                       384
      2033 NS recovery fetch received bytes                                        384
      2034 NS recovery fetch received time                                         384
      2035 NS recovery fetch requests remade                                       384

26 rows selected.

Running the query against 23c Free (23.3) on the Developer VM created by Oracle, the number of “missing” statistics jumped to 1,052 – so I won’t list them. Given the size of the list I did a quick check to remind myself of how many statistics were actually present in v$sysstat, and that produced an interesting comparison.

--
--      19.11.0.0
--
SQL> select count(*) ct_19c from v$sysstat;

    CT_19C
----------
      2036

--
--      23.3.0.0 (on premises)
--
SQL> select count(*) ct_23c from v$sysstat;

    CT_23C
----------
      2661

So there are 600-ish extra stats available in 23c, but 1,000-ish stats that don’t appear at the session level. So if you’ve been taking advantage of some of the “new” 19c stats to help with trouble-shooting you may find that they disappear on the upgrade. More on that later.

If you look at the output I’ve listed above you might spot that all the missing stats satisfy the predicate “class >= 256”. In fact, the class is a bit vector, and a more appropriate predicate would be: “bitand(class,256) = 256”. Either option gives you a fairly lazy way to do any checks you might be interested in. For example, after setting up a database link from a 19c instance to a 23c instance, I ran the following query to find out how many statistics that were visible in the 19c v$sesstat had changed their class to become invisible in the 23c v$sesstat.

select name from v$sysstat where class < 256
intersect
select name from V$sysstat@oracle23 where class >= 256
/


NAME
----------------------------------------------------------------
...
SMON posted for dropping temp segment
SMON posted for instance recovery
SMON posted for txn recovery for other instances
SMON posted for undo segment recovery
SMON posted for undo segment shrink
TBS Extension: bytes extended
TBS Extension: files extended
TBS Extension: tasks created
TBS Extension: tasks executed
...
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanout failures: delayed log
commit cleanout failures: flashback
commit cleanout failures: hot backup in progress
commit cleanout failures: write disabled
...
db corrupt blocks detected
db corrupt blocks recovered
...


502 rows selected.

Of the 502 stats a very large fraction were about In Memory (prefixed IM), and there were a number that looked as if they were only relevant to background processes and therefore (to some extent, possibly) not of interest when debugging user activity. It’s also possible that some of the statistics would fall into different classes if certain features (e.g hybrid columnar compression) were linked in to the executable.

Another query that might be of interest is a typical “what’s changed” query. What statistics are available in 23c that aren’t in 19c (and vice versa):

select
        name, version
from    (
        select name, '23c' version from v$sysstat@oracle23
        minus
        select name, '23c' from V$sysstat
        union all
        select name, '19c' version from v$sysstat
        minus
        select name, '19c' from V$sysstat@oracle23
        )
order by
        version desc, name
/

Again there are so many that I don’t think it would be helpful to reproduce my results, but I’ll just say that there were 663 stats in 23c that weren’t in 19c, and 38 stats in 19c that weren’t in 23c. Of course, it’s possible (and I didn’t check carefully) that some of these new/”disappearing” statistics show up only because they’ve had a name change as the version changed.

A couple of the new 23c stats that I like the look of (and would like to see at the session level) are:

user commits PL/SQL
user commits Autonomous

I’m sure that there are more that will reveal themselves as (occasionally) useful over time, and further investigation is left to the reader.

Create MySQL In Vagrant Box

Michael Dinh - Sun, 2023-11-19 09:17

First, thank you to jazzfogPavel for https://github.com/jazzfog/MySql-in-Vagrant as it was straight forward.

Too tired and too lazy to explain all the steps; however, the output should be self explanatory.

pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ ls -al
ls -al
total 25
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:37 .
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:34 ..
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:34 .git
-rw-r--r-- 1 pegasus 197121   31 Nov 19 06:34 .gitignore
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:35 .vagrant
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:37 MysqlData
-rw-r--r-- 1 pegasus 197121  635 Nov 19 06:34 Vagrantfile
drwxr-xr-x 1 pegasus 197121    0 Nov 19 06:34 provision
-rw-r--r-- 1 pegasus 197121 3010 Nov 19 06:34 readme.md

pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$

======================================================================

pegasus@Greek MINGW64 /c/vagrant
$ git clone https://github.com/jazzfog/MySql-in-Vagrant
Cloning into 'MySql-in-Vagrant'...
remote: Enumerating objects: 32, done.
remote: Total 32 (delta 0), reused 0 (delta 0), pack-reused 32
Receiving objects: 100% (32/32), 5.32 KiB | 5.32 MiB/s, done.
Resolving deltas: 100% (12/12), done.

pegasus@Greek MINGW64 /c/vagrant
$ ls
MySql-in-Vagrant/  vagrant/  vagrant-projects/

pegasus@Greek MINGW64 /c/vagrant
$ cd MySql-in-Vagrant/

pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ ls
Vagrantfile  provision/  readme.md

pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/trusty64'...

vagrant@vagrant-ubuntu-trusty-64:~$ mysql -V
mysql  Ver 14.14 Distrib 5.5.62, for debian-linux-gnu (x86_64) using readline 6.3

vagrant@vagrant-ubuntu-trusty-64:~$ mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.5.62-0ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SELECT version();
+-------------------------+
| version()               |
+-------------------------+
| 5.5.62-0ubuntu0.14.04.1 |
+-------------------------+
1 row in set (0.00 sec)

mysql> STATUS;
--------------
mysql  Ver 14.14 Distrib 5.5.62, for debian-linux-gnu (x86_64) using readline 6.3

Connection id:          37
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.5.62-0ubuntu0.14.04.1 (Ubuntu)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/run/mysqld/mysqld.sock
Uptime:                 5 min 13 sec

Threads: 1  Questions: 113  Slow queries: 0  Opens: 48  Flush tables: 1  Open tables: 41  Queries per second avg: 0.361
--------------

mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+-------------------------+
| Variable_name           | Value                   |
+-------------------------+-------------------------+
| innodb_version          | 5.5.62                  |
| protocol_version        | 10                      |
| slave_type_conversions  |                         |
| version                 | 5.5.62-0ubuntu0.14.04.1 |
| version_comment         | (Ubuntu)                |
| version_compile_machine | x86_64                  |
| version_compile_os      | debian-linux-gnu        |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)

mysql>

Vector Database Impact on RAG Efficiency: A Simple Overview

Andrejus Baranovski - Sun, 2023-11-19 08:54
I explain the importance of Vector DB for RAG implementation. I show with a simple example, how data retrieval from Vector DB could affect LLM performance. Before data is sent to LLM, you should verify if quality data is fetched from Vector DB. 

 

Pros and cons of DBLinks for application integration

Tom Kyte - Fri, 2023-11-17 17:46
Hello Tom, A colleague of mine is a big fan of service-oriented architectures and middle-ware. His view is that dblinks are 99.9% of the time a bad idea and that you should go via some sort of message layer in order to pass information between applications. In many circumstances I can see the value of this, particularly in heterogeneous environments, however we are an all Oracle shop and I don't see the value in writing a windows service to pull data from one database and insert it into another. What are your views on this? I realise this is not really a plsql question but I would be interested to hear your thoughts on this. Regards Alastair
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator