百度360必应搜狗淘宝本站头条
当前位置:网站首页 > IT技术 > 正文

Oracle向量数据库操作的一些随手笔记

wptr33 2024-12-26 17:07 45 浏览

1. Basic Demo:

| c(2,6). . b(5,6)
| .
| .
| a(2,2)
|_________________________

|b-a| = sqrt( (5-2)^2 + (6-2)^2 ) = 5

SELECT VECTOR_DISTANCE( vector('[2,2]'), vector('[5,6]'), EUCLIDEAN ) as distance;

How about COSINE?

CREATE TABLE IF NOT EXISTS embedding_store_hysun (
collection_name VARCHAR2(200) NOT NULL,
embedding VECTOR(*, FLOAT32) NOT NULL,
doc CLOB NOT NULL,
src VARCHAR2(500)
);

############################ In database embedding ############################

#EXEC DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);
#SQL> grant DB_DEVELOPER_ROLE to vector;
SQL> grant create mining model to pocuser;
Grant succeeded.
SQL> create or replace directory HYSUN_DUMP as '/u01/ords_sw/hysun_dump';
Directory HYSUN_DUMP created.
SQL> grant read on directory HYSUN_DUMP to pocuser;
Grant succeeded.

EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL('HYSUN_DUMP','bge-base-zh-v1.5.onnx','hysun_bge_zh_model',JSON('{"function" : "embedding", "embeddingOutput" : "embedding"}'));

SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM, ALGORITHM_TYPE, MODEL_SIZE
FROM USER_MINING_MODELS;

SQL> INSERT INTO embedding_store_hysun select 'DB_EMBED_TEST0', VECTOR_EMBEDDING(hysun_bge_zh_model USING 'Minimum Age to Get a Licence The minimum age to get a licence. minimum age' as input), 'Minimum Age to Get a Licence The minimum age to get a licence. minimum age', '/home/hysunhe/projects/oracle_vectordb/source_data/cdc_poc/QA_1.txt' from dual;
1 row inserted.

SQL> INSERT INTO embedding_store_hysun select 'DB_EMBED_TEST0', VECTOR_EMBEDDING(hysun_bge_zh_model USING 'Minimum Requirements for Enrolment The list of requirements/ enrolment prerequisites that needs to be met before enrolment. class 3/3a, Class 3A, class 2B, class 2, minimum requirements, enrolment' as input), 'Minimum Requirements for Enrolment The list of requirements/ enrolment prerequisites that needs to be met before enrolment. class 3/3a, Class 3A, class 2B, class 2, minimum requirements, enrolment', '/home/hysunhe/projects/oracle_vectordb/source_data/cdc_poc/QA_2.txt' from dual;
1 row inserted.

SQL> SELECT VECTOR_EMBEDDING(hysun_bge_zh_model USING 'mininum age to get a license' as input) AS embedding;

SELECT
collection_name,
embedding,
doc,
src,
VECTOR_DISTANCE(embedding, VECTOR_EMBEDDING(hysun_bge_zh_model USING 'mininum age to get a license' as input), COSINE) as distance
FROM embedding_store_hysun
WHERE
collection_name = 'DB_EMBED_TEST0'
ORDER BY distance
FETCH FIRST 3 ROWS ONLY;

######################## In database embedding end ########################

### Index:

show parameter vector_memory_size;
ALTER SYSTEM SET vector_memory_size=ON SCOPE=BOTH;
SELECT value FROM V$PARAMETER WHERE name='sga_target'; -- (max vector_memory_size = 70% SGA)
SELECT CON_ID, sum(alloc_bytes) / 1024 / 1024 FROM V$VECTOR_MEMORY_POOL GROUP BY CON_ID;
SELECT CON_ID, sum(USED_BYTES) / 1024 / 1024 FROM V$VECTOR_MEMORY_POOL GROUP BY CON_ID;

############################################################

In-Memory Neighbor Graph Vector Index(HNSW)

############################################################

create table galaxies (id number, name varchar2(50), doc varchar2(500), embedding vector);
insert into galaxies values (1, 'M31', 'Messier 31 is a barred spiral galaxy in the Andromeda constellation which has a lot of barred spiral galaxies.', '[0,2,2,0,0]');
insert into galaxies values (2, 'M33', 'Messier 33 is a spiral galaxy in the Triangulum constellation.', '[0,0,1,0,0]');
insert into galaxies values (3, 'M58', 'Messier 58 is an intermediate barred spiral galaxy in the Virgo constellation.', '[1,1,1,0,0]');
insert into galaxies values (4, 'M63', 'Messier 63 is a spiral galaxy in the Canes Venatici constellation.', '[0,0,1,0,0]');
insert into galaxies values (5, 'M77', 'Messier 77 is a barred spiral galaxy in the Cetus constellation.', '[0,1,1,0,0]');
insert into galaxies values (6, 'M91', 'Messier 91 is a barred spiral galaxy in the Coma Berenices constellation.', '[0,1,1,0,0]');
insert into galaxies values (7, 'M49', 'Messier 49 is a giant elliptical galaxy in the Virgo constellation.', '[0,0,0,1,1]');
insert into galaxies values (8, 'M60', 'Messier 60 is an elliptical galaxy in the Virgo constellation.', '[0,0,0,0,1]');
insert into galaxies values (9, 'NGC1073', 'NGC 1073 is a barred spiral galaxy in Cetus constellation.', '[0,1,1,0,0]');
SELECT name
FROM galaxies
ORDER BY VECTOR_DISTANCE( embedding, to_vector('[0,1,1,0,0]'), COSINE )
FETCH FIRST 3 ROWS ONLY;
SELECT name,
ROUND( VECTOR_DISTANCE( embedding, to_vector('[0,1,1,0,0]'), COSINE ), 2) as distance
FROM galaxies
ORDER BY distance
FETCH APPROXIMATE FIRST 4 ROWS ONLY;
-- WITH TARGET ACCURACY 90
EXPLAIN PLAN FOR
SELECT name,
VECTOR_DISTANCE( embedding, to_vector('[0,1,1,0,0]'), COSINE ) as distance
FROM galaxies
ORDER BY distance
FETCH APPROXIMATE FIRST 4 ROWS ONLY;
select plan_table_output from table(dbms_xplan.display('plan_table',null,'all'));
CREATE VECTOR INDEX galaxies_hnsw_idx ON galaxies (embedding) ORGANIZATION
INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 95;
CREATE VECTOR INDEX galaxies_hnsw_idx ON galaxies (embedding) ORGANIZATION
INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 90 PARAMETERS (type HNSW, neighbors 40, efconstruction
500);
SELECT name,
ROUND(VECTOR_DISTANCE( embedding, to_vector('[0,1,1,0,0]'), COSINE ), 3) distance
FROM galaxies
WHERE name <> 'NGC1073'
ORDER BY distance
FETCH APPROXIMATE FIRST 4 ROWS ONLY WITH TARGET ACCURACY 90;
drop INDEX galaxies_hnsw_idx;

##############################################################

Neighbor Partition Vector Index (IVF)

##############################################################

CREATE VECTOR INDEX galaxies_ivf_idx ON galaxies (embedding) ORGANIZATION
NEIGHBOR PARTITIONS
DISTANCE COSINE
WITH TARGET ACCURACY 95;
CREATE VECTOR INDEX galaxies_ivf_idx ON galaxies (embedding) ORGANIZATION
NEIGHBOR PARTITIONS
DISTANCE COSINE
WITH TARGET ACCURACY 90 PARAMETERS (type IVF, neighbor partitions 100);
The APPROX and APPROXIMATE keywords are optional. If omitted while connected to an
ADB-S instance, an approximate search using a vector index is attempted if one
exists.
-- Accuracy report
SET SERVEROUTPUT ON
declare
report varchar2(128);
begin
report := dbms_vector.index_accuracy_query(
OWNER_NAME => 'POCUSER',
INDEX_NAME => 'GALAXIES_IVF_IDX',
qv => to_vector('[0,1,1,0,0]'),
top_K => 10,
target_accuracy => 95 );
dbms_output.put_line(report);
end;
/

-- Index detail:

grant read on VECSYS.VECTOR$INDEX to pocuser;
SELECT JSON_SERIALIZE(IDX_PARAMS RETURNING VARCHAR2 PRETTY)
FROM VECSYS.VECTOR$INDEX WHERE IDX_NAME = 'GALAXIES_IVF_IDX';
CREATE PUBLIC DATABASE LINK LinkToLA1 CONNECT TO vectordemo IDENTIFIED BY "welcome1" USING '146.235.233.91:1521/pdb1.sub08030309530.justinvnc1.oraclevcn.com';
select OWNER, DB_LINK, USERNAME, VALID, HOST from all_db_links;
alter session set global_names=false;
select 1 from dual@LINKTOLA1;

#### Memo

grant create any directory to pocuser;
create directory RAG_DOC_DIR as '/u01/hysun/rag_docs';
create table RAG_FILES (
file_name varchar2(500),
file_content BLOB
);
create table RAG_INDB_PIPELINE (
id number,
name varchar2(50),
doc varchar2(500),
embedding VECTOR
);
Declare
mFile VARCHAR2(500) := 'Oracle向量数据库_lab.pdf';
mBLOB BLOB := Empty_Blob();
mBinFile BFILE := BFILENAME('RAG_DOC_DIR', mFile);
Begin
DBMS_LOB.OPEN(mBinFile, DBMS_LOB.LOB_READONLY); -- Open BFILE
DBMS_LOB.CreateTemporary(mBLOB, TRUE, DBMS_LOB.Session); -- BLOB locator initialization
DBMS_LOB.OPEN(mBLOB, DBMS_LOB.LOB_READWRITE); -- Open BLOB locator for writing
DBMS_LOB.LoadFromFile(mBLOB, mBinFile, DBMS_LOB.getLength(mBinFile)); -- Reading BFILE into BLOB
DBMS_LOB.CLOSE(mBLOB); -- Close BLOB locator
DBMS_LOB.CLOSE(mBinFile); -- Close BFILE

INSERT INTO RAG_FILES(file_name, file_content) values (mFile, mBLOB);
commit;
End;
/
insert into RAG_FILES(file_name, file_content) values('oracle-vector-lab', to_blob(bfilename('RAG_DOC_DIR', 'Oracle向量数据库_lab.pdf')));
commit;
select DBMS_LOB.getLength(FILE_CONTENT) from RAG_FILES;
drop table rag_doc_chunks purge;
create table rag_doc_chunks (doc_id varchar2(500), chunk_id number, chunk_data varchar2(4000), chunk_embedding vector);
-- utl_to_text: PDF -> TEXT
-- utl_to_chunks: TEXT -> CHUNKS
-- utl_to_embeddings: CHUNKS -> VECTORS
insert into rag_doc_chunks
select
dt.file_name doc_id,
et.embed_id chunk_id,
et.embed_data chunk_data,
to_vector(et.embed_vector) chunk_embedding
from
rag_files dt,
dbms_vector_chain.utl_to_embeddings(
dbms_vector_chain.utl_to_chunks(
dbms_vector_chain.utl_to_text(dt.file_content),
json('{"normalize":"all"}')
),
json('{"provider":"database", "model":"mydoc_model"}')
) t,
JSON_TABLE(
t.column_value,
'$[*]' COLUMNS (
embed_id NUMBER PATH '$.embed_id',
embed_data VARCHAR2(4000) PATH '$.embed_data',
embed_vector CLOB PATH '$.embed_vector'
)
) et;
commit;
insert into rag_doc_chunks
select
dt.file_name doc_id,
et.embed_id chunk_id,
et.embed_data chunk_data,
to_vector(et.embed_vector) chunk_embedding
from
rag_files dt,
dbms_vector_chain.utl_to_embeddings(
dbms_vector_chain.utl_to_chunks(
dbms_vector_chain.utl_to_text(dt.file_content),
JSON('{ "by":"words",
"max":"240",
"overlap":"15",
"split":"recursively",
"language":"SIMPLIFIED CHINESE",
"normalize":"all" }')
),
json('{"provider":"database", "model":"mydoc_model"}')
) t,
JSON_TABLE(
t.column_value,
'$[*]' COLUMNS (
embed_id NUMBER PATH '$.embed_id',
embed_data VARCHAR2(4000) PATH '$.embed_data',
embed_vector CLOB PATH '$.embed_vector'
)
) et;
commit;
select
dbms_vector_chain.utl_to_chunks(TO_CLOB(FILE_CONTENT),
JSON('{ "by":"words",
"max":"240",
"overlap":"15",
"split":"recursively",
"language":"SIMPLIFIED CHINESE",
"normalize":"all" }'))
from RAG_FILES;
SELECT
dbms_vector.utl_to_embedding(
'This is a test',
json('{
"provider": "OCIGenAI",
"credential_name": "OCI_GENAI_CRED_FOR_APEX",
"url": "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/embedText",
"model": "cohere.embed-multilingual-v3.0"
}')
) embedding
FROM dual;
SELECT
dbms_vector.utl_to_embedding(
'This is a test',
json('{
"provider": "database",
"model": "doc_model"
}')
) embedding
FROM dual;
create or replace directory MODELS_DIR as '/u01/hysun/models';
EXEC DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'mydoc_model', force => true);
-- BEGIN
-- DBMS_VECTOR.LOAD_ONNX_MODEL(
-- directory => 'MODELS_DIR',
-- file_name => 'bge-base-zh-v1.5.onnx',
-- model_name => 'mydoc_model',
-- metadata => JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input":{"input": ["DATA"]}}')
-- );
-- END;
-- /
BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory => 'MODELS_DIR',
file_name => 'bge-base-zh-v1.5.onnx',
model_name => 'mydoc_model'
);
END;
/
SELECT vector_embedding(mydoc_model using 'hello' as data);
select
chunk_data,
VECTOR_DISTANCE(chunk_embedding, VECTOR_EMBEDDING(mydoc_model USING '本次实验的先决条件' as data), COSINE) as distance
from rag_doc_chunks
order by distance
FETCH APPROX FIRST 1 ROWS ONLY;
-- grant CREATE CREDENTIAL
BEGIN
DBMS_VECTOR_CHAIN.CREATE_CREDENTIAL (
CREDENTIAL_NAME => 'LAB_OPENAI_CRED',
PARAMS => json('{ "access_token": "EMPTY" }')
);
END;
/
select dbms_vector_chain.utl_to_generate_text(
'Oracle 向量数据库是什么',
json('{
"provider": "openai",
"credential_name": "LAB_OPENAI_CRED",
"url": "http://146.235.226.110:8098/v1/chat/completions",
"model": "Qwen2-7B-Instruct"
}') ) from dual;
select *
from (
select
chunk_data
from rag_doc_chunks
order by VECTOR_DISTANCE(chunk_embedding, VECTOR_EMBEDDING(mydoc_model USING '本次实验的先决条件' as data), COSINE)
FETCH APPROX FIRST 3 ROWS ONLY
) dt,
dbms_vector_chain.utl_to_generate_text(
dt.chunk_data,
json('{
"provider": "openai",
"credential_name": "LAB_OPENAI_CRED",
"url": "http://146.235.226.110:8098/v1/chat/completions",
"model": "Qwen2-7B-Instruct"
}')
) rag
declare
l_question varchar2(500) := '本次实验的先决条件';
l_input CLOB;
l_clob CLOB;
j apex_json.t_values;
l_context CLOB;
l_rag_result CLOB;
begin
-- 第一步:从向量数据库中检索出与问题相似的内容
for rec in (
select
chunk_data
from rag_doc_chunks
order by VECTOR_DISTANCE(chunk_embedding, VECTOR_EMBEDDING(mydoc_model USING l_question as data), COSINE)
FETCH APPROX FIRST 3 ROWS ONLY
) loop
l_context := l_context || rec.chunk_data || chr(10);
end loop;

-- 第二步:提示工程:将相似内容和用户问题一起,组成大语言模型的输入
l_input := '你是一个诚实且专业的数据库知识问答助手,请仅仅根据提供的上下文信息内容,回答用户的问题,且不要试图编造答案。\n 以下是上下文信息:' || replace(l_context, chr(10), '\n') || '\n请用英文回答用户问题:' || l_question;


-- 第三步:调用大语言模型,生成RAG结果
for rec in (select dbms_vector_chain.utl_to_generate_text(
l_input,
json('{
"provider": "openai",
"credential_name": "LAB_OPENAI_CRED",
"url": "http://146.235.226.110:8098/v1/chat/completions",
"model": "Qwen2-7B-Instruct"
}')
) as rag from dual) loop
dbms_output.put_line('*** RAG Result: ' || rec.rag);
end loop;
-- apex_json.parse(j, l_clob);
-- l_rag_result := apex_json.get_varchar2(p_path => 'choices[%d].message.content', p0 => 1, p_values => j);

-- dbms_output.put_line('*** RAG Result: ' || l_rag_result);
end;
/

```

srvctl stop instance -d ai23 -i ai232 -force
srvctl status database -d ai23
srvctl start instance -d ai23 -i ai232

相关推荐

开发者必看的八大Material Design开源项目

MaterialDesign是介于拟物和扁平之间的一种设计风格,自从它发布以来,便引起了很多开发者的关注,在这里小编介绍在Android开发者当中里最受青睐的八个MaterialDesign开源项...

另类插这么可爱,一定是…(另类t恤)

IT之家(www.ithome.com):另类插图:这么可爱,一定是…OSXMavericks和Yosemite打破了苹果对Mac操作系统传统的命名方式,使用加州的某些标志性景点来替换猫...

Android常用ADB命令(安卓adb工具是什么)

杀死应用①根据包名获取APP的PIDadbshellps|grep应用包名②执行kill命令...

微软Mac版PowerPoint测试Reading Order Pane功能

IT之家5月20日消息,微软公司昨日(5月19日)发布博文,邀请Microsoft365Insiders成员,测试macOS新版PowerPoint演示文稿应用,重点引入...

Visual Studio跨平台开发实战(4):Xamarin Android控制项介绍

前言不同于iOS,Xamarin在VisualStudio中针对Android,可以直接设计使用者界面.在本篇教学文章中,笔者会针对Android的专案目录结构以及基本控制项进行介绍,包...

用云存储30分钟快速搭建APP,你信吗?

背景不管你承认与否,移动互联的时代已经到来,这是一个移动互联的时代,手机已经是当今世界上引领潮流的趋势,大型的全球化企业和中小企业都把APP程序开发纳入到他们的企业发展策略当中。但随着手机APP上传的...

谷歌P图神器来了!不用学不用教,输入一句话,分分钟给结果

Pine发自凹非寺量子位|公众号QbitAI当你拍照片时,“模特不好好配合”怎么办?...

iOS文本编辑控件UITextField和UITextVie

记录一个菜鸟的IOS学习之旅,如能帮助正在学习的你,亦枫不胜荣幸;如路过的大神如指教几句,亦枫感激涕淋!细心的朋友可能已经注意到了,IOS学习之旅系列教程在本篇公众号的文章中,封面已经换成美女图片了,...

Android入门图文教程集锦(android 入门教程)

Android入门视频教程集锦AndroidStudio错误gradientandroid:endXattributenotfound...

如何使用Android自定义复合视图(如何使用android自定义复合视图)

在最近的一个客户应用中,我遇到了一个需求,根据选定的值来生成指定数量的编辑框字段,这样用户可以输入人物信息。最初我的想法是把这些逻辑放到Fragment中,只是根据选中值的变化来向线性布局容器中增加编...

原生安卓开发app的框架frida常用关键代码定位

前言有时候可能会对APP进行字符串加密等操作,这样的话你的变量名等一些都被混淆了,看代码就可能无从下手...

教程10 | 三分钟搞定一个智能输入法程序

一案例描述1、考核知识点网格布局线性布局样式和主题Toast2、练习目标掌握网格布局的使用掌握Toast的使用掌握线性布局的使用...

(Android 8.1) 功能与新特性(android的功能)

和你一起终身学习,这里是程序员AndroidAndroid8.1(API级别27)为用户和开发人员引入了各种新特性和功能。本文档重点介绍了开发人员的新功能。通过本章阅读,您将获取到以下内容:Andr...

怎样设置EditText内部文字被锁定不可删除和修改

在做项目的时候,我曾经遇到过这样的要求,就是跟百度贴吧客户端上的一样,在回复帖子的时候,在EditText中显示回复人的名字,而且这个名字不可以修改和删除,说白了就是不可操作,只能在后面输入内容。在E...

如何阻止 Android 活动启动时 EditText 获得焦点

技术背景在Android开发中,当活动启动时,EditText有时会自动获得焦点并弹出虚拟键盘,这可能不是用户期望的行为。为了提升用户体验,我们需要阻止...