|
SELECT u.identify, COUNT(o.id) AS total_orders FROM `commerce`.gross sales.customers AS u JOIN `commerce`.gross sales.orders AS o ON u.id = o.user_id WHERE o.standing = “accomplished” AND DATE_DIFF_STR(NOW_STR(), o.order_date, “day”) <= 30 GROUP BY u.identify ORDER BY total_orders DESC LIMIT 5; |
The question above offers worthwhile insights out of your knowledge that’s saved in Couchbase about your high 5 customers who generated probably the most accomplished orders throughout the previous 30 days. However what when you’re not a complicated SQL++ developer and want the solutions by 11 p.m. for a report? You then want to attend for a developer to put in writing a SQL++ question and get you the solutions.
Alternatively, contemplate a case the place it’s good to do some ad hoc debugging to handle questions like:
- Are there any paperwork the place the date the order was delivered is lacking?
- Does that imply that the order was cancelled? Or did we misplace the order and the order by no means received delivered? Or was the whole lot okay, however we merely missed including the order_delivered worth within the discipline?
On this case, you not solely want to go looking the order_delivered discipline, but in addition take a look at order_cancelled or examine feedback to determine if it was misplaced, and so forth. So the question to be written isn’t easy or easy.
|
SELECT o.orderId, o.orderDate, o.order_cancelled, o.order_delivered, o.feedback,
CASE WHEN o.order_cancelled = TRUE THEN “Order was cancelled” WHEN ANY c IN o.feedback SATISFIES LOWER(c) LIKE “%misplac%” OR LOWER(c) LIKE “%misplaced%” THEN “Order could have been misplaced” WHEN ANY c IN o.feedback SATISFIES LOWER(c) LIKE “%ship%” THEN “Delivered however discipline not up to date” ELSE “Cause unknown — examine” END AS cause FROM `commerce`.`gross sales`.`orders` AS o WHERE o.order_delivered IS MISSING OR o.order_delivered IS NULL; |
In such instances, it might assist when you had a dependable assistant obtainable 24×7 to get all these solutions. The UDF described on this weblog is such an assistant. It accepts your questions in probably the most pure means and returns ends in JSON. Behind the scenes, it connects to a mannequin of your alternative, alongside along with your API key, to transform your ideas into SQL++ after which executes it. And all it’s good to invoke this assistant is to make use of the UDF.
|
SELECT NL2SQL( [“`commerce`.`sales`.`orders`”], “Are there any paperwork the place the order_delivered date is lacking?and if that’s the case why?”, “”, “https://api.openai.com/v1/chat/completions”, “gpt-4o-2024-05-13” ) ; |
How It Works
1. Arrange the library.
You first create a JavaScript library utilized by the UDF.
Library:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
/* enter:
keyspaces: an array of strings, every string represents a keyspaces “bucket.scope.assortment” with correct escaping utilizing grave-accent quote wherever required immediate: customers pure language request apikey: your openai api key mannequin: string representing the mannequin’s identify, see https://platform.openai.com/docs/api-reference/completions/create#completions-create-model for extra particulars output: chat-completions api response with the generated sql assertion */ perform inferencer(okay) { var infq = N1QL(“SELECT t.properties FROM(INFER “+okay+ “) as t”) ; var res=[] for(const doc of infq) { res.push(doc) }
return res[0]; } perform nl2sql(keyspaces, immediate, apikey, modelapi, mannequin) {
collectionSchema = {} for(const okay in keyspaces) { c = inferencer(keyspaces[k]) collectionSchema[keyspaces[k]] = c }
collectionSchemaStr = JSON.stringify(collectionSchema)
promptContent = `Info:nCollection‘s schema: ${collectionSchemaStr}nnPrompt: “${immediate}”nnThe question context is ready.nnBased on the above Info, write legitimate SQL++ and return solely the assertion and no clarification. For retrieval, use aliases. Use UNNEST from the FROM clause when acceptable. nnIf you’re positive the Immediate can‘t be used to generate a question, first say “#ERR:” after which clarify why not.`
knowledge = {“messages”:[{“role”:”system”,”content”:”You are a Couchbase Capella expert. Your task is to create valid queries to retrieve or create data based on the provided Information.nnApproach this task step-by-step and take your time.”},{“role”:”user”,”content”:promptContent}], “mannequin”: mannequin, “temperature”:0, “max_tokens”:1024, “stream”:false}
var dataStr = JSON.stringify(knowledge) .exchange(//g, “”) // escape backslashes .exchange(/”/g, ‘“‘); // escape quotes
var completionsurl = modelapi
var q= `SELECT CURL(“${completionsurl}“, { “request“: “POST“, “header“: [“Authorization: Bearer ${apikey}“, “Content–type: application/json“], “knowledge“: “${dataStr}“ }) AS end result;`
var completionsq = N1QL(q);
var res = [] for(const doc of completionsq) { res.push(doc); }
strive { content material = res[0].end result.decisions[0].message.content material } catch(e) { return res; }
stmt = content material.trim().substring(7, content material.length-4)
isSelect = (stmt.substring(0,6).toLowerCase())===”choose“ if(isSelect === false){ return { “generated_assertion“: stmt } }
var runq = N1QL(stmt);
var rrun = [] for(const doc of runq) { rrun.push(doc) }
return { “generated_assertion“: stmt, “outcomes“: rrun }
} |
2. Add the library.
Run the curl command after copying the supplied library code right into a file, i.e., usingailib.js.
|
curl –X POST http://localhost:9499/evaluator/v1/libraries/usingailib —knowledge–binary @usingailib.js –u Administrator:password |
3. Create the UDF.
Use the create perform assertion under to create the UDF after you have created the library:
|
CREATE OR REPLACE FUNCTION NL2SQL(keyspaces, immediate, apikey, modelapi, mannequin) LANGUAGE JAVASCRIPT AS “nl2sql” AT “usingailib”; |
NL2SQL() now acts as your multilingual translator between human language and Couchbase’s question engine. You merely give it some context and a pure language request, and it returns a response.
How the UDF Thinks
Beneath the hood, it makes use of your most popular mannequin when invoking the UDF to know your intent and generate a question that Couchbase can execute.
The benefit of utilizing the chat completions API means you would merely plug in a mannequin from different suppliers which are compliant with the identical API spec. You need to use your personal personal LLM or recognized ones from Open AI, Gemini, Claude, and so forth.
The invoked UDF requires the next info from you:
- keyspaces – An array of strings, every representing a Couchbase keyspace (bucket.scope.assortment).Use grave accent quotes the place wanted to flee particular names (like
travel-sample.stock.route). This tells the UDF the place to search for your knowledge. - immediate – Your request in plain English (or some other language).
Instance: “Present me all customers who made a purchase order within the final 24 hours.” - apikey – Your API key used for authenticating with the mannequin endpoint.
- mannequin endpoint – e.g., Open AI compliant chat completions URL.
- mannequin – The identify of the mannequin you need to use from the supplier.
e.g., “gpt-4o-2024-05-13”
There are additionally a number of obtainable capabilities within the library:
inferencer()
Earlier than producing a question, the UDF first tries to know your knowledge. The inferencer() helper perform calls Couchbase’s INFER assertion to retrieve a set’s schema:
|
perform inferencer(okay) { var infq = N1QL(“SELECT t.properties FROM (INFER “ + okay + “) AS t”); var res = []; for (const doc of infq) { res.push(doc); } return res[0]; } |
This schema is used to assist the AI perceive what sort of knowledge lives inside every assortment.
The primary perform: nl2sql()
- Collects all schemas for the given keyspaces utilizing the inferencer(). Constructs a immediate that features: the inferred schema, your pure language question, and a Couchbase immediate to nudge the LLM.
- Sends it to the LLM.
- Extracts the generated SQL++ from the mannequin’s response.
- Executes it straight if it’s a SELECT assertion and returns each the generated SQL++ assertion and the question outcomes.
The rationale for not executing non-select statements is that you just don’t need this UDF to insert, replace, or delete paperwork in a set with out you verifying it. So the SQL++ assertion permits you to execute it after it’s been verified.
Instance use case:
|
SELECT default:NL2SQL( [“`travel-sample`.inventory.hotel”], “Give me motels in San Francisco which have free parking and free breakfast and a score of greater than 3”, “”, “https://api.openai.com/v1/chat/completions”, “gpt-4o-2024-05-13” ); |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
Result: [{ “$1”: { “generated_statement”: “SELECT h.name, h.address, h.city, h.state, h.country, h.free_parking, h.free_breakfast, r.ratings.OverallnFROM `travel-sample`.inventory.hotel AS hnUNNEST h.reviews AS rnWHERE h.city = “San Francisco”n AND h.free_parking = truen AND h.free_breakfast = truen AND r.ratings.Overall > 3;”, “results”: [{ “Overall”: 4, “address”: “520 Church St”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Parker House”, “state”: “California” }, { “Overall”: 4, “address”: “520 Church St”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Parker House”, “state”: “California” }, { “Overall”: 5, “address”: “520 Church St”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Parker House”, “state”: “California” }, { “Overall”: 4, “address”: “520 Church St”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Parker House”, “state”: “California” }, { “Overall”: 5, “address”: “465 Grant Ave”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Grant Plaza Hotel”, “state”: “California” }, { “Overall”: 5, “address”: “465 Grant Ave”, “city”: “San Francisco”, “country”: “United States”, “free_breakfast”: true, “free_parking”: true, “name”: “Grant Plaza Hotel”, “state”: “California” }, ... |
Experimenting with fashions from different suppliers
The subsequent instance makes use of Gemini’s Open AI-compatible API. You merely change the mannequin supplier’s URL from the earlier Open AI API to Gemini’s API. Additionally, remember to change the mannequin parameter to a mannequin it acknowledges. In fact, it’s good to additionally replace the api-key from Open AI’s key to Gemini’s key.
|
SELECT NL2SQL( [“`travel-sample`.inventory.hotel”], “Present me motels in France”, “”, “https://generativelanguage.googleapis.com/v1beta/openai/chat/completions”, “gemini-2.0-flash” )as p; |
The next illustrates the end result:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[{ “p”: { “generated_statement”: “SELECT h.name AS hotel_name, h.city AS hotel_citynFROM `travel-sample`.inventory.hotel AS hnWHERE h.country = “France”;”, “results”: [{ “hotel_city”: “Giverny”, “hotel_name”: “The Robins” }, { “hotel_city”: “Giverny”, “hotel_name”: “Le Clos Fleuri” }, … { “hotel_city”: “Ferney-Voltaire”, “hotel_name”: “Hotel Formule 1” } ] } }] |
Conclusion
This weblog offers a glimpse into how one can leverage AI to work together along with your knowledge in Couchbase. With this UDF, pure language querying turns into a actuality – no SQL++ experience required. It’s model-agnostic and protected for manufacturing queries.
And that is only the start. Sooner or later, we hope to increase it to:
- Picture → SQL++
- Voice → SQL++
- Agent-like pipelines
… all working inside Couchbase workflows.
References
Capella IQ: https://docs.couchbase.com/cloud/get-started/capella-iq/get-started-with-iq.html
Chat completions APIs:
https://platform.openai.com/docs/api-reference/chat
https://ai.google.dev/gemini-api/docs/openai#relaxation
