Gemini Nano in Chrome 138: notes for AI Engineers
at long last, Gemini Nano is almost here for all Chrome users. I was reminded by this HN post.
I don’t like the way google write docs, so this blogpost is basically me rewriting their docs in a way that fits my brain.
they have a few apis for commonly used patterns on offer, but reallly the main one you’ll care about as an engineer is the Prompt API, the most flexible/open ended one.
setup
Unlike the initial overpromise of window.ai, the current released implementation is much less “clean”. anyway here’s the current way to set it up.
- make sure you have chrome 138+
- go chrome://flags/#prompt-api-for-gemini-nano and turn it on (unfortunately you’ll have to reload chrome)
- then download the model by calling
LangaugeModel.create()
for the first time - takes a few mins on home wifi. Gemini says “has an approximate download size ranging from 1.5 GB to 2.4 GB.” so lets say thats a 4-6B model at a 4-8bit quantization.
const session = await LanguageModel.create({
monitor(m) {
m.addEventListener("downloadprogress", (e) => {
console.log(`Downloaded ${e.loaded * 100}%`);
});
},
// // uncomment if want multimodal input https://developer.chrome.com/docs/ai/prompt-api#multimodal_capabilities
// expectedInputs: [
// { type: "audio" },
// { type: "image" }
// ]
})
basic important things
the loaded model has 6k token context (just ask for inputQuota
without any initialPrompts
):
session.inputQuota
// 6144
Now unlike the Gemini Nano team, I happen to be a guy who thinks that function calling/json output is very impt, so let’s see how to get this going in Gemini Nano, with prompt examples stolen from Hamel and Jason:
const JSONschema = `<schema>
{
"description": "Correctly extracted \`UserDetail\` with all the required parameters with correct types",
"name": "UserDetail",
"parameters": {
"properties": {
"age": {
"title": "Age",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
}
},
"required": [
"age",
"name"
],
"type": "object"
}
}
</schema>`
const JSONsession = await LanguageModel.create({
initialPrompts: [
{ role: 'system', content: 'You are a helpful LLM that only responds in valid JSON fitting a schema: ' + JSONschema },
{ role: 'user', content: "Extract Jason is 35 years old" },
{ role: 'assistant', content: '{age: 35, name: Jason}'},
]
});
const result1 = await JSONsession.prompt("Extract sarah is 22 years old");
console.log(result1);
// {age: 22, name: Sarah}
pitfalls
it doesnt do great instruction following, so required fields aren’t really respected:
const result1 = await JSONsession.prompt("its been a year since vibhu's birthday, he was 28 last year, guess how old he is now");
console.log(result1);
// { "age": 29 }
the other thing is that sessions are default stateful, which can be a little nasty if you forget. So a stateless version looks like:
const baseSession = await LanguageModel.create({
initialPrompts: // blah blah, as above
})
// you can also implement this as a class if you want to force users to use`new` keyword to make super clear it is stateless
const statelessSession = {
async prompt(str) {
const clonedSession = await session.clone()
return clonedSession.prompt(str)
}
}
// these are all stateless calls now! yay repeatability and predictability!
const result1 = await statelessSession.prompt("Extract sarah is 22 years old");
console.log(result1);
const result2 = await statelessSession.prompt("Extract tanisha is 30 years old");
console.log(result2);
pitfalls like these are why you will probably want little wrapper libraries you can handroll or reference https://github.com/kstonekuan/simple-chromium-ai
the last tip here for non js pros is how to import those wrapper libraries in browser contexts (aka without npm install or a build step) using ESM syntax (may need <script type="module">
- run on localhost
or a site with relaxed CSP):
// alternatively use https://cdn.jsdelivr.net/npm/simple-chromium-ai@0.1.1/dist/simple-chromium-ai.mjs
const ChromiumAI = await import('https://unpkg.com/simple-chromium-ai@0.1.1/dist/simple-chromium-ai.mjs');
const ai = await ChromiumAI.initialize("You are a friendly assistant");
const response = await ChromiumAI.prompt(ai, "Tell me a joke");
console.log(response);
const ChromiumAI = await import('https://unpkg.com/simple-chromium-ai@0.1.1/dist/simple-chromium-ai.mjs');
const ai = await ChromiumAI.initialize("You are a friendly assistant");
const response = await ChromiumAI.prompt(ai, "Tell me a joke");
console.log(response);
// Why don't scientists trust atoms? Because they make up everything!
// and of course... the structured output implementation now works:
const schema = {
type: "object",
properties: {
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"]
},
confidence: {
type: "number",
minimum: 0,
maximum: 1
},
keywords: {
type: "array",
items: { type: "string" },
maxItems: 5
}
},
required: ["sentiment", "confidence", "keywords"]
};
// Create session with response constraint
const response = await ChromiumAI.prompt(
ai,
"Analyze the sentiment of this text: 'I love this new feature!'",
undefined, // no timeout
{ responseConstraint: schema }
);
// Response will be valid JSON matching the schema
const result = JSON.parse(response);
console.log(result);