Do More Newsletter

This issue contains featured article "Your Computer Is Smarter Than You Think: A Guide to Running AI Locally" and exciting product information about Similarweb AI Studio – Ask Any Market Question, JIM.com – AI Business Agent In Your Pocket, Fabricate – Natural‐Language Full‐Stack App Builder, askROI – AI Productivity Workspace for Teams, and NetSuite AI Innovations – Embedded Workflow Automation.

Keep up to date on the latest products, workflows, apps and models so that you can excel at your work. Curated by Duet.

Stay ahead with the most recent breakthroughs—here’s what’s new and making waves in AI-powered productivity:

Similarweb has launched AI Studio, an enterprise intelligence workspace that lets business users ask plain‑language questions about markets, competitors, and audiences and get consultant‑grade reports in seconds. Sitting on top of Similarweb’s massive data infrastructure (web traffic, search, apps, e‑commerce, and more), AI Studio can generate deep research summaries, opportunity analyses, and even auto‑updating dashboards without requiring analytics expertise, so product, marketing, and strategy teams can move from question to decision dramatically faster. For busy knowledge workers, this means less time stitching together spreadsheets and more time acting on clear, visual insights that are already tailored to their role.

JIM.com has brought its AI‑powered business platform to Android, turning a regular smartphone into a self‑driving finance and operations agent for micro and small businesses. The app’s AI Business Agent can handle payments, spin up payment‑enabled websites, surface nearby sales opportunities like pop‑ups and trade shows, generate marketing content, and analyze cash‑flow patterns through a simple chat interface. With features like tap‑to‑pay via NFC, instant payouts, and payment links, solo entrepreneurs and side‑hustlers can manage revenue, collections, and basic growth tasks from one consumer‑grade app instead of juggling multiple tools and terminals.

Fabricate has launched an AI‑driven full‑stack app builder that lets anyone describe an idea in everyday language and get a deployed web app with production‑ready React and TypeScript code in minutes. The platform automatically wires up databases, payment integrations, authentication, and global edge deployment, effectively compressing days or weeks of engineering work into a guided conversational flow. For solo founders, small teams, or even non‑technical operators inside larger companies, Fabricate turns app building into an iterative dialog with an AI that can generate, refine, and ship working software as fast as ideas evolve.

askROI, a subsidiary of Hyperscale Data, has positioned itself as a full AI productivity suite rather than just another chat bot. Inside one workspace, users get an AI assistant for research and task execution, prompt libraries and reusable workflows, knowledge management tools, automated document generation, and real‑time collaboration, all aimed at reducing manual, repetitive work. The platform is designed for business professionals and teams that want to centralize their AI usage into a single hub where documents, project context, and automations live together instead of being scattered across disconnected tools.

Oracle NetSuite has introduced new embedded AI capabilities that weave intelligence directly into finance, operations, and customer workflows, rather than adding yet another standalone assistant. Highlights include SuiteCloud Developer Assistant, which accelerates scripting, customization, documentation, and testing for NetSuite developers, and an AI‑enhanced planning agent in NetSuite EPM that helps finance teams model scenarios and plans more quickly. By turning disconnected ERP tasks into end‑to‑end intelligent workflows, these features aim to boost automation, shorten cycle times, and help companies unlock more value from the data already living in their NetSuite instance.

Similarweb’s AI Studio is built for one simple promise: turn any market question into a clear, usable answer without forcing teams to become data scientists or BI experts. On top of Similarweb’s vast data graph—spanning hundreds of millions of sites, billions of keywords, app usage, and e‑commerce signals—AI Studio lets users type natural‑language questions like “Where are my competitors gaining traffic?” or “Which channels drive the highest‑value visitors in my category?” and receive structured, consultant‑style reports. Instead of navigating dashboards or exporting CSVs, a marketing manager, product leader, or founder can have a dialog with the platform and progressively refine the question until the output directly supports a decision.

At the heart of AI Studio is its Deep Research capability, which synthesizes multiple data streams into narrative answers that feel like a mini market study. When a user asks a complex question—say, evaluating new market entry or benchmarking performance against a set of rivals—the system pulls together traffic trends, search behavior, audience profiles, and competitive movements into one cohesive storyline with charts and highlights. The benefit is twofold: non‑technical stakeholders can finally access the same level of insight previously reserved for power users, and experienced analysts can offload the tedious first pass of data gathering and formatting so they can focus on interpretation and strategy.

AI Studio’s AI Dashboards take that one step further by translating plain‑language descriptions into dynamic, auto‑updating visualizations. Instead of manually building and maintaining reports, a user can say “Create a dashboard that tracks my share of traffic versus these three competitors across organic search and paid channels” and have the system assemble and maintain it. As conditions change—new campaigns, shifting search intent, seasonal spikes—the dashboard stays in sync, reducing the upkeep burden on analytics teams and making sure stakeholders always see current data when they log in. This can dramatically cut the “reporting tax” that slows down many organizations and keeps insights stuck with a small group of specialists.

Importantly, Similarweb emphasizes that the real advantage isn’t just speed, it’s access and adoption across the company. By wrapping a mature data estate in a conversational interface, AI Studio lowers the barrier for teams in sales, partnerships, and even executive leadership to ask more—and better—questions about their markets. That wider adoption can change how organizations operate: decisions get made with fresh external data instead of stale assumptions, and insights flow outward from the analytics core to every function interacting with customers and competitors. For companies feeling pressure to “do more with AI” without fragmenting their tool stack, AI Studio offers a focused, high‑impact use case: making competitive and market intelligence as easy to query as a chat window.

Your Computer Is Smarter Than You Think: A Guide to Running AI Locally

You're reading this because you use AI — probably ChatGPT, Claude, Gemini, or some combination. You type a prompt, it zips off to a data center, and an answer comes back. It works great. Until the service goes down. Or hits you with a rate limit at 11 PM when you're on a roll. Or you realize you just pasted your company's proprietary code into someone else's server.

There's another option: running AI directly on your own computer. No cloud inference, no subscription fees, no prompts leaving your machine during use. And in 2026, it's gotten surprisingly easy.

Why Would You Want To?

The pitch for local AI boils down to three things: privacy, cost, and control.

Privacy is the big one. When you use a cloud AI service, your prompts travel to a remote server. For casual use, that's fine. But if you're working with sensitive business documents, medical records, legal contracts, or proprietary code, "fine" isn't good enough. With local AI, your prompts and responses are processed entirely on-device — as long as you haven't enabled any optional cloud or web-search features that some tools offer. You'll still need an internet connection initially to download models and tools, but once they're on your machine, inference happens locally.

Cost is often straightforward math. If you're paying $20–40 a month for AI subscriptions — or hundreds in API fees — a one-time hardware investment can eventually pay for itself. The break-even point depends on your usage volume, API pricing, electricity costs, and hardware lifespan, but for heavy users the case is compelling. And once you're set up, every additional prompt costs you essentially nothing beyond electricity.

Control means fewer external dependencies. Your AI works when your computer works, even offline. No outages on someone else's servers, no rate limits, no surprise changes to a service you depend on. You can also swap between different models for different tasks without asking anyone's permission. (You're still subject to each model's license terms and your local hardware's reliability, but you've removed the biggest external bottleneck.)

The Catch (There's Always a Catch)

Let's be honest about the tradeoff. Local models are generally not as capable as the top cloud models for the hardest tasks. Claude, GPT-4, and Gemini are massive, trained on enormous compute budgets, and constantly improving. On broad leaderboards like Chatbot Arena, the top-ranked models are still largely proprietary — though the gap has narrowed significantly, and leading open-weight models now compete seriously in many categories.

For a lot of everyday tasks — drafting, summarizing, coding assistance, brainstorming, data analysis, translation — strong open-weight models can feel close to the subscription chatbots. Performance varies widely by task, though. A local model might nail your Python question and stumble on a tricky multi-step reasoning problem. The question isn't "is local AI as good as cloud AI?" It's "is local AI good enough for what I need?"

What You Need: Hardware

Here's the good news: you probably don't need new hardware to get started. Here's a rough guide to what runs what. Keep in mind that context length and quantization level affect memory requirements just as much as raw parameter count.

8 GB RAM, any modern CPU (your basic laptop): You can run small models in the 1–3 billion parameter range. Think Meta's Llama 3.2 1B or 3B, Microsoft's Phi-4-mini (3.8B), or Qwen 2.5 1.5B. These handle basic Q&A, simple coding help, and text generation. They won't write your novel, but they'll draft an email.

16 GB RAM (a decent modern laptop or desktop): You're in the sweet spot for models in the 7–14 billion parameter range. Mistral 7B, Qwen 2.5 14B, and Microsoft's Phi-4 (14B) perform well here. This is where local AI starts to feel genuinely useful for daily work.

A gaming GPU with 24 GB VRAM (like an RTX 3090 or 4090): Now you're running the really capable stuff. Qwen 2.5 Coder 32B — which the Qwen team reports performs comparably to GPT-4o on the Aider coding benchmark — can fit comfortably with 4-bit quantization. Heavily quantized 70B-parameter models like Llama 3.1 70B or Llama 3.3 70B can also run at this tier, though you may need CPU/RAM offloading with a speed penalty, and KV-cache memory grows with longer context windows.

Apple Silicon Macs: The M-series chips deserve special mention. Their unified memory architecture — where CPU and GPU share the same memory pool — simplifies allocating large model weights compared to discrete-VRAM systems. An M4 Pro with 48 GB of unified memory can run impressively large models. Apple's MLX framework has become a solid option for local inference on Apple hardware.

For the truly dedicated, purpose-built AI hardware has arrived. NVIDIA's DGX Spark and systems powered by AMD's Ryzen AI Max+ 395 both pack 128 GB of unified memory into desktop-friendly form factors. The DGX Spark runs about $4,000; AMD-based alternatives start well below that depending on configuration. These are overkill for most people, but they represent a new category of "AI workstation" that didn't exist two years ago.

The Software: Easier Than You Think

The software side is where things have gotten dramatically simpler. Two tools dominate the beginner-friendly space.

Ollama is the command-line option. Install it, open a terminal, type ollama run llama3.2, and you're chatting with a local AI. That's it. Ollama handles model downloading and local serving, and its library models default to efficient 4-bit quantization. It also exposes an API that's compatible with OpenAI's format, which means apps built for ChatGPT can often be pointed at your local Ollama server with a one-line configuration change. If you're comfortable with a terminal, Ollama is the fastest path from zero to running local AI.

LM Studio is the graphical option. It gives you a ChatGPT-like interface: download a model with a click, start chatting. It shows real-time stats on memory usage and token generation speed, which helps you understand what your hardware can handle. LM Studio also supports Apple's MLX framework natively, making it an excellent choice for Mac users. If you prefer clicking to typing, start here.

Other notable tools include Jan (privacy-by-default, open-source, and capable of running fully offline for local models), vLLM (a production-grade serving engine built for high-throughput inference — a different beast from the desktop tools), and AnythingLLM (an orchestration layer that adds document ingestion and retrieval, letting you chat with your own files locally, and can connect to multiple backends).

Under the hood, many desktop local-AI apps — including Ollama, LM Studio, and Jan — use llama.cpp, a remarkable open-source project that makes large models run efficiently on consumer hardware through clever optimization and quantization. But not everything in the local AI world is built on llama.cpp: vLLM is its own GPU-optimized serving engine, and tools like AnythingLLM are wrappers that can talk to multiple backends.

The Models: An Embarrassment of Riches

Two years ago, picking a local model meant choosing between a few options. Today, the open-weight model ecosystem is thriving. A Stanford HAI analysis documented Chinese AI labs' growing share of open-weight model downloads in 2025, with Qwen surpassing Llama in total downloads — a shift that's brought fierce competition and rapid quality improvements.

Meta's Llama family remains the default starting point. Llama 3.2 includes lightweight text models at 1B and 3B parameters — designed for edge and mobile devices — plus vision models at 11B and 90B. For beefier text work, Llama 3.1 offers 8B, 70B, and 405B variants with 128K-token context windows, and Llama 3.3 provides a refined 70B text model. Llama is the Honda Civic of local AI — reliable, well-documented, works everywhere.

Alibaba's Qwen family has arguably become the most-downloaded open model family. Qwen 3 includes coding-focused releases and strong multilingual support. The Qwen 2.5 Coder 32B model, in particular, has earned a reputation for strong coding performance that fits on a single 24 GB GPU.

DeepSeek made waves when its R1 reasoning model showed that open models could match proprietary ones on complex reasoning tasks. DeepSeek V3.2, released in late 2025, pushes this further as a Mixture-of-Experts model with 685 billion total parameters. Its architecture includes one shared expert and 256 routed experts, with 8 experts activated per token, resulting in roughly 37 billion active parameters during inference. Running the full model locally requires serious hardware, but distilled versions are very capable.

Google's Gemma 3 punches above its weight through distillation from the larger Gemini models. Google's benchmarks show strong positioning for the 27B variant relative to much more resource-intensive models, though specific claims should be checked against particular benchmarks. The 1B version is designed for mobile and web deployment via Google AI Edge.

Mistral's Ministral models at 3B and 8B parameters are optimized for speed on resource-constrained hardware, making them practical choices for devices where every millisecond of latency counts.

Microsoft's Phi-4 (14B) excels at complex reasoning and math relative to its size, and the newer Phi-4-mini (3.8B) brings function calling and multilingual support to even more constrained hardware.

What Quantization Means (And Why You Should Care)

You'll encounter the term "quantization" immediately when exploring local AI. Here's the short version: AI models store their knowledge as numbers (called "weights"). Many open-weight releases ship in 16-bit formats like BF16 or FP16. Quantization compresses these to 8-bit, 4-bit, or even fewer bits, dramatically reducing memory requirements at the cost of some accuracy.

A 70-billion-parameter model at 16-bit precision needs over 140 GB of memory. Quantized to 4-bit, the weights alone fit in about 35 GB — though real-world memory usage also includes KV cache, runtime buffers, and overhead that grows with longer context windows. The quality loss from quantization is often surprisingly small — for most everyday use, it's hard to notice.

For llama.cpp-based desktop tools (Ollama, LM Studio, Jan), GGUF is the dominant file format for quantized models. Other ecosystems use formats like GPTQ or AWQ. Most tools handle the details automatically — you just pick a model and go.

Getting Started in Five Minutes

Here's the fastest path to your first local AI conversation:

Install Ollama from ollama.com. Open your terminal. Type ollama run llama3.2. Wait for the download (time varies from minutes to an hour or more depending on model size and your connection speed). Start typing.

That's it. You're running AI locally.

If you want the GUI experience, download LM Studio from lmstudio.ai. Open it, search for a model (start with Llama 3.2 3B if your hardware is modest), click download, and start chatting.

From there, experiment. Try different models. Ask them to help with the kinds of tasks you'd normally use cloud AI for. See where local models shine and where they struggle. You might be surprised how capable that little box on your desk has become.

The Bottom Line

Running AI locally isn't about replacing cloud AI entirely. The frontier models from Anthropic, OpenAI, and Google still lead on the hardest tasks — though the gap has narrowed meaningfully, and open-weight models continue to improve at a rapid pace. For a growing list of everyday uses — especially those involving sensitive data, offline work, or simple cost avoidance — local AI has crossed the threshold from "interesting experiment" to "genuinely practical tool."

The hardware is capable. The software is mature. The models are good and getting better fast. If you've been curious, there's never been a better time to try it.

Partner Spotlight: Duet Display

With Duet, you can use an iPad, Android, Mac, or PC as an extra monitor for your primary computer with low‑latency performance, touch support where available, and simple setup that works well in home offices, studios, and on the go. For creators and professionals juggling design tools, timelines, and dashboards, Duet Display makes it easier to spread out workflows, keep reference material visible, and maintain focus across complex projects. Learn more at Duet Display.

Stay productive, stay curious—see you next week with more AI breakthroughs!