THE Kartik Voice

Microsoft Foundry is Microsoft’s unified platform for building, managing and deploying AI solutions. It is a platform-as-a-service that provides tools and services for complete AI end-to-end lifecycle management. It helps developers and data scientists to collaborate. It supports traditional Generative AI and ML Models.

This foundation is built on top of production-quality infrastructure and user-friendly interfaces.
It allows developers to concentrate on developing the application and not be responsible for the infrastructure.
Microsoft Foundry integrates agents, models, and tools within a single grouping for enterprise management, and provides built-in enterprise readiness features such as tracing, monitoring, evaluations and customizable enterprise setup configurations.
A unified role-based access control (RBAC), networking, and policies are provided under one Azure resource provider namespace, making the platform easier to manage.

In simple words, Microsoft Foundry is a platform from Microsoft that helps you build, test, and deploy AI applications in one place. Instead of using different tools for models, data, monitoring, and security, everything is available in a single platform.

Why Use Azure AI Foundry?

We can access AI models, build applications, test them, and deploy them, the complete end-to-end solution without switching between multiple services.
We can have access to many more models from OpenAI, Claude AI, Grok AI, Mistral AI, DeepSeek, and Microsoft models, and we can choose the one which fits to our use case.
We can create agents that can connect to SharePoint, databases, APIs, and other enterprise systems.
Microsoft Foundry has ready to use templates to start quickly with chatbots, document analysis, code generation, and many more.
Vector stores and retrieval-augmented generation (RAG) allow your AI solution to answer questions based on your own documents and company data.
Optimize models using data from your own business to get better results. Monitor model responses, performance, and mistakes to gain insights into the behavior of your AI application.
Security, compliance, and responsible AI controls come with the enterprise grade security from the beginning.
Scale and transition from a small prototype to a production-ready enterprise application on the same platform.

The Microsoft Foundry model landscape (who’s who)

AI Model Providers

Step-by-Step Setup Guide

Follow the complete Blog to get started with Microsoft Foundry

1. Log in to the Azure Portal. Visit portal.azure.com and sign in with your Azure account. If you don’t have one, you can create a free trial account.

2. Create a Resource Group Resource Groups act as containers for your Azure resources.

Search for “Resource groups” in the portal.
Click Create.
Give it a meaningful name (e.g., "Microsoft Foundry").
Choose a region. I used East US 2 (or a similar region) because it often has better availability for the latest AI models.
Review and create the resource group.

3. Deploy Microsoft Foundry

Inside your new resource group, click Create and search for "Microsoft Foundry".
Follow the deployment process: Select your subscription and the resource group you have created. Provide a name for your Microsoft Foundry instance. Choose the appropriate region.
Review the settings and deploy. Deployment may take a few minutes.

Once you create the Microsoft Foundry, you will see 2 options

Foundry = Main workspace that manages AI resources and settings.
Foundry Project = Actual working area where you build and test AI applications.

4. Access the Microsoft Foundry dashboard. Once the deployment is completed, open your Foundry resource. Here, you’ll see the overview page containing important details such as:

Endpoints and API keys (essential for connecting your applications).
Subscription and billing information.
Quick links to key sections.

Endpoints and API keys

Exploring Key Features in Microsoft Foundry

After setup, the platform opens up a rich set of capabilities:

Model Catalog – Browse and deploy a wide variety of models, including GPT-series from OpenAI, Grok, Mistral, and Microsoft’s Phi and other foundation models from the Foundry portal.

Playgrounds – Experiment with chat, image generation, audio, and other modalities without writing code. This is great for rapid prototyping.

Agents – We can build intelligent agents that can use tools, connect to MCP servers, SharePoint, Logic Apps, databases, and more.

Templates – You can also quickly get started with prebuilt templates for use cases like chatbots, code modernization, and other business solutions. Just pick a template based on your need, customize it, and start building without doing everything from scratch.

Fine-Tuning Customize models with your own industry-specific data to improve performance on domain tasks.

Monitoring we can monitor usage, performance and costs in real time of our applications.

Evaluation Evaluate the performance and safety of our AI models and agents.

Under Evaluation, there is an option of Evaluator Library, which is a collection of prebuilt testing tools that are used to measure how good, safe, and accurate our AI application is.

Why Is It Used?

Suppose you build:

Chatbot
RAG application
Copilot
Q&A assistant

Now you need to check:

Is the answer correct?
Is the response relevant?
Is harmful content generated?
Is grounding proper?
Is hallucination happening?

The Evaluator Library helps measure all this automatically.

Example

If your chatbot answers: "The capital of India is Mumbai". The evaluator can detect: Incorrect answer, having Low relevance, Poor grounding.

Batch jobs Batch jobs in Microsoft Foundry are used to process large numbers of AI requests together in bulk, instead of sending one request at a time. We can process many records/files/prompts in one go.

Conclusion - Microsoft Foundry is an end-to-end AI development platform used to create, manage, and deploy generative AI applications. Microsoft Foundry makes AI development easier by putting everything at one platform. Whether you’re just starting with AI or already working on AI projects, Foundry is really worth trying. It is easy to set up, simple to use, and helps you turn your ideas into real applications much faster.

Happy Exploring! Happy Learning!

Recently, DeepSeek released a paper and came up with a new approach called Context Optical Compression that stores text as vision tokens to help AI process long context in a more efficient manner.
I have broken down their whole research in an easy and beginner-friendly manner so you can understand what exactly DeepSeek discovered and why the whole AI world is talking about it.

Before going into depth, let's talk about a few things

OCR (Optical Character Recognition) - OCR means reading a text from an image.

Token - A token is a small piece of a word. Example - "Kartik" may be split into "Kar" + "tik" or "Kartik".
LLM (Large Language Model) - LLM models are trained on massive datasets and provide answers depending on the information they have learnt.
Context Window - Is the memory used by the AI to recall the previous conversation.
Vision Token - A small piece of an image that AI can read and understand.

DeepSeek-OCR: Contexts Optical Compression

Goal - The Idea is to encode the equivalent of a thousand words in a single image and have the model read them back. This approach has the potential to transform the way of thinking regarding AI memory and long context processing. This can be useful to process very long contexts, possibly reaching 10 million tokens or beyond, which is its aim.

Why does AI need this?

Currently, the Large Language Models (LLMs) like DeepSeek, ChatGPT, and Gemini talk to us using tokens and they do struggle with processing long textual content due to limited memory. Because of this, they forget the past conversation.

Right Now,

1 word = 1 token
More words - more tokens - more memory needed

There is always a memory limit for a normal account and even for the premium account. The Longer you chat to an AI Model, the more tokens it uses. Once the limit is full, it starts forgetting the old parts of the conversation. This is the Big Problem with the Model.

What is DeepSeek's new Idea?

What if instead of storing the text as text, we could store text in an image and later these images are broken down into vision tokens and AI can read them back. This idea is called COC (Context Optical Compression).

Breakdown of the COC-

Context - memory or conversation history
Optical - using images
Compression - making things compressed (smaller)

Why Images? - Because Images take less space and can store well amount of the data. Imagine you have taken a picture of the classroom board instead of writing everything in your notebook. The photo stores everything in less space.

Same with the AI:

That's the reason DeepSeek wants to use Images as AI Memory.

How Good Is DeepSeek’s Compression?

DeepSeek OCR can convert text into vision tokens and convert these vision tokens into text with high accuracy. Here is how well DeepSeek performs in the benchmark:

According to the stats. 100 tokens can store around 1000 words with almost perfect accuracy. This is 10 times smaller than the normal way.

Can This Change the Future of AI Memory?

Yes, today AI can handle maybe 128k to 1M tokens in long chats. With DeepSeek's compression idea, it could go to 10M to 20M.
Benefits from this - AI can remember more, faster response, cheaper computation

How Do Images Become Tokens?

DeepSeek uses a model called ViT: Vision Transformer to read images. In Simple terms, ViT cuts an image into small patches. Each patch becomes a token that the AI can understand.

Example:
A patch of 16x16 pixels has 256 pixels.
Each pixel has 3 colors (red, green, blue).
So 256 x 3 = 768 numbers = embedding for that patch. This lets the AI understand the image in small parts.

DeepSeek’s Secret Ingredient: Deep Encoder

The big issue with this process is that Images can produce too many vision tokens, which again increases the memory. So DeepSeek added a smart tool called the Deep Encoder.
Deep Encoder helps with:

Reducing the number of vision tokens
keeping only important parts
handling high-quality images better

It works in 2 stages:

Stage 1:
SAM (Segment Anything Model) - SAM always looks at which parts of the image matter the most.

Example - If the image has a page with text along with the background, SAM focuses on the text part, not the blank spaces.

Stage 2:
CLIP + ViT + Deep Encoder - Once SAM selects important areas:

CLIP ViT creates embeddings (understandable picture pieces)
Deep Encoder compresses these pieces into fewer tokens

Finally, it sends the compressed vision tokens to DeepSeek-3B MOE.

What Is DeepSeek-3B MOE (Mixture of Experts)?

It is a decoder model that chooses which expert module is best for the job. It has 3B total parameters, but only 570M are active at a time. This makes it fast and efficient. This decoder reads the vision tokens and converts them back to text.

Different Modes - DeepSeek OCR has different modes depending on how much detail is needed:

Why does this matter so much?

DeepSeek is not just improving the Optical Character Recognition but they are changing how AI stores, compresses and remembers information. This is going to be the biggest change in the way LLM works.

This research could lead to:

AI systems with huge memory to remember more context
Better knowledge storage
Faster processing
Cheaper AI costs

This could become a new type of AI memory.

Where Is It Available?

Research paper is released - https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
Code is on GitHub - https://github.com/deepseek-ai/DeepSeek-OCR

Happy Exploring! Happy Learning!

Each day, there is some article on AI a new LLM, Gen AI transforming the world, or AI Agents and Agentic AI as the next big thing. This will sound exciting, yet a little bit confusing, right? Are they same under different titles or do they really mean something different?

This is what I am explaining here. No technical terms, no textbook descriptions and just clear, real-life cases so you can finally get to see what makes Gen AI, AI Agents and Agentic AI special (and how all these relate to each other).

Generative AI

Generative AI is a kind of Artificial Intelligence that can generate the new content. It learns patterns using large volumes of data, either available on the internet or data that we feed it, and then produces text, images, audio, video, or even code.

The Generative AI systems, which are most commonly implemented, are based on the LLM (Large Language Model). The examples are ChatGPT, Gemini, Claude, or Perplexity. These models are trained on massive datasets and provide the answers depending on the information they have learnt.

In its simplest form, generative AI is reactive. It only reacts to your input, It does not think in advance, It does not have any actual memory, unless it is made to store context.

Example: Write emails, summarize documents, generate content, generate images and voiceovers, and many others. When you query a gen AI model by asking, "What is the weather today", it cannot provide the answer unless and until it is linked to the live data.

Advantages of using Gen AI

Content Creation: write articles, social media posts, emails
Automating Repetitive Tasks: Auto-generate product descriptions, summarizing documents, generating templates.
Multimodal Capabilities: Create logos, marketing visuals, generate voiceovers, podcasts
Boosting Productivity: Speed up the creative process, better decision-making

Limitations of using Gen AI

· Data Cutoff: Gen AI models can be trained until a certain date and they do not understand what has changed in real-time.
No Initiative: They will do nothing without prompting.
Accuracy Issues: Sometimes generate incorrect or hallucinated information.
No Personalization: It forgets past interactions without memory.
No Tool Use: Is unable to check live weather, flight prices, or to carry out transactions that are not connected to external APIs.

Real-World Examples of Gen AI

ChatGPT: Generates human-like text responses.
DALL·E / MidJourney: Creates images from text prompts.
GitHub Copilot: Assists developers by generating code.
Runway: Generates videos and creative media.

Recent Enhancements

The most significant change is the introduction of memory. More recent models, such as ChatGPT and other models, are able to both recall your preferences and context throughout communication.
Models can now work with large quantities of information simultaneously, think hundreds of pages of documents as opposed to a few pages.

AI Agent

AI Agent is a program that accepts the input, thinks, and performs an action to finish a task with the help of tools, memory, and knowledge. Unlike pure Generative AI, which only reacts with an answer, an AI Agent can actually do something with the information it generates. It is more independent and has some autonomy to make decisions. Usually, AI agents are designed to perform narrow, simple, and specific tasks effectively.

Once you create the LLM for your use case and give it access to external APIs or tools, your LLM is now smart enough to take action. For example, it can call the flight API and fetch the latest price of the ticket.

Let's say your LLM is not able to provide the response for the particular input, it will keep on looking for the external things that will be able to handle this particular case like – what is the weather today?

Think of it as A personal assistant who doesn’t just tell you “Flights are available” but actually goes and books one for you.

Advantages of AI Agents

Works Automatically: You only need to give it a task and it takes care of all the steps without you having to guide it every time.
Uses Tools and Apps: It is able to integrate with other applications, search the Internet, do data analysis, and manage programs to achieve tasks.
Saves Time: It could be available 24/7, so you do not have to do repetitive or multi-step tasks.
Remembers the Task Context: It maintains a record of its actions while doing the task, and therefore does not lose its way.

Limitations of AI Agents

Can Be Slow and Costly: Since it involves heavy processing with AI, it may take time and may be expensive to execute.
Makes Mistakes Sometimes: In the case of misunderstanding your purpose or the tool response, it could retrieve the incorrect information such as displaying you the prices of wrong date or leave a part of an answer.
Security Risks: it has access to other tools and data, there is a risk unless it is closely monitored.
Hard to Debug: When things go wrong it may be difficult to tell where and why it went wrong.

Real-World Examples AI Agents

Research Helper: Does the research work online and summarizes it on your behalf.
Data Assistant: It gathers, cleans and examines data automatically.
Travel Serch: Finds you the perfect flight and hotel deals.
Customer Support Agent: Reads the messages of the customers, interprets the problem, and forwards the request to the corresponding department.

Recent Improvements

Teamwork Features: New tools like AutoGen let multiple agents work together like a team.
Better Decision-Making: Agents are getting better at thinking through tasks and checking their own work.
Self-Correction: Some agents can now notice their own mistakes and try to fix them before finishing a task.

Tools for Building AI Agents

Zapier , N8N, LangChain , AutoGen , CrewAI

Tools AI Agents Can Use

AI agents often rely on other tools to do their jobs. Here are a few examples:

Web Browsers: To look up real-time information (e.g., weather, news, prices).
Calendars: To schedule meetings or check availability.

Databases/APIs: To fetch or update data (e.g., pulling customer info from Salesforce).
Code Interpreters: To run Python scripts for data analysis or file processing.
Email/Chat Apps: To send messages or notifications.

Agentic AI –

The next stage of AI Agents is Agentic AI. Agentic AI is an AI system that can make decisions without human intervention and can take actions on its own to achieve a goal without being told exactly what to do at every step. AI Agents take actions when instructed to do something. Whereas Agentic AI will act on its own, it is independent, thinks in advance, and acts proactively. It is not waiting to be told what to do, it can even know what you need.

In the Agentic AI, Multiple AI Agents will be there, and they will be collaborating with each other to complete the requirement.

Advantages of Agentic AI

Teamwork: Several agents collaborate with each other and each one of them takes a portion of the task.
Greater Accuracy: Agents check the work of each other and minimize mistakes and hallucinations.

Solves Complex Problems: Excellent in complex, multi-step problems such as making a detailed trip plan including booking the tickets, book hotels and related places to visit at the location or handling large data sets.
Flexibility: The agents are able to focus on various specializations (finance, travel, data) and merge their expertise.

Limitations of Agentic AI

Costly: More computing power and resources are required as many agents are being employed.
Hard to Implement: It is more difficult to design and coordinate a team of agents compared to when using one AI.
Slowness: Overkill, Simple tasks and Simple overweight projects.
Requires Supervision: Human monitoring is required to make sure that the team is working in a proper and safe manner.

Real-World Examples of Agentic AI

Trip Planning: A team of agents works together one finds flights, one books hotels, one creates an itinerary, and another checks for errors or better options.
Business Reports: One agent can gather data for you, another agent will analyze the data, another agent can create visuals, and the last agent can write the summary for you.
Software Development: One agent will write the code, another tests it, and a third reviews it for bugs or improvements.
Customer Support: It can look up your order, issue a refund, schedule a pickup, and confirm everything via email - all autonomously.

Tools for Building Agentic AI

CrewAI: good for creating teams of role-based agents (eg: researcher, writer) that collaborate on complex tasks.
AutoGen: Microsoft's framework for building custom groups of conversational agents that talk to each other to solve problems.
LangChain: Helps developers build multiple applications by connecting LLMs with external data sources and tools, enabling multi-step reasoning.
n8n: Workflow automation platform that lets you visually connect AI models with business apps (like CRM, email, databases) to create automated agents.

Recent News

Recently, ChatGPT introduced the ChatGPT Agent Builder, enabling users to create agents.

Happy Exploring! Happy Learning!

AI & ML

Data Engineering

Data Governance

ABOUT ME

The Kartik Voice

SUBSCRIBE & FOLLOW

POPULAR POSTS

Categories

Advertisement

Contact Form

Search This Blog

Not Just Logs — Stories from a Data Engineer