AI 이미지 인식방법

모카쨩 2025. 5. 31. 08:34

https://www.youtube.com/watch?v=ZaeT7N8XH3A

라마샵 LLaMaSharp 사용법은 아래 링크

https://wmmu.tistory.com/entry/%EB%9D%BC%EB%A7%88%EC%83%B5-LLaMaSharp-%EC%82%AC%EC%9A%A9%EB%B2%95

라마샵 LLaMaSharp 사용법

https://youtu.be/5WpHA-wr_l4 이름에도 보다시피 LLM, 즉 AI를 C#에서 사용하기 위한 라이브러리다 파이썬이 일반사용자용 프로그램만들때 거지같은 점이 있기 때문에(내가 파이썬을 쓰레기 언어라고 하

wmmu.tistory.com

2024년 3월 31일 AI 응용프로그램에 커다란 발전이 이루어진다

LLaMaSharp에서 LLaVa라는 이미지처리와 텍스트처리를 하는 모델을 지원하기 시작한것

ChatGPT에서 2023년 9월에 이미지 처리를 지원했으니 당대 최신기술이 일반사용자 레벨로 내려온것이다

이것을 전문용어로는 멀티모달이라고 한다

나온지 얼마 안되어서 그런지 너무 무거워서 느리다

https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf

cjpais/llava-1.6-mistral-7b-gguf · Hugging Face

GGUF Quantized LLaVA 1.6 Mistral 7B Updated quants and projector from PR #5267 Provided files ORIGINAL LLaVA Model Card Model details Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an

huggingface.co

위 링크에서 두개를 다운받는다

주의할점은 모델을 두개나 올리기때문에 VRAM이 정말정말 부족하다는것이다

다행스럽게도 위 gguf는 초경량모델이라 그나마 문제가 덜하다

https://scisharp.github.io/LLamaSharp/0.12.0/Examples/LLavaInteractiveModeExecute/

https://scisharp.github.io/LLamaSharp/0.11.2/xmldocs/llama.common.modelparams/

위 공식문서들을 참고하여 아래 코드를 만들었다

공식문서에서는 KvCacheRemove를 이용해 캐시를 리셋하여 사용하지만

완전히 지워지지 않거나 메모리 공간은 여전히 차지하는지

온갖 문제가 발생하여 나는 executor를 재생성하는 방식으로 했다

https://gist.github.com/ahzkwid/6320132dc70b70b9d706059ca3128268

텍스트와 이미지처리

텍스트와 이미지처리. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

using LLama;
using LLama.Common;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing.Imaging;

namespace Ahzkwid
{
    public class LlavaMultimodalAgent
    {
        private readonly ModelParams parameters;
        private readonly LLamaWeights modelWeights;
        private readonly LLavaWeights clipProjection;

        //private readonly InteractiveExecutor _executor;
        public LlavaMultimodalAgent(
        string modelPath = "Models/llava-v1.6-mistral-7b.Q4_K_M.gguf",
        string clipModelPath = "Models/mmproj-model-f16.gguf")
        {
            Debug.WriteLine("Initializing LlavaMultimodalAgent");
            parameters = new ModelParams(modelPath)
            {
                ContextSize = 1024,
                GpuLayerCount = 32,
                //Threads = Environment.ProcessorCount * 4,
                //BatchThreads = Environment.ProcessorCount * 4,
                BatchSize = 256,
                //UBatchSize = 256,
            };
            modelWeights = LLamaWeights.LoadFromFile(parameters); 
            clipProjection = LLavaWeights.LoadFromFile(clipModelPath);


            Debug.WriteLine("Model initialization complete.");
            Debug.WriteLine("");
            Debug.Write("USER:");
        }

        public async Task<string> DescribeImageAsync(Bitmap bitmap, string question)
        {
            var context = modelWeights.CreateContext(parameters);
            var _executor = new InteractiveExecutor(context, clipProjection);


            var imageBytes = new byte[] { };


            if (bitmap != null)
            {
                imageBytes = BitmapToBytes(bitmap);
            }

            //_executor.Images.Clear();
            //_executor.Context.NativeHandle.KvCacheRemove(LLamaSeqId.Zero, -1, -1);
            _executor.Images.Add(imageBytes);

            //string prompt = "<image>\n<|USER|>\n" + question + "\n<|ASSISTANT|>\n";
            string prompt = $"<image>\n\nUSER:\n{question.Trim()}\nASSISTANT:\n";
            Debug.WriteLine(question);


            var antiPrompt = "USER:";
            var inferParams = new InferenceParams
            {
                MaxTokens = 256,
                AntiPrompts = new List<string> { antiPrompt },
            };

            var result = new System.Text.StringBuilder();
            await foreach (var token in _executor.InferAsync(prompt, inferParams))
            {
                result.Append(token);
                Debug.Write(token);
            }

            return result.ToString().Replace(antiPrompt, "").Trim();
        }

        private byte[] BitmapToBytes(Bitmap bitmap)
        {
            using (var ms = new MemoryStream())
            {
                bitmap.Save(ms, ImageFormat.Png); 
                return ms.ToArray();
            }
        }
    }
}

개발된곳이 외국이다보니 한국어에 쥐약이므로 가급적 영어를 써주자

한국어가 아예 안 되는건 아니니 귀찮다면 한국어로 해도 된다

https://www.youtube.com/watch?v=ZaeT7N8XH3A

제대로 된다

끝