Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices

(nexa.ai)

62 points | by BUFU 14 hours ago ago

10 comments

gizajob 9 hours ago ago
Its description of the art piece is so awful.
[-]
- ImageXav 3 hours ago ago
  I thought the same, but the description of the cat picture is pretty spot on. I wonder if this is a dataset issue. Cat pictures are far more prevalent than abstract art on the internet so might well be overrepresented. Can Vision LLMs deal with a long tail of underrepresented objects when small? Or can they only do so at scale?
nighthawk454 9 hours ago ago
Easy to try here: https://huggingface.co/spaces/NexaAIDev/omnivlm-dpo-demo
https://i.imgur.com/44XYyXU.png
[-]
- TacticalCoder 5 hours ago ago
  I saw a turntable at a shop recently and my inner classifier went: "Oh a DSTOM turntable, that's sweet!"
  https://www.project-audio.com/en/product/the-dark-side-of-th...
  I was kinda expecting the model in your picture to make the link with the album cover.
jsjohnst 13 hours ago ago
Need to try this directly before passing judgement, but this can unlock a few project ideas I have if the quality lives up to the examples with this low of resource requirements.
throwaway314155 13 hours ago ago
Can GitHub please acquire all these model-hub companies like fal, replicate, ollama, hf, and checks notes "nexa.ai"? That way we can get past the inevitable fragmentation and ultimate breaking of everyone's workflow w.r.t. ML-oriented dev ops?
[-]
- gessha 4 hours ago ago
  When faced with a diversity of implantation, why is the goto “let’s have a corporate entity acquire them all” instead of “let’s come up with a good runtime standard”. The company is going to do the same thing anyway except with the additional risk of messing up the API and throwing away the hard work of so many people.
- croes 13 hours ago ago
  You want everything under the control of Microsoft?
- byyoung3 13 hours ago ago
  Satya is that you?
zhiyuan8 11 hours ago ago
I definately wish to try this https://nexa.ai/blogs/omni-vision