I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)

  • ryokimball@infosec.pub
    link
    fedilink
    English
    arrow-up
    6
    ·
    16 hours ago

    How old we talking? I personally wouldn’t go further back than 2000 series rtx. A friend has had good luck with Intel GPUs for ‘cheap’.

    No, you absolutely cannot scale horizontally for speed. VRAM is king, with local RAM being swappable with major speed penalties. SSD is even slower than that and all those are orders of magnitude faster than ant Ethernet you’ll be connecting boxes together with. That’s not to say clustering isn’t an option, just that speed is going to be worse the more you scale out like that.

    • droopy4096@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      16 hours ago

      I’ve got some circa 2010 cards laying about with a 32Gb server that already has 8Gb carved out for TrueNAS, so essentially I could squeeze 16-24Gb out of it, but it’s an older i5 Intel CPU

      • robber@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        Your biggest issue with 2010 cards will be software (inference engine) support, I assume.