I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)

  • Barbecue Cowboy@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 hours ago

    With older hardware, once you accumulate enough vram to run it, your problem is going to shift to memory bandwidth and your questions is going to shift from ‘Can I run this Model’ to ‘Can I run this Model at an acceptable speed’.