Venturing into AI Security with Locally Hosted LLMs

Venturing into AI Security with Locally Hosted LLMs

“we believe that AI is more of a net positive for defenders than it is for adversaries.”

- Red Canary

The term ‘AI security’ can mean different things to different people. ‘Artificial intelligence’ encompasses various interconnected fields and applications, including machine learning, deep learning, Large Language Models (LLMs), image generation, and even deepfakes.

For those of us that live in the world of cybersecurity, AI will likely mean leveraging AI to help defend against cyber threats, and for the adversary, using AI not only helps with crafting more sophisticated social engineering campaigns but also assists in the development of malware or even in automating responses. A study from 2023 highlights just how LLMs are being used in social media botnets, with accounts amassing hundreds of thousands of followers which is undoubtedly used by nation states to spread disinformation, since these accounts have the ability to post ‘trending’ content very easily.

So you can see ‘AI security’ is already a vast topic, but for this blog post I’m going to focus on using LLMs to bolster security tooling and processes, and to leave you with an understanding of how adversaries are using LLMs in cyber warfare. Also, this is a relatively new area for me, so I figured that I’d simply turn my own notes into this blog, which will hopefully help you too!

Large Language Models are AI systems trained on massive datasets of text. As you’ll see with recent developments in AI, some advanced models, often referred to as multimodal models, can process text, audio (voice / music) and images. They can understand and generate human-like text, answer questions, and even write code (and do it very well!). Think of LLMs as incredibly smart text prediction engines on steroids. ChatGPT by OpenAI and Claude by Anthropic are prime examples, but they’re just the tip of the iceberg.

What’s really interesting is the rise of open-source LLM projects. Tools like Ollama, KoboldCpp, Llamafile, and LocalAI are bringing powerful language models to your local machine. This means you can run AI without relying on cloud services or worrying about data privacy issues. This is a game-changer for privacy enthusiasts, but also for defenders and potential threat actors.

For defenders, imagine having an AI assistant that can analyze logs, spot patterns in threat data, or even help write and optimize security rules. It’s like having a tireless analyst who never needs coffee breaks. These tools can help security operations teams sift through the noise and focus on what really matters. I know that sounds like a marketing pitch from a security vendor, but what I’m talking about is all open source, and locally hosted.

But we do have a problem. Adversaries have access to these tools too. They are starting to use LLMs to generate more convincing phishing emails, craft malware that’s harder to detect, or even automate parts of their attack chains. These threats are very real and already in play. I’ve personally used AI to craft ransomware as part of a research project, and you can bet threat actors are not sitting on their hands, but are actively exploiting these capabilities today. Cybercriminals and state-sponsored hackers are quick to adopt new technologies, and AI is no exception.

This rapid adoption of AI by threat actors makes it absolutely crucial for defenders to stay ahead of the curve. We need to be leverage these same technologies to enhance our detection capabilities, automate our responses, and predict new attack vectors before they’re exploited.

AI Cyber Warfare

How are the bad guys are using this tech? Adversaries aren’t just sitting on their hands while we beef up our defenses. They’re weaponizing LLMs in some pretty clever (and concerning) ways. For starters, they’re using these models to generate more convincing phishing emails. These are highly personalized, contextually accurate messages that can fool even the most vigilant users. They’re also leveraging LLMs to write malware, making it easier for less skilled attackers to create sophisticated threats. Some adversaries are using LLMs to analyze stolen data more efficiently, potentially de-anonymizing data breaches by using LLMs to perform correlation as discussed in this study.

Perhaps most concerning is the use of LLMs in disinformation campaigns. As I mentioned earlier, there’s evidence of LLM-powered social media bots amassing huge followings and spreading targeted misinformation. This isn’t just about stealing data or breaching systems anymore, but manipulating perceptions and influencing behaviors on a massive scale.

Locally Hosted LLMs

This is where things get interesting for both defenders and adversaries. I’ve already mentioned Ollama, LlamaFile, KoboldCpp, and LocalAI, but if you are just getting started with local LLMs, then you might also want to check out LM Studio, another open source project that runs on Windows, Mac, or Linux. There is also Text Generation Web UI which provides a local web UI.

If you’re serious about running LLMs locally, you’ll need some serious GPU power. I use an NVIDIA 3080Ti, but even that struggles with the larger models. The important resource for LLMs is VRAM, and you’ll lots of it! While your main system RAM is pretty fast, the bottleneck comes from sending data between the GPU and system RAM. When your graphics card runs out of VRAM, you’ll hit extreme slowdowns.

If you want to run the larger models, you’ll need something like an RTX 4090 with 24GB of VRAM. For even more power, there’s the RTX 6000 Ada with 48GB. Just be prepared for the price tag, these cards aren’t cheap, and at over $7,000 you might need a very good explanation for your significant other!

Models - Size Matters

When it comes to running LLMs locally, the size of the model matters a lot. If you’re working with a lower-end GPU, you’ll need to stick with smaller models. A great example is Vicuna-13B, available on Hugging Face. This model, with its 13 billion parameters, offers a good balance between performance and resource requirements. It can run on GPUs with as little as 16GB of VRAM, making it accessible for most with mid-range graphics cards.

As a privacy nerd, I don’t often talk about Meta in a positive light, but a very recent development has tilted the open source LLM landscape on it’s axis with the ridiculously huge 405B version of Llama 3.1. This instruction-tuned model comes in three sizes: 8B, 70B, and the 405B version. The 8B model is perfect for those with limited GPU resources, while the 70B strikes a balance between capability and hardware demands. The 405B version? 405B needs almost 1TB of VRAM! 810GB for the model itself (in FP16 precision), plus another 123GB for the KV cache when using the full 128K token context. That’s 933GB total.

For reference, even a high-end professional GPU like the RTX 6000 Ada, which I mentioned earlier, ‘only’ has 48GB of VRAM. You’d need about 20 of these cards to have enough memory for Llama 3.1 405B. And no, you can’t just slap 20 GPUs in a PC and call it a day. Even if you could physically fit them, the power requirements and heat generation would be insane, not to mention the software challenges of coordinating that many GPUs. If you have several million dollars at your disposal, with your own data center, then you’d need multiple NVIDIA H100 GPUs (with 80GB of VRAM each), with at least 1TB of system RAM (probably more), several terabytes of fast NVMe storage, industrial-grade cooling, and a power supply that could probably jump-start a small city. Companies like Meta are running these models on massive GPU clusters, possibly spread across multiple data centers. Did I mention the H100 GPU sells at over $41,000? We’d need at least 12 of these, so that’s over half a million US dollars.

For us mere mortals, if you want to play with LLMs locally, you’re better off with smaller models. They’re still incredibly powerful and won’t require you to take out a second mortgage (or get a loan from Elon Musk).

Openhermes2.5-mistral

Openhermes2.5-mistral is my favorite model for running locally. It’s based on the Mistral 7B architecture and, in my opinion, is probably the best local model for mid-range PCs right now. Openhermes2.5 excels at following instructions, engaging in open-ended dialogue, and even tackling some coding tasks. If you have a lower spec machine, you can also run quantized versions of the model.

Think of quantization as a compression technique for models. It’s like taking a high-resolution photo and compressing it to save space. You might lose some detail, but the image is still recognizable and takes up much less storage. In the world of LLMs, quantization reduces the precision of the model’s weights. Instead of using 32-bit or 16-bit floating-point numbers, a quantized model might use 8-bit integers or even 4-bit integers. This dramatically shrinks the model’s size and memory footprint.

I won’t go too far down the rabbit hole of this model, but for more information, @Teknium1 the co-founder of Nous Research, shares benchmarks which shows just how capable this model is.

AI Security Examples

Now that I’ve covered the basics of LLMs and local hosting, and hopefully that’s enough to get you started, I want to get back to how this can be used for AI security. Having conversations with chatbots is fun, but it’s not going to help us much as it relates to AI security. I see this falling into four areas:

  1. Using AI to defend against cyber threats
  2. Understanding threats and how adversaries use AI for malicious purposes
  3. Securing AI systems themselves (including data privacy)
  4. The ethical implications of AI in security
  5. Using AI to assist in OSINT investigations

For the purposes of this blog, let’s focus on the first two. I believe the best way to defend against cyber threats is to unlock the power of these local LLMs using Python. Once we can use LLMs within our code, we can then start using models like Openhermes2.5-mistral to help with threat analysis, and as mentioned earlier, ‘sift through the noise’. From there, we’re only limited by your imagination. Perhaps you want to write some Python code to perform malware behavior analysis, or just use the LLM to improve incident response playbooks. The world really is your oyster!

Example 1: Ollama with Python

I didn’t plan to make this a technical ‘how-to’, but I did want to provide you with the foundations of AI security, and in my opinion, why locally hosted LLMs are a critical factor.

In the resources section below, I’ve included a great tutorial by Tech With Tim, on how to install Ollama and use a locally hosted model with Python. You can also run OpenHermes 2.5 in Ollama with ollama run openhermes as shown here.

Example 2: Ollama API

Now this is where things get interesting. Ollama has an API, so with a simple curl request, you can interact with your locally hosted model.

curl http://localhost:11434/api/generate -d '{
  "model": "openhermes",
  "prompt": "Write some python code to convert CSV to JSON"
  "stream": false
}'

Example 3: PrivateGPT for File Analysis

This is a hidden gem. PrivateGPT is another tool that you can use to attach files into your LLM requests. I can see this as a powerful ally in analyzing malware, log files, etc. You can find the full docs here, and then start ingesting files. A full list of supported file formats can be found here.

Summary

There is so much more I could have covered here, from using Whisper to transcribe audio into text, before ingesting into PrivateGPT, to using a combination of LLMs with Python. This is still a relatively new area of cybersecurity, and I find it really exciting as we’ll see models become more efficient, and open source projects like these continue to emerge.

AI security isn’t just a buzzword. By leveraging locally hosted LLMs, we put ourselves in the same playing field as our adversaries, potentially leapfrogging them. Red Canary’s 2024 Threat Detection Report said it best with “we believe that AI is more of a net positive for defenders than it is for adversaries.” This is only true if cybersecurity professionals start to leverage open source tooling like I’ve mentioned here, to start doing some really innovative things to bolster our cyber defenses.

These models, running on our own hardware, give us the power to analyze threats, automate responses, and even predict attacks before they happen. But remember, the tool is only as good as the person wielding it. As you dive into AI-powered security, keep experimenting, keep learning, and most importantly, keep questioning. AI security is evolving rapidly, and staying ahead means staying curious.

Further resources