Intelligent CIO Europe Issue 04 | Page 61

////////////////////////////////////////////////////////////////////////// can help determine the sophistication of their systems and the potential advantages and pitfalls. Compare data sources. What is the data set used in the artificial intelligence? Over time, security analytics has advanced from events, to alerts, to incidents. As more systems become digital and cloud-based, there’s even more data available (think IoT sensors and micro-service interactions). The increasing data volume has improved the potential benefit while exacerbating the challenge of finding meaning in the data quickly enough to act on it effectively. For enterprise threat detection and analysis, meaningful data depends on context from application, transaction, user and session visibility. These data elements may come from different sources, or from network traffic analytics that capture and analyse the entire L2-7 data set, subsets, or metadata. The highest-level (L7) application data set offers the most interesting data for behavioural anomalies and pattern matching; the lower-level packet data offers the most details for forensics. What network data architectures do you support? Be sure your vendor can collect the network data in your evolving business architecture, including north-south, east-west and cloud visibility. The network is the best source of data from which AI systems can learn about all aspects of your business, because every significant activity touches the network. And that dependency is rising, as cloud services, including Software-as- a-Service (SaaS), depend on the network as well. Further, wire data is different from logs and other self-reported data, as it is empirically observed. What data do you collect from encrypted traffic? Encryption is increasingly a default technique used by the good guys to foil government monitoring and protect data privacy and also used by the bad guys to foil security tools and extend the lifespan of new techniques. More of this traffic is using advanced encryption based on www.intelligentcio.com perfect forward secrecy (PFS) rather than public key encryption (PKE). Conceptually, unlike the shared key models of PKE, PFS encrypts each session with a single-use key. If someone steals the key, they only access that individual session’s data. The increasing prevalence and complexity of encryption presents a challenge for AI-driven analytics. Decryption approaches can introduce a time lag and counteract the security benefits of encrypting in the first place. AI-driven security analytics have to be able to decrypt data at line rate in a way that doesn’t expose sensitive data to risk. Few tools meet this requirement. Decryption is a common feature of many perimeter (north-south) security systems, including firewalls and web gateways. However, many remote users interact directly with the cloud provider, without traversing traditional on-premises networks. And perimeter systems perform other control duties. When resources get strained, the system will let encrypted traffic go through uninspected. One hedge for analytics systems is to only decode headers and other metadata for encrypted traffic. This leaves a huge blind spot. The actual content of the traffic may include info that is vital to catching bad actors: malware, suspicious database commands, sensitive files, SQL injection attacks, command and control behaviour and more. Security-aware companies are encrypting to improve their defences against friend and foe. The downside of all this encryption is that an attacker or insider who gets into the network will be obfuscated as well, proceeding without anyone monitoring. In attack terms, east-west activities include reconnaissance scans for vulnerable and desirable targets, lateral movement between devices and transfer of data from an internal-only system to a system with external access privileges. These actions are low-frequency, high-risk activities that are ripe for AI analytics. It’s likely that your business will encrypt more traffic within and beyond your enterprise, so the vendor’s model for analysing this traffic should be a make-or- break consideration. FEATURE: THREAT ANALYSIS How many and which protocols are providing data? Once you peel off encryption, there’s a wealth of data at the application protocol layer. Most businesses have many different protocols running, yet most analytics start with just one or a few. Visibility into your existing protocols with extensibility to your custom protocols will be very important in detecting relevant activities with AI. As you evolve your digital business and introduce new specialty systems (IoT sensors, supply chain partners, cloud services), customisation features may become vital. Compare data science. How do you train and tune your AI engines? There’s value in both supervised and unsupervised machine learning. Supervision can help refine and evolve algorithms to improve accuracy and detection of new and specialised threat artifacts. Unsupervised training can be used to identify previously unknown attacks and insider threats, plus you don’t have to pay for, or wait for vendor staff to enhance detection. The devil is in the details. Some vendors perform AI on a dedicated system at the customer site, so learning comes only from local detections, plus occasional updates from the vendor. Other vendors collect data from multiple customers and train AI engines across customer data sets. This shared education advances detection more quickly through crowdsourcing of data. When you overlay these data training strategies onto the variations in data sources discussed in the previous section, AI analytics quality can drop quickly. Plus, the data shared with the cloud may be minimised or anonymised to protect privacy. These three variables; type of training, data sources and data anonymisation, all affect the potential for accurate, timely AI. How do you reduce false positives? Early efforts in AI often add to the decision clutter with alerts of dubious value. This weakness can be offset by better detectors (algorithms), better data sources (such as the list of file names communicated rather than simply a record of communications), INTELLIGENTCIO 61