top of page

82 results found with an empty search

  • Inside IR35, the 55% Tax Rate, so much for fairness.

    How can HMRC take more in tax than an inside-IR35 contractor takes home, while still claiming the system is fair, when outside-IR35 contractors and employees already pay broadly similar amounts of tax? Let me explain..... HMRC says the off-payroll rules make sure a worker pays “broadly the same Income Tax and National Insurance as an employee would”. That is the official line. The difficulty is that this only describes part of what happens. A direct employee is paid through PAYE. Their wages are not subject to VAT, because services provided by an employee to their employer are outside the scope of VAT. An outside-IR35 contractor works through a company. That company pays corporation tax on profit, and the individual then pays dividend tax when profits are taken out. So outside IR35 is not tax free. HMRC still takes a substantial amount through company tax and personal tax. An inside-IR35 contractor is usually paid through an umbrella or other intermediary. HMRC’s umbrella guidance and its key information example show that the assignment rate is first reduced by employer costs such as employer National Insurance, Apprenticeship Levy, holiday pay arrangements, pension costs and umbrella margin before gross pay is calculated. PAYE and employee National Insurance are then deducted from what is left. Outside IR35 already carries a real tax burden The outside contractor does not sit outside the tax system. The company pays corporation tax, and the contractor then pays dividend tax on distributions. HMRC’s published rates also confirm the personal allowance taper above £100,000, which increases the effective marginal rate in that band. So the case for IR35 should not be based on the idea that outside contractors pay little or nothing. They do pay tax. The difference is that they are running a business and carrying business risk. They have to cover gaps between contracts, buy equipment, pay for training, fund pensions, and manage their own contingency. Inside IR35 creates a different problem Inside IR35 is often described as putting the contractor on a similar footing to an employee. In tax terms, that is only partly true. A real employee gets PAYE treatment, but they also get an actual employment relationship. That may include continuity, notice, redundancy protection, employer benefits, a pension contribution, and internal support for training and development. The inside contractor normally does not get that package. They are still temporary, they still face the risk of contracts ending, and they still have to think about pension, training and downtime themselves. At the same time, HMRC’s umbrella model means the assignment rate is reduced by employer-side costs before the worker even gets to gross pay. That is why many contractors see inside IR35 as worse than either of the cleaner models. It is not the same as being a normal employee, and it is not the same as running a business. Why inside rates rise in the real world If an outside contractor is pushed inside IR35, their take-home usually falls unless the day rate rises. That is because the inside structure is carrying more tax liabilities. Employer-side costs come out of the assignment rate first, then PAYE and employee NIC are taken from gross pay. HMRC’s umbrella example shows that clearly. That is why inside roles often command a higher day rate in practice. The rate increase is not because the work changed. It is because the tax and payment structure changed. Once the rate rises, the VAT on the labour supply rises as well, because supplies of staff are normally standard-rated for VAT. That is different from a direct employee, where wages are outside the scope of VAT. The VAT issue This is where the comparison with direct employment becomes awkward. A direct employee has no VAT on wages. HMRC is clear on that. A supply of staff is normally subject to VAT at the standard rate. HMRC is clear on that too. So an inside contractor can end up in a position where HMRC says they should be taxed broadly like an employee, but the labour still sits in a VATable supply chain that a real employee would never be in. That means the worker can face employment-style tax treatment while the client is still paying VAT on the labour supply. That is not the same as saying wages themselves are subject to VAT. They are not. The point is that the inside structure can still make the engagement more expensive than direct employment because the labour is being supplied through a VATable chain. Worked comparison Take a role with a market value of £600 a day over 220 days. That gives a base annual labour value of £132,000. Using current HMRC rates and umbrella mechanics, the broad picture looks like this: Model Basis Client cash cost incl. VAT VAT in client cost Approx. HMRC take, excl. VAT Approx. HMRC take, incl. VAT in chain Approx. net to worker Direct employee Employer cost equivalent to £600/day £132,000 £0 ~£57,600 ~£57,600 ~£74,400 Outside IR35 £600/day via PSC £158,400 £26,400 ~£50,500 ~£76,900 ~£81,500 Inside IR35, no uplift £600/day via umbrella £158,400 £26,400 ~£56,900 ~£83,300 ~£73,800 Inside IR35, 20% uplift £720/day via umbrella £190,080 £31,680 ~£72,900 ~ £104,600 ~£84,200 These figures are illustrative, not personal tax advice, but they reflect the structure shown in HMRC’s PAYE, NIC, VAT and umbrella guidance. The table shows four things. First, outside IR35 already produces a meaningful tax take. It is not a low-tax model in any ordinary sense. Second, inside IR35 at the same headline day rate can leave the worker with less than outside IR35, because the assignment rate is reduced before PAYE even starts. Third, if the inside contractor pushes for a higher rate to restore take-home, the client cost rises and the VAT rises with it. Fourth, inside IR35 can leave HMRC taking more than the worker takes home, once the full tax take in the chain is counted. Using the figures in the table: For inside IR35 at £600 a day with no uplift, the worker takes home about £73,800, while HMRC takes about £83,300. That means HMRC takes roughly £9,500 more than the contractor keeps. Put another way, HMRC takes about 113% of the worker’s take-home, or about 53% of the combined tax-plus-take-home total, leaving the worker with only 47%. For inside IR35 at £720 a day with a 20% uplift, the worker takes home about £84,200, while HMRC takes about £104,600. That means HMRC takes roughly £20,400 more than the contractor keeps. In percentage terms, HMRC takes about 124% of the worker’s take-home, or about 55% of the combined total, leaving the worker with only 45%. Let that sink in 55% tax rate for inside IR35 Contractor, with no rights, no expenses, no job security, or pension. Mobility and expenses This also affects where people are willing to work. Since April 2016, tax relief for home-to-work travel and subsistence has been restricted for workers providing personal services through an employment intermediary, such as an umbrella company, where they are subject to supervision, direction or control, or the right of it. HMRC’s Employment Status Manual sets this out, and the policy paper behind the change explains that the aim was to stop workers engaged through intermediaries claiming relief that direct employees could not usually claim. In practice, that means an inside contractor working away from home is often in a much weaker position on travel and subsistence than an outside contractor running a genuine business. Train fares, fuel, hotels and meals can stop being tax-efficient business expenses in the way many contractors expect. At the same time, the income funding those costs has already been pushed through PAYE and employee NIC, with employer-side costs already taken out of the assignment rate upstream. That changes behaviour. If a role is two hundred miles away and the worker cannot sensibly offset travel and accommodation costs, the role becomes much less attractive unless the rate goes up. So inside IR35 does not just affect take-home pay, it also makes it harder to take on work that involves travel or working away from home. Training, skills and self-funding A contractor normally funds a lot of their own development. That includes exams, courses, lab equipment, cloud spend, software, books and time between contracts. Outside IR35, the contractor is at least operating through a business structure that allows them to plan around that. Inside IR35 leaves less room. If more of the contract value is lost to employer-side costs, PAYE and NIC before the worker sees the money, there is less available for training, pension saving and contingency. That does not show up neatly on a tax table, but it does affect how easy it is for someone to keep their skills current. The main point If you only compare PAYE and NIC, you can say inside IR35 looks closer to employment. If you look at the whole position, including VAT on staff supply, the loss of deductible travel in many intermediary arrangements, the reduction in take-home at the same headline rate, and the lack of direct employee protections, the position is much less tidy. Outside IR35 already pays tax. Direct employment has no VAT on wages and comes with an actual employment relationship. Inside IR35 can combine lower take-home, higher client cost, VAT in the supply chain, limited expenses for mobility, and less room for pension, training and contingency. That is why many people do not accept the simple line that it creates parity. It may move tax treatment in that direction, but the wider commercial and practical position is still not the same.

  • AI Prompt Engineering for IT Pros Without Producing Slop

    There is no shortage of nonsense written about AI prompt engineering. I am not claiming to be an expert, but I have spent time researching, testing, and working out what actually improves the output. AI, well acutally LLM, as it's not true AI, is now an invaluable assistant that doesn't charge by the hour or day. Prompt engineering is simply the skill of asking clearly, giving the right context, setting boundaries, and knowing what sort of result you actually want. That is it. No magic phrases. No hidden syntax. No secret incantation that turns a chatbot into an infallible architect, engineer, or security consultant. For technical people, that is actually good news. It means prompt engineering is not some ridiculous new profession. It is just the same engineering discipline you already use elsewhere, precision, structure, context, constraints, and validation. The basic rule Take this as a starting point. A weak prompt looks like this: How do I make an AI drone? That sounds reasonable until you notice what is missing. No audience, no scope, no safety boundary, no skill level, no budget, no structure, and no clue whether the answer is supposed to be a quick overview, a shopping list, a build guide, or a technical architecture. A better version looks like this: Explain how to build a hobbyist AI drone for a technical reader familiar with Linux, Python, Jetson or Raspberry Pi devices, and basic electronics. Focus on a safe build using a flight controller, companion computer, camera module, telemetry link, and onboard computer vision. Structure the answer as hardware, software, integration, testing, and safety considerations. Keep it practical and avoid hype. That is the heart of prompt engineering. Remove ambiguity, get better output. A simple structure that works Most useful prompts contain the same core parts, whether you are generating an article, reviewing code, analysing a screenshot, or extracting structured data. You need the task, the context, the constraints, the source material if there's any, and the output shape. In plain English, that usually means: What do you want done. Who is it for. What matters. What should be avoided. What the result should look like. A practical formula is: Act as a [role]. Create [output]. It is for [audience]. Use this context: [context].Requirements: [constraints]. Avoid: [exclusions].Output as: [format]. Practical prompt examples The easiest way to understand prompt engineering is to look at weak prompts beside stronger ones. The pattern becomes obvious very quickly. Explaining the basics Weak prompt: How do I make an AI drone? Better prompt: Explain how a hobbyist can build an AI drone for learning purposes. Assume the reader understands Linux, Python, and basic electronics but has never built an autonomous drone before. Cover the main components, how they fit together, and the difference between a normal drone and one with onboard AI functions. Keep it practical and readable. Why it works: It defines the reader, the scope, and the level of explanation. Writing a technical article Weak prompt: Write about AI drones. Better prompt: Write a technical article for engineers and advanced hobbyists on how to build an AI drone using a flight controller, companion computer, camera, and onboard vision model. Focus on real-world build choices, compute limits, latency, telemetry, control loops, and safety constraints. Keep it grounded, practical, and free of generic tech hype. Why it works: It narrows the topic and tells the model what matters. Creating a beginner build guide Weak prompt: Give me the steps to build an AI drone. Better prompt: Create a beginner-friendly build guide for a hobbyist AI drone. Assume the reader is comfortable with Linux and Python but is new to drone hardware. Cover frame, motors, ESCs, flight controller, battery, camera, companion computer, telemetry, and safe testing. Keep it simple and structured. Why it works: It asks for a guide, defines the starting point, and gives the answer a clear shape. Designing the system properly Weak prompt: Design an AI drone. Better prompt: Design a hobbyist AI drone using a standard flight controller for stabilisation and a separate companion computer for onboard AI processing. Explain the role of the ESCs, motors, GPS, IMU, camera, telemetry radio, battery, and onboard computer. Show how data flows between components and highlight likely issues with latency, power draw, and signal loss. Why it works: It turns a vague request into an actual architecture exercise. Working within a budget Weak prompt: Suggest parts for an AI drone. Better prompt: Write a build plan for an AI drone with a budget of £800 to £1,200. Prioritise stability, safe testing, and offline object detection rather than acrobatics or long-range flight. Use a conventional flight controller and a separate companion computer, such as a Jetson Nano. Output the answer as a bill of materials, architecture summary, software stack, test plan, and known limitations. Why it works: It adds budget, priorities, and a specific output format. Generating Python safely Weak prompt: Write Python for an AI drone. Better prompt: Write a Python 3 script for the Jetson Nano on a hobbyist AI drone that reads frames from a camera, performs basic object detection, and outputs structured telemetry events rather than directly controlling flight. Keep the code modular, readable, and safe for lab testing. Include error handling and log detection confidence, timestamp, and object class. Why it works: It defines what the code should do, what it should not do, and how it should behave. Reviewing PowerShell properly Weak prompt: Check this script. Better prompt: Review this PowerShell script as if you were validating an engineering or audit tool. Focus on logic errors, false positives, bad assumptions, output usefulness, and edge cases. Do not rewrite the whole script unless necessary. Show the most important issues first, then provide targeted fixes. Why it works: It asks for review rather than a random rewrite. Images and Screenshots Weak prompt: What is this screenshot? Better prompt: Analyse this screenshot and explain what the error likely means, what subsystem is involved, and what checks should be performed first. Do not just describe the image contents. Why it works: It asks for interpretation. Generating an image Weak prompt: Make an image of an AI drone. Better prompt: Create a wide technical blog banner in a dark, high-tech style. The subject is an AI drone operating in a hostile RF environment, with subtle references to computer vision, telemetry, navigation, and control link interference. Keep it sharp, serious, and minimal. Avoid toy-drone styling, cartoon clichés, and generic stock art. Why it works: It defines purpose, style, and what to avoid. Extracting structured data Weak prompt: Summarise this drone report. Better prompt: Read this report and extract the findings into JSON with fields for title, severity, affected_component, summary, evidence, and remediation. Keep the wording concise, preserve technical meaning, and do not invent missing details. If something is unclear, mark it as unknown rather than guessing. Why it works: It tells the model exactly how to shape the output. Asking for a serious long-form article Weak prompt: Write a good article about AI drones. Better prompt: Write a long-form technical article on how to build a hobbyist AI drone using a flight controller, companion computer, camera, and onboard processing. Keep the tone direct, practical, technical, and grounded in real-world constraints. Cover hardware selection, software stack, vision processing, telemetry, latency, power draw, testing, and safety controls. Include practical examples from beginner to advanced level. No sales language, no obvious AI phrasing, and no exaggerated claims about autonomy or intelligence. Why it works: It defines tone, scope, and the usual traps to avoid. How to validate output without reading every line This is where AI-assisted work either becomes efficient or turns into a complete faff. Cross Checking with Other Models: If the task matters, use another model to challenge the answer. ChatGPT, Gemini, and Claude often fail in different ways, which makes cross-checking genuinely useful. You may notice I don't reference Microsoft's CoPilot, and there's a perfectly no nonsense straight up answer, and in my opinion, it's sub-optimal and on par with Bing as a service when compared to Google. And yes, multiple models agreeing can increase confidence. It is still not proof. Treat it as confidence scoring, not truth. Validating Output yourself: If you're stuck with a particular model and need to validate every sentence manually, the time savings disappear. The answer is not to trust the output blindly. The answer is to validate more intelligently. Start with structure. Are the right sections present? Are the main risks surfaced? Does the answer actually match the task? Is anything suspiciously specific without evidence behind it? Make the model expose its own weak points. Ask for assumptions and uncertainty: List any assumptions you made, anything that may be uncertain, and anything that should be verified before this is used operationally. Ask for a self-critique: Review your own answer and identify the weakest parts, likely inaccuracies, and anything that sounds more confident than the evidence supports. A good review prompt for a second model is: Review the following technical answer for factual accuracy, omissions, hidden assumptions, and overconfident wording. Do not rewrite it yet. First identify anything that looks wrong, weak, or unsupported. When working from reports or documents, it also helps to force an evidence-first workflow: Extract the key evidence points first. Then group them by severity. Then write a summary based only on those points. For code, test behaviour, worry less about syntax. Run it with good input, bad input, and awkward edge cases. What it is actually good for Used properly, AI is good at drafting, rewriting, explaining, reviewing code, analysing screenshots, summarising reports, extracting structure from messy information, and turning rough notes into something usable. It is also genuinely helpful for project work. If I can drag myself away from work, I may actually have time to build the AI drone. It can refine build guides, explain component roles, review Python snippets for image processing, help draft test plans, and turn messy technical notes into documentation that a person can actually read. Final thoughts The better you define the task, the audience, the constraints, and the output shape, the better the result tends to be. The less guessing the model has to do, the less rubbish you have to clean up afterwards.

  • LOLDrivers, why kernel drivers are the new attack surface

    Why Old Signed and Legitimate are a Risk Old but legitimately signed drivers are dangerous because they provide a trusted execution path straight into the Windows kernel, where most modern security controls have no visibility. Once a vulnerable driver is loaded, an attacker can abuse known flaws to gain arbitrary kernel read and write access, disable security features, tamper with credential protections, and hide processes or files, all while appearing “trusted” because the driver is signed. This technique, commonly called BYOVD (Bring Your Own Vulnerable Driver), bypasses application control, kernel exploit mitigations, and many EDR hooks, allowing attackers to operate below the operating system’s security boundary. LOLDrivers.io to the Rescue LOLDrivers  is a curated, continuously maintained catalogue of Windows drivers that have been abused, or are suitable for abuse, in real attacks. The list covers both malicious drivers and legitimate signed drivers with exploitable flaws. Visit LOLDrivers , its a great resource, its free, and the database of browserable drivers is awesome. The original goal was to embed an offline reference of the LOLDrivers dataset into the security tool I am building. Instead, I decided to release my own PowerShell implementation. This script creates a local, offline copy of the LOLDrivers database and provides clear, colour-coded output when a match is detected. What “Living Off the Land Drivers” actually means Living off the land usually refers to abusing built-in system tools. LOLDrivers applies the same concept to the kernel. Instead of dropping custom malware, attackers load a signed, trusted, but vulnerable driver and use it as a control interface into kernel memory. Once loaded, these drivers can be used to: Read and write arbitrary kernel memory Kill protected security processes Disable EDR hooks Bypass HVCI and protected process light (PPL) Load unsigned kernel code The risk categories in LOLDrivers LOLDrivers splits drivers into functional risk classes. This is important, because not all of them are malicious by design. Vulnerable but legitimate drivers These are signed drivers shipped by vendors that expose IOCTL handlers or memory primitives that can be abused. They are frequently used for: Arbitrary kernel read/write Token manipulation Security product termination Callback and hook removal These are the backbone of most BYOVD chains. Explicitly malicious drivers These drivers contain intentionally malicious functionality. They are often used as stealth rootkits or kernel loaders. Typical capabilities include: Process hiding File hiding Credential interception Persistence enforcement These drivers usually exist only to support a wider malware framework. Dual-use operational drivers “Dual-use” drivers are legitimate software  for admins, OEMs, and hardware vendors, but dangerous in the wrong hands . These drivers are not exploits, but they expose kernel-level control surfaces  that attackers can directly abuse. Typical capabilities include: Physical memory access MSR and PCI configuration Debug and hardware inspection features When abused, they provide the same primitives as a kernel exploit, without triggering exploit detection. Why Microsoft’s blocklist is not enough Microsoft maintains a kernel driver blocklist, but: It is reactive It is incomplete It does not cover every vulnerable version It is often bypassed by version pinning or re-signing What Needs to Be Done At a minimum, organisations must start treating driver loading as a high-risk security boundary, not a background system event. If you are not monitoring driver activity, you are blind to one of the most reliable attacker techniques in use today. The baseline controls should include: Integrate LOLDrivers intelligence into Sysmon, using the curated driver blocklist and metadata maintained by Magicsword.io , so that known vulnerable and malicious drivers can be detected at load time. Log every driver load event, not just failures. Silent, successful driver loads are how attackers bypass EDR and kernel protections. Continuously compare loaded drivers against the LOLDrivers dataset, both in real time and retrospectively, so newly classified drivers can be flagged even after they have already been seen in the environment. Alert based on abuse category, not just a hash match. A driver used for credential theft, EDR bypass, or kernel memory access represents a fundamentally different risk than a generic vulnerable driver, and should be triaged accordingly. Finally, credit where it's due Credit goes to the LOLDrivers project, without their work, this attack surface would remain largely undocumented, leaving defenders blind to a class of kernel-level abuse that is actively exploited. Show your support and visit their site.

  • Quantum-Resistant Cryptography: Preparing for the Post-Quantum Era

    A special thanks to 'D' for proofreading and providing valuable insights. With the rapid advancements in quantum computing, the world of cybersecurity is on the brink of a major transformation. While quantum computing promises breakthroughs in various fields, it also poses a significant threat to traditional encryption methods. Many of the cryptographic systems that secure our digital world today, such as RSA and ECC, could become obsolete in the face of quantum-powered attacks. This raises an urgent need for quantum-resistant cryptography, a new class of cryptographic algorithms designed to withstand attacks from quantum computers. What are Quantum Computers Quantum computers are a revolutionary leap beyond classical computing, leveraging the strange and counterintuitive principles of quantum mechanics to process information in ways that are fundamentally different from traditional computers. At the core of a quantum computer are qubits (quantum bits), which, unlike classical bits that can only be 0 or 1, can exist in a superposition of both states simultaneously. This enables quantum computers to perform vast numbers of calculations in parallel, drastically increasing their computational power for certain types of problems. Another key principle is entanglement, where qubits become intrinsically linked, allowing changes to one qubit to instantaneously affect another, no matter the distance between them. This interconnectedness enables faster and more complex computations than classical systems. Additionally, quantum computers leverage quantum interference, manipulating probabilities to guide calculations toward the correct solution. While mainstream applications are still years away, quantum computing has the potential to revolutionize fields from artificial intelligence to materials science, unlocking new levels of computational power never before possible. The Threat of Quantum Computing to Encryption At the core of modern cryptography are mathematical problems that are computationally difficult for classical computers to solve. For instance: RSA (Rivest-Shamir-Adleman)  relies on the difficulty of factoring large numbers. Elliptic Curve Cryptography (ECC)  is based on the discrete logarithm problem. Diffie-Hellman Key Exchange  also depends on the discrete logarithm problem. These cryptographic methods are currently secure because classical computers would take an impractically long time to break them. However, quantum computers leverage principles like superposition and entanglement, allowing them to perform complex calculations exponentially faster than classical machines. One of the biggest threats is Shor’s Algorithm, which, once implemented on a sufficiently powerful quantum computer, could efficiently break RSA and ECC encryption. This means that secure communications, digital signatures, and even blockchain-based systems could be compromised. The "harvest now, decrypt later" strategy is a significant cybersecurity concern, especially in the context of post-quantum cryptography. In this approach, adversaries intercept and store encrypted data today, even if they cannot decrypt it with current technology. The assumption is that once powerful quantum computers become available, these adversaries will be able to break traditional encryption schemes and access the stored data retroactively. What is Quantum-Resistant Cryptography? Quantum-resistant cryptography, also known as post-quantum cryptography (PQC), refers to encryption algorithms that remain secure even in the presence of large-scale quantum computers. These algorithms rely on mathematical problems that are believed to be hard for both classical and quantum computers to solve. Types of Post-Quantum Cryptographic Approaches Lattice-Based Cryptography Based on complex problems related to high-dimensional lattices. One of the most promising areas for quantum-resistant encryption. Examples: Kyber (key encapsulation), Dilithium (digital signatures), BIKE and SIDH (alternative approaches in research). Hash-Based Cryptography Uses cryptographic hash functions to secure data. Proven security but with limitations, mainly in key sizes and signature verification times. Example: SPHINCS+ (a stateless hash-based signature scheme). Code-Based Cryptography Relies on the hardness of decoding error-correcting codes. Example: Classic McEliece, which has been studied for decades and remains unbroken. Multivariate Polynomial Cryptography Uses equations with multiple variables to create cryptographic security. Example: Rainbow (digital signatures). Isogeny-Based Cryptography Based on the complexity of finding isogenies (mathematical maps) between elliptic curves. Example: SIKE (Supersingular Isogeny Key Encapsulation), although recently weakened by cryptanalysis. What is TLS 1.3? TLS 1.3 is the latest iteration of the TLS protocol, designed to provide faster and more secure internet connections. Compared to its predecessor, TLS 1.2, it offers: Reduced Latency TLS 1.3 simplifies the handshake process, reducing the time needed to establish a secure connection. Enhanced Security Older, vulnerable cryptographic algorithms have been removed, making TLS 1.3 resistant to various attacks. Forward Secrecy Ensures that past communications remain secure even if current encryption keys are compromised. How TLS 1.3 Integrates PQC TLS 1.3 is being adapted to include PQC through hybrid key exchange mechanisms. These involve combining traditional cryptographic algorithms with post-quantum counterparts to ensure security against both classical and quantum attacks. Major tech companies and organizations, including Cloudflare, are already testing and deploying PQC in real-world applications. Adoption and Future Outlook The adoption of PQC in TLS 1.3 is steadily increasing, with companies like Cloudflare reporting growing usage in their networks. Early integration allows organizations to future-proof their security before quantum computers become a practical threat. How Organizations Can Prepare for the Quantum Future Stay Informed on Post-Quantum Cryptography Standards NIST has been leading the effort to standardize post-quantum cryptographic algorithms. Organizations should monitor NIST's progress and start evaluating the proposed standards. Identify Cryptographic Dependencies Organizations should conduct a cryptographic inventory to identify where they are using RSA, ECC, and other vulnerable encryption methods. This includes: SSL/TLS certificates VPNs and secure communications Data encryption at rest and in transit Blockchain and digital signatures Begin Hybrid Cryptography Implementations Some security experts recommend a hybrid approach, where systems use both classical and post-quantum cryptography together. This allows for a smooth transition without immediate risks. Upgrade Hardware and Software for Post-Quantum Readiness Quantum-resistant algorithms may require more computational resources. Organizations should assess whether their hardware and software can support these new cryptographic methods. Hardware providers are preparing for the post-quantum era by transitioning their cryptographic signing processes—including firmware, drivers, and software—to quantum-resistant algorithms. Their plans vary, but they generally fall into three categories: Migration to Post-Quantum Cryptographic (PQC) Signing Hardware vendors are working to replace existing digital signature algorithms (e.g., RSA, ECC) with quantum-resistant alternatives, such as those selected by NIST (e.g., CRYSTALS-Dilithium for digital signatures). This process ensures that firmware, drivers, and software remain secure even in the face of future quantum threats. Key Actions: Updating certificate authorities (CAs) to support PQC algorithms. Developing hybrid cryptographic signatures that combine classical and PQC schemes for backward compatibility. Issuing PQC-signed firmware and driver updates for existing hardware. Patching and Retrofitting Existing Hardware For current hardware, vendors are exploring software and firmware updates that integrate PQC-based signing. However, not all legacy devices can be easily updated due to hardware constraints. Key Actions: Issuing firmware updates with PQC signatures where feasible. Providing transition guidance to enterprises on handling mixed cryptographic environments. Collaborating with operating system vendors to ensure PQC-validated driver signing mechanisms. Development of New Hardware with Built-in PQC Support Some vendors are designing next-generation hardware with PQC capabilities embedded at the hardware level. This includes cryptographic modules, TPMs (Trusted Platform Modules), and secure boot mechanisms that natively support PQC algorithms. Key Actions: Designing processors, security chips, and embedded devices with PQC accelerators. Implementing secure boot and attestation processes using PQC algorithms. Ensuring compliance with NIST’s post-quantum cryptography standards. Overall, the transition to PQC signing will involve a mix of software updates, firmware patches, and new hardware development to ensure long-term security against quantum threats. NIST Road Map 2027 and 2030 The National Institute of Standards and Technology (NIST) has laid out a roadmap for the transition to post-quantum cryptography (PQC), recognizing the potential threat posed by quantum computers to classical cryptographic algorithms. The two key milestones you mentioned, 2027 for hardware support and 2030 for mandatory activation, align with the agency’s phased approach to adopting quantum-resistant security. 2027: PQC-Capable Hardware Purchases Mandated (But Not Yet Activated) NIST’s guidance suggests that starting in 2027, all newly procured hardware should include built-in support for PQC, though the capability should not yet be enabled. This approach serves several key purposes: Future-Proofing Infrastructure By requiring hardware to be PQC-capable well in advance of the transition deadline, organizations can ensure that they won’t need to undertake costly and disruptive hardware replacements later. This also allows vendors to gradually integrate PQC into their product lines without forcing immediate adoption. Testing & Compatibility Assurance Having PQC built into the hardware, even if not enabled, allows for extensive real-world testing and validation within existing IT ecosystems. Organizations can assess interoperability with legacy cryptographic algorithms and transition strategies, ensuring smooth deployment when activation becomes mandatory. Security Flexibility There may still be ongoing refinements to PQC standards and implementations between 2027 and 2030. Keeping PQC disabled initially allows organizations to continue using classical cryptographic methods (e.g., RSA, ECC) while planning for a secure migration. 2030: PQC Must Be Activated on All Compliant Hardware By 2030, NIST mandates that PQC must be fully enabled on hardware that was purchased with built-in support. This requirement ensures that all critical systems transition to quantum-safe cryptographic algorithms within a defined timeframe. The rationale behind this activation deadline includes: Mitigating the Quantum Threat As quantum computing advances, the risk of classical cryptographic algorithms (such as RSA and ECC) becoming obsolete increases. Enforcing PQC activation by 2030 ensures that organizations are not left vulnerable to quantum-based attacks. Ensuring a Coordinated Transition By setting a firm deadline, NIST aligns government and industry efforts in adopting standardized PQC protocols. This prevents a fragmented, uncoordinated rollout where some systems remain vulnerable while others have transitioned. Compliance with Federal and Industry Standards Many regulatory frameworks (such as FIPS and CISA cybersecurity directives) will likely incorporate PQC requirements. Enabling PQC by 2030 ensures compliance with these emerging security standards. Avoiding “Harvest Now, Decrypt Later” Attacks Adversaries may already be collecting encrypted data, intending to decrypt it once they obtain a quantum computer capable of breaking classical cryptography. Enabling PQC ensures that sensitive information remains protected against both current and future decryption threats. 34% of Cloudflare HTTPS Requests are PQC According to recent Cloudflare data, roughly 34% of all TLS 1.3 connections established with Cloudflare currently utilize PQC, this is up from 2.83% in the last 12 months (Mar 2024). You can find the latest statistics on Cloudflare Radar Microsoft's PQC Efforts in Windows Server 2022/2025 To support organizations in their PQC transition, Microsoft has integrated post-quantum cryptographic capabilities into its Windows Server environment. The Microsoft PQC API is a key component, enabling developers and IT administrators to: Test and implement quantum-resistant cryptographic algorithms. Ensure compatibility with emerging NIST PQC standards. Gradually transition critical infrastructure to PQC without breaking existing systems. Key Features of the Microsoft PQC API Support for NIST PQC Candidates The API provides access to quantum-resistant algorithms selected by NIST, such as Kyber (for key exchange) and Dilithium (for digital signatures). These algorithms are expected to replace vulnerable public-key encryption methods. Backward Compatibility Windows Server 2022/2025 allows hybrid cryptographic implementations, meaning organizations can use both classical and quantum-resistant algorithms during the transition period. Integration with Windows Cryptographic APIs The PQC API is integrated with existing Windows cryptographic frameworks, including CNG (Cryptography API: Next Generation) and SCHANNEL, enabling easy adoption without major application rewrites. Secure Key Exchange and Authentication The API supports PQC-enabled TLS (Transport Layer Security), allowing secure communication channels that are resistant to quantum threats. How to Get Started with Microsoft's PQC API If you're running Windows Server 2022 or planning to migrate to Windows Server 2025, you can start preparing for post-quantum security with the following steps: Enable PQC Features Ensure your Windows Server instance is updated to the latest version that includes PQC API support. Test PQC Algorithms Use Microsoft’s API to experiment with post-quantum cryptographic primitives in a controlled environment before full-scale deployment. Implement Hybrid Cryptography Transition gradually by using hybrid cryptographic approaches that combine classical and post-quantum algorithms to maintain compatibility while enhancing security. Monitor NIST and Microsoft Updates Stay informed about the latest developments in PQC standards and Microsoft’s implementation roadmap to ensure compliance with future security policies. Conclusion The quantum era is approaching, and while large-scale quantum computers capable of breaking RSA and ECC do not yet exist, organizations must start preparing. The transition to post-quantum cryptography is a complex but necessary shift to protect sensitive data from future threats. By staying informed, assessing cryptographic dependencies, and adopting quantum-resistant strategies, organizations can ensure they remain secure in a post-quantum world.

  • Microsoft Windows, Post-Quantum Crypto, and the Reality Gap as of 2026

    Windows Server 2025 and Windows 11 (24H2 and newer) are the first Windows releases that include native post-quantum cryptographic primitives in the operating system. These changes are intended to reduce long-term exposure to data captured today and decrypted in the future. Windows does not yet provide a complete quantum safe security stack. It only includes the base cryptographic primitives and hybrid mechanisms needed to begin transitioning away from classical algorithms. What follows is what is actually present in the platform today, where it can be used, and what it cannot yet do. Terminology Base cryptographic primitives This means Windows now has low-level implementations of PQC algorithms such as: ML-KEM for key exchange ML-DSA for digital signatures These live in SymCrypt and can be called by higher-level components. They are the equivalent of having AES or RSA functions available, not a full encryption product. They do not  by themselves: Replace RSA or ECC Automatically secure traffic Enforce policy Integrate into existing enterprise controls They are just the math engines. Hybrid mechanisms Hybrid means Windows does not use PQC on its own. Instead it: Combines classical crypto (ECC like X25519) With PQC (ML-KEM) Mixes both secrets into one session key This protects you if either algorithm is later broken, but it also means: PQC is not yet trusted enough to stand alone Classical crypto is still required everywhere So Windows is not “post-quantum only.” It is post-quantum assisted. Core PQC Algorithms in Windows Microsoft has implemented the NIST standardised post-quantum algorithms directly into SymCrypt , the core cryptographic library used throughout Windows. The following primitives are now present in modern Windows builds: ML-KEM (formerly CRYSTALS-Kyber) Used for post-quantum key exchange and encapsulation. Windows includes support for ML-KEM-512, 768, and 1024. ML-DSA (formerly CRYSTALS-Dilithium) Used for digital signatures, certificate validation, and identity verification. These algorithms are integrated into the same core crypto engine that backs BitLocker, TLS, Kerberos, and CNG functions. This integration is necessary if the platform is ever to support PQC beyond academic demonstrations. Where PQC Exists in the Windows Stack SymCrypt and CNG PQC algorithms are exposed through the same cryptographic pipeline used by existing Windows security features, via CNG-backed providers. This means the operating system is capable of performing post-quantum operations natively, without third-party crypto libraries. However, these API surfaces are still considered early access or preview grade. Public, documented stable APIs for third-party developers are not yet available. The capability is present, but not fully supported for production use outside of Microsoft’s own components. Certificates and PKI Windows certificate stores can parse and retain PQC algorithm identifiers and hybrid certificate chains. Microsoft has demonstrated internal ADCS builds capable of issuing PQC and hybrid certificates, but as of early 2026 this is not a general availability feature. There is no supported workflow for running a full quantum-safe enterprise PKI yet. Windows can store and process PQC formats, but does not yet provide a complete operational lifecycle for PQC certificates. Hybrid TLS, The Only Practical Model All practical PQC deployments in Windows today use hybrid key exchange. Windows combines a classical key exchange algorithm such as X25519 with ML-KEM during TLS 1.3 handshakes, and both secrets are mixed into the session key. If the post-quantum algorithm were later found to be weak, the classical layer would still protect the session. If a quantum computer someday breaks the classical layer, the PQC portion remains intact. Hybrid key exchange is currently the only deployment model that provides backward compatibility and forward security. This support exists only in TLS 1.3. TLS 1.2 and earlier cannot negotiate hybrid key exchange with PQC algorithms. Operational Side Effects Post-quantum cryptography increases the sizes of the following: Signature sizes key blobs certificate chains Handshake payloads are larger than their classical counterparts. A classical ECDSA signature is roughly 64 bytes. An ML-DSA (Post-Quantum) signature can be over 2,400 bytes. In practice, this means: Firewalls and load balancers may drop or fragment PQC handshakes. MTU and fragmentation settings may need adjustment. HTTP header limits and reverse proxy configurations might need raising to accommodate larger certificate headers. These impacts are concrete and should be tested early in any deployment. What “Enabled” Actually Means PQC is not globally enabled across Windows as a system-wide security mode. It is selectively used by Microsoft-controlled stacks such as Edge and specific TLS paths, and in internal services where hybrid negotiation is already implemented. For developers, PQC types are beginning to appear in upcoming .NET releases, but they remain in preview and subject to change. For administrators, there is no single switch, no Group Policy, and no supported way to enforce PQC system-wide yet. PQC adoption in Windows is an incremental platform change and cannot be enabled through a single configuration setting. Final Reality Check The operating system now includes native support for post-quantum algorithms at the cryptographic provider layer. That is the actual milestone. It is not yet quantum safe end-to-end. It is not yet enterprise ready. It is not a complete solution. Broader deployment depends on standards maturity, application support, and enterprise tooling.

  • Windows 10 is End of Life: What’s Next for IT Professionals?

    Windows 10 has officially reached its end of life. After a decade of service, Microsoft is pulling the plug on regular security updates and patches. This means no more fixes for newly discovered vulnerabilities, regardless of how critical they may be. Now, I know what you’re thinking: “What’s the plan, Microsoft?” Well, they expect everyone to migrate to Windows 11. But honestly, it feels like Microsoft is out of touch with reality. The Hardware Dilemma Windows 11 comes with a self-imposed set of hardware requirements. Most notably, it mandates a Trusted Platform Module 2.0 (TPM) and specific processor generations: Intel's 8th Gen Core (Coffee Lake, 2017) or newer, or AMD's Ryzen 2000 series (Zen +, 2018) or newer. Let’s dig a little deeper into this situation. What Are Microsoft Thinking? With TPM enabling BitLocker to protect your data from the threat of someone physically stealing your laptop, you can almost admire Microsoft’s logic. Obviously, that’s a far bigger risk than a few hundred million unpatched Windows 10 machines being exposed to the Internet. Because, of course, laptops are being stolen by the truckload every night, while malware and remote exploits are just fringe concerns. Brilliantly deducted. Microsoft’s obsession with this stance is leaving millions of Windows 10 devices unpatched and exposed to the Internet. That’s the blueprint for the largest collection of remotely exploitable systems the world has ever seen. Genius, really. Windows 10 vs Windows 11: Who’s Really Using What? Windows 11 has finally surpassed Windows 10 in market share. Around 55 percent of active Windows desktops now run Windows 11, while roughly 42 percent remain on Windows 10. That's 42% of the world’s Windows machines running an operating system that’s no longer receiving security updates. It's estimated that between 400 and 450 million Windows 10 devices exist. Let that number sink in. Nearly half a billion devices are now unpatched and will soon become vulnerable. TPM 2.0 simply isn’t present on roughly 30 percent of Windows-capable devices. Since TPM only became standard on mainstream hardware from around 2018 onward, and was made mandatory for Windows 11 in 2021, anything older than 5 to 8 years is unlikely to have a TPM. It’s Not Like the Hardware Isn’t Fast Enough! Most older PCs can run Windows 11 without breaking a sweat. They’re powerful enough to handle most workloads. So why the hardware restrictions? It feels like a forced upgrade rather than a necessary one. The Fear Factor Microsoft seems to be betting on a mass migration, a sudden leap from Windows 10 to Windows 11 driven by fear of being vulnerable. The message is clear: upgrade your hardware, buy a new PC, or live with unpatched vulnerabilities. It’s a calculated gamble that enough users will cave rather than risk running unsupported systems. Of course, support can be extended for free if you sign up for a Microsoft account for an additional year of support. Otherwise, it’s a paid service. The Truth of the Matter Microsoft, come down from your ivory tower. You’re not Apple, and your products aren’t objects of desire or status. There’s no more passion for your products; people run Windows because they haven’t yet discovered the alternatives like Apple, Linux, and ChromeOS. Admit Your Mistake If you’re willing to burn over 400 million devices because of an arbitrary decision, one that leaves them open to remote attacks while blocking any legitimate upgrade path to Windows 11, then maybe we should all consider alternatives before choosing any Microsoft product. Market Share: A Declining Landscape Windows devices have been losing ground for years. Mobile devices have taken over daily computing for most people, who now handle email, banking, shopping, streaming, and social media entirely on their phones. The traditional PC has become an optional extra. So if Microsoft keeps giving people reasons not to buy a PC, what do they think will happen? Enforcing nonsensical hardware barriers, locking features behind accounts nobody wants, and deliberately blocking perfectly good machines from upgrading will only drive customers away. Push hard enough, and they won’t just skip an upgrade; they’ll abandon the platform altogether. Once a customer fully switches to mobile or jumps to Apple, ChromeOS, or Linux, they’re gone for good. They won’t be coming back just because Microsoft finally decides to be reasonable. And right now, money is tight. Inflation has hammered disposable income, and consumers aren’t lining up to replace perfectly good hardware. Microsoft picked the worst possible moment to demand a hardware refresh in a market where users are already drifting toward mobiles and away from Windows entirely. Rant Over... Well, Almost As a die-hard Microsoft engineer, I feel better for letting that rant go public. There have been too many times in the last few years that Microsoft has misstepped and angered its customer base. Let’s name a few: Xbox repeatedly, Out of Tune (Intune), the bag of spanners that is massively inferior to SCCM, deprecating MDT, CoPilot, and AI stuffed into every corner of the OS and every app, sub-optimal and untested patches, reboots at the absolute worst times, ads on enterprise devices, ads in every facet of the OS, and finally, Windows Recall.

  • Zero Trust for the Home Lab - IPSec between Windows Domain and Linux using Certs (Part 7)

    The Road to the World's Most Secure Home Lab: Implementing IPSec Between Windows Domain and Rocky Linux So far, in the pursuit of the world's most secure home lab, I've implemented several key strategies. Today, I’ll dive into the specifics of implementing IPSec between my Windows Domain and Rocky Linux. What's Covered in This Blog This post covers the implementation of IPSec, focusing on the integration between my Windows Domain and Rocky Linux. What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly : Always authenticate and authorize access. Use least privilege access : Limit access to only what's necessary. Assume breach : Design as if attackers are already in the network. IPSec and Its Back Story If you haven’t already, start with Part 4 , where I implement IPSec in a Windows environment using certificates. And yes, you guessed it, there’s more certificate configuration ahead. Wooohoooo, living the dream! Rocky Linux Rocky Linux version 10 is today’s Linux OS of choice and will be installed onto a Hyper-V platform. Rocky will serve as a Wazuh monitoring platform as part of the Zero Trust implementation for the home lab. The installation of Wazuh isn’t covered here; it’ll be the focus of the next article. Microsoft's SCOM might seem like the obvious choice for me, but there’s a longer-term goal to move away from Microsoft. As the company pivots to a Cloud and AI-first strategy, on-prem support and partner benefits are steadily being erased. This shift removes my ability and choice to deploy what and where I want. PfSense/Managed Switch VLAN To support the Rocky Linux servers and Wazuh, a new VLAN on the 192.168.90.0/24 subnet will be required. This aligns with the Zero Trust principle of service segregation. Initially, pfSense is configured to allow unrestricted traffic between VLAN 20 , VLAN 30 , and VLAN 90 in both directions. Don’t forget to update the managed switch to also allow the new VLAN tag of 90. A step-by-step guide for setting up VLANs, firewalls, etc., for pfSense is available in Part 2 . IPSec Additional GPO for SSH An additional GPO exemption allowing SSH (port 22) access between the member server and the Rocky Linux hosts will ease deployment. This allows copy and paste between host and VM. Current Domain IPSec Settings Crucial! Windows domain traffic via GPO only supports IKEv1, not that Microsoft will make this obvious or configurable. Make a note of the current IPSec settings; any deviation will result in IPSec negotiation failure. The following IPSec settings are known to work reliably. While some configurations using AES-GCM 128 and 256 are supported, AES-GCM 192 is not supported on Rocky Linux. If you plan to deviate from this setup, be sure to confirm that your chosen ciphers are supported on both Windows and Linux. A step-by-step guide for setting up IPSec in a Windows Domain is available in Part 4 . I strongly recommend following that guide before attempting to add Linux to the mix. IPSec Settings in GPO - Just for Info Open GPO Management and navigate to the IPSec policies and edit: Computer Configuration > Policies > Windows Settings > Security Settings Right-click and select Properties on Windows Defender firewall and Advanced Security. Select the IPSec Settings tab. Open Main Mode's Customize... Select and edit the SHA384 integrity policy. Make a note of all the settings. Audit Quick Mode. Audit Authentication, which is using the Trusted Root certificate. DNS Create a host record for the intended Rocky host. Linux Packages for IPSec The latest release of Rocky Linux is installed as a Hyper-V VM, with 6GB RAM and 250GB disk. Finally, the virtual NIC is set for VLAN 90. During installation, disable the root account and ensure that the user you create is added to the wheel group to grant administrative (sudo) privileges. Connection from Server Once Rocky is properly installed, pfSense should assign it an IP address via DHCP. In my case, it was 192.168.90.100 . Here’s the command to set a static IP, gateway, and DNS. SSH from PowerShell I’ll be connecting from my Windows server with PowerShell—no more Putty for me! Hostname Update the hostname to match the DNS entry created earlier. Updates Start where you mean to finish. Apply any updates to ensure stability and security fixes are applied. AD Packages Install the packages that will allow Rocky to be a domain member. Strongswan Install the following two packages in order. Time and Timezone Rocky will source its time from the DCs, not only to support authentication protocols but also to ensure that log timestamps are accurate and consistent across the environment. Search for your locale; mine's London. Copy the result and then set the timezone. Enable and start the time sync service. Update `chrony.conf` with the following, so it points at the DCs: `server 192.168.20.245 iburst` `server 192.168.20.247 iburst` `server 192.168.90.249 iburst` Restart the time service. Run the following commands once IPSec is implemented to confirm time and time sync. AD or Not to AD This step is included in case Rocky needs to join the domain. However, for its intended role as a monitoring solution, it’s best to minimize open ports and limit connectivity between it and the domain to reduce the attack surface. To Join the Domain In Active Directory Users and Computers, pre-create the computer object `wazuh90` in the required OU. If you don't do this step, Rocky will be added to the AD Computer container. Discover your domain (use your actual domain name in ALL CAPS). Join the domain using an account with permissions. Pull password information for an Active Directory user. Pull some domain info. IPSec Certificate for Linux Preparation Advanced certificate requests using version 3 templates are not supported through the traditional web enrollment interface (certsrv) unless you're using legacy systems like Windows XP or Server 2003. Clients running Windows Vista or newer cannot request v3 template certificates via this method due to compatibility limitations. Microsoft’s recommended approach for handling version 3 templates is to use Certificate Enrollment Web Services (CEP/CES) or leverage Autoenrollment via Group Policy. Both support modern certificate features and provide a more secure and scalable enrollment process. I’m not deploying a CES server; that's for another day and another blog, and it’s unnecessary for our needs. CES is mainly used by Windows clients for advanced certificate enrollment. Linux doesn’t require it, since it still supports the legacy method. New Linux Certificate Template Let's prep a certificate. Open the CA management snap-in, and then right-click on Certificate Templates and Manage. Duplicate a Certificate Template Either duplicate the IPSec (Offline) certificate or the previously created 'Non-TPM' template for server or workstation. General Tab: Set the validity period to 1 year. Compatibility Tab: Set both Compatibility settings to Windows 2003. Failure to do this will mean the template won't be available in the certificate web console. Request Handling Tab: Allow the private key to be exported. Cryptography Tab: Set the Algorithm to Determined by CSP and key size to 2048. Subject Name Tab: Set to Supply in the request. Extensions Tab: Edit the Application Policies and add in: - Client Authentication - IP Security IKE Intermediate - IP Security Tunnel Termination - IP Security User - Server Authentication Security Tab: Add the user or group that will perform the certificate enrollment. Remove any group that auto-enrolls. Publish the Certificate Template Return to the main CA Management snap-in. Right-click on Certificate Templates . Select New > Certificate Template to Issue > select Toyo Linux IPSec. Certificate Enrollment In this section, we’ll walk you through the process of requesting a certificate for a Linux system using the Windows CA web interface. SSH onto Rocky. Private Key A private key is generated locally to ensure it never leaves the system. A CSR is then created using that key to securely request a certificate from the CA without exposing the key itself. Create a working directory. Create a private key that remains on the host; I'll secure it shortly. Create CSR Create a CSR derived from the Private key. Update the following with the FQDN of the Rocky host. Copy and paste into the SSH sessions. Cat the CSR, select all the text including the Begin and End Certificate Requests lines, and press Enter to copy to the Windows clipboard. Cert Request from CA Web Console The CSR needs to be copied to the CA Web console to complete the certificate enrolment. From the Windows Server, open a browser and enter the address to the CA Web server, e.g., https://certs.toyo.loc/certsrv. Select Request a certificate. Select Submit a Certificate request by using a base-64-encoded CMC. Paste the CSR into the Base-64-encoded window. Select the Toyo Linux IPSec template. Select Base 64 encoded. Click on Download certificate. Open the downloaded certificate with Notepad. Copy the entire contents to clipboard. Create the Certificate Return to the SSH session. From this point onwards, every command will require sudo. `sudo nano FQDN.crt` and paste the contents of the Windows clipboard. Ctrl + O to output the contents to file. Ctrl + X to exit Nano. Copy the Private Key and Certificate to Strongswan The private key and the certificate are required to be copied or moved to the strongswan directory and configured with the correct permissions. Copy the Private key. Set the private key to be readable and writable only by the file's owner. Set Root as the Owner. Repeat the steps to secure the private key in the home directory. Copy the certificate to the strongswan x509 directory. Set the certificate permissions so the owner can write and everyone else can read. Trusted Root CA The root CA certificate is required on the local host to establish trust in certificates issued by that authority. Without it, the system cannot validate or trust incoming connections or services secured with those certificates. In the CA web console, click the Home link, then select Download a CA certificate, certificate chain, or CRL. Select Base 64 and then Download CA Certificate. Open with Notepad and copy the contents. Ensure you're in the 'certs' working directory. Open nano and paste the Base64-encoded root certificate from your clipboard into the file. Ctrl + O to output the contents to file. Ctrl + X to exit Nano. Root Trust Copy the root CA certificate to the trusted anchors directory so the system recognizes it as a valid certificate authority. Copy the root CA to anchors so the browser trusts sites on my domain. Refresh the system’s trusted certificate store with the new certificate. Copy the root CA to the Strongswan x509CA directory. Set the certificate permissions so the owner can write and everyone else can read. Firewalls The following commands permanently open the required ports and protocols for IPSec traffic. Note that port 4500 and the AH protocol are not needed for non-VPN traffic or this specific configuration. sudo firewall-cmd --list-all sudo firewall-cmd --permanent --add-port=500/udp sudo firewall-cmd --permanent --add-protocol=esp sudo firewall-cmd --permanent --add-port=4500/udp sudo firewall-cmd --permanent --add-protocol=ah sudo firewall-cmd --reload Swanctl.conf and Not Strongswan StrongSwan is an open-source implementation of the IPSec protocol suite, used to establish secure, encrypted connections between hosts or networks. It uses IKE (Internet Key Exchange), typically IKEv2, to negotiate and manage security associations. Naturally, Microsoft only supports IKEv2 for VPNs, so we're stuck with IKEv1. Configuration is handled through `swanctl.conf`, and the `swanctl` utility is used to load, manage, and monitor IPSec connections in real-time. It supports certificates, EAP, and various authentication methods, making it ideal for inter-domain and subnet traffic, site-to-site, and remote access VPNs. `Swanctl` is to be used as the IPsec command is deprecated. `swanctl.conf` provides a more modular, flexible, and systemd-friendly way to manage StrongSwan. Backup the original `swanctl.conf`. Create a new `swanctl.conf` with nano. Download my swanctl.conf from Github and paste it into nano. Crucial! Update the highlighted values to exactly match your Windows domain. They’re explicit, and any mismatch will prevent the IPSec tunnel from negotiating: aes256-sha384-ecp384 = Key Exchange (Main Mode) Integrity algorithm - SHA384 Encryption algorithm - AES-CBC 256 Key exchange algorithm - EC DH P-384 esp_proposals = aes128gcm128 = Data Protection (Quick Mode) Encryption algorithm - AES-GCM 128 Integrity algorithm - AES-GMAC\GCM 128 In transport mode, only traffic that matches `local_ts` and `remote_ts` will be protected by IPSec. Any traffic not matching these rules will pass as normal, unencrypted traffic. Ctrl + O to output the contents to file. Ctrl + X to exit Nano. Instruct the Charon daemon to load plugins dynamically, making the setup more flexible and easier to manage across different use cases. Start Strongswan Service Up to this point, access to Rocky has primarily been from Windows via SSH. The upcoming steps may terminate your session, and any misconfiguration will terminate the connection. With that in mind, you may want to switch to direct console access before proceeding. Note: Ignore any errors or warnings for the sqlite plugin; it’s harmless noise. Remove the SSH Exemption in GPO The IPSec GPO exemption for SSH, between the Windows Server and Rocky. Enable and Start Strongswan Execute the following commands. Any misconfigurations, typos, or incorrect parameters with `swanctl.conf` will likely prevent the service from starting or successfully establishing an IPSec connection. Enable Strongswan service. Start Strongswan service. Load all parameters stored in the `swanctl.conf` file. Status of Strongswan Let’s go through a few configuration steps to verify that IPSec is running correctly and establishing a successful connection to the Windows endpoint. `swanctl.conf` does not allow exemptions and communicates exclusively over IPSec with Windows. So, I found that using `nslookup` is a better way to test the initial connection with Windows domain controllers. Check the status of the Strongswan service to ensure it is running and enabled. I prefer retrieving a backdated list of events, so I use the `-n` option. This is particularly useful when troubleshooting issues where viewing only the latest events might miss the critical error that triggered the problem. `journalctl -u strongswan -f` displays StrongSwan events in real-time as they occur. `sudo swanctl --list-conns` displays all configured IPSec connections from the `swanctl.conf` file, including their settings and current status. `sudo swanctl --list-certs` lists all loaded X.509 certificates, showing details like subject, issuer, validity period, and key usage. `sudo tcpdump -n esp or udp port 500` captures and displays network packets that are either ESP (IPsec encrypted) traffic or use UDP port 500, which is commonly used for IKE (Internet Key Exchange) in IPSec. Finally, let’s examine the Windows side of the IPSec connection. Open `wf.msc` and navigate to either the Main Mode or Quick Mode section. There, you should see the IPSec connection established between Rocky Linux and the Windows Server. It Never Works First Time.... Windows Firewall From my experience, the following `journalctl` messages usually mean that IKE or ESP traffic is being blocked by a firewall, either on the Windows endpoint or by pfSense. If there are no matching log entries on the Windows server, I take it as a sign that the packets never made it through. In that case, I’ll enable or check the firewall logs on pfSense to confirm if it’s dropping the traffic. journalctl -u strongswan -n 50 wazuh90.toyo.loc charon-systemd[8207]: sending packet: from 192.168.90.100[500] to 192.168.30.61[500] (180 bytes) wazuh90.toyo.loc charon-systemd[8207]: creating delete job for CHILD_SA ESP/0x00000000/192.168.20.245 wazuh90.toyo.loc charon-systemd[8207]: CHILD_SA ESP/0x00000000/192.168.20.245 not found for delete wazuh90.toyo.loc charon-systemd[8207]: giving up after 5 retransmits wazuh90.toyo.loc charon-systemd[8207]: establishing IKE_SA failed, peer not responding wazuh90.toyo.loc charon-systemd[8207]: creating acquire job for policy 192.168.90.100/32[udp/35655] === 192.168.20.245/32[udp/domain] with reqid {2} wazuh90.toyo.loc charon-systemd[8207]: initiating Main Mode IKE_SA windows-ipsec[1317] to 192.168.20.245 wazuh90.toyo.loc charon-systemd[8207]: generating ID_PROT request 0 [ SA V V V V V ] wazuh90.toyo.loc charon-systemd[8207]: sending packet: from 192.168.90.100[500] to 192.168.20.245[500] (180 bytes) wazuh90.toyo.loc charon-systemd[8207]: creating delete job for CHILD_SA ESP/0x00000000/192.168.20.247 wazuh90.toyo.loc charon-systemd[8207]: CHILD_SA ESP/0x00000000/192.168.20.247 not found for delete wazuh90.toyo.loc charon-systemd[8207]: giving up after 5 retransmits Syntax with swanctl.conf The first three log extracts show that the `swanctl.conf` file is misconfigured, either due to a typo or something in the syntax being incorrect. Starting the service immediately fails with an exit code, which usually points to a parsing error or a missing/invalid configuration directive. sudo systemctl start strongswan.service Job for strongswan.service failed because the control process exited with error code. See "systemctl status strongswan.service" and "journalctl -xeu strongswan.service" for details. Running `sudo swanctl --load-all` gives the same result, confirming that the daemon can’t even load the connection definitions. sudo swanctl --load-all Job for strongswan.service failed because the control process exited with error code. See "systemctl status strongswan.service" and "journalctl -xeu strongswan.service" for details. Checking `journalctl -u strongswan -n 50` reveals that `charon-systemd` is shutting down with `status=22`, which typically means there’s a configuration error (e.g., invalid parameters, wrong file paths for certificates, or unsupported options). journalctl -u strongswan -n 50 wazuh90.toyo.loc systemd[1]: strongswan.service: Control process exited, code=exited, status=22/n/a wazuh90.toyo.loc charon-systemd[2882]: SIGTERM received, shutting down wazuh90.toyo.loc systemd[1]: strongswan.service: Failed with result 'exit-code'. wazuh90.toyo.loc systemd[1]: Failed to start strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using swanctl. The final log extract, however, tells a slightly different story. Here, I can see the IKE negotiation starting, but it’s failing with “header verification failed.” This points to either an IKE proposal mismatch (e.g., incorrect algorithms or key sizes), a certificate identity issue, or even corrupted packets caused by a misbehaving firewall/NAT device. journalctl -u strongswan -n 50 wazuh90.toyo.loc charon-systemd[22855]: 192.168.20.247 is initiating a Main Mode IKE_SA wazuh90.toyo.loc charon-systemd[22855]: selected proposal: IKE:AES_CBC_256/HMAC_SHA2_384_192/PRF_HMAC_SHA2_384/ECP_384 wazuh90.toyo.loc charon-systemd[22855]: generating ID_PROT response 0 [ SA V V V V ] wazuh90.toyo.loc charon-systemd[22855]: sending packet: from 192.168.90.100[500] to 192.168.20.247[500] (160 bytes) wazuh90.toyo.loc charon-systemd[22855]: header verification failed wazuh90.toyo.loc charon-systemd[22855]: received invalid IKE header from 192.168.20.247 - ignored Thanks for Your Time and Support... Another IPSec and certificate-based blog wrapped up, and just one more to go before my home lab’s Zero Trust panacea of perfection is fully implemented. Honestly, I loved working on this one. My first Linux IPSec deployment in prepping for this blog was Linux-to-Linux, and it was smooth, stable, and just worked. Then I brought Windows into the mix… and suddenly I was questioning my life choices and the tech I’ve devoted my time to. Back in the Vista days, when I was running 100% OpenSuse, I really should have stayed the course. Related Posts: Part 1 - Zero Trust Introduction Part 2 - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5 - AD Delegation and Separation of Duties Part 6 - Yubikey and Domain Smartcard Authentication Setup Part 7 - IPSec between Windows Domain and Linux using Certs

  • Zero Trust for the Home Lab - IPSec (Part 4)

    The Road to the World's Most Secure Home Lab.... So far in the pursuit of the World's most secure home lab, the following have been implemented: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x What's Covered in this Blog This post covers implementing IPSec using certificates for authentication and data encryption across my Windows Domain. This is not for the faint of heart. What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly – Always authenticate and authorize access. Use least privilege access – Limit access to only what's needed. Assume breach – Design as if attackers are already in the network. How IPSec Addresses Zero Trust Security Zero Trust assumes the network is hostile, even internal traffic can't be trusted without verification. Every connection must be authenticated, authorized, and encrypted. IPSec (Internet Protocol Security) is a key enabler: IPSec Overview IPSec is a suite of protocols designed to secure IP communications by authenticating and encrypting each IP packet. It lends itself to Zero Trust by enforcing confidentiality, data integrity, origin authentication, and replay protection. When deployed in a Windows environment, IPSec leverages Active Directory and the Microsoft Public Key Infrastructure (PKI), also known a Certificate Authority (CA). End-to-End Encryption: All internal traffic should be encrypted. IPSec encrypts IP packets at the network layer, ensuring confidentiality and integrity from endpoint to endpoint, protecting data regardless of the underlying application or network. This can be challenging when accessing services that aren't inherently secure or encrypted, such as the Internet. Mutual Authentication: With a CA managing digital certificates, IPSec can perform strong mutual authentication between devices. Identity Assurance: The use of a CA provides control over credentials that defines certificate lifespans, whether to issue or revoke them as needed. IPSec Technical Components Security Associations (SA):  An IPSec Security Association (SA) are 2 unidirectional agreements between two endpoints defining traffic security, specifying the encryption, authentication methods, keys, and lifetime for that session. These are negotiated automatically via IKE (Internet Key Exchange), ensuring both sides agree on how to protect the data in transit. AH vs. ESP IPSec offers two main ways to protect packets:   Authentication Header (AH) – IP Protocol 51 Authenticates the entire IP packet (including headers) to ensure data integrity and source authenticity. AH does not encrypt the payload, your data remains visible. Rarely used in practice; largely obsolete in modern IPSec deployments. Encapsulating Security Payload (ESP) – IP Protocol 50 Encrypts and optionally authenticates the payload, provides: Confidentiality (via encryption) Data integrity and authentication (via HMAC) Modes: Transport Mode: Encrypts only the payload, IP header is exposed. Tunnel Mode: Encrypts the entire original IP packet and adds a new IP header. Obvious Choice: ESP is the standard choice for secure, encrypted communications in Windows IPSec. IKEv2 Microsoft strongly recommends IKEv2 as the default key management protocol: Supports certificate and EAP authentication methods. Built-in NAT Traversal with UDP port 4500 encapsulation. Robust against network disruptions and supports mobility. Provides streamlined, secure negotiation of SAs. Aligns well with 802.1X for device/network authentication. Windows IPSec implementations starting with Windows 7 and Server 2008 R2 default to IKEv2 for VPNs and IPSec tunnels. Unfortunately, this feature isn't supported through GPO, so we will have to settle for IKEv1. The impact of GPO's not natively supporting IKEv2 results in RSA certificates and not ECC IPSec Certificate support. You'll also need to remember it's IKEv1 when configuring swanctl.co nf for Linux. Quick and Main Modes Phase 1: IKE SA Establishment (Main Mode or Aggressive Mode) This phase establishes the ISAKMP/IKE Security Association, which provides a secure, authenticated channel for subsequent negotiation. It includes: Peer authentication using certificates, pre-shared keys, or Kerberos Diffie-Hellman key exchange to derive shared secrets Agreement on encryption, integrity, and hashing algorithms Traffic in this phase uses UDP port 500. Phase 2: IPSec SA Negotiation (Quick Mode) Phase 2 leverages the secure channel from Phase 1 to: Establish one or more IPSec Security Associations (SAs) Define the protected traffic flows using traffic selectors (source/destination IPs, ports, protocols) Generate fresh keying material for encryption and integrity Quick Mode messages are protected by the IKE SA established in Phase 1. The outcome is the creation of ESP SAs for actual packet level protection. Ports & Protocols: Crucial! Required Windows firewall rules, both Inbound and Outbound: UDP Port 500: IKE (initial negotiations) UDP Port 4500: IKE NAT Traversal (when devices are behind NAT) IP Protocol 50: ESP (for the encrypted data) IP Protocol 51: AH (don't use) Basic CA and GPO Configuration:  Ensure every Windows Domain client trusts the Windows Root CA, and deploy it using GPO if necessary. Enable auto-enrollment of certificates in Group Policy, certificate templates with the Autoenroll permission will do just that, and enroll automatically. Crucial! Certificates should not serve multiple purposes. The IPSec certificate must exclude other Application policies to avoid conflicts and processing errors. Additional Logging I've enabled the three IPSec auditing policies at the root of the Domain to assist with initial testing and troubleshooting. These settings generate a high volume of event logs, but once IPSec is stable and set to 'Required', logging can be reduced to capture only failures. Client Certificates No TPM..... Not all of my Windows clients and servers are blessed with a TPM, case in point, the Intel Skull Canyon and the Gen 6 NUC - Hyper-V host, they still cling to life with retirement being long overdue. Without the TPM there's no support for Microsoft Platform Crypto Provider. The fallback option when a TPM isn't available is to use the Microsoft Software Key Storage Provider, targeting the correct provider is managed through Active Directory groups controlling certificate enrollment. Create the following AD Groups. If you have followed the 802.1x blog, the client groups will have already been created: For Computer objects that do not support TPM RG_CA_WksAuthCert_Deny_TPM_Supt RG_CA_MemberServer_Deny_TPM_Supt For Computer objects that do support TPM RG_CA_MemberServer_Allow_TPM_Supt RG_CA_WksAuthCert_Allow_TPM_Supt Workstation Authentication Client - TPM Supported Workstation Authentication Template: Right-click Certificate Templates and select Manage. Right-click the Workstation Authentication template and select Duplicate Template.   General Tab: Template display name: Toyo Workstation IPSec Validity period: 1 years Renewal period: 6 weeks Check 'Publish certificate in Active Directory.' Compatibility Tab:  Set compatibility levels to Windows Server 2016 and Windows 10 Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA RSA is supported for key generation and storage in the TPM. ECC (Elliptic Curve Cryptography) isn't generally supported for TPM storage of certificates. Minimum key size: 2048 Request hash: SHA256 Requests must use one of the following providers: Microsoft Platform Crypto Provider Microsoft Platform Crypto Provider is the Key Storage Provider (KSP) that allows certificates and their private keys to be stored in the Trusted Platform Module (TPM). If no TPM is accessible, the certificate will fail to enroll. Subject Name Tab: Under 'Build from this Active Directory Information' Subject name format: Common Name is selected Include this information in the alternative sub name: Check DNS   Extensions Tab:  Select Application Policies and click Edit.... Ensure Client Authentication is present. Add IP Security IKE Intermediate > Click OK. This isn’t required for 802.1X, but it will be relevant in the next article on IPsec. Security Tab:  Ensure Domain Computers is removed Add RG_CA_WksAuthCert_Allow_TPM_Supt and Allow Read, Enroll and AutoEnroll. Click Apply and OK. Workstation Authentication Client - TPM Not Supported Toyo Workstation Authentication Template: Right-click the Toyo Workstation IPSec template and select Duplicate Template. General Tab: Update the name to show that the TPM isn't supported Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA Minimum key size: 2048 Request hash: SHA256 Requests can use any provider available on the subject's computer. Security Tab:  Ensure 'Domain Computers' is removed Add RG_CA_WksAuthCert_Deny_TPM_Supt and Allow Read, Enroll, and AutoEnroll. Click Apply and OK. Member Server Certificates Duplicate the 'Computer v2' certificate template General Tab: Template display name: Toyo Member Server IPSec. Validity period: 1 years Renewal period: 6 weeks Check 'Publish certificate in Active Directory.' Compatibility Tab:  Set compatibility levels to Windows Server 2016 and Windows 10 Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA RSA is supported for key generation and storage in the TPM. ECC (Elliptic Curve Cryptography) is supported for certificate storage in the TPM, it cannot be used with GPO configured IPSec. GPO IPSec policies rely on IKEv1 only, which limits authentication to RSA. Although PowerShell (Set-NetIPSecRule) can configure the key module to use IKEv2, this capability cannot be deployed or managed via Group Policy. Minimum key size: 2048 Request hash: SHA256 Requests must use one of the following providers: Microsoft Platform Crypto Provider Microsoft Platform Crypto Provider is the Key Storage Provider (KSP) that allows certificates and their private keys to be stored in the Trusted Platform Module (TPM). If no TPM is accessible, the certificate will fail to enroll. Subject Name Tab: Under 'Build from this Active Directory Information' Subject name format: Common Name is selected Include this information in the alternative sub name: Check DNS   Extensions Tab:  Select Application Policies and click Edit.... Remove Client Authentication and Server Authentication. Add IP Security IKE Intermediate > Click OK. Security Tab:  Ensure Domain Computers and Domain Controllers are removed Add RG_CA_MemberServer_Allow_TPM_Supt Allow Read, Enroll, and AutoEnroll. Click Apply and OK. Domain Controller Duplicate the 'Domain Controller Authentication' Template, which should be deployed already. All of my domain controllers are virtual and equipped with a vTPM, allowing them to use the Microsoft Platform Crypto Provider. If your environment differs, separate the certificate templates as previously outlined for clients and servers. General Tab: Template display name: Toyo Domain Controller IPSec. Validity period: 1 years Renewal period: 6 week Do not 'Publish certificate in Active Directory.' Compatibility Tab:  Set compatibility levels to Windows Server 2016 and Windows 10 Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA RSA is supported for key generation and storage in the TPM. ECC (Elliptic Curve Cryptography) is supported for certificate storage in the TPM, it cannot be used with GPO configured IPSec. GPO IPSec policies rely on IKEv1 only, which limits authentication to RSA. Although PowerShell (Set-NetIPSecRule) can configure the key module to use IKEv2, this capability cannot be deployed or managed via Group Policy. Minimum key size: 2048 Request hash: SHA256 Requests must use one of the following providers: Microsoft Platform Crypto Provider Microsoft Platform Crypto Provider is the Key Storage Provider (KSP) that allows certificates and their private keys to be stored in the Trusted Platform Module (TPM). If no TPM is accessible, the certificate will fail to enroll. Subject Name Tab: Under 'Build from this Active Directory Information' Subject name format: Common Name is selected Include this information in the alternative sub name: Check DNS   Extensions Tab:  Select Application Policies and click Edit.... Remove Client Authentication, Server Authentication and Smart Card Logon. Add IP Security IKE Intermediate > Click OK. Security Tab:  Leave the Default Security settings Click Apply and OK. Publish the New Templates:  In the Certification Authority console Right click Certificate Templates, select New, then Certificate Template to Issue.   Select your newly created Toyo Templates.  Restart the clients to automatically enroll the certificate or gpupdate /force IPSec Certificates The end result will be a dedicated set of certificates purpose built for IPSec that caters for devices with and without a TPM. The Somewhat Scary Stuff Begins Here.....IPSec Request The "Request IPSec" policy is a relatively safe starting point for introducing IPSec into your environment without disrupting existing communication. Rather than enforcing encryption, it negotiates secure connections when possible and gracefully falls back to plaintext. Create a new GPO for each tier of service, Domain Controller, Member Servers and Clients, name appropriately to show that these policies are specifically for IPSec. Try and refrain from using the root domain policy or bundling the update into another already established GPO. Don't create or modify Root level IPSec GPO's, not only is it bad practice, it could to lead to a rather serious outage of the entire Domain. Consistency across all GPOs is critical, any mismatch in settings will lead to IPSec negotiation failures. Exceptions for Routers and AP's: The Domain Controller IPSec policy was deployed first, with the following rules configured. In addition to the standard 'Request inbound and outbound' rule, exemptions were added for the router IP to allow Internet access and for the PiHole servers for DNS resolution. The Build LAN exception for 192.168.60.0/24 (VLAN60) allows communication with clients and servers that have yet to join the domain and, therefore, can't establish an IPSec tunnel due to missing certificates. The exceptions for VLAN60 include DC's and the CA servers. Some follow-up actions are needed to create the additional VLAN and Firewall in pfSense. GPO_IPSec_DomainControllers: Navigate to Computer Configuration > Policies > Windows Settings > Security Settings > Windows Defender Firewall with Advanced Security. Right click on Connection Security Rules and New Rule... Rule Type: Custom Endpoints: Any IP Address Requirements: Request authentication for inbound and outbound connections. Authentication Method: Advanced Customize..... Customize Advanced Authentication Methods: First Authentication Add Computer Certificate from this CA Select the Enterprise Root Certificate Protocols and Ports: Any. Profile: Check only the Domain profile. It's unlikely the DC's and Servers will roam, consider selecting all 3 profiles. Name: Add a meaningful name. IPSec Settings Tab: Right-click on Windows Defender Firewall with Advanced Security Click on the IPSec Setting Tab Customize.... Customize IPSec Defaults: For Main Mode, Quick Mode and Authentication Method, click on Advanced. Customize each in turn, following the guide below Key exchange (Main Mode): Click Add.. Integrity algorithm: SHA-256 Encryption algorithm: AES-CBC 256 Key exchange algorithm: Elliptic Curve Diffie-Hellman P-256 Key exchange options: Use Diffie-Hellman for enhanced security. Data Protection (Quick Mode): Require encryption for all connection security rules that use these settings: Enable this option to encrypt the data, without it, only the authentication process is encrypted. Protocol: ESP (recommended) Encryption algorithm: AES-GCM 128 Integrity algorithm: AES-GCM 128 Note: AES-GCM-128 and 256 are supported by Rocky Linux, AES-GMC-192 is not supported Authentication Method: First Authentication Add Computer Certificate from this CA Select the Enterprise Root Certificate Copy the First IPSec GPO: Don't make things difficult by recreating the GPO's for each tier of OU, copy and paste the first IPSec policy in Group Policy Objects: Navigate to Group Policy Objects Right click, GPO_IPSec_DomainController and Copy Paste and rename 'Copy of GPO_IPSec_DomainController' to GPO_IPSec_MemberServers Repeat, and create GPO_IPSec_Workstations. Link the GPO's to their corresponding OU. IPSec Require Mode and the Network Profile Race Condition Crucial! When IPSec is configured in Require mode within a domain, all inbound traffic must be authenticated, which results in a race condition during system startup. DC's, Windows clients and Servers rely on the Network Location Awareness (NLA) service to determine which firewall profile (Domain, Private, or Public) to assign based on whether it can reach a domain controller. However, if IPSec is in Require mode on the DC and the client hasn't yet established an IPSec Security Association (SA), all unauthenticated attempts to reach domain services (like LDAP or Kerberos) are silently dropped. This creates a loop: The client can't authenticate because it hasn’t established IPSec yet. It can’t establish IPSec because the DC won’t respond to unauthenticated traffic. As a result, the client falls back to the Public profile, blocking IPSec negotiation as it not enabled for IPSec. To get around this catch 22, the following IPSec Exemption Mode rules are required to allow unencrypted network communication at system startup. While this introduces gaps in the IPSec enforcement policy, the majority of ports remain configured with the 'Require' setting. Critically, the two most commonly targeted services—SMB (445) and RDP (3389) are still enforced, mitigating the highest-risk vectors. Note - Don't think setting Request for these ports is an option. DHCP UDP 67, 68 Clients, Server DNS UDP/TCP 53 DC, Clients, Server SMB\CIFS TCP 445 DC, Clients, Servers Kerberos UDP/TCP 88, 464 DC, Clients, Server LDAP TCP 389, 636 DC, Clients, Server ESP (IPsec payload) IP Protocol 50 N/A DC, Clients, Server IKEv1 and IKEv2 UDP 500 DC, Clients, Server DC Exemptions The domain controller exemptions permit Endpoint 1 to initiate connections from any source port to specific service ports (e.g., TCP/UDP 88 for Kerberos), and allow Endpoint 2 to send traffic from any source port to the same service ports on remote systems. Server and Client Exemptions Clients and servers initiating connections to a remote DC service port require Endpoint 2 to allow traffic from any source port to a specific destination port. Note: Both IPSec and exemption rules are initially configured with an open Any/Any endpoint  setting. This approach makes life a little easier when investigating connectivity issues. Once the environment is fully configured, tested and stable, a remediation step will be carried out to harden the policies to allow names services and subnets. Firewalls.... What the.......I did warn you. You may encounter a situation where enabling IPSec Request rules causes the Windows Firewall to enforce stricter behavior. Even previously reliable and well-established firewall rules may begin to block traffic unexpectedly, such as group policies failing to apply. Crucial! Firewalls, particularly the Windows Firewall, will be the most significant source of pain during implementation. While pfSense may present occasional challenges, the majority of connectivity and policy enforcement problems were from Windows. Ensure that Firewall logging is enabled and reporting correctly. Test the Request Policy: Update the policy on the DC's, Member Servers and Clients with gpupdate /force. Windows Firewall - wf.msc To confirm that IPSec is functioning correctly in Request mode, start by opening wf.msc (Windows Defender Firewall with Advanced Security). Under the Monitoring section, navigate to Security Associations > Main Mode and Quick Mode to view active IPSec tunnels. Clicking on Connection Security Rules shows the accumulation of policies that are actively being applied. Main Mode Quick Mode Eventlogs Open Event Viewer (eventvwr) Paste the following event IDs and filter for IPSec events, remove the chaff. 5440, 5441, 5442, 5450, 5451, 5452, 5453, 5454, 5455, 5456, 5457, 5458, 5459 Review the logs That’s the first step in implementing IPSec using Request mode, which carries minimal risk of service disruption. However, domain joined machines, including domain controllers, will still accept traffic from devices that aren’t using IPSec. This includes any unmanaged or potentially insecure IoT devices. Now it gets serious..... Now for the Really Scary Stuff .....IPSec Required Implementing Require IPSec policies enforces strict authentication and encryption, all inbound and outbound traffic will be encrypted, unless there's an exemption rule. Any misconfiguration of policy or certificates, expect a service outage. Thoroughly test all scenarios in a controlled environment before applying Require mode. You assume full responsibility for any service impact, and proceed at your own risk. Be prepared for some frustration, check the DC firewall rules, it may not be IPSec, and good luck!!!! The Plan Now that I've provided the appropriate warning and my conscience is clear, it's time to implement Require IPSec rules...gradually Given the above warning, switching from Request to Require without a staged rollout isn’t the smartest move. Request mode provides the necessary failsafe, it plays nicely and maintains connectivity during deployment, something Require mode does not. Even if IPSec tunnels appear to negotiate successfully in testing, enforcing Require can break communication with systems that are slightly off-kilter. Make sure you have physical access to the system and choose your test targets carefully. There’s a recovery mechanism, hacking the Registry, if things go sideways. Let’s avoid dragging out the crash cart if we can help it. Make sure every client, server and domain controller is on and their policies are up to date prior to proceeding. My target is a DC, I've plenty of them, easy to replace, minimal chance of data loss, and it helps being able to unpick the GPO and have it apply locally to that DC. Don't pick a DC with a FSMO role, especially the PDC emulator. Required GPO: Connect Group Policy Management to the target DC. This helps revert the GPO if network connectivity is lost. Create a new GPO, name it GPO_IPSec_Require. Remove Authenticated Users from the Security Filtering. Add the nominated DC, in my case TOYODC19-3 This filters the GPO to the named DC and no others Connection Security Rules: Edit the GPO and navigate to the Connection Security Rules Created a new Connection Security Rule. Using the previous procedure, match the settings except for the Requirement page. Select 'Require authentication for inbound and outbound connections' Post Require Checks: Run gpupdate on the DC. If the stars align and so do the certificates, CRL and policy, the more restrictive policy takes preseedence, Required Mode is now in operation. Congratulations. Verify access to other domain resources and network shares, testing as many connections as possible. Open wf.msc, confirm entries in Main and Quick mode Staged Deployment: This is a gradual deployment, only fools rush in....and they’re usually the ones pulling an all nigther trying to fix their mess. Link the 'GPO_IPSec_Require' to the Servers and Workstations OU's. Add to the Security Filtering additional computer objects, ensure both types of objects, those with and those without a TPM. Validate connectivity after a gpupdate. The extent of testing and validation depends on your risk acceptance level. If you're confident everything's working as expected, update the existing Request IPSec GPOs to Require. Once that's done, you can safely remove the temporary GPO_IPSec_Require, its served its purpose. Whoa there, Mine's not Working.... Another fine mess...the network isn’t connecting because IPSec can’t establish a secure tunnel. When in “Require” mode, no tunnel means no traffic, so you’re stuck until it’s fixed. Start by checking the Security Event IDs like 5450, 5451, or 5442. These will point to why the IPSec negotiation failed, whether it’s a certificate issue, authentication error, or policy mismatch. Then, open wf.msc and look under Monitoring > Security Associations for Main Mode and Quick Mode. If there are no active associations, that confirms the tunnels have either collapsed or never got created in the first place. If you can’t identify the problem right away, temporarily switch GPO_IPSec_Require back to “Request”. This is very unlikely to restore connectivity, if you werent already on the DC with GPO Management open, update the policy and gpupdate. With connectivity restored, resolve any issues highlighted in the Event logs, then try Required again. If the Event logs provide no clues, verify that both the separate authentication certificate and the IPSec certificate are properly enrolled, combining them into a single certificate will break IPSec. Thoroughly check all GPO IPSec settings for any misconfigurations. Request mode lets connections pass whether or not IPSec succeeds, while Require mode blocks anything that doesn’t meet IPSec strict criteria, which is why Request often “just works” while Require can fail if the environment isn’t perfectly configured. No Connectivity, no GPO... Without connectivity, there is little chance of any GPO ever reapplying, undoing the Require. There is one solution, delete the Registry hive for the GPO Settings: Open Regedt32 Browse to HKLM:\Software\Policies\Microsoft\WindowsFirewall. Delete WindowsFirewall. Reboot Another Step in the Journey to Zero Trust is Over After questioning my sanity, and whether I truly really wanted to go through with enabling IPsec, it's finally in place. It wasn’t without its challenges, enabling "Required" led to a fair bit of pain, including network profiles switching to Public from Domain, leading to denial of services and some other random dropped network connections. Still, I think (still not entirely sure) the effort was worth it. I'm now at least one step closer to Zero Trust, and what I’m convinced will be the world’s most secure home lab. Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5  - AD Delegation and Separation of Duties Part 6  - Yubikey and Domain Smartcard Authentication Setup Part 7  - IPSec between Windows Domain and Linux using Certs

  • Zero Trust for the Home Lab - Yubikey and Domain Smartcard Authentication Setup (Part 6)

    The Road to the World's Most Secure Home Lab.... So far in the pursuit of the World's most secure home lab, the following have been implemented: Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec Part 5  - AD Delegation and Separation of Duties What's Covered in this Blog This post covers implementing YubiKey smart card authentication and how it's implemented with a Windows Enterprise CA. What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly - Always authenticate and authorize access. Use least privilege access – Limit access to only what's needed. Assume breach – Design as if attackers are already in the network. How Smart Cards Address Zero Trust Security Zero Trust is built on the principle of “never trust, always verify.” It requires strict identity verification, least-privilege access, and continuous authentication. Strong Identity Verification: Smartcards use embedded chips to store cryptographic keys securely. They require something you have (the card) and something you know (a PIN), making them ideal for strong, multi-factor authentication. Credential Protection: Because authentication happens on the card itself, sensitive credentials are never exposed to the device, reducing the risk of phishing, keylogging, or malware-based theft. How YubiKey Functions as a SmartCard YubiKey devices support smart card functionality through their PIV (Personal Identity Verification) capability, which implements the NIST SP 800-73 standard. This allows organizations to use YubiKeys for authentication, signing, and encryption in enterprise environments. The PIV applet on the YubiKey securely stores cryptographic keys and certificates, enabling seamless authentication in Windows Active Directory domains. Yubikey Core SmartCard Functionality A YubiKey operates as a hardware security module that: Stores private keys securely in tamper-resistant hardware Performs cryptographic operations internally (signing, decryption) Prevents private key material from ever leaving the device Key Technical Components Secure Element: A dedicated cryptographic processor with: Protected memory for storing private keys and certificates Hardware-based random number generation Tamper-resistant design to prevent physical attacks Certificate Storage Architecture YubiKeys store certificates in a structured slot system: 24 Total Storage Slots: Slots 9a, 9c, 9d, and 9e are the primary slots used for certificates Each slot can store one certificate/key pair Slot 9a: Authentication (typically used for workstation login) Slot 9c: Digital Signature Slot 9d: Key Management (encryption/decryption) Slot 9e: Card Authentication YubiKey Smart Card Implementation To enable smartcard authentication in a Domain, we’ll need to configure Group Policy and create a certificate smart card template. GPO Settings Enable the following 3 settings under Computer Configuration > Admin Templates> Windows Components > Smart Card: Allow certificate with no extended key usage certificate attributes Allow ECC certificates to be used for logon and authentication Turn on Smart Card Plug and Play Service Note: When the Certificates employs Elliptic Curve, the 'Allow ECC certificates to be used for logon and authentication' must be enabled. YubiKey Smartcards Software YubiKey functionality requires the following: Drivers: Any system where the smartcard is used must have the appropriate drivers installed for the YubiKey to be recognized. Management Software: The YubiKey Manager software is needed to configure the devices and set users’ smartcard PINs. Download: The software can be downloaded from this link https://www.yubico.com/support/download YubiKey Manager To configure a YubiKey open the Manager application Insert the first YubiKey Navigate to Interfaces Deselect all USB types other than PIV To set the user's PIN, this is the pin used by the user during logon, navigate to: Applications, select PIV > PIN Management Configure PIN Select 'Use Default' or enter 123456 Enter a new PIN Smartcard Certificate Template Log in to the Certificate Authority (CA) server using an account with Domain Admin rights or delegated CA Manager permissions. At the Run prompt, enter certsrv.msc, navigate to Certificate Templates, right-click, and select Manage. According to YubiKey's guidance, you can use the Smartcard Logon template for deployment, and this is my intention. However, when the User certificate template is already issued and the Smartcard Logon template is later deployed for enrollment. Testing has shown that overlapping certificate purposes can lead to authentication failures, confusion on the part of which certificate is presented, during smartcard logon. Duplicate the Smartcard template. General Tab: Publish certificate in Active Directory Do not automatically reenroll if a duplicate certificate exists in Active Directory Compatibility tab: Windows Server 2016 Windows 10 / Windows Server 2016 Request Handling tab: Include symmetric algorithms allowed by this subject For automatic renewal of smart card certificates, use the existing key if a new key cannot be created. Prompt the user during enrollment. Cryptography tab: ​ Key Storage Provider' and 'ECDH_P384 (ECDH_P512 isn't supported) ​Requests must use one of the following providers Microsoft Smart Card Key Storage Provider Request hash is 256 ​ The table below compares the equivalent security levels of Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC). It highlights how ECC achieves the same level of security with significantly shorter key lengths and lower computational overhead. As ECC key sizes grow modestly, the corresponding RSA key sizes increase disproportionately. RSA ECC Devisable by 512 112 4.6 1024 160 6.4 2048 224 9.1 3072 256 12.1 7680 384 20 15360 512 30 Security tab: Add Authenticated Users and Autoenroll A named AD group could be used instead for a more targeted enrollment. Subject Name tab: User principal name (UPN) is enabled E-mail name is unchecked Add the email address to the User attributes as an alternative to removing. Enrolment will fail if the E-mail name is enabled and not provided at enrollment. Smartcard Enrollment Once the YubiKey drivers are installed on the client machine, the user can enroll for a Smartcard certificate. Enrollment: The user opens 'certmgr.msc' Navigate to Personal > Certificates Right click on Certificates > All Tasks > Request New Certificate Select Active Directory Select the Smartcard template and enroll. During the enrollment process, when prompted, enter the PIN that was configured earlier during YubiKey setup. The YubiKey smartcard is now configured for User logon. Smart Card Misconceptions and Important Next Steps Windows password authentication is vulnerable to brute force, dictionary, guessing, and phishing attacks. Smartcards significantly reduce these risks. Although commonly thought to eliminate passwords entirely, that's not entirely accurate. When a user account is configured for smartcard authentication within the User AD account, the password is reset one time to a random 120 character string. Failing to set the "Smart card is required for interactive logon" flag leaves the user's existing password unchanged, allowing them to continue logging in with their original, potentially insecure credentials. That Password is still there for SSO: During the AS_REP stage of Kerberos authentication, the Key Distribution Center (KDC) includes the NTLM hash of this password in the PAC to support fallback to SSO when Kerberos is unavailable. The random password is highly resistant to offline cracking, even against well-resourced attacks using tools like John the Ripper. The User password is not so resistant. Pass the Hash Enabling smartcard authentication still leaves user accounts exposed to Pass-the-Hash attacks. As previously stated, when the 'Smartcard is required for Interactive logon' is set, the user’s password is reset to a long, random value, but it remains static indefinitely. Without regular password rotation, the account stays vulnerable to Pass-the-Hash. To mitigate this risk, use the script below to refresh user passwords daily. $scTrue = Get-ADUser -Filter  -Properties  | where {$_.SmartcardLogonRequired -eq "True"} | Select-Object name,SmartcardLogonRequired foreach ($user in $scTrue) { $name = $user.name Set-ADUser -Identity $name -SmartcardLogonRequired:$false Set-ADUser -Identity $name -SmartcardLogonRequired:$true } Important: Do not create a scheduled task on a Domain Controller running under a Domain Admin account to flip the smartcard attribute, this is reckless and a serious abuse of privileged credentials. The correct approach is to: Create a dedicated service account with delegated 'Write' permissions to the smartcard logon attribute on the Users OU. Assign the service account 'Log on as a batch job' rights on a hardened member server with the Active Directory tools installed. Create a scheduled task that runs daily, encoding the PowerShell script in Base64 and embedding it directly in the Task Scheduler's action tab. That's the Easy Part of the Zero Trust Completed.... The home lab’s in pretty good shape, certs everywhere, and a solid step toward Zero Trust. But let’s be honest, plastering certificates on every domain object is just providing a false sense of security. There are still gaps, some can be secured, whilst others probably not. Those Linux devices, the pfSense, the wifi AP and the printer all require my attention. The biggest challenge is monitoring, it's resource hungry and needs an enormous amount of effort to correctly configure. Despite the effort, monitoring will provide a major feature in the Zero Trust architecture, allowing me insight and visibility into the Home Lab. There's no rest for the wicked and even less for those trying to keep the wicked at bay... thanks and hope content so far is proving insightful. Next up is IPSec for Linux, after a break and some sleep. Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5  - AD Delegation and Separation of Duties Part 6  - Yubikey and Domain Smartcard Authentication Setup Part 7  - IPSec between Windows Domain and Linux using Certs

  • Zero Trust for the Home Lab - AD Delegation and Separation of Duties (Part 5)

    The Road to the World's Most Secure Home Lab.... So far in the pursuit of the World's most secure home lab, the following have been implemented: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec What's Covered in this Blog This post covers the implementation of a fully scripted AD OU structure and delegation model. What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly – Always authenticate and authorize access. Use least privilege access – Limit access to only what's needed. Assume breach – Design as if attackers are already in the network. How AD Delegation and Separation of Duties Address Zero Trust Security Zero Trust relies on minimizing trust and continuously verifying access. Active Directory delegation and separation of duties help enforce this principle by: Delegating Permissions: Assigning only necessary privileges to users, ensuring least-privilege access for each role. Separation of Duties: Splitting administrative tasks across different users or teams to prevent any one person from having too much control, reducing risks and ensuring accountability. These practices ensure access is strictly controlled and monitored, forming a key part of a Zero Trust framework. Revisited Most of this blog is repurposed from a previous article, the good news is that the delegation model is fully scripted and downloadable: Part 1 , includes how to prep and run the script. Script , downloadable from github. You may notice the domain name has changed. This home lab uses a full delegation model that’s evolved organically over more than a decade, unscripted, functional and secure, it's configured with URA deny policies to keep separation between tiers. This scripted approach isn't necessarily better, but it's shareable, the script can do in minutes, what would take a week manually. Aim of the Game The objective is to establish an Organizational Unit (OU) structure that aligns with a clear and consistent delegation model. This approach incorporates well-defined naming standards to enhance comprehensibility and facilitate ease of navigation and management within the structure. AD Group Best Practice Group management will follow Microsoft's best practice of assigning Domain Local groups against the object, eg an OU or GPO. The Domain Local group is then added as a 'Member of' a Domain Global group. The user is added to Domain Global as a 'Member'. The naming convention I've persisted with over the years, again from Microsoft, is naming delegation groups 'Action Tasks', a task being an individual permission set. And 'Roles', a role being a collection of Tasks or individual permissions. AG is Action Task Global Group AL is Action Task Domain Local Group RG is a Role Global Group RL is a Role Domain Local Group Again, something that I've persisted with over the years is that Groups and OUs are named based on their Distinguished Name (DN). Let's break down an example of a group name: AG_RG_Member Servers_SCCM_Servers_ResGrpAdmin AG\AL\RG\RL - Action Task Global, AL for AT Domain Local, R for Role RG\OU\GPO - Restricted Group, OU or GPO - Type of object delegation Member Servers - The Top-Tier OU name SCCM - The Application or Service eg SCCM or Certificates Servers - It's for Computer objects ResGrpAdmin - ResGrpAdmin is a Restricted Group providing Admin privileges. ResGrpUser is a Restricted Group providing User privileges. CompMgmt, create\delete and modify Computer objects. UserMgmt, create\delete and modify User objects. GroupMgmt, create\delete and modify Group objects. GPOModify, edit GPO settings. SvcMgmt, create\delete and modify user objects. FullCtrl, full control over OU's and any child objects. JSON OU Configuration Traditionally, there are only 3 tiers, the lower the tier the less trustworthy: Zero = Domain Controllers and CA's One = Member Servers Two = Clients and Users Given that this script can potentially generate numerous levels or hierarchies, it seemed more suitable to avoid the term "tier" and instead opted to label the top-level OU's as "Organizations" for a more meaningful representation. The JSON configuration provided creates an OU structure based on a default OU structure for many businesses, where Orgainisation1 is for Member Servers and Orgainisation2 is for Clients and Users. In addition, Organisation0 provides Admin Resources OU for the management of all delegation, role and admin account provision. Organisation0 - Admin Resources Organisation0 , creates a top-level management OU named Admin Resources ' This OU serves as the central hub for all delegation and management groups across subsequent Organizations. Each Organization benefits from having its own dedicated management OU within the Admin Resources OU. Organisation specific delegation groups, roles, and admin accounts are created. This approach allows for potential future delegation. Admin Accounts Member Servers Admin Tasks Member Servers Admin Roles Member Servers "OU": { " Organisation0 ": { "Name":"Admin Resources", "Path":"Root", "Type":"Admin", "Protect":"false", "AdministrativeOU":"Administrative Resources", "AdministrativeResources": [ "AD Roles,Group", "AD Tasks,Group", "Admin Accounts,User" ] }, Organisation1 - Member Servers Organisation1 represents the typical Member Server OU and it's of the Type Server . The type Server designates a behavioural difference for assigning policy. AppResources designates application service OU's that will be created eg Exchange. Service Resources is used for creating OU's based on a set of standard administrative functions for example Servers and the delegation and object type of Computers. " Organisation1 ": { "Name":"Member Servers ", "Path":"Root", "Type":"Server", "Protect":"false", "AdministrativeOU":"Service Infrastructure", "AdministrativeResources": [ "AD Roles,Group", "AD Tasks,Group", "Admin Accounts,User" ], "AppResources":"Certificates,MOSS,SCCM,SCOM,File Server,Exchange", "ServiceResources": [ "Servers,Computer", "Application Groups,Group", "Service Accounts,SvcAccts", "URA,Group" ] }, Organisation2 - Client Services Organisation2 represents the typical User Services OU and it's of the Type 'Clients'. " Organisation2 ": { "Name":"User Services", "Path":"Root", "Type":"Clients", "Protect":"false", "AdministrativeOU":"Service Infrastructure", "AdministrativeResources": [ "AD Roles,Group", "AD Tasks,Group", "Admin Accounts,User" ], "AppResources":"Clients", "ServiceResources": [ "Workstations,Computer", "Groups,Group", "Accounts,User", "URA,Group" ] } } Hundreds and thousands It's possible to add further top-level OU's by duplicating an Organisation, then updating the Organisation(*) and Name values as they need to be unique. It's possible to add hundreds or even thousands of Organisations, with this possibility in mind, the management and delegation structure reflects this within the design. Levels of OU Delegation As we delve deeper into the structure of each organization, we encounter a hierarchy consisting of three levels of delegation, using Member Servers as an example: Organisation = Member Servers (Level 1) Application Service = Certificates (Level 2) Resources = Computer, Groups, Users and Service Accounts (Level 3) OU delegation controls the level of access to manage objects eg create a Computer or Group object. Level 1 Level 1 is the organisation level in this case it's the Member Server OU. It's delegated with AL_OU_Member Servers_FullCtrl . The group provides full control over the OU, sub-OU's and all objects within. The arrow serves as an indicator, denoting the point at which the group's application takes effect within the structure. Level 2 Level 2 is the Service Application level, in this case, Certificate services. AL_OU_Member Servers_Certificates_FullCtrl is applied a level below Member Servers and provides full control over itself and any subsequent objects. Level 3 At Level 3, the delegation involves the management of Service Applications resources, which includes items such as Server objects and service accounts. The 4 default OU's allow the delegation and management of their respective resource types, for example, the Application Groups OU permits the creation and deletion of Group objects via AL_OU_Member Servers_Certifcates_Applications Groups_GroupMgmt . Application Groups - Application specific Groups Servers - Server or Computer objects Service Accounts - Service Accounts for running the application services URA - User Rights Assignments for services that require LogonAsAService etc Restricted Groups and User Rights Assignment (URA) Levels In this delegated model, Restricted Groups facilitate access by allowing administrative access whilst User Rights Assignments (URA) allow admins or users to log on over Remote Desktop Protocol (RDP). There are two primary levels of organization. The first level encompasses the entire organization, including all subsequent Organizational Units (OUs). The second level consists of a dedicated Servers OU for each specific Service Application. Level 1 of Restricted Groups The GPO GPO_Member Server_RestrictedGroups is linked to the Member Servers OU and has the following groups assigned: URA: Allow log on through Terminal Services: AL_RG_Member Servers_ResGrpAdmin AL_RG_Member Servers_ResGrpUser Restricted Group: Administrators: AL_RG_Member Servers_ResGrpAdmin Remote Desktop Users: AL_RG_Member Servers_ResGrpUser This is how it looks when applied in GPO. Within this delegation model, the ability to manage Group Policy Object (GPO) settings is also delegated to ensure comprehensive control and management of the environment. via AL_GPO_Member Servers_GPOModify Group. Level 2 of Restricted Groups The GPO GPO_Member Server_Certificates_Servers_RestrictedGroups is linked to the sub-OU Servers under Certificates and has the following groups assigned, that of the Organisation and of the Service Application: URA: Allow log on through Terminal Services: AL_RG_Member Servers_ResGrpAdmin AL_RG_Member Servers_ResGrpUser AL_RG_Member Servers_Certifcates_ResGrpAdmin AL_RG_Member Servers_Certificates_ResGrpUser Restricted Group: Administrators: AL_RG_Member Servers_ResGrpAdmin AL_RG_Member Servers_Certifcates_ResGrpAdmin Remote Desktop Users: AL_RG_Member Servers_ResGrpUser AL_RG_Member Servers_Certificates_ResGrpUser This is how it looks when applied in GPO. As above Group Policy Object (GPO) settings are also delegated via AL_GPO_Member Servers_Certificates_Servers_GPOModify Bringing it all together with Roles In this demonstration, an account named 'CertAdmin01' has been specifically created to oversee the management of resources within the Certificates OU. The account is added to the role group RG_OU_Member Servers Certificates_AdminRole . Opening the RG_ group and then selecting the 'Members Of' tab displays the nested RL_ group. Drilling down into the RL_ group displays the individual delegated task groups. Delegated Admin To test the certificate Admin (CertAdmin01) deploy an additional server, adding to the domain and ensuring the computer object is in the Certificate Servers OU. Login as CertAdmin01 to the new member server and install the GPO Management and AD Tools. Browse to Member Server and then Certificates OU and complete the following tests: Right-click on Applications Group > New > Group Right-click on Servers > New > Computer Right-click on Service Accounts > New > User Right-click on URA > New > Group. Open Group Policy Management and Edit GPO_Member Servers_Certificates_Servers_RestrictedGroup. Open Compmgmt.msc and confirm that the Administrators group contains the 2 _ResGrpAdmin groups and the local Administrator. AL_RG_Member Servers_Certificates_Servers_ResGrpAdmin AL_RG_Member Servers_ResGrpAdmin Confirm that CertAdmin01 is unable to create or manage any object outside the delegated OU's. Nearly there.....SCM Policies and ADMX Files As part of the delivery and configuration of the OU structure, Microsoft's Security Compliance Manager (SCM) GPOs and a collection of Administrative (ADMX) templates are included. SCM GPOs: Microsoft's SCM offers a set of pre-configured GPOs that are designed to enhance the security and compliance of Windows systems. These GPOs contain security settings, audit policies, and other configurations that align with industry best practices and Microsoft's security recommendations. ADMX Templates: ADMX files, also known as Administrative Template files, extend functionality within Group Policy Management enabling settings for Microsoft and 3rd party applications. Within a Domain, ADMX files are copied to the PolicyDefinition directory within Sysvol. Zipped... Both SCM and ADMX files are zipped and will automatically be uncompressed during the OU deployment. However, if you would like to add your own policies and ADMX files you can. SCM Policy Placement The SCM policies are delivered in their default configuration, without any modifications or merging. The policies are placed directly into the designated target directory, imported and linked to their respective OU. For example, the Member Server directory content will be linked to any OU that is of type 'Server'. The SCM imported policies are prefixed with 'MSFT,' indicating that they are Microsoft-provided policies. There are a substantial number of these policies linked from the root of the domain down to client and server-specific policies. As far as delegation the SCM policies remain under the jurisdiction of the Domain Admin with control to effect change delegated to the _'RestrictedGroup' policies. One More Step in the Windows Journey to Zero Trust Remains You’ve successfully kicked your Domain Admins off the workstations, and they didn't even see it coming, well done. There's one final Microsoft hoop to jump through and you thought I had finished with certificates: Next up is Smartcard authentication. Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5  - AD Delegation and Separation of Duties Part 6  - Yubikey and Domain Smartcard Authentication Setup Part 7  - IPSec between Windows Domain and Linux using Certs

  • Zero Trust for the Home Lab - Radius and 802.1x (Part 3)

    The Road to the World's Most Secure Home Lab.... So far in the pursuit of the World's most secure home lab, the following have been implemented: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense What's Covered in this Blog This post explains how to implement EAP-TLS 802.1X authentication using FreeRADIUS alongside a Windows Enterprise CA for domain joined clients. What Is 802.1X and What Authentication Types Can Be Used? IEEE 802.1X is a network access control standard that enforces authentication before allowing devices to connect to a wired or wireless network, preventing unauthorized access to network resources. 802.1X works by involving three components: Supplicant: The device trying to connect (e.g., a laptop) Authenticator: The network device (e.g., switch or wireless access point) Authentication Server: Typically a RADIUS server that verifies credentials (pfSense) What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly – Always authenticate and authorize access. Use least privilege access – Limit access to only what's needed. Assume breach – Design as if attackers are already in the network. How 802.1x Addresses Zero Trust Security Device and User Authentication at the Network Edge: 802.1x enforces authentication before a device can even communicate on the network. By validating the identity of both users and devices before granting access, it ensures that only trusted entities can join the network. Dynamic Access Based on Identity: After successful authentication, network access can be dynamically assigned based on user roles or device attributes, such as through VLAN tagging or access control lists. This supports Zero Trust’s principle of least-privilege access and helps isolate systems by trust level. Continuous Monitoring and Enforcement: When integrated with a Network Access Control (NAC) system, 802.1X allows ongoing assessment of device posture and compliance, with the ability to quarantine or restrict access if the device falls out of policy. Microsoft’s native NAC solution, Network Access Protection (NAP), has been deprecated and is no longer available. pfSense does not support full NAC functionality, Third-party solutions such as PacketFence, Aruba ClearPass, or Cisco ISE are required to fulfill this role. Segmentation and Isolation: 802.1x pairs well with network segmentation strategies. Devices that fail authentication or are unknown can be automatically placed into restricted VLANs, such as guest or quarantine zones. This limits exposure and aligns with Zero Trust’s goal of minimizing lateral movement. Authentication Options MAC Address Authentication For devices that can't run 802.1X (like printers or IP phones), the switch can authenticate based on the device's MAC address. This is the least secure and is typically used as a fallback method. PEAP with Active Directory (Username/Password-Based Authentication) PEAP (Protected Extensible Authentication Protocol) creates a secure TLS tunnel to protect the authentication exchange. Inside that tunnel, it commonly uses MSCHAPv2, which authenticates users via their Active Directory username and password. While this provides a basic layer of security, MSCHAPv2 has known vulnerabilities. If an attacker manages to capture the encrypted challenge, they can potentially crack it using brute-force or dictionary attacks. The effectiveness of this method ultimately depends on strong password hygiene and the security of your Active Directory environment, which can be a weak link in many setups. EAP-TLS with Certificates (Certificate-Based Mutual Authentication) EAP-TLS, on the other hand, uses digital certificates for both the client and the server to perform mutual authentication. Both sides must present valid certificates, creating a highly secure, trust-based environment. Since no passwords are exchanged, this method is immune to common credential-based attacks like brute-force, phishing, or credential stuffing. It also offers strong cryptographic protections, making it highly resistant to replay and man-in-the-middle (MiTM) attacks. Why I'm Choosing EAP-TLS Given the significantly stronger security model and elimination of password-based vulnerabilities, the obvious choice for my environment is EAP-TLS. It provides robust, certificate-based authentication. 802.1X and RADIUS Configuration Okay, let's set up 802.1X authentication on the pfSense using FreeRADIUS and an Enterprise Certificate Authority (CA). This is a long one, better grab that coffee now... Configure the Microsoft Enterprise CA On the CA server, open the Certification Authority console (certsrv.msc). Create a RADIUS Server Certificate Template:  Right-click Certificate Templates and select Manage. Right-click the RAS and IAS Server template (or Workstation Authentication as a base) and select Duplicate Template.   Compatibility Tab:  Set compatibility levels to Windows Server 2016 General Tab:  Template display name: pfSense RADIUS Server. Validity period: 2 years Do not check 'Publish certificate in Active Directory.' Request Handling Tab:  Purpose: Signature and encryption. Do not allow the private key to be exported, ensure it is unchecked. Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: ECDH_P384 Minimum key size: 384 Request hash: SHA384 Subject Name Tab:  Select Supply in the request. pfSense will generate the necessary information based on your inputs. Click OK on the warning popup. Extensions Tab:  Select Application Policies and click Edit.... Remove Client Authentication, if present. Make sure the Server Authentication is present > Click OK. Select Key Usage and click Edit.... Ensure the Digital signature is checked. Allow key exchange only with key encipherment (key encipherment) is checked. Ensure 'Make' this extension critical is unchecked. Security Tab:  Ensure Authenticated Users have Read permission. Create an AD Group and assign 'FULL Control'. Group name is 'RG_CA_pfSense_Radius_Req' Add the account that will be used to create and enroll the Radius certificates Click Apply and OK. Publish the New Template:  In the Certification Authority console Right click Certificate Templates, select New, then Certificate Template to Issue.   Select your newly created pfSense RADIUS Server Template.  Configure pfSense & Generate CSR Install FreeRADIUS Package: Log in to your pfSense web interface. Navigate to System > Package Manager > Available Packages. Search for 'radius'. Click Install and confirm. The installation will take a while. Import the CA Certificate: From the CA, export the Root CA certificate: Right-click your CA name > Properties. On the General tab, click View Certificate. Go to the Details tab, click Copy to File.... Export Next > Select Base-64 encoded X.509 Next > Choose a filename > pfSenseRootCA.cer On pfSense: Navigate to System > Certificates > Authorities Click Add. Descriptive name: Enter something meaningful Method: Import an existing Certificate Authority. Certificate data: Paste the entire content of pfSenseRootCA.cer Click Save. Generate Certificate Signing Request (CSR) on pfSense:  Navigate to System > Certificates > Certificates. Click Add/Sign. Add/Sign a New Certificate: Method: Create a Certificate Signing Request. Descriptive name: Toyo pfSense Radius Server. External Signing Request Key type: ECDSA Key length: secp384r1 [HTTPS] - match the template Digest Algorithm: sha384 - match the template Common Name (CN):  radius.toyo.loc Crucial!  This MUST be the Fully Qualified Domain Name (FQDN) or IP address of the RADIUS server. From DNS console on the DC: Create an 'A Record' to the LAN interface - pfsense.toyo.loc @ 192.168.0.1 Create an ALIAS (CNAME) named 'radius' that resolves to pfsense.toyo.loc Country Code: GB State or Province: Hook City or Locality: Hants Organization: Tenaka.net Organizational Unit: IT Department. Home Lab Certificate Attributes:  Certificate Type: Server Certificate Alternative Names: The IP of the pfSense LAN address 192.168.0.1 Add SAN Row FQDN or Hostname: radius.toyo.loc Add SAN Row FQDN or Hostname: radius IP Address: 192.168.0.1 FQDN or Hostname: pfsense FQDN or Hostname: pfsense.toyo.loc Export the CSR Export the newly created CSR by clicking on the arrow with the door icon. Issue the Certificate using the CSR Ensure your account has admin rights and is a member of the 'RG_CA_pfSense_Radius_Req' Group open CMD or PowerShell. Run command: certreq -submit -attrib "CertificateTemplate:pfSenseRADIUSServer" "C:\cert\pfSense+CSR.req" pfSense+CSR.req is the csr previously downloaded CertificiateTemplate:pfSenseRADIUSServer is the Template Name and not the Display Name A dialog will pop up asking you to select the CA; choose your issuing CA and click OK. Another dialog will ask where to save the issued certificate (.cer). Choose a location and filename of 'pfSense Radius Cert.cer' Click Save.   Import Issued Certificate into pfSense Navigate to System > Cert. Manager > Certificates Find the CSR entry you created earlier (Toyo pfSense RADIUS Server CSR). Click the Edit icon (pencil). Open the issued certificate file 'Toyo pfSense RADIUS Server.cer' with Notepad. Copy the entire content (including BEGIN/END lines). Paste this content into the Final Certificate data field. Add a description Click Update. The entry should now change from a CSR to a valid certificate, showing its issuer and validity dates. Certificate Revocation List (CRL) You have 7 days... Crucial!   When certificate revocation checking is enabled, a valid CRL must be configured to verify whether certificates have been revoked. If revocation checking is enabled but the CRL is unavailable, clients will be unable to confirm their certificate status and authentication will fail. Crucial!   In EAP-TLS, both the client and pfSense validate certificates. A critical part of this validation is checking whether a certificate has been revoked, which it does using the CRL . If the CRL has expired or is not available, pfSense will consequently fail authentication for the client. The default expiry is 7 days. With the current pfSense configuration, CRL updates are handled manually, and the CRL expires every 7 days. This is sub-optimal if the CRL isn’t updated on time, client authentication will fail across the board, locking users out of the Wi-Fi. Automating this is on the backlog... honest. The CRL validity is being extended to 6 months, which isn’t ideal. The key caveat is that any time a certificate is revoked, the CRL on pfSense must also be updated to reflect the change. CRL Publishing Parameters: On the CA, right click Revoked Certificates > Properties. Update the CRL publication interval to 6 Months. Update Publish Delta CRLs to weekly. Publish the CRL with the new expiry date. Right click Revoked Certificates > All Tasks > Publish CRL Export\Import The CRL will require exporting to a Base64 file and then pasting into pfSense. Copy the latest crl from C:\Windows\System32\CertSrv\CertEnroll\ to C:\Certs and then convert it to Base64. certutil -encode "c:\certs\TOYO-TOYO01-CA(3)+.crl" c:\certs\crl_base64.txt Navigate to System > Cert. Manager > Certificates > Revocation On the drop-down, select CA (Toyo CA) and Add Select 'Import an existing Certificate Revocation List' Add a Descriptive name Copy and paste the crl_base64.txt content. Save There should now be a CRL entry. Configure FreeRADIUS on pfSense Navigate to Services > FreeRADIUS. Interfaces Tab:  Click Add. Interface IP Address: Select the pfSense LAN IP address 192.168.0.1 Port: 1812 (standard RADIUS Authentication port). Interface Type: Authentication. IP Version: IPv4. Description: LAN Radius Authentication Click Save. Click Add again. Interface IP Address: Select the same pfSense IP address. Port: 1813 (standard RADIUS Accounting port). Interface Type: Accounting. IP Version: IPv4. Description: LAN Radius Accounting Click Save. NAS / Clients Tab:  This is where you define your switches or wireless access points that will forward authentication requests to pfSense. Click Add. Client IP Address: Zyxel Wifi AP is @ 192.168.0.253 Client IP Version: IPv4. Client Shared Secret: Some ridiculously long password You must configure the same secret on your switch/AP. (Zyxel Wifi AP) Ensure the secret conforms to best practice Client Shortname: Enter the hostname of the AP Add other switches, routers, and AP devices as needed. Click Save. EAP Tab:  The EAP tab configures authentication methods like EAP-TLS or PEAP used for secure 802.1X network access. EAP: Check the Disable weak EAP types: MD5, and GTC Default EAP Type: TLS. Disable Weak EAP Types: Checked. Minimum TLS version: 1.2 Certificates for TLS: On each of the drop-downs, select the relevant CA settings Note: If the SSL Revocation List option is set and misconfigured, clients will fail to validate their certificates and won't be able to connect to the 802.1x SSID. It's possible to select None for testing purposes, but this is not a suitable option for production. EAP-TLS: Include Length: Yes Fragment Size: 1024 Check Cert Issuer: Enable CA Subject: Blank Check Client Certificate: EAP-TLS Cache:  Leave the defaults PEAP and TTLS: Leave other EAP types like PEAP, TTLS settings at their default values. Click Save. Settings Tab:  Select the Settings Tab. General Configuration: Leave the General Configuration settings at their default values. Logging Configuration: This is a personal preference, the Radius Logging Destination is updated to output to the radius.log All other settings remain default. Click Save. Configure Switch/AP (Zyxel Wifi) The original Zyxel NWA50AX is another casualty of implementing Zero Trust, it turns out it doesn’t support 802.1X or WPA2 Enterprise. So, after discovering that fun fact the hard way, I swapped it for an NWA130BE. £160 later, we finally have proper 802.1X support. SSID Profile Wizard: A new SSID was created, subtly named, of course, and initially configured with VLAN ID 1. While this configuration allowed clients to connect, it failed to automatically switch them to VLAN 40 and the Client interface. However, since automatic VLAN switching wasn’t functioning as expected, the VLAN ID was explicitly set to 40 within the SSID configuration. I guess I was asking too much for £160. Security Profile Wizard: Select WPA2 and Enterprise Primary Radius Server Activated: Enable Radius Server IP Address: 192.168.0.1 Radius Server Port: 1812 (authentication port, to match the port assigned on the pfSense) It's time to pull out that shared secret set in the NAS/Clients section of the pfSense earlier. Save Configure Windows Clients The pfSense firewall will require a tweak to allow Clients on VLAN 40 to access the Switch/AP Navigate to the Firewall > Rules > VLAN40_Clients. Add an 'allow' rule between the Zyxel Wifi alias on 192.168.0.253 and the alias for clients. Basic CA and GPO Configuration:  Ensure every Windows Domain client trusts the Windows Root CA, and deploy it using GPO if necessary. Enable auto-enrollment of certificates in Group Policy, certificate templates with the Autoenroll permission will do just that, and enroll automatically. Deploy Workstation Authentication Client Certificate - TPM Supported: It would almost feel wrong not to deploy more certificates. This time, the clients are getting the full treatment. Not all of my Windows clients and servers are blessed with a TPM, case in point, the Intel Skull Canyon and the Gen 6 Hyper-V host, they still cling to life with retirement being long overdue. Without the TPM there's no support for the Microsoft Platform Crypto Provider. The fallback option when a TPM isn't available is to use the Microsoft Software Key Storage Provider, which is managed through Active Directory groups that control certificate enrollment. Create the following AD Groups: For Computer objects that do not support TPM RG_CA_WksAuthCert_Deny_TPM_Supt For Computer objects that do support TPM RG_CA_WksAuthCert_Allow_TPM_Supt Workstation Authentication Template: Right-click Certificate Templates and select Manage. Right-click the Workstation Authentication template and select Duplicate Template.   Will assume that the Workstation Authentication or Computer templates are NOT deployed. General Tab: Template display name: Toyo Workstation Authentication Validity period: 1 years Renewal period: 6 weeks Check 'Publish certificate in Active Directory.' Compatibility Tab:  Set compatibility levels to Windows Server 2016 Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA RSA is supported for key generation and storage in the TPM. ECC (Elliptic Curve Cryptography) isn't generally supported for TPM storage of certificates. Minimum key size: 2048 Request hash: SHA256 Requests must use one of the following providers: Microsoft Platform Crypto Provider Microsoft Platform Crypto Provider is the Key Storage Provider (KSP) that allows certificates and their private keys to be stored in the Trusted Platform Module (TPM). If no TPM is accessible, the certificate will fail to enroll. Subject Name Tab: Under 'Build from this Active Directory Information' Subject name format: Common Name is selected Include this information in alt sub name: Check DNS   Extensions Tab:  Select Application Policies and click Edit.... Ensure that the Client Authentication is present. Security Tab:  Ensure Domain Computers is removed Add RG_CA_WksAuthCert_Allow_TPM_Supt Allow Read, Enroll, and AutoEnroll. Click Apply and OK. Deploy Workstation Authentication Client Certificate - TPM is Not Supported: Toyo Workstation Authentication Template: Right-click Certificate Templates and select Manage. Right-click the Toyo Workstation Authentication template and select Duplicate Template. General Tab: Update the name to show TPM arent supported Cryptography Tab:  Provider Category: Key Storage Provider Algorithm Name: RSA Minimum key size: 2048 Request hash: SHA256 Requests can use any provider available on the subject's computer. Security Tab:  Ensure 'Domain Computers' is removed Add RG_CA_WksAuthCert_Deny_TPM_Supt Allow Read, Enroll, and AutoEnroll. Click Apply and OK. Publish the New Templates:  In the Certification Authority console Right click Certificate Templates, select New, then Certificate Template to Issue.   Select your newly created Toyo Workstation Authentication Templates.  Restart the clients to automatically enroll the certificate or gpupdate /force The Final Step (Honest) - Connect the Client to the Wifi Confirm that the Toyo Workstation Authentication certificate has been enrolled on the clients: As an Administrator, run certlm.msc. Confirm the certificate is present. After all that effort, the final step feels a bit anticlimactic, just select the 'Toyo-802.1X' Wi-Fi network and connect. No passwords required. Review the connection settings: GPO Settings For the Home Lab environment, these GPO settings may be somewhat excessive given there are only four domain joined laptops. However, the GPO will enforce connection to the specified Wi-Fi access point and hide all other SSIDs from view Create a new GPO and link it to the Domain workstations OU. Edit Computer Configuration > Policies > Windows Settings > Security Settings > Wireless Network (IEEE 802.11) Policies. Right-click and Create A New Wireless Network Policy for Windows Vista and Later Releases. General Tab: Update the Policy Name: 802.1x Toyo Wifi Add a description. Click Add. Select Infrastrucuture. Connection Tab: Update the Profile Name: Toyo 802.1X Enter the Network Name(s)(SSID): Toyo-802.1X Ensure that only 'Connect automatically when this network is in range' is selected. Security Tab: Authentication: WPA2-Enterprise Encryption: AES-CCMP Select a network authentication method: Microsoft Smartcard or other certificate Authentication Method: Computer Authentication Click OK Network Permissions Tab: The following settings will hide all other SSID's except those named in this GPO: Select Prevent connections to ad-hoc networks Select Prevent connections to infrastructure networks Uncheck Allow user to view denied networks Select Only Group Policy profiles for allowed networks Support Stuff Given the complexity that is now building within the network, it's not unexpected that there's going to be a few bumps along the way. The following section should help point you in the right direction. pfSense Logs Enable the FreeRadius logs by navigating to Services > FreeRadius > Settings Personal choice, I prefer the radius.log output. Enable temporary SSH access by navigating to System > Advanced > Secure Shell Enable Secure Shell Server Add a Firewall rule to allow TCP port 22 Secure Shell into the pfSense. ssh admin@pfsense.toyo.loc cat /var/log/radius.log When you're done, shut the door behind you, disable SSH and kill the firewall rule." Windows Client Logs When it comes to troubleshooting 802.1X on Windows, the built-in logging and diagnostic tools are: As an Admin, run the following netsh command netsh wlan show wlanreport The netsh wlan show wlanreport  command generates an HTML report that provides a detailed overview of recent Wi-Fi connection history, including connection successes, failures, signal quality, and reasons for disconnects. The following event logs offer a more traditional, readable output for analyzing connection issues. A Step to Zero Trust We made it...eventually, what a long post and a whole lot of certificates. I’ll keep it short from here. Thanks for sticking with it. Next up: IPsec. Don’t miss it... Everyone loves IPSec Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5  - AD Delegation and Separation of Duties Part 6  - Yubikey and Domain Smartcard Authentication Setup Part 7  - IPSec between Windows Domain and Linux using Certs

  • Zero Trust for the Home Lab - VLAN Tagging and Firewalls with pfSense (Part2)

    What Is Zero Trust - Recap Zero Trust is a security framework that assumes no user, device, or network segment is inherently trustworthy, regardless of where it sits in the network. The core principles include: Verify explicitly – Always authenticate and authorize access. Use least privilege access – Limit access to only what's needed. Assume breach – Design as if attackers are already in the network. What's Covered in this Blog This post covers implementing pfSense Netgate 4200, a cheap POE managed switch, VLANs and point-to-point Firewalls. How Do VLANS and Firewalls Address Zero Trust VLAN Tagging: Enforcing Logical Segmentation Virtual LANs (VLANs) break a physical network into multiple, isolated broadcast domains. Each VLAN behaves like a separate network, even if all devices are plugged into the same switch. With 802.1Q tagging, VLANs add a tag to Ethernet frames to denote which virtual segment the traffic belongs to. This enables: Separation of devices by function or trust level (e.g., IoT, guest, management, servers). Containment of potential breaches, malware on a smart TV can't reach your file server. Firewalls: Traffic Policy Enforcement Once VLANs are defined and routed, the firewall rules take over. Each VLAN has its own interface, letting you apply granular, interface-specific policies such as: Blocking traffic between VLANs by default Only allowing explicit communication paths (e.g., IoT devices can talk to the internet but not to the LAN) Logging and monitoring attempts to cross VLAN boundaries Zero Trust for the Home Lab and pfSense This marks the first step in a series of technical changes to the home lab. As outlined in Part 1 , the goal is to implement a Zero Trust model or get as close to it as possible, while avoiding cloud dependencies and keeping costs to a minimum. The old Zyxel USG 60W device has finally been retired, it's out of support, no more firmware updates. This is a basic Zero Trust requirement; keep it updated. Its replacement is a pfSense Netgate 4200. As discussed, one of the key changes is the introduction of 802.1Q VLAN tagging throughout the network. Rather than treating everything behind the firewall as implicitly trusted, I'm segmenting traffic by function: End User Devices, Infrastructure, DNS and IoT. Each VLAN gets its own virtual interface on pfSense, with point-to-point firewall rules and independent DHCP configurations. This rollout has downstream implications. For starters, the Windows Hyper-V hosts will be updated to support trunked VLANs on the switch and the external Virtual Switch will be assigned the Server VLAN. Each virtual machine will map to a specific VLAN, ensuring that VMs are isolated according to their function and access needs. The pfSense management interface will be moved to a dedicated physical LAN port, isolated from all other LANs and VLANs by firewall rules. Additionally, the Pi-hole DNS servers and Domain Controllers, which previously sat on a flat LAN, will require some reconfiguration. DNS Forwarders on the DC's will require re-pointing to the new PiHole IP Addresses. In addition, the DC's Sites and Services will require updating. Setup and Initial Config With the initial setup complete, I've allocated the following: Port 1 is connected to the ISP's router Port 2 to the NETGEAR 16-Port PoE GS316EP Managed Switch. Port 3 is reserved for the Management interface. Interfaces and VLANs Overview The physical interfaces are renamed to their correct designation of WAN, LAN and MGMT. Create the following VLANs and their tags under Interfaces > VLANs: Tag 1, default tag for management and assigned to the LAN interface Tag 10 is for DNS\PiHoles Tag 20 is reserved for Domain Controllers Tag 30 is for Member Servers Tag 40 is for Domain Clients Tag 50 is for Domain Devices such as Printers Tag 60 is reserved for SIEM and Monitoring Tag 100 is for any IOT Assigned the newly created VLANs to the LAN interface, which is igc2. KEA DHCP ISC DHCP is deprecated, despite it being the default option, so swap it out for Kea: Go to System > Advanced > Networking.  Under "DHCP Options," select "Kea DHCP" as the "Server Backend DHCP Settings DHCP is to be enabled for each Interface. Go to Services > DHCP Server Under each Interface, "Enable DHCP on LAN Interface" The LAN interface will remain enabled to support devices that don't natively support VLAN tagging via their web management pages. The risk of losing access to systems like the solar inverter will cause an unwelcome distraction. Note: There is no DHCP Scope for the Domain Controllers on VLAN 20. LAN  DHCP Scope IP Range 192.168.0.100 to 192.168.0.200 DNS points to the Internet 1.1.1.1, 4.4.4.4, 8.8.8.8 MGMT DHCP Scope IP Range 192.168.99.100 to 192.168.99.200 DNS points to the PiHole Servers - 192.168.10.70 and 192.168.10.71 DNS DHCP Scope (PiHole) IP Range 192.168.10.100 to 192.168.10.200 DNS points to its own PiHole Servers - 192.168.10.70 and 192.168.10.71 Member Servers  DHCP Scope IP Range 192.168.30.100 to 192.168.30.200 DNS points to the Domain Controllers - 192.168.20.245, 192.168.20.247, 192.168.20.249 Domain Clients DHCP Scope IP Range 192.168.40.100 to 192.168.40.200 DNS points to the Domain Controllers - 192.168.20.245, 192.168.20.247, 192.168.20.249 Firewall Intro pfSense firewall rules manage separate rules for each interface (LAN, WAN and VLAN) and are processed top-down, meaning the first rule that matches a packet is the one that gets applied. If no rule matches, pfSense blocks the traffic by default. Now for the fun part: initially, every interface, except for the WAN, was configured with permissive any-to-any firewall rules. This meant all devices could communicate freely across all networks. Once system stability was confirmed and I verified that I hadn't broken the system during the transition to VLAN tagging, the rules were gradually tightened to enforce proper network segmentation. The end result is detailed below. In keeping with Zero Trust principles, permissive subnet rules were removed, effectively eliminating lateral movement, particularly between individual client systems. Alias's pfSense firewall aliases let you group IPs, networks, ports, or protocols under a single name. Instead of writing multiple rules for each item, you create one alias and reference it across your firewall rules, making rule management simpler, cleaner, and easier to update. For example, create an alias like Mgmt_Consoles with the IPs of all management addresses, then use that alias in your rules instead of listing each IP individually. I've the following Aliases: Alias_ClientDevices, for printers Alias_Clients Alais_DomainControllers Alias_MemberServers Alias_PiHoles Each has been populated either in the case of the DCs by statically assigned IP's, or by reserving the IP address in DHCP, then updating the corresponding alias. WAN The default behaviour for the WAN Interface is to block RFC 1918 and Bogon network ranges. RFC 1918 is a standard published by the Internet Engineering Task Force (IETF) that defines a set of private IP address ranges. These aren't routable on the public internet These private IP ranges are: 10.0.0.0 to 10.255.255.255 (10.0.0.0/8) 172.16.0.0 to 172.31.255.255 (172.16.0.0/12) 192.168.0.0 to 192.168.255.255 (192.168.0.0/16) A bogon network refers to a block of IP addresses that should not be seen on the public internet. These are addresses that are either unallocated by IANA or reserved for special use, like private networks or multicast. Examples of Bogon IP Ranges: 0.0.0.0/8 – “This” network 10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12 – RFC 1918 private addresses 127.0.0.0/8 – Loopback 169.254.0.0/16 – Link-local APIPA LAN The LAN contains the least trusted devices, such as mobile phones, smart TVs, the RoboRock vacuum, and the solar inverter. As a result, none of these devices is permitted to communicate with any other interface, including the Zyxel Wi-Fi router's management interface at 192.168.0.253. The two allow rules permit communication within the device's own LAN subnet and outbound access to the internet on ports 80 (HTTP) and 443 (HTTPS). The alias, Mgmt_Consoles is used to block access to each interface’s default gateway on ports 80 and 443. This overrides pfSense’s default behavior, which permits access to the web management interface from all interfaces. MGMT The MGMT interface, by design, is allowed to communicate with all other interfaces. I've blocked most management access with the exception of direct connection or via a Member Server. VLAN10_DNS As with the other interfaces, access to pfSense's Web Management, Servers, MGMT, and Clients VLANs, is explicitly blocked Aliases for PiHoles are implemented, allowing only named IP's, the 2 devices, from accessing each other from within the subnet. Outbound DNS traffic (port 53) is permitted to allow name resolution via the Internet. Ports 80 and 443 are allowed to enable the Pi-hole instances to retrieve updates and patches. General Approach to Domain Services The VLANs for DCs, Servers and Clients allow unrestricted traffic between each VLAN. Ideally, firewall rules should be tightened to permit only the necessary services and ports. However, since the plan is to implement IPSec, the effort required to apply stricter rules at this stage aren't justified. IPsec will provide the necessary security controls and require only a couple of ports to be open, and not the plethora of rules required between DCs and Domain members etc. VLAN20_DCs The Domain Controllers provide authentication services and therefore require unrestricted communication with client and server aliases, as well as with each other. DNS traffic (TCP/UDP port 53) is allowed between the Domain Controllers and the Pi-hole instances, while port 123 is open to the Internet to support time synchronization services. VLAN30_Servers Servers are permitted to communicate with each other, as well as with DCs and Client devices, SCCM makes this a nessessity. The ClientDevice rule also enables support for network printing. Additionally, all servers are allowed outbound access to the Internet on ports 80 and 443 to facilitate Windows Updates. Note: DC's, Hyper-V hosts, SCCM, SCOM and Certificate services and shouldn't share the same VLANs. VLAN40_Clients The clients are exposed to the Internet and susceptible to various attacks during routine browsing and application use. A key principle of Zero Trust is to prevent lateral movement that could be used to discover and escalate privileges. The following measures are in place to help mitigate this risk: Client VLAN Isolation: The client VLAN is configured to block client-to-client communication; no firewall rules permit traffic between hosts within the same subnet. URA Logon Restrictions: As outlined in the referenced link below, Server and Domain Admin accounts are explicitly denied logon rights to client machines via User Rights Assignments. https://www.tenaka.net/post/deny-domain-admins-logon-to-workstations Clients are only permitted unrestricted communication with the DC and the Servers VLANs. External outbound traffic is restricted to web access on ports 80 (HTTP) and 443 (HTTPS). Admin Access The default configuration is for the LAN interface to apply anti-lockout rules to pfSense's management interface. The rules have been removed in favour of the MGMT interface, so proceed with caution: Navigate to System > Advanced > Admin Access Locate the 'Anti-lockout' check box and remove. Netgear Managed Switch Of all the devices involved in the VLAN migration, the NETGEAR 16-Port PoE Switch (GS316EP) was the only one that caused real issues. Due to a fatal error on my part, the switch had to be factory reset. Losing the configuration wasn’t a major problem, but diagnosing the loss of access to the switch that led to the reset was frustrating. The issue stemmed from allowing the switch to obtain its IP address via DHCP. Once the trunk uplink, connecting both pfSense and the switch, was assigned, the switch lost its ability to pick up a DHCP IP address, resulting in a loss of management access. The most likely cause is that VLAN 1 traffic may be dropped as it's explicitly tagged and the switch or router will be expecting untagged traffic. Assign a static IP address to the switch. Enable 'Basic 802.1Q VLAN'. Create the desired VLANs Assign Truck (uplink) to the following connections: Port 1 is the connection to pfSense Port 2 is the connection to the Wifi Access Point Ports 3 and 4 are the connections to the Hyper-V Hosts Assign VLAN Tags to the relevant device: Ports 5 and 6 are for the PiHoles on VLAN 10 Ports 7 through 12 are for Clients on VLAN 40 WIFI Access Point To assign a VLAN to a Wi-Fi network, you need to configure the VLAN ID directly in the settings of the wireless access point (AP) or wireless controller. These are part of the settings when configuring the SSID and WIFI password. Manually Set for Windows Client and Server (Non-Hyperv) To manually tag a VLAN on a Windows machine: Open Control Panel > Network and Sharing Centre > Change Adapter Settings Right-click > Properties > go to the Advanced tab. Look for a setting like VLAN ID, Priority & VLAN, or Packet Priority and VLAN. Set the VLAN ID you want and click OK. Once configured, all traffic from that adapter will be tagged with the specified VLAN ID. Note: Not all NIC drivers support VLAN tagging through Windows natively. Intel, Broadcom, and some Realtek adapters typically do, but it often requires their vendor-specific drivers (not just the Microsoft default). If the VLAN option is missing, you may need to install the manufacturer’s advanced network driver suite. Hyper-V Servers The Intel (now ASUS) NUCs are equipped with only a single network interface, which limits the ability to physically separate VLANs across multiple NICs. Meaning Hyper-V management traffic and VM traffic that defaults to the virtual switch go through VLAN 30 To mitigate this, and I certainly don't want Kali on the same interface as my Member Servers is to set the VLANs within the network adapter settings of the individual VM, thus keeping with the Zero Trust approach intact. Additionally, again deviating from strict Zero Trust principles, the physical hosts serve multiple roles, including File and Print services, as well as DFSR replication. Raspberry PI and PiHole VLAN Before moving the PiHoles VLAN, it's important to complete the following steps. VLANs aren't supported out of the box. Install the latest updates and then the vlan package. sudo apt update sudo apt install vlan Load the 8021q kernel module, which is essential for enabling VLAN tagging on network interfaces. sudo modprobe 8021q echo "8021q" | sudo tee -a /etc/modules Define how VLANs are created and configured. sudo nano /etc/systemd/network/25-vlan.network [Match] Name=eth0.VLAN_ID [Network] DHCP=yes This file instructs systemd-networkd how to create and manage the VLAN interface sudo nano /etc/systemd/network/25-vlan.netdev [NetDev] Name=eth0.VLAN_ID Kind=vlan [VLAN] Id=VLAN_ID Update the VLAN tagging on the network switch that the Pi's are plugged into. Restart the network interface. sudo systemctl restart networking sudo systemctl status networking Confirm the IP has updated from 192.168.0.70 to 192.168.10.70. ip addr show Conclussion Firstly, thanks for following along. There’s a lot of moving pieces and implementing VLAN tagging can present a few challenges. However, this is the first critical step toward achieving a more complete Zero Trust architecture. Related Posts: Part 1  - Zero Trust Introduction Part 2  - VLAN Tagging and Firewalls with pfSense Part 3 - pfSense and 802.1x Part 4 - IPSec for the Windows Domain Part 5  - AD Delegation and Separation of Duties Part 6  - Yubikey and Domain Smartcard Authentication Setup Part 7  - IPSec between Windows Domain and Linux using Certs

bottom of page