What does data filtering protect against?

Data filtering protects against unauthorised disclosure, data leakage, regulatory non-compliance, fraud and reputational damage. Inbound filtering keeps bad data out, in-platform filtering limits access, and outbound filtering prevents over-sharing with vendors, ad platforms or APIs beyond consented purposes.

Is tokenisation mandatory under RBI rules?

RBI's card-on-file tokenisation framework requires merchants, payment aggregators and gateways to replace stored card numbers with tokens issued by the card network. Storing raw card numbers or CVV is prohibited, and tokenisation is now standard practice across Indian e-commerce.

How does DPDP impact outbound data sharing?

Outbound sharing of personal data to vendors or platforms is permitted only if it matches the lawful purpose disclosed at consent and is covered by a data-processing agreement. Data principals can request restriction or erasure, which must propagate to downstream recipients within prescribed timelines.

Do small B2B firms need data filtering?

Yes. DPDP applies regardless of size, and small firms often suffer the most damage from a breach. Even a basic classification policy, encrypted backups, role-based access and outbound DLP rules can meaningfully reduce risk without large investment.

Secure B2B and B2C Data Filtering

No Coupler.io skill applies to this blog-writing task. Proceeding directly with the regenerated post.

Secure B2B and B2C Data Filtering in India: A Practical 2026 Framework

Indian businesses must now filter, classify and govern data as a hard regulatory obligation. The Digital Personal Data Protection Act 2023 (DPDP Act), RBI's Master Direction on IT Governance (2023), SEBI's revised Cyber Security and Cyber Resilience Framework (CSCRF, August 2024) and CERT-In's 6-hour breach-reporting direction together create overlapping obligations governing how data enters your systems, how it is handled inside them and what is allowed to leave. Get any layer wrong and the penalty exposure is measured in crores — not lakhs.

What Data Filtering Actually Means in a Compliance Context

Most finance and IT teams still treat data filtering as a data-quality task: remove blanks, fix formats, deduplicate. In FY 2026-27, the term carries a different and heavier regulatory meaning.

Data filtering is the disciplined process of inspecting every record against a policy set — classification tier, consent status, contractual limits and regulatory obligations — and then deciding whether to allow, transform, mask, tokenise, redact or block that record. It applies at three distinct points: on the way in (inbound validation), while the data lives inside your systems (in-platform controls) and on the way out to vendors, partners, ad platforms and analytics tools (outbound governance).

The legal reason this matters: Section 8(4) of the DPDP Act 2023 requires every Data Fiduciary to ensure that personal data used to make decisions about a Data Principal is complete, accurate and consistent. A broken inbound filter that allows stale mobile numbers or incorrect PAN mappings into a lending or insurance workflow is not just a data-quality failure — it is a statutory violation the Data Protection Board can cite.

The Regulatory Stack Governing Your Data in FY 2026-27

Understanding which rule maps to which data flow prevents you from building three separate compliance programmes. Here is the consolidated picture.

DPDP Act 2023

Applies to any entity that processes digital personal data within India, or processes the data of Indian residents outside India. For filtering purposes, the critical obligations are:

Section 6 — Consent: Processing for any purpose requires free, specific, informed and unambiguous consent. Pre-ticked boxes and bundled consent clauses are invalid. Consent status must be a machine-readable filter condition in your data pipelines, not a human-readable footnote in your privacy policy.
Section 8(5) — Reasonable security safeguards: Penalty under the Schedule: up to Rs. 250 crore for breach of this obligation.
Breach notification: Notify the Data Protection Board of India and affected Data Principals within timelines to be prescribed in Rules (Rules were pending as of May 2026). Until notified, align to the CERT-In 6-hour standard as a conservative interim baseline.

RBI IT Governance and Tokenisation

RBI's Master Direction on Information Technology Governance, Risk, Controls and Assurance Practices (2023) governs banks, NBFCs and payment system operators. On tokenisation, RBI's circular CO.DPSS.POLC.No.S-516/02-14-003/2021-22 and successor circulars prohibit merchants, payment aggregators and acquirers from storing raw card Primary Account Numbers (PANs), expiry dates or CVV after the October 2022 deadline. Tokens issued by card networks (Visa, Mastercard, RuPay) are the only permissible stored representation.

CERT-In Direction (IT Act 2000, Section 70B)

Cybersecurity incidents — including data breaches, unauthorised access and ransomware — must be reported to CERT-In within 6 hours of detection, not 6 hours of investigation completion. The clock starts when any team member becomes aware something has gone wrong. Approximately 20 categories of incidents are covered. Failure to report is a criminal offence under the IT Act.

SEBI CSCRF (August 2024 Revision)

The revised framework applies a tiered architecture:

Market Infrastructure Institutions (MIIs) — exchanges, clearing corporations, depositories: real-time monitoring, quarterly board review, strictest controls.
Qualified Regulated Entities (REs): annual third-party security audit, mandatory DLP, vulnerability scans.
Mid-size and Small REs: lighter but substantive requirements including access controls, data masking and incident reporting.

Step 1: Classify Every Dataset Before Writing a Single Filter Rule

You cannot filter what you have not classified. A data classification policy is the non-negotiable foundation.

Practical five-tier classification for Indian businesses:

Tier	Label	Examples	Regulatory sensitivity
1	Public	Published pricing, stock filings, blog posts	None
2	Internal	Operational KPIs, internal memos	Low
3	Confidential	Vendor price lists, customer contracts	Contractual
4	Sensitive Personal	PAN, Aadhaar, mobile, email, location, health, financial history	DPDP + IT Act
5	Critical Financial	CVV, card PANs, bank account credentials, SWIFT codes	RBI + PCI DSS

How to execute this in practice — a seven-step sequence:

Pull a complete inventory: databases, object stores (S3, Azure Blob, GCS), SaaS platforms (Salesforce, Zoho, Tally), API endpoints and email marketing tools.
For each data element, assign the tier above. Start with any system that touches customer or employee records.
Tag fields in a data catalog — Apache Atlas, Collibra and Microsoft Purview work at enterprise scale; a maintained spreadsheet works for SMEs.
Record for each sensitive field: the lawful purpose under DPDP, the consent reference, the retention period and every downstream system that consumes the field.
Run an automated data-discovery scan using tools such as BigID or Privacera (or open-source alternatives) to surface Aadhaar patterns (\d{4}\s\d{4}\s\d{4}), PAN formats ([A-Z]{5}[0-9]{4}[A-Z]{1}) and card numbers (16-digit strings passing the Luhn algorithm) lurking in unexpected places.
Document the data flow map: source → processing system → downstream consumers → deletion/archival endpoint.
Schedule an annual classification review, and trigger an ad-hoc review whenever a new data source or use case is introduced.

Without this inventory, every downstream filtering rule is guesswork — and regulators will treat it as such.

Filtering for B2B Data Flows

B2B filtering is transactional and contractual in character, but regulatory obligations apply equally.

Inbound Vendor and Partner Data

When vendors send you master-data files — supplier lists, invoice feeds, employee-secondment records — validate at the point of ingestion before any record enters your ERP or data warehouse:

GSTIN format check: 15-character alphanumeric. Characters 1–2 = state code, characters 3–12 = PAN of the entity, character 13 = entity number, character 14 = Z (default), character 15 = checksum digit. Any record failing this pattern should be quarantined, not rejected silently.
PAN validation: 10-character format check (AAAAA9999A pattern), cross-referenced against your existing TDS master to prevent duplicates that create reconciliation problems on Form 26Q / 27Q uploaded to TRACES (TDS Reconciliation Analysis and Correction Enabling System).
Bank account verification via penny drop: Every new beneficiary account must be verified through a penny-drop API before it is marked active for payment. This is both a fraud-prevention control and a practical requirement before any NEFT/RTGS instruction.

Why this matters for tax compliance: Under Sections 194C, 194J and 206AA of the Income-tax Act 1961, if a vendor's PAN on file is invalid or missing, TDS must be deducted at 20% instead of the standard rate (typically 1–10% depending on the section). On a vendor invoice of Rs. 50 lakh, that is a TDS differential of Rs. 4.5–9.5 lakh — all recoverable in theory, but practically slow and contested when TRACES reconciliation fails.

Outbound Data to Vendors and Partners

Every data-sharing arrangement must clear a vendor-risk gate before the first byte leaves your system:

What data classification tier does this vendor receive?
Does the vendor hold ISO 27001, SOC 2 Type II or an equivalent current certification?
Is a Data Processing Agreement (DPA) — mandatory for processors under DPDP — signed and on file?
What is the breach-notification SLA in the DPA? It should be ≤ 72 hours from the vendor to you, so you can meet your CERT-In 6-hour clock.
Does the vendor use sub-processors? If so, are sub-processors contractually bound to the same security standards, and will the vendor notify you of any sub-processor change?

Filtering outbound TDS files: The quarterly TDS statement uploaded under Form 26Q (non-salary domestic), Form 27Q (non-resident payments), Form 24Q (salary) and Form 27EQ (TCS) must contain only: deductee PAN, deductee name, amount paid, TDS deducted and challan details. Strip every other field — full address, mobile number, email, bank account details — before the file is generated. This is simultaneously a DPDP data-minimisation obligation and a sensible security control.

Filtering for B2C Data Flows

B2C filtering is where DPDP bites hardest, because Data Principals have enforceable rights that your systems must be technically capable of honouring.

Before any personal data enters a processing pipeline for marketing, analytics or behavioural modelling:

Query the consent register in your CRM or Customer Data Platform (CDP).
If consent is absent, withdrawn or has expired, route the record to a quarantine bucket. Do not delete — deletion may destroy evidence of a consent dispute. Do not process — processing without consent is a Section 6 violation.
Store every consent record with: timestamp, channel (web, IVR, mobile app), version of the consent notice displayed, IP address (for web) and a cryptographic hash of the notice text so you can prove what the person actually agreed to.

RBI Tokenisation: The Non-Negotiable for Card Data

If your business accepts card payments as a merchant, marketplace or payment aggregator:

Audit immediately: Search every database and data store for columns named card_number, pan, cvv, card_no, cc_num, expiry, card_expiry or any variant. Run a regex across unstructured data stores for 13–19 digit numeric strings that pass the Luhn algorithm.
Any match in a table that is not your payment gateway's token vault is a live RBI non-compliance.
Tokens are issued by card networks through acquiring banks and are specific to the merchant-device-network combination — a token cannot be used at a different merchant even if intercepted.

Masking for Internal Analysis

Even inside your systems, analysts, CRM users and BI developers should not see raw sensitive personal data unless the role explicitly requires it.

Static data masking: Before any data reaches a development or test environment, mask it. A real PAN ABCDE1234F becomes XXXXX1234F. A real email [email protected] becomes r*****@g*****.com. Treat this as a CI/CD pipeline gate — no real personal data in dev or staging, ever.
Dynamic data masking: In production query environments, a data analyst sees XXXXXXXX0091 for a mobile number while the application layer retains the full value. Implement this at the database view or API gateway layer.
Access logging: Every time a full sensitive field is accessed — not just modified — the access must be logged with user identity, timestamp and client IP. This log is your evidence in a Data Protection Board inquiry.

Honouring Data Principal Rights

DPDP creates rights that are effectively filter and workflow triggers:

Right of access: On request, your system must be able to extract every piece of personal data held about one individual across CRM, warehouse, email platform, analytics tool and backup copies. Build this capability before a request arrives. Provisional target: respond within 30 days (Rules may prescribe a shorter period).
Right to erasure: Purge or irreversibly anonymise data across all systems — including backup copies, which the Act explicitly addresses. Anonymisation means the individual can no longer be re-identified even with additional data.
Grievance redressal: Designate a point of contact (or a Consent Manager if you are a Significant Data Fiduciary). Aim for 48-hour acknowledgement and 15 business day resolution as an internal standard until Rules specify a timeline.

Three-Layer Architecture: Inbound, In-Platform, Outbound

The most resilient filtering architecture places controls at all three stages. A failure at one layer is caught by the next.

Layer 1 — Inbound Controls

Schema validation: field types, mandatory fields, format patterns (GSTIN, PAN, IBAN, IFSC)
Consent token check: does this inbound record carry a valid, purpose-matched consent reference?
Duplicate detection: especially for vendor master and customer master feeds
Regulatory routing: records containing PAN, Aadhaar or card data are flagged immediately for handling under the appropriate policy

Layer 2 — In-Platform Controls

Role-based access control (RBAC) enforced at database and API layer, not just at the application UI
Dynamic data masking for analyst-facing query environments
Card tokenisation vault with network-issued tokens as the only stored representation
Data lineage tracking: every downstream table or BI report that consumes a sensitive field is mapped
Immutable audit logs retained for a minimum of 180 days (CERT-In direction minimum) and ideally 12 months for audit-readiness

Layer 3 — Outbound Controls

Pre-flight consent check before any export to advertising platforms (Meta CAPI, Google Enhanced Conversions, DV360, trade desks)
Data minimisation: programmatically strip every field the recipient does not need before the export runs
Encryption in transit: TLS 1.2 minimum, TLS 1.3 preferred; never transmit sensitive data over plain HTTP
Digital watermarking or canary tokens for high-risk data shares, so a leaked copy can be traced back to which recipient received it
API rate-limit anomaly detection: a partner pulling 10× their normal data volume in an off-hours window should trigger an alert and temporary rate cap

Worked Example: What the Cost of Getting This Wrong Actually Looks Like

Scenario A — The misconfigured outbound API

A mid-size e-commerce company with 2 lakh customer records has a misconfigured analytics API connector that has been pulling full name, email, mobile number and city — all Tier 4 Sensitive Personal data — into a third-party BI tool for 18 months. A security researcher discovers the API is unauthenticated and files a public disclosure.

Regulatory exposure:

DPDP Act Schedule, Item 1: Failure to implement reasonable security safeguards → up to Rs. 250 crore penalty.
CERT-In direction: Breach detected Day 1, notified Day 3. Two-day gap on the 6-hour clock → potential criminal liability under Section 70B of the IT Act.
Direct incident costs (Indian mid-market benchmarks): forensics Rs. 15–25 lakh, legal Rs. 10–20 lakh, customer notification Rs. 5–10 lakh, PR management Rs. 5–15 lakh. Total direct cost before any regulatory penalty: Rs. 35–70 lakh.
Revenue impact: Cart abandonment typically spikes 15–25% in the 90 days following a publicised breach in Indian e-commerce.

The prevention cost: a dynamic data masking layer on the analytics API (Rs. 2–5 lakh per year for a SaaS DLP tool) and a consent management platform (Rs. 1–3 lakh per year). The economics are unambiguous.

Scenario B — The over-populated TDS outbound file

A finance team exports a vendor TDS file containing PAN, name, TDS amount and credit limit, outstanding balance and pricing tier — because the report was built from the full vendor master view rather than a purpose-limited extract. The file lands in the vendor's inbox. One vendor employee is also a consultant to a competitor. The pricing data leaks.

The contractual and legal dispute costs Rs. 5–20 lakh in legal fees. The regulatory exposure under DPDP (sharing confidential data without a lawful basis) and IT Act Section 43A (failure of reasonable security practices) creates additional civil liability. The fix — an outbound filter that strips every field except PAN, name, amount and challan reference from the TDS extract — takes one developer half a day to implement.

Common Mistakes and How to Fix Them

What goes wrong: Terms-and-conditions acceptance is treated as blanket consent for all processing purposes, including sharing data with advertising partners and behavioural analytics vendors. Under DPDP Section 6, bundled consent is invalid. When challenged, you have no lawful basis for most of your data-processing activity. Fix: Granular, purpose-specific consent notices. Separate consent per purpose (order fulfilment, marketing, analytics, third-party sharing). Store each consent record with its own timestamp, notice version hash and withdrawal mechanism.

Mistake 2: Real Personal Data in Development and Test Environments

What goes wrong: Developers use production exports as test data because it "makes debugging easier." A developer's laptop contains 50,000 real PANs. The laptop is lost or stolen. Fix: Static data masking as a mandatory CI/CD gate. No real personal data enters any non-production environment. Automate the masking step — do not rely on developer discipline.

Mistake 3: Ignoring the Vendor's Sub-Processors

What goes wrong: Your primary cloud analytics vendor is ISO 27001 certified and has signed a DPA. But they use a sub-processor in a jurisdiction without an adequacy determination under DPDP Section 16. Any cross-border transfer through the primary vendor to that sub-processor is a potential violation. Fix: Request a sub-processor list from every data vendor. Require your vendors to contractually bind sub-processors to identical security standards and to notify you of any sub-processor change with 30 days' notice.

Mistake 4: Log Retention Set Below Regulatory Minimum

What goes wrong: A breach is discovered 45 days after the intrusion. CERT-In demands logs from the incident window. Logs are configured to auto-delete after 30 days. You cannot reconstruct the incident or demonstrate containment — which is itself a separate compliance failure. Fix: Immutable (write-once) log retention of 180 days minimum across all systems. For payment systems and critical financial applications, retain 12 months. Store logs in a separate, write-protected storage tier with independent access controls.

Mistake 5: API Endpoints Returning More Fields Than the Consumer Needs

What goes wrong: A third-party integration was built three years ago and pulls a broad customer object including fields the vendor never actually uses — because narrowing the response required a ticket and no one prioritised it. Over time, the vendor becomes a data store of personal data you never intended to share. Fix: Enforce field-level API policies at the API gateway (AWS API Gateway, Apigee, Kong). Every endpoint returns only the minimum necessary fields, enforced at the infrastructure layer — not left to the discretion of the consuming application.

Incident Response: The Timelines You Cannot Afford to Miss

Build your incident-response runbook around these hard deadlines. The clock on each starts at the moment of detection, not when investigation is complete.

Event	Deadline	Authority	Consequence of missing
Cybersecurity incident (any of 20 categories)	6 hours from detection	CERT-In (MeitY)	Criminal liability under IT Act Section 70B
Personal data breach — Data Principals	As per DPDP Rules (use 72-hour standard until notified)	Data Protection Board	Up to Rs. 200 crore penalty per Schedule
Personal data breach — Data Protection Board	As per DPDP Rules	Data Protection Board	Up to Rs. 200 crore penalty
RBI-regulated entity breach	Immediate to RBI CISO; formal within 2–6 hours	RBI	Regulatory action under Banking Regulation Act / PSS Act
SEBI MII/RE breach	Report to SEBI within 6 hours; board intimation within 24 hours	SEBI	Show-cause, penalties under SEBI Act 1992

Tabletop exercise requirement: Run at least one annual exercise with your CFO, CISO (or IT head), legal counsel and communications lead. Simulate three distinct scenarios: (a) a ransomware attack that encrypts the customer database, (b) a vendor-side breach where a sub-processor exposes data, and (c) an accidental public link that exposes an internal financial model. Each scenario requires a different response — detecting that your plan handles all three is the only way to know it works.

Key Takeaways

Classify first, filter second. Assign every data element a tier (Public through Critical Financial) before writing a single filter rule. Without an enforced classification, every downstream control is guesswork — and regulators treat it that way.
Consent is a machine-readable filter gate, not a policy document clause. Under DPDP Section 6, bundled consent is invalid. Wire consent status — per purpose, per channel — into your data pipeline as a hard block on processing.
RBI tokenisation is a live obligation, not a future roadmap item. Scan your databases for raw card PANs, expiry dates and CVVs today. Any match outside a card-network token vault is a standing compliance violation with direct RBI consequences.
Your vendor's perimeter is your perimeter. Most Indian data breaches trace back to a vendor or sub-processor, not the enterprise itself. Every data-sharing arrangement requires a DPA, a security attestation and a breach-notification SLA that fits inside your own regulatory clocks.
Three-layer filtering is more resilient than a single perimeter control. Inbound validation, in-platform masking and tokenisation, and outbound field-level controls in combination catch failures that any single layer alone would miss.
The CERT-In 6-hour clock starts on detection, not on investigation completion. Designate a CERT-In reporting officer. That person must be able to file an initial report within 6 hours of any team member raising a security alert, even if the full scope of the incident is not yet known.
Immutable logs are your evidence, not just your audit trail. Set 180-day minimum retention across all systems, stored in a write-protected tier. A breach you cannot reconstruct from logs is a breach you cannot defend before any of the four regulatory authorities above.

Secure B2B and B2C Data Filtering

Secure B2B and B2C Data Filtering in India: A Practical 2026 Framework

What Data Filtering Actually Means in a Compliance Context