Build a 2026-ready data governance and quality programme that satisfies DPDP, audit-trail rules and unlocks AI for Indian enterprises.
Data Governance & Quality Management
Data governance is the set of policies, people, and processes that determine who owns a dataset, how accurate it must be, how long it is retained, and who can access it. In FY 2026-27, that definition carries legal weight: the Digital Personal Data Protection (DPDP) Act 2023 mandates demonstrable consent records and breach response; the Companies Act audit-trail rule requires a tamper-proof edit log for every accounting transaction; and AI projects collapse when their training data is ungoverned. This guide gives you a practical blueprint to build a programme that satisfies all three simultaneously.
What Data Governance Actually Covers — and What It Does Not
Most organisations equate data governance with buying a catalogue tool. That confuses the tool with the discipline. Governance is an operating model built on five pillars:
- Ownership — every dataset has a named data owner (a business leader accountable for quality) and a data steward (the analyst who executes standards day to day).
- Standards — field definitions, permitted values, formats, and master code lists are documented, approved, and version-controlled.
- Policies — classification (public, internal, confidential, restricted), retention schedules, access rules, and data-lifecycle decisions are written down and enforced.
- Quality measurement — completeness, accuracy, consistency, timeliness, uniqueness, and validity are scored continuously, not checked at year-end.
- Lineage — every dataset has a documented path: where it originates, every system it passes through, and what the downstream consumer does with it.
What governance is not: it is not a one-time data-cleaning project, not solely an IT responsibility, and not a compliance checkbox that gets ticked and forgotten. The moment a policy is written but not operationalised through steward workflows and automated quality tests, it is decorative.
The Six Quality Dimensions — With Indian Compliance Examples
These six dimensions are not theoretical; they map directly to the failure modes that generate GST notices, audit qualifications, and regulatory penalties.
1. Completeness
Required fields must be populated. For GST data, this means every invoice in your accounting system must have a valid GSTIN, a six-digit HSN or SAC code (mandatory above the applicable turnover threshold), and a correct Place of Supply state code. A missing HSN causes your GSTR-1 to fail schema validation on the GST portal, blocking filing.
2. Accuracy
Values must match an authoritative reference. Your vendor master's bank account name should match what NPCI returns against the IFSC + account number pair. A mismatch discovered during a payment run is embarrassing; one discovered during a statutory audit is a finding.
3. Consistency
The same fact must be the same across all systems. The most common failure: a customer's legal name in your CRM is "ABC Pvt Ltd" but "ABC Private Limited" in the ERP and "A.B.C. Pvt. Ltd." in the GST portal. Each variant creates a separate bucket in your analytics, understates concentration risk, and breaks GSTR-2B reconciliation matching.
4. Timeliness
Data must arrive within the window the consumer needs it. For TDS, deductor data must be available before the due date for filing TDS returns (typically the 31st of the month following the quarter). If payroll data arrives in your accounting system five days late, a quarterly TDS return either misses the deadline (attracting a late fee under Section 234E of the Income-tax Act 1961 at Rs. 200 per day) or gets filed with stale numbers.
5. Uniqueness
No duplicate masters. The gold standard in Indian compliance: one customer, one PAN, one active record. Duplicates inflate your customer count, split transaction history, distort Form 26AS reconciliation, and cause AIS/TIS mismatches that downstream tax computations must painfully untangle.
6. Validity
Values must conform to defined patterns and permitted lists. State codes follow the two-digit GST state code list. Currency codes follow ISO 4217. Tax codes follow your chart of accounts taxonomy. An invalid state code (e.g., "33" entered as "033") will silently pass a spreadsheet but break any downstream API call to the GST portal.
Building Your Governance Operating Model
A governance operating model has three tiers. Understanding each tier's mandate prevents the common failure of everything collapsing into IT.
Tier 1 — Steering Committee
A cross-functional body including the CFO, CTO or Head of IT, Chief Compliance Officer, and heads of key functions (Sales, Operations, HR). Meets quarterly. Reviews quality metrics, approves policy changes, resolves inter-departmental data disputes, and signs off on the annual data-governance budget. Without executive sponsorship at this level, governance stalls the moment it conflicts with a short-term operational priority.
Tier 2 — Data Owners
Business leaders who are formally accountable for the quality of a domain's data. The Finance Head owns general-ledger master data and transaction data. The Sales Head owns customer master and pipeline data. Ownership means they are named in the governance policy, they review monthly quality scorecards, and they are escalation points when stewards cannot resolve issues.
Tier 3 — Data Stewards
The day-to-day executors. A steward for the customer master handles duplicate identification and merge requests, monitors completeness dashboards, enforces field-validation rules, and maintains the data dictionary. In a 500-person Indian enterprise, you might have four to six stewards — often existing analysts given a formal stewardship mandate, not new headcount.
The central Data Management Office (or its equivalent in smaller firms: a single Data Manager) sets the policies, owns the catalogue and tooling, defines quality thresholds, and runs the monthly data-quality review meeting.
DPDP Compliance: Governance as the Enforcement Mechanism
The Digital Personal Data Protection Act 2023, with rules notified progressively through 2025-26, imposes obligations that only a functioning data-governance framework can satisfy in practice.
Consent management — before processing personal data, a Data Fiduciary must obtain free, specific, informed, and unambiguous consent. That consent record — who gave it, when, for what purpose, and the exact notice presented — must be retrievable on demand. This is a data-governance problem: consent records are a dataset with an owner (Compliance team), a retention policy (at minimum, for the duration of processing plus the applicable statute of limitations), and a quality requirement (completeness = 100%, no gap allowed).
Data-Principal rights — the Act gives individuals the right to access their data, correct inaccuracies, and request erasure. You have a defined window to respond. Satisfying a correction request requires knowing where the person's data lives across all your systems — which is precisely what a data lineage map tells you. Without lineage, your team spends days manually checking CRM, ERP, marketing automation, and HR systems, and still may miss a copy sitting in a data warehouse.
Breach notification — a breach affecting personal data must be notified to the Data Protection Board of India (DPBI) and to affected Data Principals. The Schedule to the DPDP Act 2023 prescribes penalties of up to Rs. 200 crore for failure to notify a breach to the Board, and up to Rs. 250 crore for a breach of the security obligation itself (Clauses 1 and 2 of the Schedule). A data-governance programme that has already classified data by sensitivity (what is personal, what is sensitive personal) dramatically reduces detection-to-notification time because the scope of affected records is immediately knowable.
Retention and deletion — the Act prohibits retaining personal data beyond the purpose for which it was collected. Your governance policy must document retention periods by data category, and your systems must enforce automated purging or anonymisation when those periods expire. "We keep everything forever in a data lake" is not compliant. "Customer transaction data is retained for 8 years per GST obligations, then pseudonymised; marketing consent data is retained for the period of the relationship plus 2 years" is a governance policy.
The MCA Audit-Trail Obligation and What It Demands of Your Data
The Companies (Accounts) Amendment Rules introduced a requirement — now enforceable and reported on by statutory auditors for FY 2023-24 onwards — that every company using accounting software must maintain an audit trail (edit log) of each transaction. Specifically:
- The software must record who created, modified, or deleted each transaction entry, and when.
- The audit trail must be enabled throughout the financial year and must not have been disabled at any point.
- The logs themselves must be tamper-proof and retained for the period required under the Companies Act 2013 (generally eight years).
For FY 2026-27 statutory audits (reported in AY 2027-28 filings), auditors will flag — and must qualify — any period during the year when the audit trail was switched off or where log integrity cannot be confirmed.
From a data-governance perspective, this means:
- System logging is a policy requirement, not an IT feature to be turned on optionally.
- Log data is itself a governed dataset with a steward, a retention schedule, and a quality requirement (zero gaps, tamper-evident storage).
- Lineage for accounting entries must include the software system, the user ID, the timestamp, and the original value before modification. A journal voucher created in Tally, modified in an Excel import, and posted again is a lineage chain that must be reconstructable.
If your company uses multiple systems — say, a manufacturing ERP for production orders and a separate accounting package for GL — the audit trail obligation applies to each system, and governance must cover the interface between them.
Worked Example: Calculating the Cost of Poor Master Data
Consider a mid-size FMCG distributor — 12 branches, approximately Rs. 150 crore annual turnover, 8,000 active customer master records in their ERP.
A one-time data audit reveals:
- 1,200 records (15%) have an incorrect or missing PAN, making Form 26AS reconciliation unreliable.
- 600 records have a GST state code that does not match the billing address, causing GSTR-1 auto-population to place supplies in the wrong jurisdiction.
- 320 records are outright duplicates — the same customer exists under two codes, splitting transaction history.
Direct rework cost per quarter: Two accounts executives spend roughly 15 working days each quarter manually identifying and correcting these records. At a fully-loaded cost of Rs. 40,000 per month per person, 15 days represents Rs. 20,000 per person, or Rs. 40,000 per quarter — Rs. 1.6 lakh annually — just in salary cost for rework that should not exist.
Regulatory cost: Three GST show-cause notices arrive in the year relating to ITC mismatches caused by state-code errors. Each notice costs Rs. 30,000–45,000 in professional fees to respond to, plus management time. Conservative estimate: Rs. 1.2 lakh in professional fees.
Opportunity cost: Duplicate customer records mean the analytics team cannot reliably compute customer lifetime value or concentration risk. One large customer appears as two entities with separate credit limits, resulting in a combined outstanding that exceeds the intended exposure by Rs. 28 lakh — discovered only when the customer defaults.
Total visible cost of this data-quality gap: approximately Rs. 2.8 lakh per year in direct costs, before accounting for the default exposure.
A structured master-data governance programme — a part-time data steward role, automated duplicate-detection rules in the ERP, and a monthly quality scorecard reviewed by the Finance Head — would cost approximately Rs. 3–4 lakh to set up and under Rs. 1 lakh annually to run. It pays back in year one.
Common Mistakes That Derail Governance Programmes
Starting with the tool instead of the owner
Organisations buy a data catalogue, spend three months configuring it, and then discover that no business user logs in because no one has been assigned stewardship. The tool is an accelerator; the operating model is the foundation. Assign owners before you evaluate software.
Governing everything at once
A "big bang" catalogue project that attempts to document all datasets in the enterprise simultaneously almost always stalls at around 30% completion. Start with the three to five datasets that generate the most pain — typically customer master, vendor master, and the core transaction tables used for statutory reporting. Demonstrate value, then expand.
Treating quality thresholds as binary pass/fail
A completeness threshold of "100% of PAN fields must be populated" sounds rigorous but triggers constant false alarms for genuinely optional records (e.g., walk-in customers below the PAN-quoting threshold). Set tiered thresholds: critical fields have a 99%+ target; important fields have an 85%+ target; desirable fields are monitored but not alarmed. This prevents alert fatigue that leads teams to ignore dashboards.
Ignoring data-at-rest in legacy systems
Governance programmes tend to focus on the modern data warehouse and forget that the HR system running on a 2012 on-premise server holds employee records subject to DPDP obligations. Every system that touches personal data is in scope, regardless of vintage.
Conflating data governance with data security
Security controls who can access data. Governance defines what the data means, how clean it must be, and what the rules for its use are. You need both; neither substitutes for the other.
Data Lineage and AI Readiness
Data lineage — the documented path from a data point's origin to every downstream consumer — has two immediate payoffs in FY 2026-27.
For regulators: When a tax officer or auditor asks "where did this number come from?", lineage lets you trace a figure in a tax return back through the warehouse transformation, the ERP posting, and the source document in under an hour rather than days of manual archaeology. This is the practical meaning of audit-trail readiness.
For AI initiatives: Every AI or machine-learning model is only as trustworthy as its training data. Without lineage, you cannot answer the questions a responsible AI governance framework demands: What data trained this model? Over what time period? Were there known quality issues in that data? Has the training data been updated since the model was deployed? What is the bias risk given the demographic composition of the training set?
For Indian enterprises launching AI-assisted underwriting, demand forecasting, or fraud detection in FY 2026-27, these are not hypothetical questions. RBI guidelines on model risk management for regulated entities, and emerging SEBI guidance on algorithmic systems, both expect documented model lineage. A data-governance programme that already tracks dataset versions, transformation logic, and data-quality scores at ingestion is three-quarters of the way to satisfying these requirements without building a parallel "AI governance" silo.
Structure your AI readiness checklist within governance as follows:
- Training data passport — dataset name, version, source system, quality scorecard at time of extraction, known gaps.
- Model owner — a named business owner accountable for model outputs, not just the data science team.
- Inference log — every model inference is logged with input features (suitably anonymised for personal data) and output, enabling post-hoc auditing.
- Retraining schedule — defined trigger conditions (data drift beyond a threshold, known regulatory change) that initiate retraining, with sign-off by the model owner.
Tools and Phased Implementation
The modern data-governance stack for Indian enterprises typically combines three layers:
- Data catalogue: Atlan (increasingly popular with Indian data teams), OpenMetadata (open-source), Collibra or Alation (larger enterprise). The catalogue is where business definitions, ownership, classification, and lineage are documented.
- Quality testing framework: dbt tests and Great Expectations are widely used in warehouse-centric architectures; Soda integrates well with both cloud and on-premise. These run as part of daily ETL pipelines and produce automated quality scores.
- Lineage tracker: Most modern cloud warehouses (BigQuery, Snowflake, Databricks) emit lineage metadata natively. Complement with OpenLineage-compatible tooling for cross-system coverage.
Recommended phasing for a 200–2,000 employee Indian enterprise:
- Months 1–2: Identify the five highest-risk datasets. Assign owners and stewards. Document current-state definitions in a shared spreadsheet or Wiki — do not wait for catalogue procurement.
- Months 3–4: Run your first quality profile on those five datasets. Establish baseline scores. Set thresholds. Begin a defect log.
- Months 5–6: Implement automated quality tests in your ETL pipeline for the critical fields in those five datasets. Publish the first monthly quality dashboard.
- Months 7–9: Evaluate and deploy a catalogue tool. Migrate definitions and ownership from the spreadsheet. Connect quality scores to the catalogue.
- Months 10–12: Expand to the next ten datasets. Begin data-literacy sessions for business users. Review and formalise DPDP-related policies (consent, retention, breach response) within the governance framework.
Cost-calibrate the tooling choice to your scale. A 300-person company does not need an enterprise Collibra licence; OpenMetadata hosted on a cloud VM costs a fraction and covers 80% of the functional requirement.
Key Takeaways
- Data governance is a board-level obligation in FY 2026-27, not a technology project. The DPDP Act, the Companies Act audit-trail rule, and AI model risk expectations all require a functioning governance operating model with named owners, documented policies, and measurable quality.
- Assign data owners before buying tools. Without a named business owner and steward for each critical dataset, no catalogue or quality platform will sustain adoption.
- The six quality dimensions — completeness, accuracy, consistency, timeliness, uniqueness, validity — map directly to Indian compliance failure modes: GST notices, TDS late fees, audit qualifications, and AIS mismatches.
- DPDP penalties of up to Rs. 250 crore for security breaches are not abstract. Consent management, retention schedules, and breach response are all governance artefacts that must be operationalised, not just drafted.
- Poor master data has a quantifiable Rs. cost. A realistic mid-size enterprise loses Rs. 2–5 lakh annually in rework and professional fees attributable to customer and vendor master data quality gaps — costs that a structured governance programme eliminates in year one.
- Data lineage is the connective tissue between DPDP breach response, MCA audit-trail compliance, and AI model documentation. Build it once; it satisfies multiple regulators.
- Phase your rollout. Start with five high-risk datasets, prove value with a monthly quality scorecard, then expand. A phased programme that actually runs beats an ambitious big-bang project that stalls.





