WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@shreeshd-tn
Copy link
Contributor

What does this PR do ?

Implemented the new address class with a context based approach. Slight changes were made to ordinals for English transliterations as well and performance improvements.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

shreeshd-tn and others added 30 commits October 9, 2025 11:30
Signed-off-by: shreeshd-tn <[email protected]>
…#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>
…IDIA#310)

* Staging hi tn (NVIDIA#271)

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Future Implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…A#320)

* telephone class integration

(cherry picked from commit a7c9adf)
Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: shreeshd-tn <[email protected]>

* Updated date in Jenkins file to the PR creation date

Signed-off-by: shreeshd-tn <[email protected]>

* Jenkins file date change

Signed-off-by: shreeshd-tn <[email protected]>

* Trying today's date

Signed-off-by: shreeshd-tn <[email protected]>

* improved country code coverage + some test cases

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Ignore test generated files

Signed-off-by: shreeshd-tn <[email protected]>

* Improved landline detection and added edge test cases for proper coverage

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Deleted gitignore file

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

* Hindi TN Future Implementations 2.0. - Fraction, Measure and Time (NVIDIA#310)

* Staging hi tn (NVIDIA#271)

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Future Implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Hindi TN 2.0 - Telephone class integration from staging branch (NVIDIA#320)

* telephone class integration

(cherry picked from commit a7c9adf)
Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: shreeshd-tn <[email protected]>

* Updated date in Jenkins file to the PR creation date

Signed-off-by: shreeshd-tn <[email protected]>

* Jenkins file date change

Signed-off-by: shreeshd-tn <[email protected]>

* Trying today's date

Signed-off-by: shreeshd-tn <[email protected]>

* improved country code coverage + some test cases

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Ignore test generated files

Signed-off-by: shreeshd-tn <[email protected]>

* Improved landline detection and added edge test cases for proper coverage

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Deleted gitignore file

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Rebase Hindi TN update: Fix Jenkinsfile for CI (NVIDIA#325) (NVIDIA#331)

* Staging hi tn (NVIDIA#271)

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: shreeshd-tn <[email protected]>

* Fix Jenkinsfile for CI (NVIDIA#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: shreeshd-tn <[email protected]>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>

* Hindi TN Future Implementations 2.0. - Fraction, Measure and Time (NVIDIA#310)

* Staging hi tn (NVIDIA#271)

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Future Implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: shreeshd-tn <[email protected]>

* Hindi TN 2.0 - Telephone class integration from staging branch (NVIDIA#320)

* telephone class integration

(cherry picked from commit a7c9adf)
Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: shreeshd-tn <[email protected]>

* Updated date in Jenkins file to the PR creation date

Signed-off-by: shreeshd-tn <[email protected]>

* Jenkins file date change

Signed-off-by: shreeshd-tn <[email protected]>

* Trying today's date

Signed-off-by: shreeshd-tn <[email protected]>

* improved country code coverage + some test cases

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Ignore test generated files

Signed-off-by: shreeshd-tn <[email protected]>

* Improved landline detection and added edge test cases for proper coverage

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Deleted gitignore file

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: shreeshd-tn <[email protected]>

* Ran tests successfuly and updated cache date to today

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: Mariana Graterol Fuenmayor <[email protected]>

* Hindi TN: Ordinal Implementation (NVIDIA#343)

* Adding ordinals into staging_hi_tn

Signed-off-by: shreeshd-tn <[email protected]>

* Ordinal Cleanup

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Review changes

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Hindi TN: Main to staging Fix + Cardinals (leading zero update) (NVIDIA#348)

* Staging hi tn (NVIDIA#271)

* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Fix Jenkinsfile for CI (NVIDIA#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>

* Comma bugfix for En electronics (NVIDIA#332)

* fix bug with commas and electronics

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkins

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update Jenkinsfile (NVIDIA#341)

Only mount TestData from path

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] pre-commit suggestions (NVIDIA#335)

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0)
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0)
- [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0)
- https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror
- [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Cardinal: Leading zero changes

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>

* debug file issue

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* debug ordinals error

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* ci debug

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert to original suffixes for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* CI fix: Missing init file (NVIDIA#350)

Signed-off-by: shreeshd-tn <[email protected]>

* HI TN: Staging branch cleanup for main merge (NVIDIA#355)

* Review changes - cleanup

Signed-off-by: shreeshd-tn <[email protected]>

* Missed cleanup

Signed-off-by: shreeshd-tn <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: shreeshd-tn <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Cache date change (NVIDIA#356)

* Cache date change

Signed-off-by: shreeshd-tn <[email protected]>

* Cache date changes again

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: shreeshd-tn <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: shreeshd-tn <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
Signed-off-by: shreeshd-tn <[email protected]>
@shreeshd-tn shreeshd-tn marked this pull request as ready for review November 12, 2025 16:36
from nemo_text_processing.text_normalization.hi.utils import get_abs_path

EN_TO_HI_DIGIT_MAPPINGS = [
("0", "०"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a TSV file?

# Hindi-specific: Convert COMMA marker back to actual comma
# (Used in address verbalization to avoid "sil" token in Sparrowhawk)
if self.lang == "hi":
output = output.replace(" COMMA ", ", ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you add test cases for this specific change?

# replace non breaking space with breaking space
denorm_pred=$(echo $written | normalizer_main --config=sparrowhawk_configuration.ascii_proto 2>&1 | tail -n 1 | sed 's/\xC2\xA0/ /g')
# replace non breaking space with breaking space, and convert COMMA back to comma
denorm_pred=$(echo $written | normalizer_main --config=sparrowhawk_configuration.ascii_proto 2>&1 | tail -n 1 | sed 's/\xC2\xA0/ /g' | sed 's/ \+COMMA /, /g')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't replace this for tests. the graph needs to handle it

HI_TWO_POINT_FIVE = "२.५" # 2.5
HI_DECIMAL_25 = ".२५" # .25
HI_DECIMAL_75 = ".७५" # .75
HI_POINT_FIVE = ".५"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be part of our utils and imported here so they can be leveraged by other classes if needed?

expanded_mapping.append([x, y])
en_context_words.append(x)
if x and x[0].isalpha():
capitalized = x[0].upper() + x[1:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we have a function to get all possible capitalization options. can you check how this is handled for English?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the capitalized_input_graph in the English utils file. I have updated the code to use that

# " , " -> " COMMA " so Sparrowhawk doesn't output "sil"
# Then sed script converts "COMMA" back to ","
comma_marker = pynini.cdrewrite(
pynini.cross(" , ", " COMMA "),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we handle this here, why does it have to be processed in normalize or in the output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were workaround fixes since I couldn't find the issue at the time, I've now removed all of them and fixed it in the tagger itself

@shreeshd-tn shreeshd-tn changed the title Hindi TN: Address Class (context based) Hindi TN: Address Class (context + structural) Dec 15, 2025
Signed-off-by: shreeshd-tn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants