Process for Estimating Tax
A peek into tax estimation process for redeeming units that are outside STCG zone
Intro
In the previous chapter, we've seen how to compute total number of LTCG taxation eligible units, if equity transaction history has been provided.
Picking up from where we’d left off in in part 1, we can start with planning for potential tax computations
Planning for Tax Computation
At this point, we’ve computed units outside of STCG zone (purchased more than a year ago, that’s still in investor’s portfolio, eligible for LTCG taxation if redeemed). STCG stands for Short Term Capital Gain.
We need to compute these:
tax liability, if investor were to sell all these units.
total number of units present, where net tax liability is equal to or below 1L, if investors were to sell those
How to compute Tax?
Tax is computed on realized gain, or booked profits. Realized gain, can be written as SUM(selling price of units - purchase price of units)
.
Say, an investor purchased
U1
units, on datet1
, at priceP1
.U2
units, on datet2
, at priceP2
.U3
units, on datet3
, at priceP3
.
Assume selling price (NAV on day of selling) is P
.
This investor has (U1 + U2 + U3)
available for selling. Realized gain is:
Which can be written as:
In previous section, we’ve computed effective value of (U1 + U2 + U3)
, for any number of transactions.
We know P1
, P2
, P3
; for all those transactions.
Only missing information is selling price, P
.
What we need, is selling price as on today (the given day when investor is attempting the sell transaction).
In other words, we need a way to get access to latest NAV of these funds.
Because this NAV would change every day (equity mutual funds update NAVs at the end of the day, every business day); it’d be better to programmatically access it, like we used GOOGLEFINANCE()
for historic data.
GOOGLEFINANCE()
can be one option here. But as we’ve learned in the previous section, it has issues (remember seeing #NA
all over the place?)
Instead, we’d go to the source!
Getting the latest NAV
Every AMC has a regulatory requirement to update their NAV daily with AMFI (Association of Mutual Funds in India). And AMFI makes this data available for parsing, in a CSV format, on their website.
This is the link: https://www.amfiindia.com/spages/NAVAll.txt
Remember when you were thinking learning SPLIT()
TRANSPOSE()
would’ve been of no use? Well, as you’re about to see, not all CSV data come in text files.
Google Sheets has an IMPORTDATA()
function, that’d do just fine. We provide it a URL for the AMFI NAV endpoint, and it’d fetch that latest raw NAV data.
Then we can split that raw NAV feed into cells, as we see fit.
Finding your fund
This AMFI URL has ~20k entries. To be able to find the funds as in the CSV transaction dump, our excel matchers would be fine.
But the problem is this: the name of fund can appear differently in that list, as opposed to how it appears in the CSV.
Finding fund by matching strings would lead to misses.
A better option would be to find fund the way it’s usually done - by a unique identifier.
This unique identifier is already present in the AMFI NAV feed URL: ISIN
ISIN (International Securities Identification Numbers) is unique across global securities market. Not only can there be no two funds with same ISIN in India; there can be no two securities with duplicate ISINs in regulated markets, across the world!
We have to find ISIN for each of the funds in CSV transaction, manually, and add a column in the second sheet, next to each fund, to add its ISIN.
List of ISIN is also present in AMFI NAV feed.
This is actually a straight-forward, one-time exercise. It should also make it clear to the reader why having too many funds to portfolio can be cumbersome.
Execution
Adding ISIN column
Let’s start by adding the ISIN column first
In the second sheet (not the one named data
), add a column next to fund’s name. You can use Insert Column.
Insert ISINs as follows (copy-paste from below):
Final results, should look like this
Parag Parikh Flexi Cap Fund - Regular Plan - Growth
INF879O01019
Invesco India Multicap Fund - Growth
INF205K01DN2
Axis Focused 25 Fund - Regular Plan - Growth
INF846K01CH7
Mirae Asset Large Cap Fund - Regular Plan - Dividend
INF769K01036
Mirae Asset Emerging Bluechip Fund - Regular Growth
INF769K01101
HDFC Mid-Cap Opportunities Fund - Growth
INF179K01CR2
Franklin India Smaller Companies Fund - Growth
INF090I01569
Mirae Asset Large Cap Fund - Regular Plan - Growth
INF769K01010
Franklin India Flexi Cap Fund - Growth
INF090I01239
Franklin India Bluechip Fund - Growth
INF090I01171
Franklin India Prima Fund - Growth
INF090I01809
ICICI Prudential Bluechip Fund - Growth
INF109K01BL4
DSP Tax Saver Fund - Regular Plan - Growth
INF740K01185
Fetch and Process Latest NAV from AMFI
Follow these steps
Create a new sheet, name it AMFI NAV (Raw)
In a cell, enter the AMFI URL:
https://www.amfiindia.com/spages/NAVAll.txt
Invoke
IMPORTDATA()
function, and pass the cell ID of the URL above, as argumentGiven it few seconds, it’d fetch latest NAV from AMFI for all mutual funds. It should add some 18k+ rows.
This data is in CSV format, separated by semicolon, per line.
Create another new sheet, name it AMFI NAV (Processed)
In this sheet, add somewhere this formula
=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";")
, where<cell ID>
points to a row of semicolon-separated data, in the sheet AMFI NAV (Raw).ISBLANK(<cell ID>)
checks if the cell is empty. As you can see from the data, there can be plenty of empty cells. InvokingSPLIT()
on these, would result in an error, it’d show up as a#VALUE
error on spread sheet.IF(condition, arg1, arg2)
checks ifcondition
is true for a cell (in this case, if the cell is empty). If so, the first argumentarg1
in inserted in the result cell (not the cell on which condition is being evaluated). Otherwise,arg2
is inserted.=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";")
basically means if<cell Id>
is empty, don’t put anything, else split the content of the cell by semicolon, and put the results in cells.Use autofill drag to fill out all 18k+ rows in new sheet.
Technically, you won't need to fill out all 18k+ rows. As long as you fill out enough number of rows, such that all funds in that other sheet appear in those rows, it's good enough.
Refer to the following video(s) for reference
Your sheets would now have latest available NAV from all mutual funds, in a nice table format.
We've introduced a few new Google Sheets functions in this section:
Computation of Value as on Today
In this section, we’d compute net redemption value of units older than 1 year.
This is total available units older than one year (already computed earlier), multiplied by latest NAV.
We can look up latest NAV using VLOOKUP
with ISIN as search query.
VLOOKUP()
stands for vertical lookup or row-wise lookup, across columns. It’s a way of searching for matching data.
There’s also HLOOKUP()
and LOOKUP()
; but for our use case, VLOOKUP()
would work fine for now.
VLOOKUP() documentation | archive.org link | archive.is link
Follow these steps:
Come back to the main sheet, which has information on available units and sold units.
Add a column at the end, titled Latest Value
Invoke
VLOOKUP()
for the top-most row as follows:1st argument,
<cell ID that has ISIN for the fund>
is ISIN, but we’re doing it this way, to be able to drag and auto-fill.2nd argument,
<range of data from processed NAV sheet>
is a bit interesting.VLOOKUP()
does not work if the column we are searching for is not the first column of a range. In our processed NAV sheet, the ISIN is the second column.We have to pick a range, leaving out the first column.
This range is a combination of rows and cells. For instance, if your processed NAV sheet starts at
B3
withScheme Code
text in it; then, our range would be something likeC9:F18123
(leaving out the last, date column).Since we’d be using drag & auto-fill, we can lock these cells:
$C$9:$F$18123
.3rd argument,
<column index, that has NAV for the fund>
, is column index of the column for the result.We’re looking for NAV when ISIN matches, which is the 4th column in our range (ISIN is the first column in this range). Hence, we can put this value as 4.Final argument, is
false
. It tellsVLOOKUP()
that the dataset in the range is not sorted. Our data set is not sorted on the basis of ISIN (sorted alphabetically on AMC name, actually), hencefalse
is the right value to pass here.
Refer to this video to follow along steps:
Final outcome in the newly added column, should be similar to these screenshots:
Depending on when you're looking at these, the exact value might differ. Because it depends on latest NAV as on that date. This computation was done with NAV of 22nd March, 2021
We did not explicitly set out to do this, but it's trivial to take a CSV and compute latest portfolio valuation! You already have total number of units investor has in portfolio for each fund, and latest NAV for each fund.
Adding up total cost of purchase
On the surface, this sounds easier than previous steps.
All one has to do, is go and sum on units and NAV column, where date is much before the date 1 year ago.
In fact, investment amount is provided in the CSV transaction dump, so multiplication isn’t even needed (invested amount = unit x NAV on that date).
Except, it’s not that simple.
We have realized that even if units were acquired more than one year ago, doesn’t mean some of those weren’t sold in last 1 year, or before.
Remember that available units is NOT sum of all units purchased more than a year ago. We had to subtract the sold units (sold at any point in time, not just more than a year ago) to arrive at LTCG-eligible units.
In other words, we need to find unit purchase price for only these units. Not that of all units.
Another way to look at it: keep adding purchase price, until total number of units added is same as LTCG-eligible units.
An Example
An example to illustrate this idea:
Say, an investor is buying 10 units every month, from 2018 January, to 2020 December. That's a total 36 month period.
At the end of December 2020, units purchased between January 2018 and December 2019, are eligible for LTCG.
Now assume that in 2020, at some point, this investor has also sold 54 units.
Then, at that point, they’ve units available for LTCG-eligible redemption.
What were the purchase price(s) for these 186 units?
More importantly, when were these 186 units acquired?
We know that investor purchased 10 units every month, from Jan 2018 to Dec 2020. Investor has also sold 54 units some time in 2020.
Redemption works in a FIFO (First In First Out) manner in folio. Meaning, earliest purchased units are to be redeemed first.
We first need to account for 54 units that were sold. These units were acquired as follows -
January 2018
10
February 2018
10
March 2018
10
April 2018
10
May 2018
10
June 2018
4
Once we’ve depleted the earliest-purchased-earliest-sold scheme for already redeemed units, we can start to account for remaining 186 units.
These 186 LTCG-eligible units were acquired as follows:
June 2018
6
July 2018
10
August 2018
10
September 2018
10
October 2018
10
November 2018
10
December 2018
10
January 2019
10
February 2019
10
March 2019
10
April 2019
10
May 2019
10
June 2019
10
July 2019
10
August 2019
10
September 2019
10
October 2019
10
November 2019
10
December 2019
10
Total
When computing purchase price for these 186 units, we need to use purchase price from these months in the table; and not purchase price from January 2018.
In other words, this is going to be the most complex part of the computation.
To recap, we compute purchase cost / acquisition cost as follows:
Account for existing sell transactions, depleting units, starting from initial purchase transaction; until total units add up to all units sold so far.
Start adding up units from this date, till it adds up to total available LTCG-eligible units.
A Special Case
Luckily, we have a special scenario on our hands.
In our case, if you notice the units sold and LTCG-eligible units in the column, investor has either never sold any units from their current holdings in a specific fund; or they’ve completely sold off all their units in that specific fund.
Which means, for funds where there are LTCG eligible units; we do NOT need to account for previous sell transactions.
In other words, one can start adding up the purchase transaction costs from the very first purchase.
This won’t be the case for a typical CSV transaction dumps for most investors.
Grandfathering LTCG Computations
We’ve so far carefully avoided discussing the grandfathering clause in equity LTCG computation.
Union budget 2018, defined purchase price of equity, for LTCG computation as higher of price as on 31st January 2018, or actual purchase price - if units were bought before this date.
To factor this in, we’d need information on NAV price as on 31st January, for each of the funds present in the CSV transaction dump, which has LTCG-eligible units.
The AMFI URL for latest NAV won’t do.
Instead, we can use this URL for fetching historic data: http://portal.amfiindia.com/DownloadNAVHistoryReport_Po.aspx?mf=<AMC Code>&tp=1&frmdt=31-Jan-2018&todt=31-Jan-2018
This URL returns data in CSV format as before. However, it’s specific to each AMC, identified by an AMC code.
An AMC code is a number, between 1 and 100. However, not all numbers between 1 and 100 correspond to a valid AMC code. You’d have to try out each number, and see which ones result in valid URLs that return data.
For example, Nippon AMC is identified by AMC code 21. This URL would return NAV history of all Nippon AMC funds, as on 31st January 2018: http://portal.amfiindia.com/DownloadNAVHistoryReport_Po.aspx?mf=21&tp=1&frmdt=31-Jan-2018&todt=31-Jan-2018
You can follow similar steps as earlier to incorporate data from this URL into your calculations.
An Exercise for You
We won’t be covering the last bit of calculation here, as it’d be quite complex.
It’s left as an exercise to the reader; as it won’t be of much value to do these with Excel / Spreadsheet, for most investors.
To achieve such conditional summing-up, one would most likely need to use QUERY()
function, writing something akin to SQL queries.
QUERY() documentation | archive.org link | archive.is link
It’d also be much harder to update or understand. for another person going through same excel sheet or spreadsheet; which would have such esoteric constructs.
At that point, using a for-loop equivalent in commonly used programming languages (Python / Node / Java / Ruby / Go etc.) would be much easier to reason about.
Another option would be to use GApps Script (archive.org link | archive.is link), in Google Sheets; which is basically JavaScript to program your sheets. However, we won’t be covering that here.
Wrapping Up
Phew! We covered a lot of ground in these three chapters.
We can only hope power of spreadsheets are becoming more and more apparent to you. Simultaneously, you've also started seeing the rough edges.
Excel / spreadsheets with complex formulas can be hard to maintain - it's tough to understand and update if needed.
We'd gradually address these in upcoming chapters and series.
Last updated