Deriving categorical cross entropy is quite trivial, and could help understanding the loss function better.
Let $K$ be the number of categories.
$$ L = - \sum_{j=1}^K t_j \hat{y_i} $$
Using chain rule we can write the partial derivative as follows;
$$ \frac{\delta L}{z_i} = \sum_j \frac{\delta L}{\hat{y_j}} \frac{\delta \hat{y_j}}{\delta z_i}. $$
We can substitute $\frac{\delta L}{\delta \hat{y_j}}$;
$$ \frac{\delta L}{\delta \hat{y_j}} = \frac{\delta \left (- \sum_{j=1}^K t_j \hat{y_i} \right)}{\delta \hat{y_j}} = - t_i \frac{1}{\hat{y_j}}. $$
Then $\frac{\delta L}{z_i}$ becomes;
$$ \frac{\delta L}{z_i} = \sum_j - t_i \frac{1}{\hat{y_j}} \frac{\delta \hat{y_j}}{\delta z_i} $$
We can also substitute $\frac{\hat{y_j}}{z_i}$ for $i = j$ and $i \neq j$ separately,
$$ \frac{\delta \hat{y_j}}{\delta z_i} = \frac{\delta \left (\frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \right )}{\delta z_i}. $$
If $i=j$ it becomes;
$$ \frac{\delta \hat{y_i}}{\delta z_i} = \frac{\delta \left (\frac{e^{z_i}}{\sum_{k=1}^K e^{z_k}} \right )}{\delta z_i} = \frac{e^{z_i}\sum_{\substack{k=1 \ k \neq i}}^K e^{z_k}}{\left (\sum_{k=1}^K e^{z_k} \right )^2} = \frac{\sum_{\substack{k=1 \ k \neq i}}^K e^{z_k}}{\left (\sum_{k=1}^K e^{z_k} \right )} \frac{e^{z_i}}{\left (\sum_{k=1}^K e^{z_k} \right )} = \left ( 1- \frac{e^{z_i}}{\left (\sum_{k=1}^K e^{z_k} \right )} \right ) \frac{e^{z_i}}{\left (\sum_{k=1}^K e^{z_k} \right )}, $$
$$ \frac{\delta \hat{y_i}}{\delta z_i} = \hat{y_i} (1-\hat{y_i}). $$
Otherwise if $i \neq j$ it becomes;
$$ \frac{\delta \hat{y_i}}{\delta z_i} = \frac{\delta \left (\frac{e^{z_i}}{\sum_{k=1}^K e^{z_k}} \right )}{\delta z_i} = - \frac{e^{z_j} e^{z_i}}{\left ( \sum_{k=1}^K e^{z_k} \right )^2} = - \hat{y_j} \hat{y_i}. $$
Going back to initial statement and substituting $\frac{\delta \hat{y_j}}{\delta z_i}$ gives;
$$ \frac{\delta L}{z_i} = \frac{-t_i}{\hat{y_i}}\frac{\delta \hat{y_i}}{\delta z_i} + \sum_{\substack{j=1 \ j \neq i}}^K \frac{-t_j}{\hat{y_j}} \frac{\delta \hat{y_j}}{\delta z_i} $$
$$ \frac{\delta L}{z_i} = \frac{-t_i}{\hat{y_i}}\hat{y_i}(1-\hat{y_i}) + \sum_{\substack{j=1 \ j \neq i}}^K \left ( \frac{-t_j}{\hat{y_j}} \right ) (-\hat{y_j} \hat{y_i}) $$
$$ \frac{\delta L}{z_i} = - t_i + t_i \hat{y_i} + \sum_{\substack{j=1 \ j \neq i}}^K t_j \hat{y_i} $$
Since sum of the targets $\sum_{k=1}^K t_k = 1$, we finally end up with
$$ \frac{\delta L}{z_i} = t_i + \sum_{j=1}^K t_j \hat{y_i} = \hat{y_i} - t_i. $$