Topic: [Bug/Need Info] Page 1 in web interface == Page 0 in JSON API

Posted under Site Bug Reports & Feature Requests

Bug overview description.The same page of results is presented as "1" (eg. post/index/1/foobar) by the web interface, but "0" (eg post/index.json?page=0 by the JSON API.
What part(s) of the site page(s) are affected? JSON post/index API. Possibly XML API as well. Potentially other parts of the query API that are paginated and have a web-interface analog.
What is the expected behavior?Requesting page=1 via API gives the same results as post/index/1 in the web interface
What actual behavior is given instead? Requesting page=1 via API gives the same results as post/index/2 in the web interface
Can you reproduce the bug every time? Yes. I have been seeing it for some time in a few queries, but only now took the time to figure out exactly what was wrong with the produced listings.
What steps did you take to replicate this bug? Fetched a series of result pages for {set:lancearmstrongfavs} from the JSON API, compared the IDs given with those found in the post/index/ pages.

I think it's likely all indices are offset by -1. However, I only tested with the first few page indices (site 1 -> api 0; site 2 -> api 1)

Updated

Unfortunately, page zero is an invalid page, and you cannot access it, even if you try. I wasn't able to reproduce this in testing either.
page = 1 if page <= 0 is applied, and they take the same code path, and both use the same query and paginator code. The only difference is the output path where the collection is serialized as JSON/XML or passed to the view code to be displayed. The output format is more of an afterthought and not ingrained into the logic.

Doing https://e621.net/post/index?page=1&limit=1 has the same post id and result as https://e621.net/post/index.json?page=1&limit=1

Are you sure this isn't an undefined sorting order problem? More details would be appreciated.

Updated by anonymous

Are you sure this isn't an undefined sorting order problem? More details would be appreciated.

Well, I verified that the order of the IDs given matched between the JSON and webpage -- just offset by ~1 page. ie. I checked runs of 5 IDs, they were all in both, and the order they were presented in was the same.
I recall you talking about undefined sort order before, but don't really remember the details.

Short on time currently but will work up a minimal tester using jq when I get back.

EDIT:
I've confirmed what you said for pages 0 and 1 in web and json API, including cross checking. Not sure what else to do to push this forward. It could be a bug in my (fairly simple) code .
The main loop looks like:

BEFORE=
for V in $(seq $START $END); do 
   FNAME=$(printf '%s-%03d.json' "$PREFIX" $V)
   if [ -n "$BEFORE" ]; then 
     download "$FNAME" "https://e621.net/post/index.json?limit=320&before_id=$BEFORE&tags=$WHAT"
   else
     download "$FNAME" "https://e621.net/post/index.json?limit=320&page=$V&tags=$WHAT"
   fi
   
   LENGTH=$(jq length < "$FNAME")
   if [ "$LENGTH" -lt 320 ]; then
     if [ "$LENGTH" -lt 1 ]; then
       echomd " _320_ records expected, but *NONE* found"
     else
       echomd " _320_ records expected, but _${LENGTH}_ found"
     fi
   fi
   BEFORE=$(jq '.[-1].id' < "$FNAME")
   echo "BEFORE=$BEFORE;  index=$V"
   [ "$V" -lt "$END" ] && sleep 30
done


(where download is a wrapper function that chooses wget or curl according to availability)

Updated by anonymous

This would happen if you tried to use page zero(or any value less than or equal to zero) as a starting index, and then continued to download using before_id, because your internal page index goes up by one each time, but the server paging would still start at 1. It would create an effective page + 1 effect.

Something else to note here is that if you enter an order: it will be effectively only on the first request and then ignored for all future requests, as before_id ignores the sorting order.

Updated by anonymous

Well, I was using page 1 as the start before. I only switched to page 0 recently, as it seemed to address the problem of a pile of posts being omitted from results.

I'm not entirely sure I understand, though..

This would happen if you tried to use page zero(or any value less than or equal to zero) as a starting index, and then continued to download using before_id, because your internal page index goes up by one each time, but the server paging would still start at 1. It would create an effective page + 1 effect.

Why is the server paging even relevant when using before_id?

I can run a trace, but I'm pretty sure that the internal page ID is only used on the first download. I don't see how it can interact with e621 after that first query, since it isn't passed to e621.

...

For the sake of completeness, I'll mention that I don't use order: because I suspected I would encounter the behaviour you described.

Updated by anonymous

It isn't directly relevant, but conceptually relevant. You are using for V in $(seq $START $END); do and $V is used to determine which page your output appears to be at in the filename. This means that your script is assigning page numbers instead of the server.

Using this logic if $START contains zero you now have an invisible 1 page offset introduced in your output files, since the before_id value swaps to the equivalent of page 2 when you then request $V = 1. You originally requested page zero but received page one instead, the before_id logic kicks in and now server paging doesn't matter, only what your script thinks the page number is.

I cannot attest to why it seems there are results missing, or what the cause of that would be. Tracing the script might provide better insight into the actual behavior over the perceived behavior.

Updated by anonymous

Ah, you mean the output numbers will be offset, rather than what e621 produces. Now I get it. That would result in the output numbered 1 being different from the output numbered 0, with the wrong implication that the on-disk '1' is actually what e621 calls page 1.

Didn't really mean to imply that you should know why there are results missing. The whole situation is more than a bit confusing.

For now, I'll amend the script to clamp the START value to >=1.

Updated by anonymous

  • 1