Comment on Oops, something went wrong!
hperrin@lemmy.ca 2 days agoI am a developer of software. I can guarantee you that what you’re asking for would make my job harder, because I’ve done it, and it has made my job harder. If an error is transient (like, a caching layer error, a db connection error, an external API error, an endpoint connectivity error, etc), giving the user an error code will make it more likely that they’ll file a useless bug report or support ticket. The errors are all logged internally, and we can see when there is a spike in the error count. There’s no reason to give the user an error code, because there’s nothing helpful that the user can do with it, and there’s a lot of unhelpful things a user can do with it.
There are times where a message to the user is appropriate, like if they made a mistake with their input. But there are so many things that could go wrong that the user can’t do anything about. You’re not going to work around your DB shard going down, and a replica will replace it in a few seconds anyway, so giving you an error code does more harm than good. Telling you to try again later is exactly what I would tell you if you filed a support ticket. I don’t want to deal with useless support tickets, and you don’t want to deal with useless error messages.
Modern software stacks are big, complex systems with lots of failure points. We monitor them, and we can tell when you see these errors. If we chose to not show you a specific error code/message, there’s almost definitely a good reason.
Cryophilia@lemmy.world 2 days ago
So what you’re saying is that your code is garbage and you’re hiding it from users because it’s too much work to fix it.
hperrin@lemmy.ca 2 days ago
What I’m saying is that error messages can be helpful or harmful. Knowing that and how to tell the difference is what makes you an expert. Just firing off any information to the user without thinking about it is what makes you a novice, and will eventually get you fired. We’re talking about systems with millions of daily users. If you cause 2,000 unnecessary support tickets every day because you don’t know when to send what information to the user, you won’t get very far in tech.
Cryophilia@lemmy.world 1 day ago
If you have 2000 daily people getting error messages, your code is garbage rofl
And if your company would rather you avoid those tickets by not giving out error codes, your company is also garbage. Which to be fair, is a lot of tech companies.
hperrin@lemmy.ca 1 day ago
I feel like you really don’t understand how big tech works. There’s not some single server running every service perfectly. There are tons of different layers and services running on thousands or hundreds of thousands of hosts.
Let’s say you make a request to something like Facebook. Say you’re liking a post. Here’s what happens:
That request goes in through a PoP (point of presence). These are sometimes called edge servers or edge gateways, but at Facebook we called them PoPs. This is a server that’s physically close to you that’s used to terminate the TLS connection. It doesn’t have any user data. Its job is to take your encrypted request, decrypt it, then pass it on to Facebook’s regional data center on their internal network.
The request enters a webby. These are usually called frontend servers, but again, at Facebook we called them webbies. This is a server that runs the monolithic Facebook web app. Again, it doesn’t have any user data. Its job is to take your request and orchestrate actions on deeper services to fulfill that request.
First it’s going to check a local memory cache server for sitevars. These control system level switches, like AB tests, and whether certain services are brought down. That server returns the sitevars and the webby proceeds, now knowing which logic paths to take.
For a like, which is a write request between your user account and a post, it will create two DB entries (you likes post, post liked by you). It needs to first get the data from the caching layer, so it will make two requests to TOA, one for your account, and one for the post.
TOA runs in the same regional data center, and if it doesn’t have the two data objects cached, it will request them from the regional db shards.
These regional db shards also run in the same data center, and they’ll return the data.
TOA returns the data back to the webby.
The webby (after doing some permission checks, which probably hit TOA again) now creates the two relationships, likes and liked by, referencing the two data objects, you and the post. TOA is a write-through cache, so the webby sends the writes to TOA.
TOA now needs to send the requests to the db primary shards, since they are the only ones that can handle writes. Your primary shard and the post’s primary shard are probably in different data centers, so TOA now passes the writes to the regional data centers for each primary shard.
A host running TOA in each regional data center for each primary shard now passes the write to each shard.
Each primary shard now writes the data to the local disk, and waits for the binary log to be written to the local journal before returning a success message.
The success message is passed from the local TOA host back to the original region’s TOA host.
When that TOA host gets both requests back successfully, it returns a success back to the webby handling your request.
The webby then returns a success to the PoP you’re still connected to.
The PoP then returns a success to the client running on your device.
The client doesn’t notify you of anything, because it already showed you a filled in like button right after you pressed it.
This was how it worked back in 2013 when I worked there. It probably hasn’t changed a whole lot, but this is also an extremely simplified overview. That request will probably hit hundreds of services. Some of them can fail and the request could still succeed. But some are required to succeed for your request to be considered successful, like the db write operations.
If you know a better way to make a system like this that works for billions of users across the planet, you should write a paper and submit it to a local conference. If they approve you for a talk, you can present your designs to an audience there. If the audience is really receptive, your designs could make a big impact in the tech sector. That’s basically what the highest level engineers at these big tech companies do when they design these multi-billion user systems, so it’s definitely possible for you to do it too.