Thursday, January 15, 2026

CoderByte assessments fall to ChatGPT · Ponderings of an Andy


Introduction

At the moment I will be evaluating CoderByte and their publicly posted challenges in opposition to ChatGPT. I chosen a straightforward, a medium and the only arduous
query from their listing of open challenges and ran these via ChatGPT.

In earlier articles on this sequence, I’ve proven how ChatGPT can simply clear up interview assessments from LeetCode,
TestGorilla, CodeSignal, Codility and HackerRank[hackerrankg]. I’ve stated it in
these articles, however I will say it once more: Corporations want to judge their engineering candidates – all candidates, truly – apprpriately
when they’re hiring. Your engineers are utilizing many instruments to do their jobs and AI Tooling is without doubt one of the latest instruments available on the market. Eradicating
a instrument out of your engineers goes to make them much less environment friendly, and not testing your candidates on how they use these instruments is
going to imply you rent somebody that had not absolutely demonstrated their capabilities.

ChatGPT ought to be handled as a instrument, but it surely’s additionally useful to consider it as a junior stage engineer. You must test it is work.
ChatGPT is thought to be inaccurate, which is why it stays banned on Stack Overflow. There are just a few cases the place
I had to try this beneath.

First a disclaimer: I used to be Director of Engineering at Woven Groups in 2022. Previous to that,
I used to be a buyer of Woven for over two years. I don’t bear in mind CoderByte being talked about throughout my tenure and CoderByte doesn’t
appear to contemplate Woven a competitor, as Woven is not listed as an various on their pricing FAQ web page.

Straightforward – React Button Toggle

Whereas I’ve tried to be principally random in my choice of assessments throughout these exams, I deliberately chosen this react
evaluation as a result of I do not know react. I’ve appreciated all these evaluation questions within the earlier articles as a result of I means I am unable to
apply my expertise to “nudge” ChatGPT towards an answer.

The problem right here is to toggle a button between “On” and “Off”. You are introduced with a really small quantity of boiler plate code.

I merely copy/pasted the immediate and the template into ChatGPT. It despatched again this:

import React, { useState } from 'react';
import ReactDOM from 'react-dom';

perform Toggle() {
const [toggle, setToggle] = useState(true);

perform handleClick() {
    setToggle(!toggle);
}

return (
    <button onClick={handleClick}>{toggle ? 'ON' : 'OFF'}button>
);
}

ReactDOM.render(
<Toggle />,
doc.getElementById('root')
);

I copied that into the IDE offered by CoderByte and submitted the code.

This take a look at took two minutes to finish. The majority of that point was spot checking that ChatGPT did not mess with the boiler plate code to badly.

CoderByte - Easy Challenge - React Toggle Button Time Taken

The fascinating this concerning the outcomes is that even with the ten/10, the system says it solely scored larger than 12% of different customers. Does this imply
that solely 12% of the customers passing this evaluation are getting decrease than a ten/10? To me, that is the signal of a foul evaluation.

CoderByte - Easy Challenge - React Toggle Button Score comparison

Medium – SQL Member Rely

The medium problem I chosen was an SQL query. SQL is fascinating, in my expertise, in that the outcomes returned on an evaluation are a
lot extra uniform in comparison with a non-SQL coding evaluation. This one wasn’t a tough one with joins.

The objective of this evaluation was to return a listing of names in alphabetical order, a rely of the variety of those who report back to them, and the
common age of their workforce members. I pasted the immediate, the anticipated desk structure for the output and the desk structure being chosen from.

The primary response was

SELECT ReportsTo, COUNT(ID) AS Members, AVG(Age) AS Average_Age
FROM maintable_YTTQH
WHERE ReportsTo IS NOT NULL
GROUP BY ReportsTo
ORDER BY ReportsTo;

That is the place expertise, and checking the outcomes are necessary. The anticipated output confirmed that the common age ought to be an integer. It additionally confirmed that the common age column ought to be named Common Age not Average_Age (discover the shortage of the underscore). I offered this steerage to ChatGPT

The AVG(Age) ought to be an integer not a float and The “Average_Age” area ought to be “Common Age” with out an underscore.

It corrected each issues with a brand new question which I submitted.

SELECT ReportsTo, COUNT(ID) AS Members, ROUND(AVG(Age)) AS "Common Age"
FROM maintable_YTTQH
WHERE ReportsTo IS NOT NULL
GROUP BY ReportsTo
ORDER BY ReportsTo;

CoderByte - Medium Challenge - SQL Score

ChatGPT scored one other 10/10.

CoderByte - Medium Challenge - SQL Time Taken

Even with that further spherical journey with ChatGPT, it took lower than a minute to finish this evaluation.

CoderByte - Medium Challenge - SQL Score Comparison

This evaluation appears to have a barely higher cut up of person scores, which is progress I suppose.

Arduous – Bracket Combos

There is just one arduous evaluation out there publicly. The objective for this evaluation is to return the variety of combos you can also make with a quantity
of pairs of parenthesis. The instance offered is:

The doable combos of three pairs of parenthesis, particularly: ()()(), are ()()(), ()(()), (())(), ((())), and (()())

I copied the immediate and boiler plate code into ChatGPT and it returned the next response:

def BracketCombinations(num):

    from math import factorial
    return factorial(2 * num) // (factorial(num) * factorial(num + 1))

# preserve this perform name right here 
print(BracketCombinations(enter()))

Expertise right here says that’s odd. Technically talking, Python will can help you have that import statemnet inside a perform, but when I noticed considered one of
my builders do it, I would point out it throughout a code overview. I used to be curious although and submitted it as is.

I additionally went again and advised ChatGPT:

Write the perform with out importing factorial

It replied with this perform:

def BracketCombinations(num):
    def reality(n):
        end result = 1
        for i in vary(1, n + 1):
            end result *= i
        return end result

    return reality(2 * num) // (reality(num) * reality(num + 1))

# preserve this perform name right here 
print(BracketCombinations(int(enter())))

I submitted this response too.

I solely have one set of photos for this problem, as a result of they each output precisely the identical responses. Each options scored the identical. Each options
have been higher than 35.1% of different customers. Each options had an O(n) complexity.

CoderByte - Hard Challenge - Combination Score and Complexity

Displaying the complexity is a pleasant contact. I’ll admit that I used to be anticipating the answer that imported factorial to be rejected as a result of it is utilizing a
inbuilt perform as an alternative of rolling your personal. Good for CoderByte for permitting a developer to make use of the inbuilt libraries as an engineer would do in
a manufacturing setting, as an alternative of including a false constraint.

CoderByte - Hard Challenge - Combination Time Taken

Copy pasting to CoderByte saved this near a minute.

Last Ideas

One other code solely evaluation instrument has been proven to be ineffective. The one factor this one does, that others does present publicly, is the
time it took to resolve an issue. I am very shocked {that a} majority of those have been solved in underneath ten minutes. Even so, fixing an issue in
lower than a minute will in all probability increase a flag of some variety for a hiring supervisor giving one of these evaluation. I guarentee that point is visbile
to them.

These three assessments present how properly ChatGPT will be in being your digital junior engineer. With the react query, I don’t know if it is
the extra environment friendly approach to clear up the issue but it surely works. With the SQL query, expertise recognized two small issues and allowed me to
present suggestions and get a corrected question in seconds. The third query turned in two alternative ways of fixing the issue. Apparently, in accordance
to CoderByte, trouble of which carry out precisely the identical.

The entire time it took me to run these three exams was underneath 5 minutes. That is an incredible instrument to have in my pocket.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles