ArXiV Technical Paper API Github Repo Announcing the arrival of Valued Associate #679: Cesar...

3 doors, three guards, one stone

Antler Helmet: Can it work?

Limit for e and 1/e

Working around an AWS network ACL rule limit

Slither Like a Snake

Passing functions in C++

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?

Windows 10: How to Lock (not sleep) laptop on lid close?

What did Darwin mean by 'squib' here?

Need a suitable toxic chemical for a murder plot in my novel

Do working physicists consider Newtonian mechanics to be "falsified"?

How should I respond to a player wanting to catch a sword between their hands?

Losing the Initialization Vector in Cipher Block Chaining

How to colour the US map with Yellow, Green, Red and Blue to minimize the number of states with the colour of Green

How does modal jazz use chord progressions?

Array/tabular for long multiplication

What's the point in a preamp?

Is there folklore associating late breastfeeding with low intelligence and/or gullibility?

Is it possible to ask for a hotel room without minibar/extra services?

How to pour concrete for curved walkway to prevent cracking?

Am I ethically obligated to go into work on an off day if the reason is sudden?

What is the electric potential inside a point charge?

Using "nakedly" instead of "with nothing on"

ArXiV Technical Paper API Github Repo

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API

from urllib import request

from bs4 import BeautifulSoup



class ArXivPully:

    # Removes rogue newline characters from the title and abstract

    def cleanText(self,text):

        return ' '.join(text.split('n'))



    def pullFromArXiv(self,search_query, num_results=10):

        # Fix Input if it has spaces in it

        split_query = search_query.split(' ')

        if(len(split_query) > 1):

            search_query = '%20'.join(split_query)

        url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)

        data = request.urlopen(url).read()

        output = []

        soup = BeautifulSoup(data, 'html.parser')

        titles = soup.find_all('title')



        # ArXiv populates the first title value as the search query

        titles.pop(0)



        bodies = soup.find_all('summary')

        links = soup.find_all('link', title='pdf')

        for i in range(len(titles)):

            title = self.cleanText(titles[i].text.strip())

            body = self.cleanText(bodies[i].text.strip())

            pdf_link = links[i]['href']

            output.append([pdf_link, title, body])

        return output



    def on_get(self, req, resp):

        """Handles GET requests"""

        output = []

        for item in req.params.items():

            output.append(self.pullFromArXiv(item[0],item[1]))

        resp.media = output



api = API()

api.add_route('/api/query', ArXivPully())

Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.

asked 11 mins ago

evader110

New contributor

add a comment |

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API

from urllib import request

from bs4 import BeautifulSoup



class ArXivPully:

    # Removes rogue newline characters from the title and abstract

    def cleanText(self,text):

        return ' '.join(text.split('n'))



    def pullFromArXiv(self,search_query, num_results=10):

        # Fix Input if it has spaces in it

        split_query = search_query.split(' ')

        if(len(split_query) > 1):

            search_query = '%20'.join(split_query)

        url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)

        data = request.urlopen(url).read()

        output = []

        soup = BeautifulSoup(data, 'html.parser')

        titles = soup.find_all('title')



        # ArXiv populates the first title value as the search query

        titles.pop(0)



        bodies = soup.find_all('summary')

        links = soup.find_all('link', title='pdf')

        for i in range(len(titles)):

            title = self.cleanText(titles[i].text.strip())

            body = self.cleanText(bodies[i].text.strip())

            pdf_link = links[i]['href']

            output.append([pdf_link, title, body])

        return output



    def on_get(self, req, resp):

        """Handles GET requests"""

        output = []

        for item in req.params.items():

            output.append(self.pullFromArXiv(item[0],item[1]))

        resp.media = output



api = API()

api.add_route('/api/query', ArXivPully())

asked 11 mins ago

evader110

New contributor

add a comment |

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API

from urllib import request

from bs4 import BeautifulSoup



class ArXivPully:

    # Removes rogue newline characters from the title and abstract

    def cleanText(self,text):

        return ' '.join(text.split('n'))



    def pullFromArXiv(self,search_query, num_results=10):

        # Fix Input if it has spaces in it

        split_query = search_query.split(' ')

        if(len(split_query) > 1):

            search_query = '%20'.join(split_query)

        url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)

        data = request.urlopen(url).read()

        output = []

        soup = BeautifulSoup(data, 'html.parser')

        titles = soup.find_all('title')



        # ArXiv populates the first title value as the search query

        titles.pop(0)



        bodies = soup.find_all('summary')

        links = soup.find_all('link', title='pdf')

        for i in range(len(titles)):

            title = self.cleanText(titles[i].text.strip())

            body = self.cleanText(bodies[i].text.strip())

            pdf_link = links[i]['href']

            output.append([pdf_link, title, body])

        return output



    def on_get(self, req, resp):

        """Handles GET requests"""

        output = []

        for item in req.params.items():

            output.append(self.pullFromArXiv(item[0],item[1]))

        resp.media = output



api = API()

api.add_route('/api/query', ArXivPully())

asked 11 mins ago

evader110

New contributor

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API

from urllib import request

from bs4 import BeautifulSoup



class ArXivPully:

    # Removes rogue newline characters from the title and abstract

    def cleanText(self,text):

        return ' '.join(text.split('n'))



    def pullFromArXiv(self,search_query, num_results=10):

        # Fix Input if it has spaces in it

        split_query = search_query.split(' ')

        if(len(split_query) > 1):

            search_query = '%20'.join(split_query)

        url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)

        data = request.urlopen(url).read()

        output = []

        soup = BeautifulSoup(data, 'html.parser')

        titles = soup.find_all('title')



        # ArXiv populates the first title value as the search query

        titles.pop(0)



        bodies = soup.find_all('summary')

        links = soup.find_all('link', title='pdf')

        for i in range(len(titles)):

            title = self.cleanText(titles[i].text.strip())

            body = self.cleanText(bodies[i].text.strip())

            pdf_link = links[i]['href']

            output.append([pdf_link, title, body])

        return output



    def on_get(self, req, resp):

        """Handles GET requests"""

        output = []

        for item in req.params.items():

            output.append(self.pullFromArXiv(item[0],item[1]))

        resp.media = output



api = API()

api.add_route('/api/query', ArXivPully())

python web-scraping api git

asked 11 mins ago

evader110

New contributor

asked 11 mins ago

evader110

New contributor

asked 11 mins ago

evader110

New contributor

asked 11 mins ago

evader110

asked 11 mins ago

evader110

New contributor

evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy