- March 3, 2014
- Vasilis Vryniotis
- . No feedback
Within the earlier article we now have mentioned concerning the Information Envelopment Evaluation approach and we now have seen how it may be used as an efficient non-parametric rating algorithm. On this weblog put up we are going to develop an implementation of Information Envelopment Evaluation in JAVA and we are going to use it to judge the Social Media Reputation of webpages and articles on the internet. The code is open-sourced (underneath GPL v3 license) and you’ll obtain it freely from Github.
Replace: The Datumbox Machine Studying Framework is now open-source and free to obtain. Take a look at the package deal com.datumbox.framework.algorithms.dea to see the implementation of Information Envelopment Evaluation in Java.
Information Envelopment Evaluation implementation in JAVA
The code is written in JAVA and will be downloaded straight from Github. It’s licensed underneath GPLv3 so be happy to make use of it, modify it and redistribute it freely.
The code implements the Information Envelopment Evaluation algorithm, makes use of the lp_solve library to resolve the Linear Programming issues and makes use of extracted information from Net web optimization Analytics index to be able to assemble a composite social media recognition metric for webpages primarily based on their shares on Fb, Google Plus and Twitter. All of the theoretical elements of the algorithm are coated on the earlier article and within the supply code yow will discover detailed javadoc feedback in regards to the implementation.
Under we offer a excessive degree description of the structure of the implementation:
1. lp_solve 5.5 library
In an effort to clear up the assorted linear programming issues, we use an open supply library referred to as lp_solve. The actual lib is written in ANSI C and makes use of a JAVA wrapper to invoke the library strategies. Thus earlier than working the code you should set up lp_solve in your system. Binaries of the library can be found each for Linux and Home windows and you’ll learn extra details about the set up on lp_solve documentation.
Please be sure that the actual library is put in in your system earlier than attempting to run the JAVA code. For any drawback regarding putting in and configuring the library please seek advice from the lp_solve documentation.
2. DataEnvelopmentAnalysis Class
That is the principle class of the implementation of DEA algorithm. It implements a public methodology referred to as estimateEfficiency() which takes a Map of data and returns their DEA scores.
3. DeaRecord Object
The DeaRecord is a particular Object that shops the info of our report. Since DEA requires separating the enter and output, the DeaRecord Object shops our information individually in a manner that DEA can deal with it.
4. SocialMediaPopularity Class
The SocialMediaPopularity is an software which makes use of DEA to judge the recognition of a web page on Social Media networks primarily based on its Fb likes, Google +1s, and Tweets. It implements two protected strategies the calculatePopularity() and the estimatePercentiles() together with two public strategies the loadFile() and the getPopularity().
The calculatePopularity() makes use of the DEA implementation to estimate the scores of the pages primarily based on their social media counts. The estimatePercentiles() methodology will get the DEA scores and converts them into percentiles. Typically percentiles are simpler to elucidate than DEA scores; thus once we say that the recognition rating of a web page is 70% it implies that the actual web page is extra fashionable than the 70% of the pages.
So as to have the ability to estimate the recognition of a selected web page, we will need to have a dataset with the social media counts of different pages. This is smart since to be able to predict which web page is fashionable and which isn’t, you should be capable of examine it with different pages on the internet. To take action, we use a small anonymized pattern from Net web optimization Analytics index supplied in txt format. You may construct your individual database by extracting the social media counts from extra pages on the internet.
The loadFile() methodology is used to load the aforementioned statistics on DEA and the getPopularity() methodology is a straightforward to make use of methodology that will get the Fb likes, Google +1s and the variety of Tweets of a web page and evaluates its recognition on social media.
Utilizing the Information Envelopment Evaluation JAVA implementation
Within the DataEnvelopmentAnalysisExample Class I present 2 completely different examples of learn how to use the code.
The primary instance makes use of straight the DEA methodology to judge the effectivity of organizational items primarily based on their output (ISSUES, RECEIPTS, REQS) and enter (STOCK, WAGES). This instance was taken from an article of DEAzone.com.
Map data = new LinkedHashMap<>();
data.put("Depot1", new DeaRecord(new double[]{40.0,55.0,30.0}, new double[]{3.0,5.0}));
//...including extra data right here...
DataEnvelopmentAnalysis dea = new DataEnvelopmentAnalysis();
Map outcomes = dea.estimateEfficiency(data);
System.out.println((new TreeMap<>(outcomes)).toString());
The second instance makes use of our Social Media Reputation software to judge the recognition of a web page by utilizing information from Social Media equivalent to Fb Likes, Google +1s and Tweets. All social media counts are marked as output and we move to DEA an empty enter vector.
SocialMediaPopularity rank = new SocialMediaPopularity();
rank.loadFile(DataEnvelopmentAnalysisExample.class.getResource("/datasets/socialcounts.txt"));
Double recognition = rank.getPopularity(135, 337, 9079); //Fb likes, Google +1s, Tweets
System.out.println("Web page Social Media Reputation: "+recognition.toString());
Mandatory Expansions
The supplied code is simply an instance of how DEA can be utilized as a rating algorithm. Listed here are few expansions that should be made to be able to enhance the implementation:
1. Dashing up the implementation
The actual DEA implementation evaluates the DEA scores of all of the data within the database. This makes the implementation gradual since we require fixing as many linear programming issues because the variety of data in database. If we don’t require calculating the rating of all of the data then we will pace up the execution considerably. Thus a small enlargement of the algorithm can provide us higher management over which data must be solved and which must be used solely as constrains.
2. Increasing the Social Media Counts Database
The supplied Social Media Counts Database consists of 1111 samples from Net web optimization Analytics index. To have the ability to estimate a extra correct recognition rating, a bigger pattern is important. You may create your individual database by estimating the social media counts from extra pages of the online.
3. Including extra Social Media Networks
The implementation makes use of the Fb Likes, the Google +1s and the variety of Tweets to judge the recognition of an article. Nonetheless metrics from different social media networks will be simply taken under consideration. All it’s good to do is construct a database with the social media counts from the networks that you’re thinking about and develop the SocialMediaPopularity class to deal with them accordingly.
Ultimate feedback on the implementation
To have the ability to develop the implementation you should have a very good understanding of how Information Envelopment Evaluation works. That is coated on the earlier article, so please ensure you learn the tutorial earlier than you proceed to any modifications. Furthermore to be able to use the JAVA code you should have put in in your system the lp_solve library (see above).
Â
When you use the implementation in an attention-grabbing undertaking drop us a line and we are going to characteristic your undertaking on our weblog. Additionally if you happen to just like the article, please take a second and share it on Twitter or Fb.
